Re: [PATCH 00/13] [RFC] Rust support
On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote: > On Fri, Apr 16, 2021 at 10:39 AM Willy Tarreau wrote: > > > > resources usage, I'm really not convinced at all it's suited for > > low-level development. I understand the interest of the experiment > > to help the language evolve into that direction, but I fear that > > the kernel will soon be as bloated and insecure as a browser, and > > that's really not to please me. > > Dunno, I don't think the introduction of Rust made Firefox _more_ insecure. > https://wiki.mozilla.org/Oxidation#Within_Firefox > > I pray no executives ever see Dmitry Vyukov's 2019 Linux Plumbers Conf > talk "Reflections on kernel quality, development process and testing." > https://www.youtube.com/watch?v=iAfrrNdl2f4 > or his 2018 Linux Security Summit talk "Syzbot and the Tale of > Thousand Kernel Bugs" https://www.youtube.com/watch?v=qrBVXxZDVQY > (and they're just fuzzing the syscall interface and USB devices. > Imagine once folks can more easily craft malformed bluetooth and wifi > packets.) > > I'd imagine the first term that comes to mind for them might be > "liability." They are quite sensitive to these vulnerabilities with > silly names, logos, and websites. There are many of us that believe > an incremental approach of introducing a memory safe language to our > existing infrastructure at the very least to attempt to improve the > quality of drivers for those that choose to use such tools is a better > approach. I would LOVE it if some "executives" would see the above presentations, because then they would maybe actually fund developers to fix bugs and maintain the kernel code, instead of only allowing them to add new features. Seriously, that's the real problem, that Dmitry's work has exposed, the lack of people allowed to do this type of bugfixing and maintenance on company time, for something that the company relies on, is a huge issue. "executives" feel that they are willing to fund the initial work and then "throw it over the wall to the community" once it is merged, and then they can forget about it as "the community" will maintain it for them for free. And that's a lie, as Dmitry's work shows. The world creates new use cases and testing ability all the time, which exposes bugs that have been around in old code. Once the bugs are fixed in that layer of code, the next layer down can finally be tested and then look, more corner cases of issues. Rewriting the kernel in another language is not going to fix the majority of the issues that fuzzing finds here automagically, as that work has exposed us to things like fault-injection and malicious USB packets that no language would have saved us from "automatically". All of those code paths deal with "unsafe" data that doesn't magically become "safe" because we switch languages. And over time, what we have determined is "safe" has changed! People forget that only a few years ago have we decided that the kernel now has to protect userspace programs from malicious hardware. That's a major shift in thinking, now data that we used to blindly trust can not be trusted at all. And "executives" want us to fix all of those issues for free, when really it's a major design shift for loads of kernel subsystems to handle this new threat model. So yes, please spread that talk around. Maybe then will we actually get funding and support to FIX the bugs that those tools test. Right now, the primary fixer of those findings are _INTERNS_ as that's all companies are willing to fund to fix this type of thing. And don't get me started on the inability for "executives" to fund other parts of Linux that they rely on, because they want "other companies" to do it instead. The tragedy-of-the-commons is a real threat to Linux, and always has been... thanks, greg k-h
[PATCH v3 2/2] mmc: block: Update ext_csd.cache_ctrl if it was written
The cache function can be turned ON and OFF by writing to the CACHE_CTRL byte (EXT_CSD byte [33]). However, card->ext_csd.cache_ctrl is only set on init if cache size > 0. Fix that by explicitly setting ext_csd.cache_ctrl on ext-csd write. Signed-off-by: Avri Altman --- drivers/mmc/core/block.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 5b6501fc9fb7..8b07ed5e08de 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -572,6 +572,18 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, struct mmc_blk_data *md, main_md->part_curr = value & EXT_CSD_PART_CONFIG_ACC_MASK; } + /* +* Make sure to update CACHE_CTRL in case it was changed. The cache +* will get turned back on if the card is re-initialized, e.g. +* suspend/resume or hw reset in recovery. +*/ + if ((MMC_EXTRACT_INDEX_FROM_ARG(cmd.arg) == EXT_CSD_CACHE_CTRL) && + (cmd.opcode == MMC_SWITCH)) { + u8 value = MMC_EXTRACT_VALUE_FROM_ARG(cmd.arg) & 1; + + card->ext_csd.cache_ctrl = value; + } + /* * According to the SD specs, some commands require a delay after * issuing the command. -- 2.25.1
[PATCH v3 1/2] mmc: block: Issue flush only if allowed
The cache may be flushed to the nonvolatile storage by writing to FLUSH_CACHE byte (EXT_CSD byte [32]). When in command queueing mode, the cache may be flushed by issuing a CMDQ_TASK_ DEV_MGMT (CMD48) with a FLUSH_CACHE op-code. Either way, verify that The cache function is turned ON before doing so. fixes: 1e8e55b67030 (mmc: block: Add CQE support) Reported-by: Brendan Peter Tested-by: Brendan Peter Signed-off-by: Avri Altman --- drivers/mmc/core/block.c | 7 +++ drivers/mmc/core/mmc.c | 2 +- drivers/mmc/core/mmc_ops.h | 5 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 8bfd4d95b386..5b6501fc9fb7 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1476,6 +1476,11 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req) struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req); + if (mmc_card_mmc(mq->card) && !mmc_flush_allowed(mq->card)) { + blk_mq_end_request(req, BLK_STS_OK); + return -EPERM; + } + mrq->cmd->opcode = MMC_SWITCH; mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) | (EXT_CSD_FLUSH_CACHE << 16) | @@ -2226,6 +2231,8 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req) switch (req_op(req)) { case REQ_OP_FLUSH: ret = mmc_blk_cqe_issue_flush(mq, req); + if (ret == -EPERM) + return MMC_REQ_FINISHED; break; case REQ_OP_READ: case REQ_OP_WRITE: diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 9ad4aa537867..e3da62ffcb5e 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -2037,7 +2037,7 @@ static int _mmc_flush_cache(struct mmc_card *card) { int err = 0; - if (card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1) { + if (mmc_flush_allowed(card)) { err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL, EXT_CSD_FLUSH_CACHE, 1, CACHE_FLUSH_TIMEOUT_MS); diff --git a/drivers/mmc/core/mmc_ops.h b/drivers/mmc/core/mmc_ops.h index 5782fdf4e8e9..2682bf66708a 100644 --- a/drivers/mmc/core/mmc_ops.h +++ b/drivers/mmc/core/mmc_ops.h @@ -19,6 +19,11 @@ enum mmc_busy_cmd { struct mmc_host; struct mmc_card; +static inline bool mmc_flush_allowed(struct mmc_card *card) +{ + return card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1; +} + int mmc_select_card(struct mmc_card *card); int mmc_deselect_cards(struct mmc_host *host); int mmc_set_dsr(struct mmc_host *host); -- 2.25.1
[PATCH v3 0/2] Do not flush cache when it is disabled
v2 -> v3: - rebase onto recent cache changes v1 -> v2: - Attend Adrian's comments Cache is a temporary storage space in an eMMC device. Volatile by nature, the cache should in typical case reduce the access time compared to an access to the main nonvolatile storage. The cache function can be turned ON and OFF. Once OFF, the host is not expected to issue a flush-cache command to the device. Avri Altman (2): mmc: block: Issue flush only if allowed mmc: block: Update ext_csd.cache_ctrl if it was written drivers/mmc/core/block.c | 19 +++ drivers/mmc/core/mmc.c | 2 +- drivers/mmc/core/mmc_ops.h | 5 + 3 files changed, 25 insertions(+), 1 deletion(-) -- 2.25.1
Re: Enabling pmbus power control
On Mon, Apr 19, 2021 at 10:36:48PM CDT, Guenter Roeck wrote: On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote: On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote: > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote: > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote: > > > > > Okay, to expand a bit on the description in my initial message -- we've > > > got a single chassis with multiple server boards and a single manager board > > > that handles, among other things, power control for the servers. > > > The manager board has one LM25066 for each attached server, which acts as > > > the "power switch" for that server. There thus really isn't any driver to > > > speak of for the downstream device. > > > > This sounds like you need a driver representing those server boards (or > > the slots they plug into perhaps) that represents everything about those > > boards to userspace, including power switching. I don't see why you > > wouldn't have a driver for that - it's a thing that physically exists > > and clearly has some software control, and you'd presumably also expect > > to represent some labelling about the slot as well. > > Absolutely agree. > > Thanks, > Guenter Hi Guenter, Mark, I wanted to return to this to try to explain why structuring the kernel support for this in a way that's specific to the device behind the PMIC seems like an awkward fit to me, and ultimately kind of artificial. In the system I described, the manager board with the LM25066 devices is its own little self-contained BMC system running its own Linux kernel (though "BMC" is perhaps a slightly misleading term because there's no host system that it manages). The PMICs are really the only relevant connection it has to the servers it controls power for -- they have their own dedicated local BMCs on board as well doing all the usual BMC tasks. A hypothetical dedicated driver for this on the manager board wouldn't have any other hardware to touch aside from the pmbus interface of the LM25066 itself, so calling it a server board driver seems pretty arbitrary -- and in fact the same system also has another LM25066 that controls the power supply to the chassis cooling fans (a totally different downstream device, but one that would be equally well-served by the same software). More recently, another system has entered the picture for us that might illustrate it more starkly -- the Delta Open19 power shelf [0] supported by a recent code release from LinkedIn [1]. This is a rackmount power supply All I can see is that this code is a mess. with fifty outputs, each independently switchable via an LM25066 attached to an on-board BMC-style management controller (though again, no host system involved). We (Equinix Metal) are interested in porting a modern OpenBMC to it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it currently runs (as found in the linked repo). The exact nature of the devices powered by its outputs is well outside the scope of the firmware running on that controller (it could be any arbitrary thing that runs on 12VDC), but we still want to be able to both (a) retrieve per-output voltage/current/power readings as provided by the existing LM25066 hwmon support, and (b) control the on/off state of those outputs from userspace. Given the array of possible use-cases, an approach of adding power-switch functionality to the existing LM25066 support seems like the most obvious way to support this, so I'm hoping to see if I can get some idea of what an acceptable implementation of that might look like. Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before, and it is still true: The hwmon subsystem is not in the business of power control. Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that or something similar could be used for your purposes. Thanks, Guenter I had a glance at the enclosure driver; it looks pretty geared toward SES-like things (drivers/scsi/ses.c being its only usage I can see in the kernel at the moment) and while it could perhaps be pressed into working for this it seems like it would probably drag in a fair amount of boilerplate and result in a somewhat gratuitously confusing driver arrangement (calling the things involved in the cases we're looking at "enclosures" seems like a bit of a stretch). As an alternative, would something like the patch below be more along the lines of what you're suggesting? And if so, would it make sense to generalize it into something like 'pmbus-switch.c' and add a PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code instead of hardcoding it for only LM25066 support? Thanks, Zev From f4c0a3637371d69a414b6fb882497d22e5de3ba0 Mon Sep 17 00:00:00 2001 From: Zev Weiss Date: Wed, 31 Mar 2021 01:58:35 -0500 Subject: [PATCH] misc: add lm25066-switch driver This driver allows an lm25066 to be switched on and off from userspace via
Doubt regarding memory allocation in KVM
Hi, I'm learning about qemu KVM, looking into code and experimenting on it. I have the following doubts regarding it, I would be grateful if you help me to get some idea on them. 1. I observe that KVM allocates memory to guests when it needs it but doesn't take it back (except for ballooning case). Also, the Qemu/KVM process does not free the memory even when the guest is rebooted. In this case, Does the Guest VM get access to memory already pre-filled with some garbage from the previous run?? (Since the host would allocate zeroed pages to guests the first time it requests but after that it's up to guests). Can it be a security issue? 2. How does the KVM know if GPFN (guest physical frame number) is backed by an actual machine frame number in host? If not mapped, then it faults in the host and allocates a physical frame for guests in the host. (kvm_mmu_page_fault) 3. How/where can I access the GPFNs in the host? Is "gfn_t gfn = gpa >> PAGE_SHIFT" and "gpa_t cr2_or_gpa" in the KVM page fault handler, x86 is the same as GPFN. (that is can I use pfn_to_page in guest VM to access the struct page in Guest) Thank You. Best Regards, Shivank Garg M.Tech Student, IIT Kanpur
Re: [PATCH] watchdog: qcom: Move suspend/resume to suspend_late/resume_early
Hi Guenter, On 2021-03-11 01:53, Guenter Roeck wrote: On Thu, Mar 11, 2021 at 01:50:04AM +0530, Sai Prakash Ranjan wrote: During suspend/resume usecases and tests, it is common to see issues such as lockups either in suspend path or resume path because of the bugs in the corresponding device driver pm handling code. In such cases, it is important that watchdog is active to make sure that we either receive a watchdog pretimeout notification or a bite causing reset instead of a hang causing us to hard reset the machine. There are good reasons as to why we need this because: * We can have a watchdog pretimeout governor set to panic in which case we can have a backtrace which would help identify the issue with the particular driver and cause a normal reboot. * Even in case where there is no pretimeout support, a watchdog bite is still useful because some firmware has debug support to dump CPU core context on watchdog bite for post-mortem analysis. * One more usecase which comes to mind is of warm reboot. In case we hard reset the target, a cold reboot could be induced resulting in lose of ddr contents thereby losing all the debug info. Currently, the watchdog pm callback just invokes the usual suspend and resume callback which do not have any special ordering in the sense that a watchdog can be suspended before the buggy device driver suspend callback and watchdog resume can happen after the buggy device driver resume callback. This would mean that the watchdog will not be active when the buggy driver cause the lockups thereby hanging the system. So to make sure this doesn't happen, move the watchdog pm to use late/early system pm callbacks which will ensure that the watchdog is suspended late and resumed early so that it can catch such issues. Signed-off-by: Sai Prakash Ranjan Reviewed-by: Guenter Roeck Gentle Ping. I don't see this in linux-next or linux-watchdog, please let me know if anything is pending from my side. Thanks, Sai --- drivers/watchdog/qcom-wdt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c index e38a87ffe5f5..0d2209c5eaca 100644 --- a/drivers/watchdog/qcom-wdt.c +++ b/drivers/watchdog/qcom-wdt.c @@ -329,7 +329,9 @@ static int __maybe_unused qcom_wdt_resume(struct device *dev) return 0; } -static SIMPLE_DEV_PM_OPS(qcom_wdt_pm_ops, qcom_wdt_suspend, qcom_wdt_resume); +static const struct dev_pm_ops qcom_wdt_pm_ops = { + SET_LATE_SYSTEM_SLEEP_PM_OPS(qcom_wdt_suspend, qcom_wdt_resume) +}; static const struct of_device_id qcom_wdt_of_table[] = { { .compatible = "qcom,kpss-timer", .data = _data_apcs_tmr }, -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH] x86/mm: Fix copy error in comments
Direct page mapping in bottom-up way will allocate memory from low address for page structures in a range, which is the *bottom*, not the *end*. Signed-off-by: Cao jin --- arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index e26f5c5c6565..bc2f871c75f1 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -663,7 +663,7 @@ static void __init memory_map_bottom_up(unsigned long map_start, /* * We start from the bottom (@map_start) and go to the top (@map_end). * The memblock_find_in_range() gets us a block of RAM from the -* end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages +* bottom of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages * for page table. */ while (start < map_end) { -- 2.30.1
Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
On Tue, 2021-04-20 at 15:18 +1000, Alexey Kardashevskiy wrote: > > On 20/04/2021 14:54, Leonardo Bras wrote: > > As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's > > possible to use direct DMA mapping even with pmem region. > > > > But, if that happens, the window size (len) is set to > > (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a > > pagesize times smaller DDW to be created, being insufficient for correct > > usage. > > > > Fix this so the correct window size is used in this case. > > Good find indeed. > > afaict this does not create a huge problem though as > query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - > page_shift) where it matters (phyp). > > > Reviewed-by: Alexey Kardashevskiy > Thanks for reviewing! Leonardo Bras
Re: [PATCH 5/7] mfd: sec: Simplify getting of_device_id match data
On 19.04.2021 10:17, Krzysztof Kozlowski wrote: > Use of_device_get_match_data() to make the code slightly smaller. > > Signed-off-by: Krzysztof Kozlowski > --- > drivers/mfd/sec-core.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/mfd/sec-core.c b/drivers/mfd/sec-core.c > index 8d55992da19e..3126c39f3203 100644 > --- a/drivers/mfd/sec-core.c > +++ b/drivers/mfd/sec-core.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -324,12 +325,8 @@ static inline unsigned long > sec_i2c_get_driver_data(struct i2c_client *i2c, > const struct i2c_device_id *id) > { > #ifdef CONFIG_OF > - if (i2c->dev.of_node) { > - const struct of_device_id *match; > - > - match = of_match_node(sec_dt_match, i2c->dev.of_node); > - return (unsigned long)match->data; > - } > + if (i2c->dev.of_node) > + return (unsigned long)of_device_get_match_data(>dev); > #endif Does it make any sense to keep the #ifdef CONFIG_OF after this change? I would also skip (i2c->dev.of_node) check, because of_device_get_match_data() already does that (although indirectly). > return id->driver_data; > } Best regards -- Marek Szyprowski, PhD Samsung R Institute Poland
Re: [PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources
On Tue, Apr 13, 2021 at 08:12:40PM -0700, Ilya Lipnitskiy wrote: > Follow the reasoning from commit 842de40d93e0 ("PCI: add generic > pci_enable_resources()"): > > The only functional difference from the MIPS version is that the > generic one uses "!r->parent" to check for resource collisions > instead of "!r->start && r->end". > > That should have no effect on any pci-legacy driver. > Unfortunately it does. With this patch in place, all my mips qemu emulations fail to boot from ide/ata drive; the driver is not instantiated. The error message is: ata_piix :00:0a.1: can't enable device: BAR 0 [io 0x01f0-0x01f7] not claimed ata_piix: probe of :00:0a.1 failed with error -22 Reverting this patch fixes the problem, and the message displayed by the driver is: ata_piix :00:0a.1: enabling device ( -> 0001) Bisect log is attached. Guenter --- # bad: [50b8b1d699ac313c0a07a3c185ffb23aecab8abb] Add linux-next specific files for 20210419 # good: [bf05bf16c76bb44ab5156223e1e58e26dfe30a88] Linux 5.12-rc8 git bisect start 'HEAD' 'v5.12-rc8' # bad: [c4bb91fc07e59241cde97f913d7a2fbedc248f0d] Merge remote-tracking branch 'crypto/master' git bisect bad c4bb91fc07e59241cde97f913d7a2fbedc248f0d # bad: [499f739ad70f2a58aac985dceb25ca7666da88be] Merge remote-tracking branch 'jc_docs/docs-next' git bisect bad 499f739ad70f2a58aac985dceb25ca7666da88be # bad: [11b56408a328d1c5c4dfa7667c5dc46956b64aec] Merge remote-tracking branch 'parisc-hd/for-next' git bisect bad 11b56408a328d1c5c4dfa7667c5dc46956b64aec # good: [09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f] Merge branch 'arm/defconfig' into for-next git bisect good 09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f # good: [a5b76c2f17330e266a5c56dde21430e27b0d0dbb] Merge remote-tracking branch 'arm-soc/for-next' git bisect good a5b76c2f17330e266a5c56dde21430e27b0d0dbb # good: [1e4241f6813f1c1a0027d96df075ffd01808b3cf] Merge remote-tracking branch 'ti-k3/ti-k3-next' git bisect good 1e4241f6813f1c1a0027d96df075ffd01808b3cf # good: [7496a43be7a362391607d78e49a3f28de80029ce] Merge remote-tracking branch 'h8300/h8300-next' git bisect good 7496a43be7a362391607d78e49a3f28de80029ce # good: [66633abd0642f1e89d26e15f36fb13d3a1c535ff] MIPS/bpf: Enable bpf_probe_read{, str}() on MIPS again git bisect good 66633abd0642f1e89d26e15f36fb13d3a1c535ff # good: [2c92ef8ff8d327797c1920ae7f938bcc6f3f7421] MIPS: Fix strnlen_user access check git bisect good 2c92ef8ff8d327797c1920ae7f938bcc6f3f7421 # good: [3a070801c61f4e3987d59b1068368ba71d727208] Merge remote-tracking branch 'microblaze/next' git bisect good 3a070801c61f4e3987d59b1068368ba71d727208 # good: [317f553bb677e324c9c865ff7f14597bc5ceeb9c] MIPS: pci-legacy: remove redundant info messages git bisect good 317f553bb677e324c9c865ff7f14597bc5ceeb9c # bad: [6ce48897ce476bed86fde28752c27596e8753277] MIPS: Loongson64: Add kexec/kdump support git bisect bad 6ce48897ce476bed86fde28752c27596e8753277 # bad: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic pci_enable_resources git bisect bad 99bca615d89510917864fac6b26fd343eff2aba2 # good: [0af83d2e447af3e5098583cb6320bb1b1fb0976b] MIPS: pci-legacy: remove busn_resource field git bisect good 0af83d2e447af3e5098583cb6320bb1b1fb0976b # first bad commit: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic pci_enable_resources
Re: [PATCH] mips: kdump: Crash kernel should be able to see old memories
Hi, Jiaxun On 04/20/2021 09:11 AM, Jiaxun Yang wrote: 在 2021/4/19 18:56, Youling Tang 写道: From: Huacai Chen kexec-tools use mem=X@Y to pass usable memories to crash kernel, but in commit a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") all BIOS passed memories are removed by early_parse_mem(). I think this is reasonable for a normal kernel but not for a crash kernel, because a crash kernel should be able to see all old memories, even though it is not supposed to use them. Fixes: a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") Signed-off-by: Huacai Chen Signed-off-by: Youling Tang --- arch/mips/kernel/setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index b86e241..ac90d3b 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -351,8 +351,10 @@ static int __init early_parse_mem(char *p) */ if (usermem == 0) { usermem = 1; +#ifndef CONFIG_CRASH_DUMP Why depend on a config instead of a runtime variable? If not depend on config, we can determine whether the command line contains the "elfcorehdr=" parameter, because the "mem=" and "elfcorhdr=" parameters are automatically added in kexec-tools. So if there is an "elfcorehdr=" parameter in the command line, it means that the currently running kernel is a capture kernel, and the memblock_remove() operation is not called. The revised patch is as follows: if (usermem == 0) { usermem = 1; - memblock_remove(memblock_start_of_DRAM(), - memblock_end_of_DRAM() - memblock_start_of_DRAM()); + if (!strstr(boot_command_line, "elfcorehdr")) { + memblock_remove(memblock_start_of_DRAM(), + memblock_end_of_DRAM() - memblock_start_of_DRAM()); + } Do you think it is feasible? Btw as you are fixing my commit please keep me CCed. Sorry, I will add your CCed. Thanks, Youling Thanks. - Jiaxun
Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
On 20/04/2021 14:54, Leonardo Bras wrote: As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's possible to use direct DMA mapping even with pmem region. But, if that happens, the window size (len) is set to (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a pagesize times smaller DDW to be created, being insufficient for correct usage. Fix this so the correct window size is used in this case. Good find indeed. afaict this does not create a huge problem though as query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - page_shift) where it matters (phyp). Reviewed-by: Alexey Kardashevskiy Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory present") Signed-off-by: Leonardo Bras --- arch/powerpc/platforms/pseries/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 9fc5217f0c8e..836cbbe0ecc5 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (pmem_present) { if (query.largest_available_block >= (1ULL << (MAX_PHYSMEM_BITS - page_shift))) - len = MAX_PHYSMEM_BITS - page_shift; + len = MAX_PHYSMEM_BITS; else dev_info(>dev, "Skipping ibm,pmemory"); } -- Alexey
[tip:master] BUILD SUCCESS 75ce17f4d207d047b991e28cd5243f2345889c0c
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master branch HEAD: 75ce17f4d207d047b991e28cd5243f2345889c0c Merge branch 'x86/build' elapsed time: 722m configs tested: 117 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig x86_64 allyesconfig riscvallmodconfig i386 allyesconfig riscvallyesconfig powerpc motionpro_defconfig sh ap325rxa_defconfig mips pic32mzda_defconfig powerpc xes_mpc85xx_defconfig m68kmvme16x_defconfig arm versatile_defconfig s390 alldefconfig mips pistachio_defconfig armspear3xx_defconfig sh rsk7203_defconfig powerpc mpc8272_ads_defconfig mips rb532_defconfig shedosk7705_defconfig powerpc asp8347_defconfig armrealview_defconfig sparc sparc32_defconfig um alldefconfig sh rts7751r2dplus_defconfig sh se7705_defconfig powerpc redwood_defconfig powerpc katmai_defconfig sh ecovec24_defconfig sh rsk7201_defconfig ia64 allmodconfig ia64defconfig parisc alldefconfig powerpc holly_defconfig shsh7785lcr_defconfig m68k multi_defconfig arm omap2plus_defconfig sh sh7724_generic_defconfig mips loongson1b_defconfig powerpc g5_defconfig arm footbridge_defconfig powerpc sequoia_defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a003-20210419 x86_64 randconfig-a001-20210419 x86_64 randconfig-a005-20210419 x86_64 randconfig-a002-20210419 x86_64 randconfig-a006-20210419 x86_64 randconfig-a004-20210419 i386 randconfig-a003-20210419 i386 randconfig-a001-20210419 i386 randconfig-a006-20210419 i386 randconfig-a005-20210419 i386 randconfig-a004-20210419 i386 randconfig-a002-20210419 i386 randconfig-a012-20210420 i386 randconfig-a014-20210420 i386 randconfig-a011-20210420 i386 randconfig-a013-20210420 i386 randconfig-a015-20210420 i386 randconfig-a016-20210420 i386 randconfig-a015-20210419 i386 randconfig-a013-20210419 i386 randconfig-a014-20210419 i386 randconfig-a016-20210419 i386 randconfig-a012-20210419 i386 randconfig-a011-20210419 riscvnommu_k210_defconfig riscvnommu_virt_defconfig riscv allnoconfig riscv defconfig riscv
[syzbot] KASAN: use-after-free Read in sctp_do_8_2_transport_strike
Hello, syzbot found the following issue on: HEAD commit:bf05bf16 Linux 5.12-rc8 git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=16a00dfed0 kernel config: https://syzkaller.appspot.com/x/.config?x=9404cfa686df2c05 dashboard link: https://syzkaller.appspot.com/bug?extid=bbe538efd1046586f587 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+bbe538efd1046586f...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 net/sctp/sm_sideeffect.c:531 Read of size 4 at addr 888024d65154 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 net/sctp/sm_sideeffect.c:531 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1636 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1185 [inline] sctp_do_sm+0x114f/0x5120 net/sctp/sm_sideeffect.c:1156 sctp_generate_timeout_event+0x1bb/0x3d0 net/sctp/sm_sideeffect.c:295 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431 expire_timers kernel/time/timer.c:1476 [inline] __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745 __run_timers kernel/time/timer.c:1726 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345 invoke_softirq kernel/softirq.c:221 [inline] __irq_exit_rcu kernel/softirq.c:422 [inline] irq_exit_rcu+0x134/0x200 kernel/softirq.c:434 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:517 Code: ed 56 6e f8 84 db 75 ac e8 34 50 6e f8 e8 3f 3f 74 f8 e9 0c 00 00 00 e8 25 50 6e f8 0f 00 2d be a5 c7 00 e8 19 50 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 fa 31 ff 48 89 de e8 24 58 6e f8 48 85 db RSP: 0018:c9d57d18 EFLAGS: 0293 RAX: RBX: RCX: RDX: 8880111c54c0 RSI: 8905a167 RDI: RBP: 888141433864 R08: 0001 R09: 0001 R10: 8179e0c8 R11: R12: 0001 R13: 888141433800 R14: 888141433864 R15: 888017879004 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:652 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351 call_cpuidle kernel/sched/idle.c:158 [inline] cpuidle_idle_call kernel/sched/idle.c:239 [inline] do_idle+0x3e1/0x590 kernel/sched/idle.c:300 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272 secondary_startup_64_no_verify+0xb0/0xbb Allocated by task 14433: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] kasan_kmalloc mm/kasan/common.c:506 [inline] kasan_kmalloc mm/kasan/common.c:465 [inline] __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:684 [inline] sctp_transport_new+0x8c/0x690 net/sctp/transport.c:96 sctp_assoc_add_peer+0x28f/0x1160 net/sctp/associola.c:618 sctp_process_init+0x12a/0x2940 net/sctp/sm_make_chunk.c:2345 sctp_sf_do_dupcook_a net/sctp/sm_statefuns.c:1800 [inline] sctp_sf_do_5_2_4_dupcook+0x1401/0x2d50 net/sctp/sm_statefuns.c:2200 sctp_do_sm+0x179/0x5120 net/sctp/sm_sideeffect.c:1153 sctp_assoc_bh_rcv+0x386/0x6c0 net/sctp/associola.c:1048 sctp_inq_push+0x1da/0x270 net/sctp/inqueue.c:80 sctp_rcv+0xf64/0x2f10 net/sctp/input.c:256 sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1077 ip6_protocol_deliver_rcu+0x2e9/0x17f0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x7f/0x160 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:458 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ipv6_rcv+0x28e/0x3c0
Re: [PATCH] bonding: 3ad: update slave arr after initialize
jin yiting wrote: [...] >> The described issue is a race condition (in that >> ad_agg_selection_logic clears agg->is_active under mode_lock, but >> bond_open -> bond_update_slave_arr is inspecting agg->is_active outside >> the lock). I don't see how the above change will reliably manage this; >> the real issue looks to be that bond_update_slave_arr is committing >> changes to the array (via bond_reset_slave_arr) based on a racy >> inspection of the active aggregator state while it is in flux. >> >> Also, the description of the issue says "The best aggregator in >> ad_agg_selection_logic has not changed, no need to update slave arr," >> but the change above does the opposite, and will set update_slave_arr >> when the aggregator has not changed (update_slave_arr remains false at >> return of ad_agg_selection_logic). >> >> I believe I understand the described problem, but I don't see >> how the patch fixes it. I suspect (but haven't tested) that the proper >> fix is to acquire mode_lock in bond_update_slave_arr while calling >> bond_3ad_get_active_agg_info to avoid conflict with the state machine. >> >> -J >> >> --- >> -Jay Vosburgh, jay.vosbu...@canonical.com >> . >> > > Thank you for your reply. The last patch does have redundant >update slave arr.Thank you for your correction. > >As you said, holding mode_lock in bond_update_slave_arr while >calling bond_3ad_get_active_agg_info can avoid conflictwith the state >machine. I have tested this patch, with ifdown/ifup operations for bond or >slaves. > >But bond_update_slave_arr is expected to hold RTNL only and NO >other lock. And it have WARN_ON(lockdep_is_held(>mode_lock)); in >bond_update_slave_arr. I'm not sure that holding mode_lock in >bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a >correct action. That WARN_ON came up in discussion recently, and my opinion is that it's incorrect, and is trying to insure bond_update_slave_arr is safe for a potential sleep when allocating memory. https://lore.kernel.org/netdev/20210322123846.3024549-1-maxi...@nvidia.com/ The original authors haven't replied, so I would suggest you remove the WARN_ON and the surrounding CONFIG_LOCKDEP ifdefs as part of your patch and replace it with a call to might_sleep. The other callers of bond_3ad_get_active_agg_info are generally obtaining the state in order to report it to user space, so I think it's safe to leave those calls not holding the mode_lock. The race is still there, but the data returned to user space is a snapshot and so may reflect an incomplete state during a transition. Further, having the inspection functions acquire the mode_lock permits user space to spam the lock with little effort. -J >diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >index 74cbbb2..db988e5 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, >struct slave *skipslave) >if (BOND_MODE(bond) == BOND_MODE_8023AD) { >struct ad_info ad_info; > >+ spin_lock_bh(>mode_lock); >if (bond_3ad_get_active_agg_info(bond, _info)) { >+ spin_unlock_bh(>mode_lock); >pr_debug("bond_3ad_get_active_agg_info failed\n"); >/* No active aggragator means it's not safe to use > * the previous array. >@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, >struct slave *skipslave) >bond_reset_slave_arr(bond); >goto out; >} >+ spin_unlock_bh(>mode_lock); >agg_id = ad_info.aggregator_id; >} >bond_for_each_slave(bond, slave, iter) { --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr
Le 19/04/2021 à 23:39, Randy Dunlap a écrit : On 4/19/21 6:16 AM, Michael Ellerman wrote: Randy Dunlap writes: Sure. I'll post them later today. They keep FPU and ALTIVEC as independent (build) features. Those patches look OK. But I don't think it makes sense to support that configuration, FPU=n ALTVEC=y. No one is ever going to make a CPU like that. We have enough testing surface due to configuration options, without adding artificial combinations that no one is ever going to use. IMHO :) So I'd rather we just make ALTIVEC depend on FPU. That's rather simple. See below. I'm doing a bunch of randconfig builds with it now. --- From: Randy Dunlap Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled, there are build errors: drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration] enable_kernel_fp(); ../arch/powerpc/lib/sstep.c: In function 'do_vec_load': ../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 'put_vr' [-Werror=implicit-function-declaration] 637 | put_vr(rn, ); | ^~ ../arch/powerpc/lib/sstep.c: In function 'do_vec_store': ../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration] 660 | get_vr(rn, ); | ^~ In theory ALTIVEC is independent of PPC_FPU but in practice nobody is going to build such a machine, so make ALTIVEC require PPC_FPU by depending on PPC_FPU. Signed-off-by: Randy Dunlap Reported-by: kernel test robot Cc: Michael Ellerman Cc: linuxppc-...@lists.ozlabs.org Cc: Christophe Leroy Cc: Segher Boessenkool Cc: l...@intel.com --- arch/powerpc/platforms/86xx/Kconfig|1 + arch/powerpc/platforms/Kconfig.cputype |2 ++ 2 files changed, 3 insertions(+) --- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig +++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig @@ -4,6 +4,7 @@ menuconfig PPC_86xx bool "86xx-based boards" depends on PPC_BOOK3S_32 select FSL_SOC + select PPC_FPU select ALTIVEC help The Freescale E600 SoCs have 74xx cores. --- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype +++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype @@ -186,6 +186,7 @@ config E300C3_CPU config G4_CPU bool "G4 (74xx)" depends on PPC_BOOK3S_32 + select PPC_FPU select ALTIVEC endchoice @@ -309,6 +310,7 @@ config PHYS_64BIT config ALTIVEC bool "AltiVec Support" + depends on PPC_FPU Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two selects that are above ? depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64) help This option enables kernel support for the Altivec extensions to the
Re: [PATCH v5 09/12] media: uvcvideo: Implement UVC_QUIRK_PRIVACY_DURING_STREAM
On Wed, Dec 23, 2020 at 9:56 PM Ricardo Ribalda wrote: > > Hi again > > On Wed, Dec 23, 2020 at 9:31 AM Ricardo Ribalda wrote: > > > > Hi Laurent > > > > On Wed, Dec 23, 2020 at 9:05 AM Laurent Pinchart > > wrote: > > > > > > Hi Ricardo, > > > > > > On Tue, Dec 22, 2020 at 09:04:19PM +0100, Ricardo Ribalda wrote: > > > > On Tue, Dec 22, 2020 at 11:30 AM Laurent Pinchart wrote: > > > > > On Mon, Dec 21, 2020 at 05:48:16PM +0100, Ricardo Ribalda wrote: > > > > > > Some devices, can only read the privacy_pin if the device is > > > > > > > > > > s/devices,/devices/ > > > > > > > > > > > streaming. > > > > > > > > > > > > This patch implement a quirk for such devices, in order to avoid > > > > > > invalid > > > > > > reads and/or spurious events. > > > > > > > > > > > > Signed-off-by: Ricardo Ribalda > > > > > > --- > > > > > > drivers/media/usb/uvc/uvc_driver.c | 57 > > > > > > -- > > > > > > drivers/media/usb/uvc/uvc_queue.c | 3 ++ > > > > > > drivers/media/usb/uvc/uvcvideo.h | 4 +++ > > > > > > 3 files changed, 61 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c > > > > > > b/drivers/media/usb/uvc/uvc_driver.c > > > > > > index 72516101fdd0..7af37d4bd60a 100644 > > > > > > --- a/drivers/media/usb/uvc/uvc_driver.c > > > > > > +++ b/drivers/media/usb/uvc/uvc_driver.c > > > > > > @@ -7,6 +7,7 @@ > > > > > > */ > > > > > > > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > @@ -1472,6 +1473,17 @@ static int uvc_parse_control(struct > > > > > > uvc_device *dev) > > > > > > /* > > > > > > - > > > > > > * Privacy GPIO > > > > > > */ > > > > > > > > > > There should be a blank line here. > > > > > > > > > > > +static bool uvc_gpio_is_streaming(struct uvc_device *dev) > > > > > > +{ > > > > > > + struct uvc_streaming *streaming; > > > > > > + > > > > > > + list_for_each_entry(streaming, >streams, list) { > > > > > > + if (uvc_queue_streaming(>queue)) > > > > > > + return true; > > > > > > + } > > > > > > + > > > > > > + return false; > > > > > > +} > > > > > > > > > > > > > > > > > > > > > > But not too blank lines here. > > > > > > > > > > > static u8 uvc_gpio_update_value(struct uvc_device *dev, > > > > > > @@ -1499,7 +1511,12 @@ static int uvc_gpio_get_cur(struct > > > > > > uvc_device *dev, struct uvc_entity *entity, > > > > > > if (cs != UVC_CT_PRIVACY_CONTROL || size < 1) > > > > > > return -EINVAL; > > > > > > > > > > > > + if ((dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) && > > > > > > + !uvc_gpio_is_streaming(dev)) > > > > > > + return -EBUSY; > > > > > > + > > > > > > *(uint8_t *)data = uvc_gpio_update_value(dev, entity); > > > > > > + > > > > > > return 0; > > > > > > } > > > > > > > > > > > > @@ -1528,19 +1545,50 @@ static struct uvc_entity > > > > > > *uvc_gpio_find_entity(struct uvc_device *dev) > > > > > > return NULL; > > > > > > } > > > > > > > > > > > > -static irqreturn_t uvc_gpio_irq(int irq, void *data) > > > > > > +void uvc_privacy_gpio_event(struct uvc_device *dev) > > > > > > { > > > > > > - struct uvc_device *dev = data; > > > > > > struct uvc_entity *unit; > > > > > > > > > > > > + > > > > > > unit = uvc_gpio_find_entity(dev); > > > > > > if (!unit) > > > > > > - return IRQ_HANDLED; > > > > > > + return; > > > > > > > > > > > > uvc_gpio_update_value(dev, unit); > > > > > > +} > > > > > > + > > > > > > +static irqreturn_t uvc_gpio_irq(int irq, void *data) > > > > > > +{ > > > > > > + struct uvc_device *dev = data; > > > > > > + > > > > > > + /* Ignore privacy events during streamoff */ > > > > > > + if (dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) > > > > > > + if (!uvc_gpio_is_streaming(dev)) > > > > > > + return IRQ_HANDLED; > > > > > > > > > > I'm still a bit concerned of race conditions. When stopping the > > > > > stream, > > > > > vb2_queue.streaming is set to 0 after calling the driver's > > > > > .stop_stream() > > > > > handler. This means that the device will cut power before > > > > > uvc_gpio_is_streaming() can detect that streaming has stopped, and the > > > > > GPIO could thus trigger an IRQ. > > > > > > > > On the affected devices I have not seen this. I guess it takes some > > > > time to discharge. Anyway I am implementing a workaround. Tell me if > > > > it is too ugly. > > > > > > > > > You mentioned that devices have a pull-up or pull-down on the GPIO > > > > > line. > > > > > As there are only two devices affected, do you know if it's a pull-up > > > > > or > > > > > pull-down ? Would it be worse to expose that state to userspace than > > > > > to > > > > > return -EBUSY when reading the control ? > >
[PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem
As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's possible to use direct DMA mapping even with pmem region. But, if that happens, the window size (len) is set to (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a pagesize times smaller DDW to be created, being insufficient for correct usage. Fix this so the correct window size is used in this case. Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory present") Signed-off-by: Leonardo Bras --- arch/powerpc/platforms/pseries/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 9fc5217f0c8e..836cbbe0ecc5 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (pmem_present) { if (query.largest_available_block >= (1ULL << (MAX_PHYSMEM_BITS - page_shift))) - len = MAX_PHYSMEM_BITS - page_shift; + len = MAX_PHYSMEM_BITS; else dev_info(>dev, "Skipping ibm,pmemory"); } -- 2.30.2
Re: [PATCH] bpf: Fix backport of "bpf: restrict unknown scalars of mixed signed bounds for unprivileged"
On Mon, Apr 19, 2021 at 04:56:41PM -0700, Samuel Mendoza-Jonas wrote: > The 4.14 backport of 9d7eceede ("bpf: restrict unknown scalars of mixed > signed bounds for unprivileged") adds the PTR_TO_MAP_VALUE check to the > wrong location in adjust_ptr_min_max_vals(), most likely because 4.14 > doesn't include the commit that updates the if-statement to a > switch-statement (aad2eeaf4 "bpf: Simplify ptr_min_max_vals adjustment"). > > Move the check to the proper location in adjust_ptr_min_max_vals(). > > Fixes: 17efa65350c5a ("bpf: restrict unknown scalars of mixed signed bounds > for unprivileged") > Signed-off-by: Samuel Mendoza-Jonas > Reviewed-by: Frank van der Linden > Reviewed-by: Ethan Chen > --- Thanks for catching it :) Reviewed-by: Balbir Singh
RE: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically
From: Andrea Parri (Microsoft) Sent: Monday, April 19, 2021 6:44 PM > > If a malicious or compromised Hyper-V sends a spurious message of type > CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will > call complete() on an uninitialized event, and cause an oops. > > Reported-by: Michael Kelley > Signed-off-by: Andrea Parri (Microsoft) > --- > Changes since v1[1]: > - add inline comment in vmbus_unload_response() > > [1] > https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/ > > drivers/hv/channel_mgmt.c | 7 ++- > drivers/hv/connection.c | 2 ++ > 2 files changed, 8 insertions(+), 1 deletion(-) > Reviewed-by: Michael Kelley > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 4c9e45d1f462c..335a10ee03a5e 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -826,6 +826,11 @@ static void vmbus_unload_response(struct > vmbus_channel_message_header *hdr) > /* >* This is a global event; just wakeup the waiting thread. >* Once we successfully unload, we can cleanup the monitor state. > + * > + * NB. A malicious or compromised Hyper-V could send a spurious > + * message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call > + * of the complete() below. Make sure that unload_event has been > + * initialized by the time this complete() is executed. >*/ > complete(_connection.unload_event); > } > @@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash) > if (vmbus_proto_version < VERSION_WIN8_1) > return; > > - init_completion(_connection.unload_event); > + reinit_completion(_connection.unload_event); > memset(, 0, sizeof(struct vmbus_channel_message_header)); > hdr.msgtype = CHANNELMSG_UNLOAD; > vmbus_post_msg(, sizeof(struct vmbus_channel_message_header), > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c > index dc19d5ae4373c..311cd005b3be6 100644 > --- a/drivers/hv/connection.c > +++ b/drivers/hv/connection.c > @@ -26,6 +26,8 @@ > > struct vmbus_connection vmbus_connection = { > .conn_state = DISCONNECTED, > + .unload_event = COMPLETION_INITIALIZER( > + vmbus_connection.unload_event), > .next_gpadl_handle = ATOMIC_INIT(0xE1E10), > > .ready_for_suspend_event = COMPLETION_INITIALIZER( > -- > 2.25.1
[PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload
When running in Azure, disks may be connected to a Linux VM with read/write caching enabled. If a VM panics and issues a VMbus UNLOAD request to Hyper-V, the response is delayed until all dirty data in the disk cache is flushed. In extreme cases, this flushing can take 10's of seconds, depending on the disk speed and the amount of dirty data. If kdump is configured for the VM, the current 10 second timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD complete message may arrive well after the kdump kernel is already running, causing problems. Note that no problem occurs if kdump is not enabled because Hyper-V waits for the cache flush before doing a reboot through the BIOS/UEFI code. Fix this problem by increasing the timeout in vmbus_wait_for_unload() to 100 seconds. Also output periodic messages so that if anyone is watching the serial console, they won't think the VM is completely hung. Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload") Signed-off-by: Michael Kelley --- Changed in v2: Fixed silly error in the argument to mdelay() --- drivers/hv/channel_mgmt.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index f3cf4af..ef4685c 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -755,6 +755,12 @@ static void init_vp_index(struct vmbus_channel *channel) free_cpumask_var(available_mask); } +#define UNLOAD_DELAY_UNIT_MS 10 /* 10 milliseconds */ +#define UNLOAD_WAIT_MS (100*1000) /* 100 seconds */ +#define UNLOAD_WAIT_LOOPS (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS) +#define UNLOAD_MSG_MS (5*1000)/* Every 5 seconds */ +#define UNLOAD_MSG_LOOPS (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS) + static void vmbus_wait_for_unload(void) { int cpu; @@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void) * vmbus_connection.unload_event. If not, the last thing we can do is * read message pages for all CPUs directly. * -* Wait no more than 10 seconds so that the panic path can't get -* hung forever in case the response message isn't seen. +* Wait up to 100 seconds since an Azure host must writeback any dirty +* data in its disk cache before the VMbus UNLOAD request will +* complete. This flushing has been empirically observed to take up +* to 50 seconds in cases with a lot of dirty data, so allow additional +* leeway and for inaccuracies in mdelay(). But eventually time out so +* that the panic path can't get hung forever in case the response +* message isn't seen. */ - for (i = 0; i < 1000; i++) { + for (i = 1; i <= UNLOAD_WAIT_LOOPS; i++) { if (completion_done(_connection.unload_event)) - break; + goto completed; for_each_online_cpu(cpu) { struct hv_per_cpu_context *hv_cpu @@ -800,9 +811,18 @@ static void vmbus_wait_for_unload(void) vmbus_signal_eom(msg, message_type); } - mdelay(10); + /* +* Give a notice periodically so someone watching the +* serial output won't think it is completely hung. +*/ + if (!(i % UNLOAD_MSG_LOOPS)) + pr_notice("Waiting for VMBus UNLOAD to complete\n"); + + mdelay(UNLOAD_DELAY_UNIT_MS); } + pr_err("Continuing even though VMBus UNLOAD did not complete\n"); +completed: /* * We're crashing and already got the UNLOAD_RESPONSE, cleanup all * maybe-pending messages on all CPUs to be able to receive new -- 1.8.3.1
Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API
On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi wrote: > > This adds some basic tests for the low level bpf_tc_cls_* API. > > Reviewed-by: Toke Høiland-Jørgensen > Signed-off-by: Kumar Kartikeya Dwivedi > --- > .../selftests/bpf/prog_tests/test_tc_bpf.c| 112 ++ > .../selftests/bpf/progs/test_tc_bpf_kern.c| 12 ++ > 2 files changed, 124 insertions(+) > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > > diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > new file mode 100644 > index ..945f3a1a72f8 > --- /dev/null > +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c > @@ -0,0 +1,112 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define LO_IFINDEX 1 > + > +static int test_tc_cls_internal(int fd, __u32 parent_id) > +{ > + DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = > 10, > + .class_id = TC_H_MAKE(1UL << 16, 1), > + .chain_index = 5); > + struct bpf_tc_cls_attach_id id = {}; > + struct bpf_tc_cls_info info = {}; > + int ret; > + > + ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, , ); > + if (CHECK_FAIL(ret < 0)) > + return ret; > + > + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, ); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + ret = -1; > + > + if (CHECK_FAIL(info.id.handle != id.handle) || > + CHECK_FAIL(info.id.chain_index != id.chain_index) || > + CHECK_FAIL(info.id.priority != id.priority) || > + CHECK_FAIL(info.id.handle != 1) || > + CHECK_FAIL(info.id.priority != 10) || > + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) || > + CHECK_FAIL(info.id.chain_index != 5)) > + goto end; > + > + ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, , ); > + if (CHECK_FAIL(ret < 0)) > + return ret; > + > + if (CHECK_FAIL(info.id.handle != 1) || > + CHECK_FAIL(info.id.priority != 10) || > + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1))) > + goto end; > + > + /* Demonstrate changing attributes */ > + opts.class_id = TC_H_MAKE(1UL << 16, 2); > + > + ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, , ); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, ); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2))) > + goto end; > + if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1)) > + goto end; > + > +end: > + ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, ); > + CHECK_FAIL(ret < 0); > + return ret; > +} > + > +void test_test_tc_bpf(void) > +{ > + const char *file = "./test_tc_bpf_kern.o"; > + struct bpf_program *clsp; > + struct bpf_object *obj; > + int cls_fd, ret; > + > + obj = bpf_object__open(file); > + if (CHECK_FAIL(IS_ERR_OR_NULL(obj))) > + return; > + > + clsp = bpf_object__find_program_by_title(obj, "classifier"); > + if (CHECK_FAIL(IS_ERR_OR_NULL(clsp))) > + goto end; > + > + ret = bpf_object__load(obj); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + cls_fd = bpf_program__fd(clsp); > + > + system("tc qdisc del dev lo clsact"); > + > + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + if (CHECK_FAIL(system("tc qdisc del dev lo clsact"))) > + goto end; > + > + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS); > + if (CHECK_FAIL(ret < 0)) > + goto end; > + > + CHECK_FAIL(system("tc qdisc del dev lo clsact")); please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK(). > + > +end: > + bpf_object__close(obj); > +} > diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > new file mode 100644 > index ..3dd40e21af8e > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c > @@ -0,0 +1,12 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > + > +// Dummy prog to test TC-BPF API no C++-style comments, please (except for SPDX header, of course) > + > +SEC("classifier") > +int cls(struct __sk_buff *skb) > +{ > + return 0; > +} > -- > 2.30.2 >
Re: [PATCH] arm64: dts: ls1028a-rdb: enable optee node
On Tue, Mar 30, 2021 at 9:53 AM Sahil Malhotra wrote: > > From: Sahil Malhotra > > optee node was disabled by default, enabling it for ls1028a-rdb. > > Signed-off-by: Michael Walle > Signed-off-by: Sahil Malhotra Acked-by: Li Yang > --- > arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 4 > arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 2 +- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > index 41ae6e7675ba..1bdf0104d492 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts > @@ -274,6 +274,10 @@ > status = "okay"; > }; > > + { > + status = "okay"; > +}; > + > { > status = "okay"; > }; > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > index 262fbad8f0ec..fb155ac192b1 100644 > --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi > @@ -92,7 +92,7 @@ > }; > > firmware { > - optee { > + optee: optee { > compatible = "linaro,optee-tz"; > method = "smc"; > status = "disabled"; > -- > 2.17.1 >
[PATCH v2 2/2] usb: cdnsp: Fix lack of removing request from pending list.
From: Pawel Laszczak Patch fixes lack of removing request from ep->pending_list on failure of the stop endpoint command. Driver even after failing this command must remove request from ep->pending_list. Without this fix driver can stuck in cdnsp_gadget_ep_disable function in loop: while (!list_empty(>pending_list)) { preq = next_request(>pending_list); cdnsp_ep_dequeue(pep, preq); } Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") Signed-off-by: Pawel Laszczak --- Changelog: v2: - removed blank space - added "Fixes" tag drivers/usb/cdns3/cdnsp-gadget.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c index 56707b6b0f57..c083985e387b 100644 --- a/drivers/usb/cdns3/cdnsp-gadget.c +++ b/drivers/usb/cdns3/cdnsp-gadget.c @@ -422,17 +422,17 @@ int cdnsp_ep_enqueue(struct cdnsp_ep *pep, struct cdnsp_request *preq) int cdnsp_ep_dequeue(struct cdnsp_ep *pep, struct cdnsp_request *preq) { struct cdnsp_device *pdev = pep->pdev; - int ret; + int ret_stop = 0; + int ret_rem; trace_cdnsp_request_dequeue(preq); - if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) { - ret = cdnsp_cmd_stop_ep(pdev, pep); - if (ret) - return ret; - } + if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) + ret_stop = cdnsp_cmd_stop_ep(pdev, pep); + + ret_rem = cdnsp_remove_request(pdev, preq, pep); - return cdnsp_remove_request(pdev, preq, pep); + return ret_rem ? ret_rem : ret_stop; } static void cdnsp_zero_in_ctx(struct cdnsp_device *pdev) -- 2.25.1
Re: linux-next: Tree for Apr 19 (bcache)
On 4/20/21 1:50 AM, Jens Axboe wrote: > On 4/19/21 10:26 AM, Coly Li wrote: >> On 4/19/21 11:40 PM, Randy Dunlap wrote: >>> On 4/19/21 3:23 AM, Stephen Rothwell wrote: Hi all, Changes since 20210416: >>> >>> on x86_64: >>> >>> when >>> # CONFIG_BLK_DEV is not set >>> >>> >>> WARNING: unmet direct dependencies detected for LIBNVDIMM >>> Depends on [n]: PHYS_ADDR_T_64BIT [=y] && HAS_IOMEM [=y] && BLK_DEV [=n] >>> Selected by [y]: >>> - BCACHE_NVM_PAGES [=y] && MD [=y] && BCACHE [=y] && PHYS_ADDR_T_64BIT >>> [=y] >>> >>> >>> Full randconfig file is attached. >>> >> >> I need hint from kbuild expert. >> >> My original idea to use "select LIBNVDIMM" is to avoid the >> BCACHE_NVM_PAGES option to disappear if LIBNVDIMM is not enabled. >> Otherwise if nvdimm driver is not configure, users won't know there is a >> BCACHE_NVM_PAGES option unless they read bcache Kconfig file. > > But why? That's exactly how it should work. Just use depends to set the > dependency. > >> But I see nvdimm's Kconfig, it uses "depends on BLK_DEV", I understand >> it is acceptable that LIBNVDIMM option to disappear from "make >> menuconfig" if BLK_DEV is not enabled. >> >> For such condition, which one is the proper way to set the dependence ? >> - Change "select LIBNVDIMM" and "select DAX" to "depends on LIBNVDIMM" >> and "depends on DAX" in bcache Kconfig >> - Or change "depends on BLK_DEV" to "select BLK_DEV" in nvdimm Kconfig. > > The former. > Copied. Thanks for the hint. I will post a fix soon. Coly Li
Re: [PATCH] riscv: Remove 32b kernel mapping from page table dump
On Sun, Apr 18, 2021 at 4:59 PM Alexandre Ghiti wrote: > > The 32b kernel mapping lies in the linear mapping, there is no point in > printing its address in page table dump, so remove this leftover that > comes from moving the kernel mapping outside the linear mapping for 64b > kernel. > > Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses") > Signed-off-by: Alexandre Ghiti Looks good to me. Reviewed-by: Anup Patel Regards, Anup > --- > arch/riscv/mm/ptdump.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c > index 0aba4421115c..a4ed4bdbbfde 100644 > --- a/arch/riscv/mm/ptdump.c > +++ b/arch/riscv/mm/ptdump.c > @@ -76,8 +76,8 @@ enum address_markers_idx { > PAGE_OFFSET_NR, > #ifdef CONFIG_64BIT > MODULES_MAPPING_NR, > -#endif > KERNEL_MAPPING_NR, > +#endif > END_OF_SPACE_NR > }; > > @@ -99,8 +99,8 @@ static struct addr_marker address_markers[] = { > {0, "Linear mapping"}, > #ifdef CONFIG_64BIT > {0, "Modules mapping"}, > -#endif > {0, "Kernel mapping (kernel, BPF)"}, > +#endif > {-1, NULL}, > }; > > @@ -379,8 +379,8 @@ static int ptdump_init(void) > address_markers[PAGE_OFFSET_NR].start_address = PAGE_OFFSET; > #ifdef CONFIG_64BIT > address_markers[MODULES_MAPPING_NR].start_address = MODULES_VADDR; > -#endif > address_markers[KERNEL_MAPPING_NR].start_address = kernel_virt_addr; > +#endif > > kernel_ptd_info.base_addr = KERN_VIRT_START; > > -- > 2.20.1 >
Re: [PATCH] riscv: Fix 32b kernel caused by 64b kernel mapping moving outside linear mapping
On Sat, Apr 17, 2021 at 10:52 PM Alexandre Ghiti wrote: > > Fix multiple leftovers when moving the kernel mapping outside the linear > mapping for 64b kernel that left the 32b kernel unusable. > > Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping") > Signed-off-by: Alexandre Ghiti Quite a few #ifdef but I don't see any better way at the moment. Maybe we can clean this later. Otherwise looks good to me. Reviewed-by: Anup Patel Regards, Anup > --- > arch/riscv/include/asm/page.h| 9 + > arch/riscv/include/asm/pgtable.h | 16 > arch/riscv/mm/init.c | 25 - > 3 files changed, 45 insertions(+), 5 deletions(-) > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h > index 22cfb2be60dc..f64b61296c0c 100644 > --- a/arch/riscv/include/asm/page.h > +++ b/arch/riscv/include/asm/page.h > @@ -90,15 +90,20 @@ typedef struct page *pgtable_t; > > #ifdef CONFIG_MMU > extern unsigned long va_pa_offset; > +#ifdef CONFIG_64BIT > extern unsigned long va_kernel_pa_offset; > +#endif > extern unsigned long pfn_base; > #define ARCH_PFN_OFFSET(pfn_base) > #else > #define va_pa_offset 0 > +#ifdef CONFIG_64BIT > #define va_kernel_pa_offset0 > +#endif > #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT) > #endif /* CONFIG_MMU */ > > +#ifdef CONFIG_64BIT > extern unsigned long kernel_virt_addr; > > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + > va_pa_offset)) > @@ -112,6 +117,10 @@ extern unsigned long kernel_virt_addr; > (_x < kernel_virt_addr) ? > \ > linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x); > \ > }) > +#else > +#define __pa_to_va_nodebug(x) ((void *)((unsigned long) (x) + va_pa_offset)) > +#define __va_to_pa_nodebug(x) ((unsigned long)(x) - va_pa_offset) > +#endif > > #ifdef CONFIG_DEBUG_VIRTUAL > extern phys_addr_t __virt_to_phys(unsigned long x); > diff --git a/arch/riscv/include/asm/pgtable.h > b/arch/riscv/include/asm/pgtable.h > index 80e63a93e903..5afda75cc2c3 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -16,19 +16,27 @@ > #else > > #define ADDRESS_SPACE_END (UL(-1)) > -/* > - * Leave 2GB for kernel and BPF at the end of the address space > - */ > + > +#ifdef CONFIG_64BIT > +/* Leave 2GB for kernel and BPF at the end of the address space */ > #define KERNEL_LINK_ADDR (ADDRESS_SPACE_END - SZ_2G + 1) > +#else > +#define KERNEL_LINK_ADDR PAGE_OFFSET > +#endif > > #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) > #define VMALLOC_END (PAGE_OFFSET - 1) > #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE) > > -/* KASLR should leave at least 128MB for BPF after the kernel */ > #define BPF_JIT_REGION_SIZE(SZ_128M) > +#ifdef CONFIG_64BIT > +/* KASLR should leave at least 128MB for BPF after the kernel */ > #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) > #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE) > +#else > +#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) > +#define BPF_JIT_REGION_END (VMALLOC_END) > +#endif > > /* Modules always live before the kernel */ > #ifdef CONFIG_64BIT > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 093f3a96ecfc..dc9b988e0778 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -91,8 +91,10 @@ static void print_vm_layout(void) > (unsigned long)VMALLOC_END); > print_mlm("lowmem", (unsigned long)PAGE_OFFSET, > (unsigned long)high_memory); > +#ifdef CONFIG_64BIT > print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR, > (unsigned long)ADDRESS_SPACE_END); > +#endif > } > #else > static void print_vm_layout(void) { } > @@ -165,9 +167,11 @@ static struct pt_alloc_ops pt_ops; > /* Offset between linear mapping virtual address and kernel load address */ > unsigned long va_pa_offset; > EXPORT_SYMBOL(va_pa_offset); > +#ifdef CONFIG_64BIT > /* Offset between kernel mapping virtual address and kernel load address */ > unsigned long va_kernel_pa_offset; > EXPORT_SYMBOL(va_kernel_pa_offset); > +#endif > unsigned long pfn_base; > EXPORT_SYMBOL(pfn_base); > > @@ -410,7 +414,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) > load_sz = (uintptr_t)(&_end) - load_pa; > > va_pa_offset = PAGE_OFFSET - load_pa; > +#ifdef CONFIG_64BIT > va_kernel_pa_offset = kernel_virt_addr - load_pa; > +#endif > > pfn_base = PFN_DOWN(load_pa); > > @@ -469,12 +475,16 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) >pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL); > dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1)); > #else /* CONFIG_BUILTIN_DTB */ > +#ifdef CONFIG_64BIT > /* >
Re: [PATCH v20 1/2] scsi: ufs: Enable power management for wlun
On 20/04/21 12:53 am, Asutosh Das (asd) wrote: > On 4/19/2021 11:37 AM, Adrian Hunter wrote: >> On 16/04/21 10:49 pm, Asutosh Das wrote: >>> >>> Co-developed-by: Can Guo >>> Signed-off-by: Can Guo >>> Signed-off-by: Asutosh Das >>> --- >> >> I came across 3 issues while testing. See comments below. >> > Hi Adrian > Thanks for the comments. >> >> >>> @@ -5794,7 +5839,7 @@ static void ufshcd_err_handling_unprepare(struct >>> ufs_hba *hba) >>> if (ufshcd_is_clkscaling_supported(hba)) >>> ufshcd_clk_scaling_suspend(hba, false); >>> ufshcd_clear_ua_wluns(hba); >> >> ufshcd_clear_ua_wluns() deadlocks trying to clear UFS_UPIU_RPMB_WLUN >> if sdev_rpmb is suspended and sdev_ufs_device is suspending. >> e.g. ufshcd_wl_suspend() is waiting on host_sem while ufshcd_err_handler() >> is running, at which point sdev_rpmb has already suspended. >> > Umm, I didn't understand this deadlock. > When you say, sdev_rpmb is suspended, does it mean runtime_suspended? > sdev_ufs_device is suspending - this can't be runtime_suspending, while > ufshcd_err_handling_unprepare is running. > > If you've a call-stack of this deadlock, please can you share it with me. > I'll also try to reproduce this. Yes it is system suspend. sdev_rpmb has suspended, sdev_ufs_device is waiting on host_sem. ufshcd_err_handler() holds host_sem. ufshcd_clear_ua_wlun(UFS_UPIU_RPMB_WLUN) gets stuck. I will get some call-stacks. > > I'll address the other comments in the next version. > > > Thank you! > >>> - pm_runtime_put(hba->dev); >>> + ufshcd_rpm_put(hba); >>> } >> >> >> >>> +void ufshcd_resume_complete(struct device *dev) >>> +{ >
linux-next: manual merge of the ftrace tree with the bpf-next tree
Hi all, Today's linux-next merge of the ftrace tree got a conflict in: kernel/trace/bpf_trace.c between commit: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") from the bpf-next tree and commit: f2cc020d7876 ("tracing: Fix various typos in comments") from the ftrace tree. I fixed it up (the former removed the comment updated by the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpZSIISeFaSc.pgp Description: OpenPGP digital signature
Re: [PATCH 0/2] Change struct page layout for page_pool
"Matthew Wilcox (Oracle)" writes: > The first patch here fixes two bugs on ppc32, and mips32. It fixes one > bug on arc and arm32 (in certain configurations). It probably makes > sense to get it in ASAP through the networking tree. I'd like to see > testing on those four architectures if possible? Sorry I don't have easy access to any hardware that fits the bill. At some point I'll be able to go to the office and setup a machine that (I think) can test these paths, but I don't have an ETA on that. You and others seem to have done lots of analysis on this though, so I think you should merge the fixes without waiting on ppc32 testing. cheers > > The second patch enables new functionality. It is much less urgent. > I'd really like to see Mel & Michal's thoughts on it. > > I have only compile-tested these patches. > > Matthew Wilcox (Oracle) (2): > mm: Fix struct page layout on 32-bit systems > mm: Indicate pfmemalloc pages in compound_head > > include/linux/mm.h | 12 +++- > include/linux/mm_types.h | 9 - > include/net/page_pool.h | 12 +++- > net/core/page_pool.c | 12 +++- > 4 files changed, 29 insertions(+), 16 deletions(-) > > -- > 2.30.2
RE: [PATCH 1/2] usb: gadget: f_uac2: Stop endpoint before enabling it.
>On 21-04-19 09:50:53, Pawel Laszczak wrote: >> From: Pawel Laszczak >> >> Patch adds disabling endpoint before enabling it during changing >> alternate setting. Lack of this functionality causes that in some >> cases uac2 queue the same request multiple time. >> Such situation can occur when host send set interface with >> alternate setting 1 twice. >> >> Signed-off-by: Pawel Laszczak >> --- >> drivers/usb/gadget/function/f_uac2.c | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/usb/gadget/function/f_uac2.c >> b/drivers/usb/gadget/function/f_uac2.c >> index 9cc5c512a5cd..7d20a9d8a1b4 100644 >> --- a/drivers/usb/gadget/function/f_uac2.c >> +++ b/drivers/usb/gadget/function/f_uac2.c >> @@ -890,17 +890,17 @@ afunc_set_alt(struct usb_function *fn, unsigned intf, >> unsigned alt) >> if (intf == uac2->as_out_intf) { >> uac2->as_out_alt = alt; >> >> +u_audio_stop_capture(>g_audio); >> + >> if (alt) >> ret = u_audio_start_capture(>g_audio); >> -else >> -u_audio_stop_capture(>g_audio); >> } else if (intf == uac2->as_in_intf) { >> uac2->as_in_alt = alt; >> >> +u_audio_stop_playback(>g_audio); >> + >> if (alt) >> ret = u_audio_start_playback(>g_audio); >> -else >> -u_audio_stop_playback(>g_audio); >> } else { >> dev_err(dev, "%s:%d Error!\n", __func__, __LINE__); >> return -EINVAL; > >To avoid this, you may use prm->ep_enabled to judge if the endpoint has >already enabled. Such condition is as first instruction inside u_audio_stop_playback->free_ep function, so we don't need duplicate it here. > >-- > >Thanks, >Peter Chen -- Regards, Pawe Laszczak
[PATCH] mhi: add MHI_STATE_M2 to resume success criteria
During system resume, mhi driver triggers M3->M0 transition and then waits for target device to enter M0 state. Once done, the device queues a state change event into ctrl event ring and notify mhi dirver by raising an interrupt, where a tasklet is scheduled to process this event. In most cases, the taklet is served timely and wait operation succeeds. However, there are cases where CPU is busy and can not serve this tasklet for some time. Once delay goes long enough, the device moves itself to M1 state and also interrupts mhi driver after inserting a new state change event to ctrl ring. Later CPU finally has time to process the ring, however there are two events in it now: 1. for M3->M0 event, which is processed first as queued first, tasklet handler updates device state to M0 and wakes up the task, i.e., the mhi driver. 2. for M0->M1 event, which is processed later, tasklet handler triggers M1->M2 transition and updates device state to M2 directly, then wakes up the mhi driver(if still sleeping on this wait queue). Note that although mhi driver has been woken up while processing the first event, it may still has no chance to run before the second event is processed. In other words, mhi driver has to keep waiting till timeout cause the M0 state has been missed. kernel log here: ... Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi :06:00.0: Entered with PM state: M3, MHI state: M3 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi :06:00.0: State change event to state: M0 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi :06:00.0: State change event to state: M1 Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi :06:00.0: Did not enter M0 state, MHI state: M2, PM state: M2 ... Fix this issue by simply adding M2 as a valid state for resume. Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Signed-off-by: Baochen Qiang --- drivers/bus/mhi/core/pm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index ce73cfa63cb3..ca5f2feed9d5 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -900,6 +900,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl) ret = wait_event_timeout(mhi_cntrl->state_event, mhi_cntrl->dev_state == MHI_STATE_M0 || +mhi_cntrl->dev_state == MHI_STATE_M2 || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state), msecs_to_jiffies(mhi_cntrl->timeout_ms)); -- 2.25.1
Re: [PATCH v4 4/4] pinctrl: add rsel setting on MT8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > @@ -176,6 +180,12 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev, > else > err = -ENOTSUPP; > break; > + case MTK_PIN_CONFIG_RSEL: > + if (hw->soc->rsel_get) > + err = hw->soc->rsel_get(hw, desc, ); > + else > + err = -EOPNOTSUPP; I think that should want to be -ENOTSUPP to align other occurrences. > + break; > default: > err = -ENOTSUPP; > } > @@ -295,6 +305,12 @@ static int mtk_pinconf_set(struct pinctrl_dev *pctldev, > unsigned int pin, > else > err = -ENOTSUPP; > break; > + case MTK_PIN_CONFIG_RSEL: > + if (hw->soc->rsel_set) > + err = hw->soc->rsel_set(hw, desc, arg); > + else > + err = -EOPNOTSUPP; Ditto > + break; > default: > err = -ENOTSUPP; > } > -- > 2.18.0 >
linux-next: build warning after merge of the spi tree
Hi all, After merging the spi tree, today's linux-next build (x86_64 allmodconfig) produced this warning: In file included from include/linux/printk.h:409, from include/linux/kernel.h:16, from include/linux/clk.h:13, from drivers/spi/spi-stm32-qspi.c:7: drivers/spi/spi-stm32-qspi.c: In function 'stm32_qspi_dirmap_read': drivers/spi/spi-stm32-qspi.c:481:21: warning: format '%x' expects argument of type 'unsigned int', but argument 5 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ^~ include/linux/dynamic_debug.h:129:15: note: in definition of macro '__dynamic_func_call' 129 | func(, ##__VA_ARGS__); \ | ^~~ include/linux/dynamic_debug.h:161:2: note: in expansion of macro '_dynamic_func_call' 161 | _dynamic_func_call(fmt,__dynamic_dev_dbg, \ | ^~ include/linux/dev_printk.h:123:2: note: in expansion of macro 'dynamic_dev_dbg' 123 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~ include/linux/dev_printk.h:123:23: note: in expansion of macro 'dev_fmt' 123 | dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__) | ^~~ drivers/spi/spi-stm32-qspi.c:481:2: note: in expansion of macro 'dev_dbg' 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ^~~ drivers/spi/spi-stm32-qspi.c:481:34: note: format string is defined here 481 | dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", __func__, len, offs, buf); | ~^ | | | unsigned int | %lx Introduced by commit 18674dee3cd6 ("spi: stm32-qspi: Add dirmap support") -- Cheers, Stephen Rothwell pgpnXiPdMIUXz.pgp Description: OpenPGP digital signature
Re: [PATCH 00/13] [RFC] Rust support
Hi Nick, On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote: > I don't think the introduction of Rust made Firefox _more_ insecure. > https://wiki.mozilla.org/Oxidation#Within_Firefox Browsers are human interfaces and do not fundamentally require low level access to memory/hardware/whatever. They can be written in about any language, only the resource usage and performance will make a difference. As such, some were even written in Java or JS for example. Operating systems, and particularly drivers *do* require low-level accesses, and stuff that can hardly be abstracted or understood by a compiler. You may have to perform two 16-bit reads/writes on a 32-bit MMIO address to perform an operation and the compiler does not have to know it, just to obey. > Really, a key point is that a lot of common mistakes in C are compile > time errors in Rust. I know no "true" kernel dev would make such > mistakes in C, Everyone makes mistakes, the level of attention varies over time and the focus often changes when dealing with build errors. How many time some of us facing a bug remembered having changed the code very late after a build error, and being less careful from this point when the goal changed from "let's do it right" to "let's get this to build" ? > but is there nothing we can do to help our peers > writing drivers? The point is to transfer cost from runtime to > compile time to avoid costs at runtime; like all of the memory safety > bugs which are costing our industry. And do we have stats on the number of logical bugs, some of which are caused by developers trying to work around compilers' stubbornness ? For me, personally speaking, they have *increased* over time, usually trying to avoid some annoying modern gcc warnings, resulting in integer casts being placed close to string formats, or returns being placed in switch/case to avoid the fall-through warning, etc. Thus I'm worried that a non-negligible part of the 70% of bugs caused by memory safety issues could be replaced with logic bugs to get to the point where the rust compiler finally accepts to compile the code. It makes me think about researchers trying to reduce the causes of certain deaths and claiming to "save lives" while in the end the people they "save" will simply die from something else. And I'm not particularly trying to blindly defend C here. I'm complaining every single day about some of its shortcomings like the vast amount of UB, stupid type promotion, counter-intuitive operators precedence when combining bit-ops with arithmetic, limited size of enums, lack of rotate operator, strict aliasing, or the recourse to asm() statements every 10 lines to do stuff that can hardly be expressed in a way understandable by a compiler. I'm just seeing that a lot of the griefs I'm having against C come from the compiler trying to be too smart or too stubborn, so giving even more of the handle to a compiler doesn't appeal me at all. In addition, we all know how painful it is to work around compiler bugs by writing complex code that carefully avoids certain constructs. I'm wondering if we'll still have that luxury with a stricter compiler, or if the only response will have to be between "let's disable this driver that does not compile" or "please force distros to upgrade their compilers". But we'll see :-/ Regards, Willy
Re: [PATCH v4 3/4] pinctrl: add drive for I2C related pins on MT8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > > This patch provides the advanced drive raw data setting version > for I2C used pins on MT8195. > > Signed-off-by: Zhiyong Tao Acked-by: Sean Wang > --- > drivers/pinctrl/mediatek/pinctrl-mt8195.c | 22 +++ > .../pinctrl/mediatek/pinctrl-mtk-common-v2.c | 14 > .../pinctrl/mediatek/pinctrl-mtk-common-v2.h | 5 + > 3 files changed, 41 insertions(+) > > diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > index 063f164d7c9b..a7500e18bb1d 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > @@ -760,6 +760,25 @@ static const struct mtk_pin_field_calc > mt8195_pin_drv_range[] = { > PIN_FIELD_BASE(143, 143, 1, 0x020, 0x10, 24, 3), > }; > > +static const struct mtk_pin_field_calc mt8195_pin_drv_adv_range[] = { > + PIN_FIELD_BASE(8, 8, 4, 0x020, 0x10, 15, 3), > + PIN_FIELD_BASE(9, 9, 4, 0x020, 0x10, 0, 3), > + PIN_FIELD_BASE(10, 10, 4, 0x020, 0x10, 18, 3), > + PIN_FIELD_BASE(11, 11, 4, 0x020, 0x10, 3, 3), > + PIN_FIELD_BASE(12, 12, 4, 0x020, 0x10, 21, 3), > + PIN_FIELD_BASE(13, 13, 4, 0x020, 0x10, 6, 3), > + PIN_FIELD_BASE(14, 14, 4, 0x020, 0x10, 24, 3), > + PIN_FIELD_BASE(15, 15, 4, 0x020, 0x10, 9, 3), > + PIN_FIELD_BASE(16, 16, 4, 0x020, 0x10, 27, 3), > + PIN_FIELD_BASE(17, 17, 4, 0x020, 0x10, 12, 3), > + PIN_FIELD_BASE(29, 29, 2, 0x020, 0x10, 0, 3), > + PIN_FIELD_BASE(30, 30, 2, 0x020, 0x10, 3, 3), > + PIN_FIELD_BASE(34, 34, 1, 0x040, 0x10, 0, 3), > + PIN_FIELD_BASE(35, 35, 1, 0x040, 0x10, 3, 3), > + PIN_FIELD_BASE(44, 44, 1, 0x040, 0x10, 6, 3), > + PIN_FIELD_BASE(45, 45, 1, 0x040, 0x10, 9, 3), > +}; > + > static const struct mtk_pin_reg_calc mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = { > [PINCTRL_PIN_REG_MODE] = MTK_RANGE(mt8195_pin_mode_range), > [PINCTRL_PIN_REG_DIR] = MTK_RANGE(mt8195_pin_dir_range), > @@ -773,6 +792,7 @@ static const struct mtk_pin_reg_calc > mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = { > [PINCTRL_PIN_REG_PUPD] = MTK_RANGE(mt8195_pin_pupd_range), > [PINCTRL_PIN_REG_R0] = MTK_RANGE(mt8195_pin_r0_range), > [PINCTRL_PIN_REG_R1] = MTK_RANGE(mt8195_pin_r1_range), > + [PINCTRL_PIN_REG_DRV_ADV] = MTK_RANGE(mt8195_pin_drv_adv_range), > }; > > static const char * const mt8195_pinctrl_register_base_names[] = { > @@ -801,6 +821,8 @@ static const struct mtk_pin_soc mt8195_data = { > .bias_get_combo = mtk_pinconf_bias_get_combo, > .drive_set = mtk_pinconf_drive_set_rev1, > .drive_get = mtk_pinconf_drive_get_rev1, > + .adv_drive_get = mtk_pinconf_adv_drive_get_raw, > + .adv_drive_set = mtk_pinconf_adv_drive_set_raw, > }; > > static const struct of_device_id mt8195_pinctrl_of_match[] = { > diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > index 72f17f26acd8..2b51f4a9b860 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c > @@ -1027,6 +1027,20 @@ int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw, > } > EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get); > > +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 arg) > +{ > + return mtk_hw_set_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, arg); > +} > +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_set_raw); > + > +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 *val) > +{ > + return mtk_hw_get_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, val); > +} > +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get_raw); > + > MODULE_LICENSE("GPL v2"); > MODULE_AUTHOR("Sean Wang "); > MODULE_DESCRIPTION("Pin configuration library module for mediatek SoCs"); > diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > index e2aae285b5fc..fd5ce9c5dcbd 100644 > --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h > @@ -66,6 +66,7 @@ enum { > PINCTRL_PIN_REG_DRV_EN, > PINCTRL_PIN_REG_DRV_E0, > PINCTRL_PIN_REG_DRV_E1, > + PINCTRL_PIN_REG_DRV_ADV, > PINCTRL_PIN_REG_MAX, > }; > > @@ -314,6 +315,10 @@ int mtk_pinconf_adv_drive_set(struct mtk_pinctrl *hw, > const struct mtk_pin_desc *desc, u32 arg); > int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw, > const struct mtk_pin_desc *desc, u32 *val); > +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw, > + const struct mtk_pin_desc *desc, u32 arg); > +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl
[PATCH] drivers: regulator: core.c: Fix indentation of comment
Shifted the closing */ of multiline comment to a new line This is done to maintain code uniformity Signed-off-by: Shubhankar Kuranagatti --- drivers/regulator/core.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 16114aea099a..06fbf18b6524 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -538,7 +538,8 @@ static int regulator_mode_constrain(struct regulator_dev *rdev, /* The modes are bitmasks, the most power hungry modes having * the lowest values. If the requested mode isn't supported -* try higher modes. */ +* try higher modes. +*/ while (*mode) { if (rdev->constraints->valid_modes_mask & *mode) return 0; @@ -931,7 +932,8 @@ static DEVICE_ATTR(bypass, 0444, regulator_bypass_show, NULL); /* Calculate the new optimum regulator operating mode based on the new total - * consumer load. All locks held by caller */ + * consumer load. All locks held by caller + */ static int drms_uA_update(struct regulator_dev *rdev) { struct regulator *sibling; @@ -1219,7 +1221,8 @@ static int machine_constraints_voltage(struct regulator_dev *rdev, int cmax = constraints->max_uV; /* it's safe to autoconfigure fixed-voltage supplies - and the constraints are used by list_voltage. */ +* and the constraints are used by list_voltage. +*/ if (count == 1 && !cmin) { cmin = 1; cmax = INT_MAX; @@ -2525,7 +2528,8 @@ static int _regulator_do_enable(struct regulator_dev *rdev) /* Allow the regulator to ramp; it would be useful to extend * this for bulk operations so that the regulators can ramp -* together. */ +* together. +*/ trace_regulator_enable_delay(rdev_get_name(rdev)); /* If poll_enabled_time is set, poll upto the delay calculated @@ -5337,10 +5341,12 @@ regulator_register(const struct regulator_desc *regulator_desc, ret = set_machine_constraints(rdev); if (ret == -EPROBE_DEFER) { /* Regulator might be in bypass mode and so needs its supply -* to set the constraints */ +* to set the constraints +*/ /* FIXME: this currently triggers a chicken-and-egg problem * when creating -SUPPLY symlink in sysfs to a regulator -* that is just being created */ +* that is just being created +*/ rdev_dbg(rdev, "will resolve supply early: %s\n", rdev->supply_name); ret = regulator_resolve_supply(rdev); @@ -5899,7 +5905,8 @@ static int regulator_late_cleanup(struct device *dev, void *data) if (have_full_constraints()) { /* We log since this may kill the system if it goes -* wrong. */ +* wrong. +*/ rdev_info(rdev, "disabling\n"); ret = _regulator_do_disable(rdev); if (ret != 0) -- 2.17.1
Re: [PATCH v4 2/4] pinctrl: add pinctrl driver on mt8195
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao wrote: > > This commit includes pinctrl driver for mt8195. > > Signed-off-by: Zhiyong Tao Acked-by: Sean Wang > --- > drivers/pinctrl/mediatek/Kconfig |6 + > drivers/pinctrl/mediatek/Makefile |1 + > drivers/pinctrl/mediatek/pinctrl-mt8195.c | 828 > drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 + > 4 files changed, 2504 insertions(+) > create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c > create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h > > diff --git a/drivers/pinctrl/mediatek/Kconfig > b/drivers/pinctrl/mediatek/Kconfig > index eef17f228669..90f0c8255eaf 100644 > --- a/drivers/pinctrl/mediatek/Kconfig > +++ b/drivers/pinctrl/mediatek/Kconfig > @@ -147,6 +147,12 @@ config PINCTRL_MT8192 > default ARM64 && ARCH_MEDIATEK > select PINCTRL_MTK_PARIS > > +config PINCTRL_MT8195 > + bool "Mediatek MT8195 pin control" > + depends on OF > + depends on ARM64 || COMPILE_TEST > + select PINCTRL_MTK_PARIS > + > config PINCTRL_MT8516 > bool "Mediatek MT8516 pin control" > depends on OF > diff --git a/drivers/pinctrl/mediatek/Makefile > b/drivers/pinctrl/mediatek/Makefile > index 01218bf4dc30..06fde993ace2 100644 > --- a/drivers/pinctrl/mediatek/Makefile > +++ b/drivers/pinctrl/mediatek/Makefile > @@ -21,5 +21,6 @@ obj-$(CONFIG_PINCTRL_MT8167) += pinctrl-mt8167.o > obj-$(CONFIG_PINCTRL_MT8173) += pinctrl-mt8173.o > obj-$(CONFIG_PINCTRL_MT8183) += pinctrl-mt8183.o > obj-$(CONFIG_PINCTRL_MT8192) += pinctrl-mt8192.o > +obj-$(CONFIG_PINCTRL_MT8195)+= pinctrl-mt8195.o > obj-$(CONFIG_PINCTRL_MT8516) += pinctrl-mt8516.o > obj-$(CONFIG_PINCTRL_MT6397) += pinctrl-mt6397.o > diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c > b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > new file mode 100644 > index ..063f164d7c9b > --- /dev/null > +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c > @@ -0,0 +1,828 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2020 MediaTek Inc. > + * > + * Author: Zhiyong Tao > + * > + */ > + > +#include "pinctrl-mtk-mt8195.h" > +#include "pinctrl-paris.h" > + > +/* MT8195 have multiple bases to program pin configuration listed as the > below: > + * iocfg[0]:0x10005000, iocfg[1]:0x11d1, iocfg[2]:0x11d3, > + * iocfg[3]:0x11d4, iocfg[4]:0x11e2, iocfg[5]:0x11eb, > + * iocfg[6]:0x11f4. > + * _i_based could be used to indicate what base the pin should be mapped > into. > + */ > + > +#define PIN_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) > \ > + PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, \ > + 32, 0) > + > +#define PINS_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, > x_bits) \ > + PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, > \ > + 32, 1) > + > +static const struct mtk_pin_field_calc mt8195_pin_mode_range[] = { > + PIN_FIELD(0, 144, 0x300, 0x10, 0, 4), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_dir_range[] = { > + PIN_FIELD(0, 144, 0x0, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_di_range[] = { > + PIN_FIELD(0, 144, 0x200, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_do_range[] = { > + PIN_FIELD(0, 144, 0x100, 0x10, 0, 1), > +}; > + > +static const struct mtk_pin_field_calc mt8195_pin_ies_range[] = { > + PIN_FIELD_BASE(0, 0, 4, 0x040, 0x10, 0, 1), > + PIN_FIELD_BASE(1, 1, 4, 0x040, 0x10, 1, 1), > + PIN_FIELD_BASE(2, 2, 4, 0x040, 0x10, 2, 1), > + PIN_FIELD_BASE(3, 3, 4, 0x040, 0x10, 3, 1), > + PIN_FIELD_BASE(4, 4, 4, 0x040, 0x10, 4, 1), > + PIN_FIELD_BASE(5, 5, 4, 0x040, 0x10, 5, 1), > + PIN_FIELD_BASE(6, 6, 4, 0x040, 0x10, 6, 1), > + PIN_FIELD_BASE(7, 7, 4, 0x040, 0x10, 7, 1), > + PIN_FIELD_BASE(8, 8, 4, 0x040, 0x10, 13, 1), > + PIN_FIELD_BASE(9, 9, 4, 0x040, 0x10, 8, 1), > + PIN_FIELD_BASE(10, 10, 4, 0x040, 0x10, 14, 1), > + PIN_FIELD_BASE(11, 11, 4, 0x040, 0x10, 9, 1), > + PIN_FIELD_BASE(12, 12, 4, 0x040, 0x10, 15, 1), > + PIN_FIELD_BASE(13, 13, 4, 0x040, 0x10, 10, 1), > + PIN_FIELD_BASE(14, 14, 4, 0x040, 0x10, 16, 1), > + PIN_FIELD_BASE(15, 15, 4, 0x040, 0x10, 11, 1), > + PIN_FIELD_BASE(16, 16, 4, 0x040, 0x10, 17, 1), > + PIN_FIELD_BASE(17, 17, 4, 0x040, 0x10, 12, 1), > + PIN_FIELD_BASE(18, 18, 2, 0x040, 0x10, 5, 1), > + PIN_FIELD_BASE(19, 19, 2, 0x040, 0x10, 12, 1), > + PIN_FIELD_BASE(20, 20, 2, 0x040, 0x10, 11, 1), > + PIN_FIELD_BASE(21, 21, 2, 0x040, 0x10, 10, 1), > + PIN_FIELD_BASE(22, 22, 2, 0x040, 0x10, 0, 1), > + PIN_FIELD_BASE(23, 23, 2, 0x040, 0x10, 1, 1), > + PIN_FIELD_BASE(24, 24, 2, 0x040, 0x10, 2, 1), > +
Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus
Tyrel Datwyler writes: > On 4/17/21 5:30 AM, Michael Ellerman wrote: >> Tyrel Datwyler writes: >>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote: Currently, neither the vio_bus or vio_driver structures provide support for a shutdown() routine. Add support for shutdown() by allowing drivers to provide a implementation via function pointer in their vio_driver struct and provide a proper implementation in the driver template for the vio_bus that calls a vio drivers shutdown() if defined. In the case that no shutdown() is defined by a vio driver and a kexec is in progress we implement a big hammer that calls remove() to ensure no further DMA for the devices is possible. Signed-off-by: Tyrel Datwyler --- >>> >>> Ping... any comments, problems with this approach? >> >> The kexec part seems like a bit of a hack. >> >> It also doesn't help for kdump, when none of the shutdown code is run. > > If I understand correctly for kdump we have a reserved memory space where the > kdump kernel is loaded, but for kexec the memory region isn't reserved ahead > of > time meaning we can try and load the kernel over potential memory used for DMA > by the current kernel. That's correct. >> How many drivers do we have? Can we just implement a proper shutdown for >> them? > > Well that is the end goal. I just don't currently have the bandwidth to do > each > driver myself with a proper shutdown sequence, and thought this was a > launching > off point to at least introduce the shutdown callback to the VIO bus. Fair enough. > Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, > vmc, > pseries_rng, nx, nx842, hvcs, hvc_vio. > > I can drop the kexec_in_progress hammer and just have each driver call > remove() > themselves in their shutdown function. Leave it to each maintainer to decide > if > remove() is enough or if there is a more lightweight quiesce sequence they > choose to implement. That's OK, you've convinced me. I'll take it as-is. Eventually it would be good for drivers to implement shutdown in the optimal way for their device, but that can be done incrementally. cheers
Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote: > On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote: > > cap_setfcap is required to create file capabilities. > > > > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a > > process running as uid 0 but without cap_setfcap is able to work around > > this as follows: unshare a new user namespace which maps parent uid 0 > > into the child namespace. While this task will not have new > > capabilities against the parent namespace, there is a loophole due to > > the way namespaced file capabilities are represented as xattrs. File > > capabilities valid in userns 1 are distinguished from file capabilities > > valid in userns 2 by the kuid which underlies uid 0. Therefore the > > restricted root process can unshare a new self-mapping namespace, add a > > namespaced file capability onto a file, then use that file capability in > > the parent namespace. > > > > To prevent that, do not allow mapping parent uid 0 if the process which > > opened the uid_map file does not have CAP_SETFCAP, which is the capability > > for setting file capabilities. > > > > As a further wrinkle: a task can unshare its user namespace, then > > open its uid_map file itself, and map (only) its own uid. In this > > case we do not have the credential from before unshare, which was > > potentially more restricted. So, when creating a user namespace, we > > record whether the creator had CAP_SETFCAP. Then we can use that > > during map_write(). > > > > With this patch: > > > > 1. Unprivileged user can still unshare -Ur > > > > ubuntu@caps:~$ unshare -Ur > > root@caps:~# logout > > > > 2. Root user can still unshare -Ur > > > > ubuntu@caps:~$ sudo bash > > root@caps:/home/ubuntu# unshare -Ur > > root@caps:/home/ubuntu# logout > > > > 3. Root user without CAP_SETFCAP cannot unshare -Ur: > > > > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- > > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap > > unable to set CAP_SETFCAP effective capability: Operation not permitted > > root@caps:/home/ubuntu# unshare -Ur > > unshare: write failed /proc/self/uid_map: Operation not permitted > > > > Note: an alternative solution would be to allow uid 0 mappings by > > processes without CAP_SETFCAP, but to prevent such a namespace from > > writing any file capabilities. This approach can be seen here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 > > > > Ah, can you link to the previous fix and its revert, please? I think > that was mentioned in the formerly private thread as well but we forgot: > > commit 95ebabde382c371572297915b104e55403674e73 > Author: Eric W. Biederman > Date: Thu Dec 17 09:42:00 2020 -0600 > > capabilities: Don't allow writing ambiguous v3 file capabilities > > commit 3b0c2d3eaa83da259d7726192cf55a137769012f > Author: Eric W. Biederman > Date: Fri Mar 12 15:07:09 2021 -0600 > > Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file > capabilities") Sure. Is there a tag for that kind of thing or do I just mention it at the end of the description?
Re: [PATCH v4 5/7] cpufreq: qcom-hw: Implement CPRh aware OSM programming
On 19-04-21, 13:52, Bjorn Andersson wrote: > On Tue 19 Jan 11:45 CST 2021, AngeloGioacchino Del Regno wrote: > @Viresh, do you have any suggestion regarding my last comment? > > static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev) > > { > > + const struct qcom_cpufreq_soc_data *soc_data; > > + struct device_node *pd_node; > > + struct platform_device *pd_dev; > > struct device *cpu_dev; > > struct clk *clk; > > - int ret; > > + int clk_div, ret; > > + > > + cpu_dev = get_cpu_device(0); > > + if (!cpu_dev) > > + return -EPROBE_DEFER; > > + > > + soc_data = of_device_get_match_data(>dev); > > + if (!soc_data) > > + return -EINVAL; > > + > > + if (!soc_data->uses_tz) { > > + /* > > +* When the OSM is not pre-programmed from TZ, we will > > +* need to program the sequencer through SCM calls. > > +*/ > > + if (!qcom_scm_is_available()) > > + return -EPROBE_DEFER; > > + > > + /* > > +* If there are no power-domains, OSM programming cannot be > > +* performed, as in that case, we wouldn't know where to take > > +* the params from... > > +*/ > > + pd_node = of_parse_phandle(cpu_dev->of_node, > > + "power-domains", 0); > > + if (!pd_node) { > > + ret = PTR_ERR(pd_node); > > + dev_err(cpu_dev, "power domain not found: %d\n", ret); > > + return ret; > > + } > > + > > + /* > > +* If the power domain device is not registered yet, then > > +* defer probing this driver until that is available. > > +*/ > > + pd_dev = of_find_device_by_node(pd_node); > > + if (!pd_dev || !pd_dev->dev.driver || > > + !device_is_bound(_dev->dev)) > > + return -EPROBE_DEFER; > > I wonder if there's a more appropriate way to probe defer on resources > described in the CPU nodes... Recently we made some updates to the OPP core to start returning EPROBE_DEFER on failure to acquire resources. I think you can get rid of many checks for resources here by just trying to create the OPP table and check its return value for EPROBE_DEFER. -- viresh
Re: Enabling pmbus power control
On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote: > On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote: > > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote: > > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote: > > > > > > > Okay, to expand a bit on the description in my initial message -- we've > > > > got a single chassis with multiple server boards and a single manager > > > > board > > > > that handles, among other things, power control for the servers. > > > > The manager board has one LM25066 for each attached server, which acts > > > > as > > > > the "power switch" for that server. There thus really isn't any driver > > > > to > > > > speak of for the downstream device. > > > > > > This sounds like you need a driver representing those server boards (or > > > the slots they plug into perhaps) that represents everything about those > > > boards to userspace, including power switching. I don't see why you > > > wouldn't have a driver for that - it's a thing that physically exists > > > and clearly has some software control, and you'd presumably also expect > > > to represent some labelling about the slot as well. > > > > Absolutely agree. > > > > Thanks, > > Guenter > > Hi Guenter, Mark, > > I wanted to return to this to try to explain why structuring the kernel > support for this in a way that's specific to the device behind the PMIC > seems like an awkward fit to me, and ultimately kind of artificial. > > In the system I described, the manager board with the LM25066 devices is its > own little self-contained BMC system running its own Linux kernel (though > "BMC" is perhaps a slightly misleading term because there's no host system > that it manages). The PMICs are really the only relevant connection it has > to the servers it controls power for -- they have their own dedicated local > BMCs on board as well doing all the usual BMC tasks. A hypothetical > dedicated driver for this on the manager board wouldn't have any other > hardware to touch aside from the pmbus interface of the LM25066 itself, so > calling it a server board driver seems pretty arbitrary -- and in fact the > same system also has another LM25066 that controls the power supply to the > chassis cooling fans (a totally different downstream device, but one that > would be equally well-served by the same software). > > More recently, another system has entered the picture for us that might > illustrate it more starkly -- the Delta Open19 power shelf [0] supported by > a recent code release from LinkedIn [1]. This is a rackmount power supply All I can see is that this code is a mess. > with fifty outputs, each independently switchable via an LM25066 attached to > an on-board BMC-style management controller (though again, no host system > involved). We (Equinix Metal) are interested in porting a modern OpenBMC to > it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it > currently runs (as found in the linked repo). The exact nature of the > devices powered by its outputs is well outside the scope of the firmware > running on that controller (it could be any arbitrary thing that runs on > 12VDC), but we still want to be able to both (a) retrieve per-output > voltage/current/power readings as provided by the existing LM25066 hwmon > support, and (b) control the on/off state of those outputs from userspace. > > Given the array of possible use-cases, an approach of adding power-switch > functionality to the existing LM25066 support seems like the most obvious > way to support this, so I'm hoping to see if I can get some idea of what an > acceptable implementation of that might look like. > Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before, and it is still true: The hwmon subsystem is not in the business of power control. Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that or something similar could be used for your purposes. Thanks, Guenter
[Patch v3 7/7] crypto: qce: aead: Schedule fallback algorithm
Qualcomm crypto engine does not handle the following scenarios and will issue an abort. In such cases, pass on the transformation to a fallback algorithm. - DES3 algorithms with all three keys same. - AES192 algorithms. - 0 length messages. Signed-off-by: Thara Gopinath --- v1->v2: - Updated crypto_aead_set_reqsize to include the size of fallback request as well. drivers/crypto/qce/aead.c | 64 --- drivers/crypto/qce/aead.h | 3 ++ 2 files changed, 56 insertions(+), 11 deletions(-) diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c index ef66ae21eae3..6d06a19b48e4 100644 --- a/drivers/crypto/qce/aead.c +++ b/drivers/crypto/qce/aead.c @@ -512,7 +512,23 @@ static int qce_aead_crypt(struct aead_request *req, int encrypt) /* CE does not handle 0 length messages */ if (!rctx->cryptlen) { if (!(IS_CCM(rctx->flags) && IS_DECRYPT(rctx->flags))) - return -EINVAL; + ctx->need_fallback = true; + } + + /* If fallback is needed, schedule and exit */ + if (ctx->need_fallback) { + /* Reset need_fallback in case the same ctx is used for another transaction */ + ctx->need_fallback = false; + + aead_request_set_tfm(>fallback_req, ctx->fallback); + aead_request_set_callback(>fallback_req, req->base.flags, + req->base.complete, req->base.data); + aead_request_set_crypt(>fallback_req, req->src, + req->dst, req->cryptlen, req->iv); + aead_request_set_ad(>fallback_req, req->assoclen); + + return encrypt ? crypto_aead_encrypt(>fallback_req) : +crypto_aead_decrypt(>fallback_req); } /* @@ -553,7 +569,7 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, const u8 *key, memcpy(ctx->ccm4309_salt, key + keylen, QCE_CCM4309_SALT_SIZE); } - if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256) + if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256 && keylen != AES_KEYSIZE_192) return -EINVAL; ctx->enc_keylen = keylen; @@ -562,7 +578,12 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, const u8 *key, memcpy(ctx->enc_key, key, keylen); memcpy(ctx->auth_key, key, keylen); - return 0; + if (keylen == AES_KEYSIZE_192) + ctx->need_fallback = true; + + return IS_CCM_RFC4309(flags) ? + crypto_aead_setkey(ctx->fallback, key, keylen + QCE_CCM4309_SALT_SIZE) : + crypto_aead_setkey(ctx->fallback, key, keylen); } static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int keylen) @@ -593,20 +614,21 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int * The crypto engine does not support any two keys * being the same for triple des algorithms. The * verify_skcipher_des3_key does not check for all the -* below conditions. Return -EINVAL in case any two keys -* are the same. Revisit to see if a fallback cipher -* is needed to handle this condition. +* below conditions. Schedule fallback in this case. */ memcpy(_key, authenc_keys.enckey, DES3_EDE_KEY_SIZE); if (!((_key[0] ^ _key[2]) | (_key[1] ^ _key[3])) || !((_key[2] ^ _key[4]) | (_key[3] ^ _key[5])) || !((_key[0] ^ _key[4]) | (_key[1] ^ _key[5]))) - return -EINVAL; + ctx->need_fallback = true; } else if (IS_AES(flags)) { /* No random key sizes */ if (authenc_keys.enckeylen != AES_KEYSIZE_128 && + authenc_keys.enckeylen != AES_KEYSIZE_192 && authenc_keys.enckeylen != AES_KEYSIZE_256) return -EINVAL; + if (authenc_keys.enckeylen == AES_KEYSIZE_192) + ctx->need_fallback = true; } ctx->enc_keylen = authenc_keys.enckeylen; @@ -617,7 +639,7 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int memset(ctx->auth_key, 0, sizeof(ctx->auth_key)); memcpy(ctx->auth_key, authenc_keys.authkey, authenc_keys.authkeylen); - return 0; + return crypto_aead_setkey(ctx->fallback, key, keylen); } static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize) @@ -632,15 +654,33 @@ static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize) return -EINVAL; } ctx->authsize = authsize; - return 0; + + return crypto_aead_setauthsize(ctx->fallback, authsize); }
[Patch v3 6/7] crypto: qce: common: Add support for AEAD algorithms
Add register programming sequence for enabling AEAD algorithms on the Qualcomm crypto engine. Signed-off-by: Thara Gopinath --- v2->v3: - Made qce_be32_to_cpu_array truly be32 to cpu endian by using be32_to_cpup instead of cpu_to_be32p. Also remove the (u32 *) typcasting of arrays obtained as output from qce_be32_to_cpu_array as per Bjorn's review comments. - Wrapped newly introduced std_iv_sha1, std_iv_sha256 and qce_be32_to_cpu_array in CONFIG_CRYPTO_DEV_QCE_AEAD to prevent W1 warnings as reported by kernel test robot . v1->v2: - Minor fixes like removing not needed initializing of variables and using bool values in lieu of 0 and 1 as pointed out by Bjorn. - Introduced qce_be32_to_cpu_array which converts the u8 string in big endian order to array of u32 and returns back total number of words, as per Bjorn's review comments. Presently this function is used only by qce_setup_regs_aead to format keys, iv and nonce. cipher and hash algorithms can be made to use this function as a separate clean up patch. drivers/crypto/qce/common.c | 162 +++- 1 file changed, 160 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index 7b3d6caec1b2..6d6b3792323b 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -15,6 +15,7 @@ #include "core.h" #include "regs-v5.h" #include "sha.h" +#include "aead.h" static inline u32 qce_read(struct qce_device *qce, u32 offset) { @@ -96,7 +97,7 @@ static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) qce_write(qce, REG_GOPROC, BIT(GO_SHIFT)); } -#ifdef CONFIG_CRYPTO_DEV_QCE_SHA +#if defined(CONFIG_CRYPTO_DEV_QCE_SHA) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD) static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) { u32 cfg = 0; @@ -139,7 +140,9 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) return cfg; } +#endif +#ifdef CONFIG_CRYPTO_DEV_QCE_SHA static int qce_setup_regs_ahash(struct crypto_async_request *async_req) { struct ahash_request *req = ahash_request_cast(async_req); @@ -225,7 +228,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) } #endif -#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER +#if defined(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD) static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size) { u32 cfg = 0; @@ -271,7 +274,9 @@ static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size) return cfg; } +#endif +#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize) { u8 swap[QCE_AES_IV_LENGTH]; @@ -386,6 +391,155 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req) } #endif +#ifdef CONFIG_CRYPTO_DEV_QCE_AEAD +static const u32 std_iv_sha1[SHA256_DIGEST_SIZE / sizeof(u32)] = { + SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4, 0, 0, 0 +}; + +static const u32 std_iv_sha256[SHA256_DIGEST_SIZE / sizeof(u32)] = { + SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3, + SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7 +}; + +static unsigned int qce_be32_to_cpu_array(u32 *dst, const u8 *src, unsigned int len) +{ + u32 *d = dst; + const u8 *s = src; + unsigned int n; + + n = len / sizeof(u32); + for (; n > 0; n--) { + *d = be32_to_cpup((const __be32 *)s); + s += sizeof(u32); + d++; + } + return DIV_ROUND_UP(len, sizeof(u32)); +} + +static int qce_setup_regs_aead(struct crypto_async_request *async_req) +{ + struct aead_request *req = aead_request_cast(async_req); + struct qce_aead_reqctx *rctx = aead_request_ctx(req); + struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm); + struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req)); + struct qce_device *qce = tmpl->qce; + u32 enckey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0}; + u32 enciv[QCE_MAX_IV_SIZE / sizeof(u32)] = {0}; + u32 authkey[QCE_SHA_HMAC_KEY_SIZE / sizeof(u32)] = {0}; + u32 authiv[SHA256_DIGEST_SIZE / sizeof(u32)] = {0}; + u32 authnonce[QCE_MAX_NONCE / sizeof(u32)] = {0}; + unsigned int enc_keylen = ctx->enc_keylen; + unsigned int auth_keylen = ctx->auth_keylen; + unsigned int enc_ivsize = rctx->ivsize; + unsigned int auth_ivsize; + unsigned int enckey_words, enciv_words; + unsigned int authkey_words, authiv_words, authnonce_words; + unsigned long flags = rctx->flags; + u32 encr_cfg, auth_cfg, config, totallen; + u32 iv_last_word; + + qce_setup_config(qce); + + /* Write encryption key */ + enckey_words = qce_be32_to_cpu_array(enckey, ctx->enc_key,
[Patch v3 5/7] crypto: qce: common: Clean up qce_auth_cfg
Remove various redundant checks in qce_auth_cfg. Also allow qce_auth_cfg to take auth_size as a parameter which is a required setting for ccm(aes) algorithms Signed-off-by: Thara Gopinath --- drivers/crypto/qce/common.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index 7b5bc5a6ae81..7b3d6caec1b2 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -97,11 +97,11 @@ static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) } #ifdef CONFIG_CRYPTO_DEV_QCE_SHA -static u32 qce_auth_cfg(unsigned long flags, u32 key_size) +static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size) { u32 cfg = 0; - if (IS_AES(flags) && (IS_CCM(flags) || IS_CMAC(flags))) + if (IS_CCM(flags) || IS_CMAC(flags)) cfg |= AUTH_ALG_AES << AUTH_ALG_SHIFT; else cfg |= AUTH_ALG_SHA << AUTH_ALG_SHIFT; @@ -119,15 +119,16 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size) cfg |= AUTH_SIZE_SHA256 << AUTH_SIZE_SHIFT; else if (IS_CMAC(flags)) cfg |= AUTH_SIZE_ENUM_16_BYTES << AUTH_SIZE_SHIFT; + else if (IS_CCM(flags)) + cfg |= (auth_size - 1) << AUTH_SIZE_SHIFT; if (IS_SHA1(flags) || IS_SHA256(flags)) cfg |= AUTH_MODE_HASH << AUTH_MODE_SHIFT; - else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags) || -IS_CBC(flags) || IS_CTR(flags)) + else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags)) cfg |= AUTH_MODE_HMAC << AUTH_MODE_SHIFT; - else if (IS_AES(flags) && IS_CCM(flags)) + else if (IS_CCM(flags)) cfg |= AUTH_MODE_CCM << AUTH_MODE_SHIFT; - else if (IS_AES(flags) && IS_CMAC(flags)) + else if (IS_CMAC(flags)) cfg |= AUTH_MODE_CMAC << AUTH_MODE_SHIFT; if (IS_SHA(flags) || IS_SHA_HMAC(flags)) @@ -136,10 +137,6 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size) if (IS_CCM(flags)) cfg |= QCE_MAX_NONCE_WORDS << AUTH_NONCE_NUM_WORDS_SHIFT; - if (IS_CBC(flags) || IS_CTR(flags) || IS_CCM(flags) || - IS_CMAC(flags)) - cfg |= BIT(AUTH_LAST_SHIFT) | BIT(AUTH_FIRST_SHIFT); - return cfg; } @@ -171,7 +168,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) qce_clear_array(qce, REG_AUTH_KEY0, 16); qce_clear_array(qce, REG_AUTH_BYTECNT0, 4); - auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen); + auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen, digestsize); } if (IS_SHA_HMAC(rctx->flags) || IS_CMAC(rctx->flags)) { @@ -199,7 +196,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) qce_write_array(qce, REG_AUTH_BYTECNT0, (u32 *)rctx->byte_count, 2); - auth_cfg = qce_auth_cfg(rctx->flags, 0); + auth_cfg = qce_auth_cfg(rctx->flags, 0, digestsize); if (rctx->last_blk) auth_cfg |= BIT(AUTH_LAST_SHIFT); -- 2.25.1
[Patch v3 2/7] crypto: qce: common: Make result dump optional
Qualcomm crypto engine allows for IV registers and status register to be concatenated to the output. This option is enabled by setting the RESULTS_DUMP field in GOPROC register. This is useful for most of the algorithms to either retrieve status of operation or in case of authentication algorithms to retrieve the mac. But for ccm algorithms, the mac is part of the output stream and not retrieved from the IV registers, thus needing a separate buffer to retrieve it. Make enabling RESULTS_DUMP field optional so that algorithms can choose whether or not to enable the option. Note that in this patch, the enabled algorithms always choose RESULTS_DUMP to be enabled. But later with the introduction of ccm algorithms, this changes. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- drivers/crypto/qce/common.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index dd76175d5c62..7b5bc5a6ae81 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -88,9 +88,12 @@ static void qce_setup_config(struct qce_device *qce) qce_write(qce, REG_CONFIG, config); } -static inline void qce_crypto_go(struct qce_device *qce) +static inline void qce_crypto_go(struct qce_device *qce, bool result_dump) { - qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT)); + if (result_dump) + qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT)); + else + qce_write(qce, REG_GOPROC, BIT(GO_SHIFT)); } #ifdef CONFIG_CRYPTO_DEV_QCE_SHA @@ -219,7 +222,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req) config = qce_config_reg(qce, 1); qce_write(qce, REG_CONFIG, config); - qce_crypto_go(qce); + qce_crypto_go(qce, true); return 0; } @@ -380,7 +383,7 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req) config = qce_config_reg(qce, 1); qce_write(qce, REG_CONFIG, config); - qce_crypto_go(qce); + qce_crypto_go(qce, true); return 0; } -- 2.25.1
[Patch v3 4/7] crypto: qce: Add support for AEAD algorithms
Introduce support to enable following algorithms in Qualcomm Crypto Engine. - authenc(hmac(sha1),cbc(des)) - authenc(hmac(sha1),cbc(des3_ede)) - authenc(hmac(sha256),cbc(des)) - authenc(hmac(sha256),cbc(des3_ede)) - authenc(hmac(sha256),cbc(aes)) - ccm(aes) - rfc4309(ccm(aes)) Signed-off-by: Thara Gopinath --- v1->v2: - Removed spurious totallen from qce_aead_async_req_handle since it was unused as pointed out by Hubert. - Updated qce_dma_prep_sgs to use #nents returned by dma_map_sg rather than using precomputed #nents. drivers/crypto/Kconfig | 15 + drivers/crypto/qce/Makefile | 1 + drivers/crypto/qce/aead.c | 799 drivers/crypto/qce/aead.h | 53 +++ drivers/crypto/qce/common.h | 2 + drivers/crypto/qce/core.c | 4 + 6 files changed, 874 insertions(+) create mode 100644 drivers/crypto/qce/aead.c create mode 100644 drivers/crypto/qce/aead.h diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index 9a4c275a1335..1fe5b7eafc02 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -627,6 +627,12 @@ config CRYPTO_DEV_QCE_SHA select CRYPTO_SHA1 select CRYPTO_SHA256 +config CRYPTO_DEV_QCE_AEAD + bool + depends on CRYPTO_DEV_QCE + select CRYPTO_AUTHENC + select CRYPTO_LIB_DES + choice prompt "Algorithms enabled for QCE acceleration" default CRYPTO_DEV_QCE_ENABLE_ALL @@ -647,6 +653,7 @@ choice bool "All supported algorithms" select CRYPTO_DEV_QCE_SKCIPHER select CRYPTO_DEV_QCE_SHA + select CRYPTO_DEV_QCE_AEAD help Enable all supported algorithms: - AES (CBC, CTR, ECB, XTS) @@ -672,6 +679,14 @@ choice - SHA1, HMAC-SHA1 - SHA256, HMAC-SHA256 + config CRYPTO_DEV_QCE_ENABLE_AEAD + bool "AEAD algorithms only" + select CRYPTO_DEV_QCE_AEAD + help + Enable AEAD algorithms only: + - authenc() + - ccm(aes) + - rfc4309(ccm(aes)) endchoice config CRYPTO_DEV_QCE_SW_MAX_LEN diff --git a/drivers/crypto/qce/Makefile b/drivers/crypto/qce/Makefile index 14ade8a7d664..2cf8984e1b85 100644 --- a/drivers/crypto/qce/Makefile +++ b/drivers/crypto/qce/Makefile @@ -6,3 +6,4 @@ qcrypto-objs := core.o \ qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SHA) += sha.o qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) += skcipher.o +qcrypto-$(CONFIG_CRYPTO_DEV_QCE_AEAD) += aead.o diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c new file mode 100644 index ..ef66ae21eae3 --- /dev/null +++ b/drivers/crypto/qce/aead.c @@ -0,0 +1,799 @@ +// SPDX-License-Identifier: GPL-2.0-only + +/* + * Copyright (C) 2021, Linaro Limited. All rights reserved. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "aead.h" + +#define CCM_NONCE_ADATA_SHIFT 6 +#define CCM_NONCE_AUTHSIZE_SHIFT 3 +#define MAX_CCM_ADATA_HEADER_LEN6 + +static LIST_HEAD(aead_algs); + +static void qce_aead_done(void *data) +{ + struct crypto_async_request *async_req = data; + struct aead_request *req = aead_request_cast(async_req); + struct qce_aead_reqctx *rctx = aead_request_ctx(req); + struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm); + struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req)); + struct qce_device *qce = tmpl->qce; + struct qce_result_dump *result_buf = qce->dma.result_buf; + enum dma_data_direction dir_src, dir_dst; + bool diff_dst; + int error; + u32 status; + unsigned int totallen; + unsigned char tag[SHA256_DIGEST_SIZE] = {0}; + int ret = 0; + + diff_dst = (req->src != req->dst) ? true : false; + dir_src = diff_dst ? DMA_TO_DEVICE : DMA_BIDIRECTIONAL; + dir_dst = diff_dst ? DMA_FROM_DEVICE : DMA_BIDIRECTIONAL; + + error = qce_dma_terminate_all(>dma); + if (error) + dev_dbg(qce->dev, "aead dma termination error (%d)\n", + error); + if (diff_dst) + dma_unmap_sg(qce->dev, rctx->src_sg, rctx->src_nents, dir_src); + + dma_unmap_sg(qce->dev, rctx->dst_sg, rctx->dst_nents, dir_dst); + + if (IS_CCM(rctx->flags)) { + if (req->assoclen) { + sg_free_table(>src_tbl); + if (diff_dst) + sg_free_table(>dst_tbl); + } else { + if (!(IS_DECRYPT(rctx->flags) && !diff_dst)) + sg_free_table(>dst_tbl); + } + } else { + sg_free_table(>dst_tbl); + } + + error = qce_check_status(qce, ); + if (error < 0 && (error !=
[Patch v3 3/7] crypto: qce: Add mode for rfc4309
rf4309 is the specification that uses aes ccm algorithms with IPsec security packets. Add a submode to identify rfc4309 ccm(aes) algorithm in the crypto driver. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- v1->v2: - Moved up the QCE_ENCRYPT AND QCE_DECRYPT bit positions so that addition of other algorithms in future will not affect these macros. drivers/crypto/qce/common.h | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/qce/common.h b/drivers/crypto/qce/common.h index 3bc244bcca2d..b135440bf72b 100644 --- a/drivers/crypto/qce/common.h +++ b/drivers/crypto/qce/common.h @@ -51,9 +51,11 @@ #define QCE_MODE_CCM BIT(12) #define QCE_MODE_MASK GENMASK(12, 8) +#define QCE_MODE_CCM_RFC4309 BIT(13) + /* cipher encryption/decryption operations */ -#define QCE_ENCRYPTBIT(13) -#define QCE_DECRYPTBIT(14) +#define QCE_ENCRYPTBIT(30) +#define QCE_DECRYPTBIT(31) #define IS_DES(flags) (flags & QCE_ALG_DES) #define IS_3DES(flags) (flags & QCE_ALG_3DES) @@ -73,6 +75,7 @@ #define IS_CTR(mode) (mode & QCE_MODE_CTR) #define IS_XTS(mode) (mode & QCE_MODE_XTS) #define IS_CCM(mode) (mode & QCE_MODE_CCM) +#define IS_CCM_RFC4309(mode) ((mode) & QCE_MODE_CCM_RFC4309) #define IS_ENCRYPT(dir)(dir & QCE_ENCRYPT) #define IS_DECRYPT(dir)(dir & QCE_DECRYPT) -- 2.25.1
[Patch v3 0/7] Add support for AEAD algorithms in Qualcomm Crypto Engine driver
Enable support for AEAD algorithms in Qualcomm CE driver. The first three patches in this series are cleanups and add a few missing pieces required to add support for AEAD algorithms. Patch 4 introduces supported AEAD transformations on Qualcomm CE. Patches 5 and 6 implements the h/w infrastructure needed to enable and run the AEAD transformations on Qualcomm CE. Patch 7 adds support to queue fallback algorithms in case of unsupported special inputs. This patch series has been tested with in kernel crypto testing module tcrypt.ko with fuzz tests enabled as well. Thara Gopinath (7): crypto: qce: common: Add MAC failed error checking crypto: qce: common: Make result dump optional crypto: qce: Add mode for rfc4309 crypto: qce: Add support for AEAD algorithms crypto: qce: common: Clean up qce_auth_cfg crypto: qce: common: Add support for AEAD algorithms crypto: qce: aead: Schedule fallback algorithm drivers/crypto/Kconfig | 15 + drivers/crypto/qce/Makefile | 1 + drivers/crypto/qce/aead.c | 841 drivers/crypto/qce/aead.h | 56 +++ drivers/crypto/qce/common.c | 196 - drivers/crypto/qce/common.h | 9 +- drivers/crypto/qce/core.c | 4 + 7 files changed, 1102 insertions(+), 20 deletions(-) create mode 100644 drivers/crypto/qce/aead.c create mode 100644 drivers/crypto/qce/aead.h -- 2.25.1
[Patch v3 1/7] crypto: qce: common: Add MAC failed error checking
MAC_FAILED gets set in the status register if authenthication fails for ccm algorithms(during decryption). Add support to catch and flag this error. Reviewed-by: Bjorn Andersson Signed-off-by: Thara Gopinath --- v1->v2: - Split the error checking for -ENXIO and -EBADMSG into if-else clause so that the code is more readable as per Bjorn's review comment. drivers/crypto/qce/common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c index dceb9579d87a..dd76175d5c62 100644 --- a/drivers/crypto/qce/common.c +++ b/drivers/crypto/qce/common.c @@ -419,6 +419,8 @@ int qce_check_status(struct qce_device *qce, u32 *status) */ if (*status & STATUS_ERRORS || !(*status & BIT(OPERATION_DONE_SHIFT))) ret = -ENXIO; + else if (*status & BIT(MAC_FAILED_SHIFT)) + ret = -EBADMSG; return ret; } -- 2.25.1
Re: [PATCH] net: ethernet: ravb: Fix release of refclk
On Mon, Apr 19, 2021 at 5:45 PM David Miller wrote: > > From: Adam Ford > Date: Sat, 17 Apr 2021 08:23:29 -0500 > > > The call to clk_disable_unprepare() can happen before priv is > > initialized. This means moving clk_disable_unprepare out of > > out_release into a new label. > > > > Fixes: 8ef7adc6beb2("net: ethernet: ravb: Enable optional refclk") > > Signed-off-by: Adam Ford > Thjis does not apply cleanly, please rebbase and resubmit. Which branch should I use as the rebase? I used net-next because that's where the bug is, but I know it changes frequently. > > Please fix the formatting of your Fixes tag while you are at it, thank you. no problem. Sorry about that adam
RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within a die
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Friday, March 19, 2021 11:02 PM > To: Jonathan Cameron > Cc: Song Bao Hua (Barry Song) ; > tim.c.c...@linux.intel.com; catalin.mari...@arm.com; w...@kernel.org; > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de; > t...@linutronix.de; mi...@redhat.com; l...@kernel.org; pet...@infradead.org; > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com; > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com; > juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com; > aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org; > linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org; > xuwei (O) ; Zengtao (B) ; > guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth) > ; linux...@openeuler.org; h...@zytor.com > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within > a die > > On Fri, Mar 19, 2021 at 09:36:16AM +, Jonathan Cameron wrote: > > On Fri, 19 Mar 2021 06:57:08 + > > "Song Bao Hua (Barry Song)" wrote: > > > > > > -Original Message- > > > > From: Greg KH [mailto:gre...@linuxfoundation.org] > > > > Sent: Friday, March 19, 2021 7:35 PM > > > > To: Song Bao Hua (Barry Song) > > > > Cc: tim.c.c...@linux.intel.com; catalin.mari...@arm.com; > w...@kernel.org; > > > > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de; > > > > t...@linutronix.de; mi...@redhat.com; l...@kernel.org; > pet...@infradead.org; > > > > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com; > > > > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com; > Jonathan > > > > Cameron ; juri.le...@redhat.com; > > > > mark.rutl...@arm.com; sudeep.ho...@arm.com; aubrey...@linux.intel.com; > > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; > > > > linux-a...@vger.kernel.org; x...@kernel.org; xuwei (O) > ; > > > > Zengtao (B) ; guodong...@linaro.org; > yangyicong > > > > ; Liguozhu (Kenneth) ; > > > > linux...@openeuler.org; h...@zytor.com > > > > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs > > > > within > > > > a die > > > > > > > > On Fri, Mar 19, 2021 at 05:16:15PM +1300, Barry Song wrote: > > > > > diff --git a/Documentation/admin-guide/cputopology.rst > > > > b/Documentation/admin-guide/cputopology.rst > > > > > index b90dafc..f9d3745 100644 > > > > > --- a/Documentation/admin-guide/cputopology.rst > > > > > +++ b/Documentation/admin-guide/cputopology.rst > > > > > @@ -24,6 +24,12 @@ core_id: > > > > > identifier (rather than the kernel's). The actual value is > > > > > architecture and platform dependent. > > > > > > > > > > +cluster_id: > > > > > + > > > > > + the Cluster ID of cpuX. Typically it is the hardware platform's > > > > > + identifier (rather than the kernel's). The actual value is > > > > > + architecture and platform dependent. > > > > > + > > > > > book_id: > > > > > > > > > > the book ID of cpuX. Typically it is the hardware platform's > > > > > @@ -56,6 +62,14 @@ package_cpus_list: > > > > > human-readable list of CPUs sharing the same > > > > > physical_package_id. > > > > > (deprecated name: "core_siblings_list") > > > > > > > > > > +cluster_cpus: > > > > > + > > > > > + internal kernel map of CPUs within the same cluster. > > > > > + > > > > > +cluster_cpus_list: > > > > > + > > > > > + human-readable list of CPUs within the same cluster. > > > > > + > > > > > die_cpus: > > > > > > > > > > internal kernel map of CPUs within the same die. > > > > > > > > Why are these sysfs files in this file, and not in a Documentation/ABI/ > > > > file which can be correctly parsed and shown to userspace? > > > > > > Well. Those ABIs have been there for much a long time. It is like: > > > > > > [root@ceph1 topology]# ls > > > core_id core_siblings core_siblings_list physical_package_id > thread_siblings thread_siblings_list > > > [root@ceph1 topology]# pwd > > > /sys/devices/system/cpu/cpu100/topology > > > [root@ceph1 topology]# cat core_siblings_list > > > 64-127 > > > [root@ceph1 topology]# > > > > > > > > > > > Any chance you can fix that up here as well? > > > > > > Yes. we will send a separate patch to address this, which won't > > > be in this patchset. This patchset will base on that one. > > > > > > > > > > > Also note that "list" is not something that goes in sysfs, sysfs is "one > > > > value per file", and a list is not "one value". How do you prevent > > > > overflowing the buffer of the sysfs file if you have a "list"? > > > > > > > > > > At a glance, the list is using "-" rather than a real list > > > [root@ceph1 topology]# cat core_siblings_list > > > 64-127 > > > > > > Anyway, I will take a look if it has any chance to overflow. > > > > It could in theory be alternate CPUs as comma separated list. > > So it's would get interesting around 500-1000 cpus (guessing).
Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode
On Sat, Apr 17, 2021 at 1:16 AM Christophe Leroy wrote: > > > > Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit : > > On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet > > wrote: > >> > >> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann > >>> On 4/15/21 11:32 AM, Jianlin Lv wrote: > For debugging JITs, dumping the JITed image to kernel log is discouraged, > "bpftool prog dump jited" is much better way to examine JITed dumps. > This patch get rid of the code related to bpf_jit_enable=2 mode and > update the proc handler of bpf_jit_enable, also added auxiliary > information to explain how to use bpf_jit_disasm tool after this change. > > Signed-off-by: Jianlin Lv > >> > >> Hello, > >> > >> For what it's worth, I have already seen people dump the JIT image in > >> kernel logs in Qemu VMs running with just a busybox, not for kernel > >> development, but in a context where buiding/using bpftool was not > >> possible. > > > > If building/using bpftool is not possible then majority of selftests won't > > be exercised. I don't think such environment is suitable for any kind > > of bpf development. Much so for JIT debugging. > > While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers. > > I'd rather nuke that code instead of carrying it from kernel to kernel. > > > > When I implemented JIT for PPC32, it was extremely helpfull. > > As far as I understand, for the time being bpftool is not usable in my > environment because it > doesn't support cross compilation when the target's endianess differs from > the building host > endianess, see discussion at > https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/ > > That's right that selftests can't be exercised because they don't build. > > The question might be candid as I didn't investigate much about the > replacement of "bpf_jit_enable=2 > debugging mode" by bpftool, how do we use bpftool exactly for that ? > Especially when using the BPF > test module ? the kernel developers can add any amount of printk and dumps to debug their code, but such debugging aid should not be part of the production kernel. That sysctl was two things at once: debugging tool for kernel devs and introspection for users. bpftool jit dump solves the 2nd part. It provides JIT introspection to users. Debugging of the kernel can be done with any amount of auxiliary code including calling print_hex_dump() during jiting.
Re: [PATCH] bonding: 3ad: update slave arr after initialize
在 2021/4/16 12:28, Jay Vosburgh 写道: jinyiting wrote: From: jin yiting The bond works in mode 4, and performs down/up operations on the bond that is normally negotiated. The probability of bond-> slave_arr is NULL Test commands: ifconfig bond1 down ifconfig bond1 up The conflict occurs in the following process: __dev_open (CPU A) --bond_open --queue_delayed_work(bond->wq,>ad_work,0); --bond_update_slave_arr --bond_3ad_get_active_agg_info ad_work(CPU B) --bond_3ad_state_machine_handler --ad_agg_selection_logic ad_work runs on cpu B. In the function ad_agg_selection_logic, all agg->is_active will be cleared. Before the new active aggregator is selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A, bond->slave_arr will be set to NULL. The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr. Signed-off-by: jin yiting --- drivers/net/bonding/bond_3ad.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 6908822..d100079 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2327,6 +2327,12 @@ void bond_3ad_state_machine_handler(struct work_struct *work) aggregator = __get_first_agg(port); ad_agg_selection_logic(aggregator, _slave_arr); + if (!update_slave_arr) { + struct aggregator *active = __get_active_agg(aggregator); + + if (active && active->is_active) + update_slave_arr = true; + } } bond_3ad_set_carrier(bond); } The described issue is a race condition (in that ad_agg_selection_logic clears agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr is inspecting agg->is_active outside the lock). I don't see how the above change will reliably manage this; the real issue looks to be that bond_update_slave_arr is committing changes to the array (via bond_reset_slave_arr) based on a racy inspection of the active aggregator state while it is in flux. Also, the description of the issue says "The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr," but the change above does the opposite, and will set update_slave_arr when the aggregator has not changed (update_slave_arr remains false at return of ad_agg_selection_logic). I believe I understand the described problem, but I don't see how the patch fixes it. I suspect (but haven't tested) that the proper fix is to acquire mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info to avoid conflict with the state machine. -J --- -Jay Vosburgh, jay.vosbu...@canonical.com . Thank you for your reply. The last patch does have redundant update slave arr.Thank you for your correction. As you said, holding mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info can avoid conflictwith the state machine. I have tested this patch, with ifdown/ifup operations for bond or slaves. But bond_update_slave_arr is expected to hold RTNL only and NO other lock. And it have WARN_ON(lockdep_is_held(>mode_lock)); in bond_update_slave_arr. I'm not sure that holding mode_lock in bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a correct action. diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 74cbbb2..db988e5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info ad_info; + spin_lock_bh(>mode_lock); if (bond_3ad_get_active_agg_info(bond, _info)) { + spin_unlock_bh(>mode_lock); pr_debug("bond_3ad_get_active_agg_info failed\n"); /* No active aggragator means it's not safe to use * the previous array. @@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) bond_reset_slave_arr(bond); goto out; } + spin_unlock_bh(>mode_lock); agg_id = ad_info.aggregator_id; } bond_for_each_slave(bond, slave, iter) {
[PATCH v7 9/9] Documentation: arm64: Document PMU counters access from userspace
From: Raphael Gault Add documentation to describe the access to the pmu hardware counters from userspace. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Merge into existing arm64 perf.rst v6: - Update the chained event section with attr.config1 details v2: - Update links to test examples Changes from Raphael's v4: - Convert to rSt - Update chained event status - Add section for heterogeneous systems --- Documentation/arm64/perf.rst | 67 +++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst index b567f177d385..c40fc2adc451 100644 --- a/Documentation/arm64/perf.rst +++ b/Documentation/arm64/perf.rst @@ -2,7 +2,10 @@ .. _perf_index: -= + +Perf + + Perf Event Attributes = @@ -88,3 +91,65 @@ exclude_host. However when using !exclude_hv there is a small blackout window at the guest entry/exit where host events are not captured. On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +== + +Overview + +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +-- +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counter it is necessary to open the event +using the perf tool interface: the sys_perf_event_open syscall returns a fd which +can subsequently be used with the mmap syscall in order to retrieve a page of +memory containing information about the event. +The PMU driver uses this page to expose to the user the hardware counter's +index and other necessary data. Using this index enables the user to access the +PMU registers using the `mrs` instruction. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and 64-bit counters + +Chained events are not supported in conjunction with userspace counter +access. If a 64-bit counter is requested (attr.config1:0), then +userspace access must also be requested with attr.config1:1 set. This +will disable counter chaining. The 'pmc_width' in the user page will +indicate the actual width of the counter which could be only 32-bits +depending on event and PMU features. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c -- 2.27.0
[PATCH v7 7/9] perf: arm64: Add test for userspace counter access on heterogeneous systems
Userspace counter access only works on heterogeneous systems with some restrictions. The userspace process must be pinned to a homogeneous subset of CPUs and must open the corresponding PMU for those CPUs. This commit adds a test implementing these requirements. Signed-off-by: Rob Herring --- v6: - Add a check on cap_user_rdpmc v5: - Adapt to libperf mmap API changes v4: - Update perf_evsel__mmap params v2: - Drop all but heterogeneous test as others covered by libperf tests - Rework to use libperf --- tools/perf/arch/arm64/include/arch-tests.h | 7 + tools/perf/arch/arm64/tests/Build | 1 + tools/perf/arch/arm64/tests/arch-tests.c | 4 + tools/perf/arch/arm64/tests/user-events.c | 177 + 4 files changed, 189 insertions(+) create mode 100644 tools/perf/arch/arm64/tests/user-events.c diff --git a/tools/perf/arch/arm64/include/arch-tests.h b/tools/perf/arch/arm64/include/arch-tests.h index 90ec4c8cb880..380ad34a3f09 100644 --- a/tools/perf/arch/arm64/include/arch-tests.h +++ b/tools/perf/arch/arm64/include/arch-tests.h @@ -2,11 +2,18 @@ #ifndef ARCH_TESTS_H #define ARCH_TESTS_H +#include + #ifdef HAVE_DWARF_UNWIND_SUPPORT struct thread; struct perf_sample; +int test__arch_unwind_sample(struct perf_sample *sample, +struct thread *thread); #endif extern struct test arch_tests[]; +int test__rd_pinned(struct test __maybe_unused *test, + int __maybe_unused subtest); + #endif diff --git a/tools/perf/arch/arm64/tests/Build b/tools/perf/arch/arm64/tests/Build index a61c06bdb757..3f9a20c17fc6 100644 --- a/tools/perf/arch/arm64/tests/Build +++ b/tools/perf/arch/arm64/tests/Build @@ -1,4 +1,5 @@ perf-y += regs_load.o perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o +perf-y += user-events.o perf-y += arch-tests.o diff --git a/tools/perf/arch/arm64/tests/arch-tests.c b/tools/perf/arch/arm64/tests/arch-tests.c index 5b1543c98022..80ce7bd3c16d 100644 --- a/tools/perf/arch/arm64/tests/arch-tests.c +++ b/tools/perf/arch/arm64/tests/arch-tests.c @@ -10,6 +10,10 @@ struct test arch_tests[] = { .func = test__dwarf_unwind, }, #endif + { + .desc = "Pinned CPU user counter access", + .func = test__rd_pinned, + }, { .func = NULL, }, diff --git a/tools/perf/arch/arm64/tests/user-events.c b/tools/perf/arch/arm64/tests/user-events.c new file mode 100644 index ..c8efc6b369e6 --- /dev/null +++ b/tools/perf/arch/arm64/tests/user-events.c @@ -0,0 +1,177 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include +#include +#include + +#include "pmu.h" +#include "debug.h" +#include "tests/tests.h" +#include "arch-tests.h" + +static int run_test(struct perf_evsel *evsel) +{ + int n; + volatile int tmp = 0; + u64 delta, i, loops = 1000; + struct perf_counts_values counts = { .val = 0 }; + + for (n = 0; n < 6; n++) { + u64 stamp, now; + + perf_evsel__read(evsel, 0, 0, ); + stamp = counts.val; + + for (i = 0; i < loops; i++) + tmp++; + + perf_evsel__read(evsel, 0, 0, ); + now = counts.val; + loops *= 10; + + delta = now - stamp; + pr_debug("%14d: %14llu\n", n, (long long)delta); + + if (!delta) + break; + } + return delta ? 0 : -1; +} + +static struct perf_pmu *pmu_for_cpu(int cpu) +{ + int acpu, idx; + struct perf_pmu *pmu = NULL; + + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (pmu->is_uncore) + continue; + perf_cpu_map__for_each_cpu(acpu, idx, pmu->cpus) + if (acpu == cpu) + return pmu; + } + return NULL; +} + +static bool pmu_is_homogeneous(void) +{ + int core_cnt = 0; + struct perf_pmu *pmu = NULL; + + while ((pmu = perf_pmu__scan(pmu)) != NULL) { + if (!pmu->is_uncore && !perf_cpu_map__empty(pmu->cpus)) + core_cnt++; + } + return core_cnt == 1; +} + +static int libperf_print(enum libperf_print_level level, +const char *fmt, va_list ap) +{ + (void)level; + return vfprintf(stderr, fmt, ap); +} + +static struct perf_evsel *perf_init(struct perf_event_attr *attr) +{ + int err; + struct perf_thread_map *threads; + struct perf_evsel *evsel; + struct perf_event_mmap_page *pc; + + libperf_init(libperf_print); + + threads = perf_thread_map__new_dummy(); + if (!threads) { + pr_err("failed to create threads\n"); + return NULL; + } + + perf_thread_map__set_pid(threads, 0, 0); + + evsel = perf_evsel__new(attr); + if (!evsel) { +
[PATCH v7 8/9] perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access
Add arm64 specific tests for 32-bit and 64-bit counter user access. On arm64, counters default to 32-bit unless attr.config1:0 is set to 1. In order to enable user access for 64-bit counters, attr.config1 must be set to 3. Signed-off-by: Rob Herring --- v6: - New patch --- tools/perf/arch/arm64/include/arch-tests.h | 5 +++ tools/perf/arch/arm64/tests/arch-tests.c | 8 + tools/perf/arch/arm64/tests/user-events.c | 38 ++ 3 files changed, 51 insertions(+) diff --git a/tools/perf/arch/arm64/include/arch-tests.h b/tools/perf/arch/arm64/include/arch-tests.h index 380ad34a3f09..ddfa7460e1e1 100644 --- a/tools/perf/arch/arm64/include/arch-tests.h +++ b/tools/perf/arch/arm64/include/arch-tests.h @@ -15,5 +15,10 @@ extern struct test arch_tests[]; int test__rd_pinned(struct test __maybe_unused *test, int __maybe_unused subtest); +int test__rd_64bit(struct test __maybe_unused *test, + int __maybe_unused subtest); + +int test__rd_32bit(struct test __maybe_unused *test, + int __maybe_unused subtest); #endif diff --git a/tools/perf/arch/arm64/tests/arch-tests.c b/tools/perf/arch/arm64/tests/arch-tests.c index 80ce7bd3c16d..bbdb81aa3229 100644 --- a/tools/perf/arch/arm64/tests/arch-tests.c +++ b/tools/perf/arch/arm64/tests/arch-tests.c @@ -14,6 +14,14 @@ struct test arch_tests[] = { .desc = "Pinned CPU user counter access", .func = test__rd_pinned, }, + { + .desc = "User 64-bit counter access", + .func = test__rd_64bit, + }, + { + .desc = "User 32-bit counter access", + .func = test__rd_32bit, + }, { .func = NULL, }, diff --git a/tools/perf/arch/arm64/tests/user-events.c b/tools/perf/arch/arm64/tests/user-events.c index c8efc6b369e6..546323b5242c 100644 --- a/tools/perf/arch/arm64/tests/user-events.c +++ b/tools/perf/arch/arm64/tests/user-events.c @@ -175,3 +175,41 @@ int test__rd_pinned(struct test __maybe_unused *test, perf_evsel__delete(evsel); return ret; } + +static int test__rd_counter_size(struct test __maybe_unused *test, +int config1) +{ + int ret; + struct perf_evsel *evsel; + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .config = PERF_COUNT_HW_INSTRUCTIONS, + .config1 = config1, + .exclude_kernel = 1, + }; + + if (!pmu_is_homogeneous()) + return TEST_SKIP; + + evsel = perf_init(); + if (!evsel) + return -1; + + ret = run_test(evsel); + + perf_evsel__close(evsel); + perf_evsel__delete(evsel); + return ret; +} + +int test__rd_64bit(struct test __maybe_unused *test, + int __maybe_unused subtest) +{ + return test__rd_counter_size(test, 0x3); +} + +int test__rd_32bit(struct test __maybe_unused *test, + int __maybe_unused subtest) +{ + return test__rd_counter_size(test, 0x2); +} -- 2.27.0
[PATCH v7 6/9] libperf: Add arm64 support to perf_mmap__read_self()
Add the arm64 variants for read_perf_counter() and read_timestamp(). Unfortunately the counter number is encoded into the instruction, so the code is a bit verbose to enumerate all possible counters. Signed-off-by: Rob Herring --- v7: - Move enabling of libperf user read test for arm64 to this patch --- tools/lib/perf/mmap.c | 98 +++ tools/lib/perf/tests/test-evsel.c | 2 +- 2 files changed, 99 insertions(+), 1 deletion(-) diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c index 915469f00cf4..216e6c6212a2 100644 --- a/tools/lib/perf/mmap.c +++ b/tools/lib/perf/mmap.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev, @@ -294,6 +295,103 @@ static u64 read_timestamp(void) return low | ((u64)high) << 32; } +#elif defined(__aarch64__) +#define read_sysreg(r) ({ \ + u64 __val; \ + asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \ + __val; \ +}) + +static u64 read_pmccntr(void) +{ + return read_sysreg(pmccntr_el0); +} + +#define PMEVCNTR_READ(idx) \ + static u64 read_pmevcntr_##idx(void) { \ + return read_sysreg(pmevcntr##idx##_el0);\ + } + +PMEVCNTR_READ(0); +PMEVCNTR_READ(1); +PMEVCNTR_READ(2); +PMEVCNTR_READ(3); +PMEVCNTR_READ(4); +PMEVCNTR_READ(5); +PMEVCNTR_READ(6); +PMEVCNTR_READ(7); +PMEVCNTR_READ(8); +PMEVCNTR_READ(9); +PMEVCNTR_READ(10); +PMEVCNTR_READ(11); +PMEVCNTR_READ(12); +PMEVCNTR_READ(13); +PMEVCNTR_READ(14); +PMEVCNTR_READ(15); +PMEVCNTR_READ(16); +PMEVCNTR_READ(17); +PMEVCNTR_READ(18); +PMEVCNTR_READ(19); +PMEVCNTR_READ(20); +PMEVCNTR_READ(21); +PMEVCNTR_READ(22); +PMEVCNTR_READ(23); +PMEVCNTR_READ(24); +PMEVCNTR_READ(25); +PMEVCNTR_READ(26); +PMEVCNTR_READ(27); +PMEVCNTR_READ(28); +PMEVCNTR_READ(29); +PMEVCNTR_READ(30); + +/* + * Read a value direct from PMEVCNTR + */ +static u64 read_perf_counter(unsigned int counter) +{ + static u64 (* const read_f[])(void) = { + read_pmevcntr_0, + read_pmevcntr_1, + read_pmevcntr_2, + read_pmevcntr_3, + read_pmevcntr_4, + read_pmevcntr_5, + read_pmevcntr_6, + read_pmevcntr_7, + read_pmevcntr_8, + read_pmevcntr_9, + read_pmevcntr_10, + read_pmevcntr_11, + read_pmevcntr_13, + read_pmevcntr_12, + read_pmevcntr_14, + read_pmevcntr_15, + read_pmevcntr_16, + read_pmevcntr_17, + read_pmevcntr_18, + read_pmevcntr_19, + read_pmevcntr_20, + read_pmevcntr_21, + read_pmevcntr_22, + read_pmevcntr_23, + read_pmevcntr_24, + read_pmevcntr_25, + read_pmevcntr_26, + read_pmevcntr_27, + read_pmevcntr_28, + read_pmevcntr_29, + read_pmevcntr_30, + read_pmccntr + }; + + if (counter < ARRAY_SIZE(read_f)) + return (read_f[counter])(); + + return 0; +} + +static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); } + #else static u64 read_perf_counter(unsigned int counter) { return 0; } static u64 read_timestamp(void) { return 0; } diff --git a/tools/lib/perf/tests/test-evsel.c b/tools/lib/perf/tests/test-evsel.c index 288b5feaefe2..670fcdb6d6af 100644 --- a/tools/lib/perf/tests/test-evsel.c +++ b/tools/lib/perf/tests/test-evsel.c @@ -148,7 +148,7 @@ static int test_stat_user_read(int event) pc = perf_evsel__mmap_base(evsel, 0, 0); -#if defined(__i386__) || defined(__x86_64__) +#if defined(__i386__) || defined(__x86_64__) || __defined(__aarch64__) __T("userspace counter access not supported", pc->cap_user_rdpmc); __T("userspace counter access not enabled", pc->index); __T("userspace counter width not set", pc->pmc_width >= 32); -- 2.27.0
[PATCH v7 5/9] arm64: perf: Add userspace counter access disable switch
Like x86, some users may want to disable userspace PMU counter altogether. Add a sysfs 'rdpmc' file to control userspace counter access. The default is '1' which is enabled. Writing '0' disables access. In the case of multiple PMUs (i.e. big.LITTLE), the control is per PMU and userspace must disable access on each PMU. Note that x86 also supports writing '2' to globally enable user access. As there's not existing userspace support to worry about, this shouldn't be necessary for Arm. It could be added later if the need arises. Signed-off-by: Rob Herring --- arch/arm64/kernel/perf_event.c | 61 -- include/linux/perf/arm_pmu.h | 4 ++- 2 files changed, 61 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index bfbb7f449aca..1ab6308ca89c 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -336,6 +336,54 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = { .attrs = armv8_pmuv3_caps_attrs, }; +static void armv8pmu_disable_user_access(void *mm); + +static ssize_t get_attr_rdpmc(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); + + return snprintf(buf, 40, "%d\n", cpu_pmu->attr_rdpmc); +} + +static ssize_t set_attr_rdpmc(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct pmu *pmu = dev_get_drvdata(dev); + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu); + unsigned long val; + ssize_t ret; + + ret = kstrtoul(buf, 0, ); + if (ret) + return ret; + + if (val > 1) + return -EINVAL; + + if (val != cpu_pmu->attr_rdpmc) { + cpu_pmu->attr_rdpmc = !!val; + if (!val) + on_each_cpu_mask(_pmu->supported_cpus, armv8pmu_disable_user_access, NULL, 1); + } + + return count; +} + +static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc); + +static struct attribute *armv8_pmuv3_rdpmc_attrs[] = { + _attr_rdpmc.attr, + NULL, +}; + +static const struct attribute_group armv8_pmuv3_rdpmc_attr_group = { + .attrs = armv8_pmuv3_rdpmc_attrs, +}; + /* * Perf Events' indices */ @@ -950,7 +998,8 @@ static void armv8pmu_sched_task(struct perf_event_context *ctx, bool sched_in) * If a new task has user access enabled, clear the dirty counters * to prevent the potential leak. */ - if (ctx && current->mm && atomic_read(>mm->context.pmu_direct_access)) { + if (ctx && to_arm_pmu(ctx->pmu)->attr_rdpmc && + current->mm && atomic_read(>mm->context.pmu_direct_access)) { armv8pmu_enable_user_access(); armv8pmu_clear_dirty_counters(to_arm_pmu(ctx->pmu)); } else { @@ -1093,7 +1142,8 @@ static int __armv8_pmuv3_map_event(struct perf_event *event, _pmuv3_perf_cache_map, ARMV8_PMU_EVTYPE_EVENT); - if (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event)) + if (armpmu->attr_rdpmc && + (armv8pmu_event_want_user_access(event) || !armv8pmu_event_is_64bit(event))) event->hw.flags |= ARMPMU_EL0_RD_CNTR; /* @@ -1218,7 +1268,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu) static int armv8pmu_undef_handler(struct pt_regs *regs, u32 insn) { - if (atomic_read(>mm->context.pmu_direct_access)) { + struct arm_pmu *armpmu = *this_cpu_ptr(_armpmu); + + if (armpmu->attr_rdpmc && atomic_read(>mm->context.pmu_direct_access)) { armv8pmu_enable_user_access(); return 0; } @@ -1277,6 +1329,9 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name, cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ? caps : _pmuv3_caps_attr_group; + cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_RDPMC] = _pmuv3_rdpmc_attr_group; + cpu_pmu->attr_rdpmc = true; + return 0; } diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 1daad3b2cce5..9303cd07ac57 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -82,6 +82,7 @@ enum armpmu_attr_groups { ARMPMU_ATTR_GROUP_EVENTS, ARMPMU_ATTR_GROUP_FORMATS, ARMPMU_ATTR_GROUP_CAPS, + ARMPMU_ATTR_GROUP_RDPMC, ARMPMU_NR_ATTR_GROUPS }; @@ -107,7 +108,8 @@ struct arm_pmu { int (*map_event)(struct perf_event *event); int (*filter_match)(struct perf_event *event); int num_events; - bool
[PATCH v7 3/9] arm64: perf: Enable PMU counter direct access for perf event
From: Raphael Gault Keep track of event opened with direct access to the hardware counters and modify permissions while they are open. The strategy used here is the same which x86 uses: every time an event is mapped, the permissions are set if required. The atomic field added in the mm_context helps keep track of the different event opened and de-activate the permissions when all are unmapped. We also need to update the permissions in the context switch code so that tasks keep the right permissions. In order to enable 64-bit counters for userspace when available, a new config1 bit is added for userspace to indicate it wants userspace counter access. This bit allows the kernel to decide if chaining should be disabled and chaining and userspace access are incompatible. The modes for config1 are as follows: config1 = 0 or 2 : user access enabled and always 32-bit config1 = 1 : user access disabled and always 64-bit (using chaining if needed) config1 = 3 : user access enabled and counter size matches underlying counter. User access is enabled with config1 == 0 so that we match x86 behavior and don't need Arm specific code (outside the counter read). Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- Peter Z says (event->oncpu == smp_processor_id()) in the user page update is always true, but my testing says otherwise[1]. v7: - Clear disabled counters when user access is enabled for a task to avoid leaking other tasks counter data. - Rework context switch handling utilizing sched_task callback - Add armv8pmu_event_can_chain() helper - Rework config1 flags handling structure - Use ARMV8_IDX_CYCLE_COUNTER_USER define for remapped user cycle counter index v6: - Add new attr.config1 rdpmc bit for userspace to hint it wants userspace access when also requesting 64-bit counters. v5: - Only set cap_user_rdpmc if event is on current cpu - Limit enabling/disabling access to CPUs associated with the PMU (supported_cpus) and with the mm_struct matching current->active_mm. v2: - Move mapped/unmapped into arm64 code. Fixes arm32. - Rebase on cap_user_time_short changes Changes from Raphael's v4: - Drop homogeneous check - Disable access for chained counters - Set pmc_width in user page [1] https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/ --- arch/arm64/include/asm/mmu.h | 5 + arch/arm64/kernel/perf_event.c | 179 +++-- include/linux/perf/arm_pmu.h | 6 ++ 3 files changed, 183 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 75beffe2ee8a..ee08447455da 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -18,6 +18,11 @@ typedef struct { atomic64_t id; + /* +* non-zero if userspace have access to hardware +* counters directly. +*/ + atomic_tpmu_direct_access; #ifdef CONFIG_COMPAT void*sigpage; #endif diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 40cf59455ce8..bfbb7f449aca 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -8,9 +8,11 @@ * This code is based heavily on the ARMv7 perf event code. */ +#include #include #include #include +#include #include #include @@ -288,15 +290,22 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = { PMU_FORMAT_ATTR(event, "config:0-15"); PMU_FORMAT_ATTR(long, "config1:0"); +PMU_FORMAT_ATTR(rdpmc, "config1:1"); static inline bool armv8pmu_event_is_64bit(struct perf_event *event) { return event->attr.config1 & 0x1; } +static inline bool armv8pmu_event_want_user_access(struct perf_event *event) +{ + return event->attr.config1 & 0x2; +} + static struct attribute *armv8_pmuv3_format_attrs[] = { _attr_event.attr, _attr_long.attr, + _attr_rdpmc.attr, NULL, }; @@ -344,6 +353,15 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu) return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5); } +static inline bool armv8pmu_event_can_chain(struct perf_event *event) +{ + struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); + + return !armv8pmu_event_want_user_access(event) && + armv8pmu_event_is_64bit(event) && + !armv8pmu_has_long_event(cpu_pmu); +} + /* * We must chain two programmable counters for 64 bit events, * except when we have allocated the 64bit cycle counter (for CPU @@ -353,11 +371,9 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu) static inline bool armv8pmu_event_is_chained(struct perf_event *event) { int idx = event->hw.idx; - struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu); return !WARN_ON(idx < 0) && - armv8pmu_event_is_64bit(event) && - !armv8pmu_has_long_event(cpu_pmu) && +
[PATCH v7 4/9] drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu
The arm64 PMU driver needs to retrieve the struct arm_pmu pointer for the current CPU. As the common code already maintains this with the percpu cpu_armpmu, let's make it global. Signed-off-by: Rob Herring --- drivers/perf/arm_pmu.c | 2 +- include/linux/perf/arm_pmu.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 2d10d84fb79c..6ac56280b9c7 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -99,7 +99,7 @@ static const struct pmu_irq_ops percpu_pmunmi_ops = { .free_pmuirq = armpmu_free_percpu_pmunmi }; -static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu); +DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu); static DEFINE_PER_CPU(int, cpu_irq); static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops); diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 63af052e3909..1daad3b2cce5 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -127,6 +127,8 @@ struct arm_pmu { #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu)) +DECLARE_PER_CPU(struct arm_pmu *, cpu_armpmu); + u64 armpmu_event_update(struct perf_event *event); int armpmu_event_set_period(struct perf_event *event); -- 2.27.0
[PATCH v7 2/9] arm64: pmu: Add function implementation to update event index in userpage
From: Raphael Gault In order to be able to access the counter directly for userspace, we need to provide the index of the counter using the userpage. We thus need to override the event_idx function to retrieve and convert the perf_event index to armv8 hardware index. Since the arm_pmu driver can be used by any implementation, even if not armv8, two components play a role into making sure the behaviour is correct and consistent with the PMU capabilities: * the ARMPMU_EL0_RD_CNTR flag which denotes the capability to access counter from userspace. * the event_idx call back, which is implemented and initialized by the PMU implementation: if no callback is provided, the default behaviour applies, returning 0 as index value. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Add define ARMV8_IDX_CYCLE_COUNTER_USER for userspace index of cycle counter --- arch/arm64/kernel/perf_event.c | 20 +++- include/linux/perf/arm_pmu.h | 2 ++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 4658fcf88c2b..40cf59455ce8 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -332,7 +332,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = { */ #defineARMV8_IDX_CYCLE_COUNTER 0 #defineARMV8_IDX_COUNTER0 1 - +#defineARMV8_IDX_CYCLE_COUNTER_USER32 /* * We unconditionally enable ARMv8.5-PMU long event counter support @@ -871,6 +871,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc, clear_bit(idx - 1, cpuc->used_mask); } +static int armv8pmu_access_event_idx(struct perf_event *event) +{ + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR)) + return 0; + + /* +* We remap the cycle counter index to 32 to +* match the offset applied to the rest of +* the counter indices. +*/ + if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER) + return ARMV8_IDX_CYCLE_COUNTER_USER; + + return event->hw.idx; +} + /* * Add an event filter to a given event. */ @@ -1098,6 +1114,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name, cpu_pmu->set_event_filter = armv8pmu_set_event_filter; cpu_pmu->filter_match = armv8pmu_filter_match; + cpu_pmu->pmu.event_idx = armv8pmu_access_event_idx; + cpu_pmu->name = name; cpu_pmu->map_event = map_event; cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ? diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 505480217cf1..d29aa981d989 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -26,6 +26,8 @@ */ /* Event uses a 64bit counter */ #define ARMPMU_EVT_64BIT 1 +/* Allow access to hardware counter from userspace */ +#define ARMPMU_EL0_RD_CNTR 2 #define HW_OP_UNSUPPORTED 0x #define C(_x) PERF_COUNT_HW_CACHE_##_x -- 2.27.0
[PATCH v7 0/9] arm64 userspace counter access support
Hi all, Another version of arm64 userspace counter access support. I sent out the libperf bits separately and those are now applied (Thanks!), so this is just the arm64 bits. This originally resurrected Raphael's series[1] to enable userspace counter access on arm64. My previous versions are here[2][3][4][5][6][7]. A git branch is here[8]. Changes in v7: - Handling of dirty counter leakage and reworking of context switch and user access enabling. The .sched_task hook and undef instruction handler are now utilized. (Patch 3) - Add a userspace disable switch like x86. (Patch 5) Changes in v6: - Reworking of the handling of 64-bit counters and user access. There's a new config1 flag to request user access. This takes priority over the 64-bit flag and the user will get the maximum size the h/w supports without chaining. - The libperf evsel mmap struct is stored in its own xyarray - New tests for user 64-bit and 32-bit counters - Rebase to v5.12-rc2 Changes in v5: - Limit enabling/disabling access to CPUs associated with the PMU (supported_cpus) and with the mm_struct matching current->active_mm. The x86 method of using mm_cpumask doesn't work for arm64 as it is not updated. - Only set cap_user_rdpmc if event is on current cpu. See patch 2. - Create an mmap for every event in an evsel. This results in some changes to the libperf mmap API from the last version. - Rebase to v5.11-rc2 Changes in v4: - Dropped 'arm64: pmu: Add hook to handle pmu-related undefined instructions'. The onus is on userspace to pin itself to a homogeneous subset of CPUs and avoid any aborts on heterogeneous systems, so the hook is not needed. - Make perf_evsel__mmap() take pages rather than bytes for size - Fix building arm64 heterogeneous test. Changes in v3: - Dropped removing x86 rdpmc test until libperf tests can run via 'perf test' - Added verbose prints for tests - Split adding perf_evsel__mmap() to separate patch The following changes to the arm64 support have been made compared to Raphael's last version: The major change is support for heterogeneous systems with some restrictions. Specifically, userspace must pin itself to like CPUs, open a specific PMU by type, and use h/w specific events. The tests have been reworked to demonstrate this. Chained events are not supported. The problem with supporting chained events was there's no way to distinguish between a chained event and a native 64-bit counter. We could add some flag, but do self monitoring processes really need that? Native 64-bit counters are supported if the PMU h/w has support. As there's already an explicit ABI to request 64-bit counters, userspace can request 64-bit counters and if user access is not enabled, then it must retry with 32-bit counters. Prior versions broke the build on arm32 (surprisingly never caught by 0-day). As a result, event_mapped and event_unmapped implementations have been moved into the arm64 code. There was a bug in that pmc_width was not set in the user page. The tests now check for this. The documentation has been converted to rST. I've added sections on chained events and heterogeneous. The tests have been expanded to test the cycle counter access. Rob [1] https://lore.kernel.org/r/20190822144220.27860-1-raphael.ga...@arm.com/ [2] https://lore.kernel.org/r/20200707205333.624938-1-r...@kernel.org/ [3] https://lore.kernel.org/r/20200828205614.3391252-1-r...@kernel.org/ [4] https://lore.kernel.org/r/20200911215118.2887710-1-r...@kernel.org/ [5] https://lore.kernel.org/r/20201001140116.651970-1-r...@kernel.org/ [6] https://lore.kernel.org/r/20210114020605.3943992-1-r...@kernel.org/ [7] https://lore.kernel.org/r/20210311000837.3630499-1-r...@kernel.org/ [8] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git user-perf-event-v7 Raphael Gault (4): arm64: Restrict undef hook for cpufeature registers arm64: pmu: Add function implementation to update event index in userpage arm64: perf: Enable PMU counter direct access for perf event Documentation: arm64: Document PMU counters access from userspace Rob Herring (5): drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu arm64: perf: Add userspace counter access disable switch libperf: Add arm64 support to perf_mmap__read_self() perf: arm64: Add test for userspace counter access on heterogeneous systems perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access Documentation/arm64/perf.rst | 67 +- arch/arm64/include/asm/mmu.h | 5 + arch/arm64/kernel/cpufeature.c | 4 +- arch/arm64/kernel/perf_event.c | 254 - drivers/perf/arm_pmu.c | 2 +- include/linux/perf/arm_pmu.h | 14 +- tools/lib/perf/mmap.c | 98 tools/lib/perf/tests/test-evsel.c | 2 +- tools/perf/arch/arm64/include/arch-tests.h | 12 +
[PATCH v7 1/9] arm64: Restrict undef hook for cpufeature registers
From: Raphael Gault This commit modifies the mask of the mrs_hook declared in arch/arm64/kernel/cpufeatures.c which emulates only feature register access. This is necessary because this hook's mask was too large and thus masking any mrs instruction, even if not related to the emulated registers which made the pmu emulation inefficient. Signed-off-by: Raphael Gault Signed-off-by: Rob Herring --- v7: - Split off from Raphael's original undef hook patch as this change stands on its own. --- arch/arm64/kernel/cpufeature.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 066030717a4c..aa0777690ab1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2888,8 +2888,8 @@ static int emulate_mrs(struct pt_regs *regs, u32 insn) } static struct undef_hook mrs_hook = { - .instr_mask = 0xfff0, - .instr_val = 0xd530, + .instr_mask = 0x, + .instr_val = 0xd538, .pstate_mask = PSR_AA32_MODE_MASK, .pstate_val = PSR_MODE_EL0t, .fn = emulate_mrs, -- 2.27.0
Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner wrote: > > On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote: > > On Wed, Apr 14, 2021 at 2:14 AM Greg KH wrote: > >> To give context, the commit is now 46eb1701c046 ("hrtimer: Update > >> softirq_expires_next correctly after __hrtimer_get_next_event()") and is > >> attached below. > >> > >> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode > >> HRTIMER_MODE_REL_SOFT for I think every packet it gets. If that should > >> not be happening, we can easily fix it but I guess no one has actually > >> had fast USB devices that noticed this until now :) > > > > AIUI the purpose of this timer is to support packet aggregation. USB > > transfers are relatively expensive/high latency. 6 Gbps is 500k > > 1500-byte packets per second, or one every 2us. So f_ncm buffers as > > many packets as will fit into 16k (usually, 10 1500-byte packets), and > > only initiates a USB transfer when those packets have arrived. That > > ends up doing only one transfer every 20us. It sets a 300us timer to > > ensure that if the 10 packets haven't arrived, it still sends out > > whatever it has when the timer fires. The timer is set/updated on > > every packet buffered by ncm. > > > > Is this somehow much more expensive in 5.10.24 than it was before? > > Even if this driver is somehow "holding it wrong", might there not be > > other workloads that have a similar problem? What about regressions on > > those workloads? > > Let's put the question of whether this hrtimer usage is sensible or not > aside for now. > > I stared at the change for a while and did some experiments to recreate > the problem, but that didn't get me anywhere. > > Could you please do the following? > > Enable tracing and enable the following tracepoints: > > timers/hrtimer_cancel > timers/hrtimer_start > timers/hrtimer_expire_entry > irq/softirq_raise > irq/softirq_enter > irq/softirq_exit > > and function tracing filtered on ncm_wrap_ntb() and > package_for_tx() only (to reduce the noise). > > Run the test on a kernels with and without that commit and collect trace > data for both. > > That should give me a pretty clear picture what's going on. Lorenzo is trying to get the traces you asked for, or rather he’s trying to get confirmation he can post them. Our initial observation of these results seems to suggest that updating the timer (hrtimer_start, which seems to also call hrtimer_cancel) takes twice as long as it used to. My gut feeling is that softirq_activated is usually false, and the old code in such a case calls just __hrtimer_get_next_event(, HRTIMER_ACTIVE_ALL). While the new code will first call __hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then __hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD) Perhaps __hrtimer_get_next_event() should return both soft and hard event times in one function call? Or perhaps hrtimer_start should not call hrtimer_cancel? We've also been looking at the f_ncm driver itself, and trying to reduce the number of hrtimer_cancel/start calls. It's pretty easy to reduce this by a factor of 10x (we’re not yet 100% certain this is race free), which does drastically improve performance. However, it still only takes the regression from 60% to 20%. - Maciej
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
On Tue, Apr 20, 2021 at 02:48:17AM +, Vineet Gupta wrote: > > 32-bit architectures which expect 8-byte alignment for 8-byte integers > > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct > > page inadvertently expanded in 2019. > > FWIW, ARC doesn't require 8 byte alignment for 8 byte integers. This is > only needed for 8-byte atomics due to the requirements of LLOCKD/SCOND > instructions. Ah, like x86? OK, great, I'll drop your arch from the list of affected. Thanks!
RE: [EXT] Re: [net-next] net: dsa: felix: disable always guard band bit for TAS config
Hi Vladimir. On Mon, Apr 19, 2021 at 20:38PM +0800, Vladimir Oltean wrote: > >What is a scheduled queue? When time-aware scheduling is enabled on the port, >why are some queues scheduled and some not? The felix vsc9959 device can set SCH_TRAFFIC_QUEUES field bits to define which queue is scheduled. Only the set queues serves schedule traffic. In this driver we set all 8 queues to be scheduled in default, so all the traffic are schedule queues to schedule queue. Thanks, Xiaoliang Yang
Re: [PATCH V4 1/3] vDPA/ifcvf: deduce VIRTIO device ID when probe
在 2021/4/19 下午2:33, Zhu Lingshan 写道: This commit deduces VIRTIO device ID as device type when probe, then ifcvf_vdpa_get_device_id() can simply return the ID. ifcvf_vdpa_get_features() and ifcvf_vdpa_get_config_size() can work properly based on the device ID. Signed-off-by: Zhu Lingshan Acked-by: Jason Wang --- drivers/vdpa/ifcvf/ifcvf_base.h | 1 + drivers/vdpa/ifcvf/ifcvf_main.c | 27 +++ 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h index b2eeb16b9c2c..1c04cd256fa7 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.h +++ b/drivers/vdpa/ifcvf/ifcvf_base.h @@ -84,6 +84,7 @@ struct ifcvf_hw { u32 notify_off_multiplier; u64 req_features; u64 hw_features; + u32 dev_type; struct virtio_pci_common_cfg __iomem *common_cfg; void __iomem *net_cfg; struct vring_info vring[IFCVF_MAX_QUEUE_PAIRS * 2]; diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 44d7586019da..66927ec81fa5 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -323,19 +323,9 @@ static u32 ifcvf_vdpa_get_generation(struct vdpa_device *vdpa_dev) static u32 ifcvf_vdpa_get_device_id(struct vdpa_device *vdpa_dev) { - struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); - struct pci_dev *pdev = adapter->pdev; - u32 ret = -ENODEV; - - if (pdev->device < 0x1000 || pdev->device > 0x107f) - return ret; - - if (pdev->device < 0x1040) - ret = pdev->subsystem_device; - else - ret = pdev->device - 0x1040; + struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); - return ret; + return vf->dev_type; } static u32 ifcvf_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev) @@ -466,6 +456,19 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_drvdata(pdev, adapter); vf = >vf; + + /* This drirver drives both modern virtio devices and transitional +* devices in modern mode. +* vDPA requires feature bit VIRTIO_F_ACCESS_PLATFORM, +* so legacy devices and transitional devices in legacy +* mode will not work for vDPA, this driver will not +* drive devices with legacy interface. +*/ + if (pdev->device < 0x1040) + vf->dev_type = pdev->subsystem_device; + else + vf->dev_type = pdev->device - 0x1040; + vf->base = pcim_iomap_table(pdev); adapter->pdev = pdev;
[RESEND v2] iommu/vt-d: Use passthrough mode for the Intel IPUs
Intel IPU(Image Processing Unit) has its own (IO)MMU hardware, The IPU driver allocates its own page table that is not mapped via the DMA, and thus the Intel IOMMU driver blocks access giving this error: DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [00:05.0] PASID fault addr 76406000 [fault reason 06] PTE Read access is not set As IPU is not an external facing device which is not risky, so use IOMMU passthrough mode for Intel IPUs. Fixes: 26f5689592e2 ("media: staging/intel-ipu3: mmu: Implement driver") Signed-off-by: Bingbu Cao --- drivers/iommu/intel/iommu.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee0932307d64..7e2fbdae467e 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -55,6 +55,12 @@ #define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY) #define IS_USB_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB) #define IS_ISA_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) +#define IS_INTEL_IPU(pdev) ((pdev)->vendor == PCI_VENDOR_ID_INTEL && \ + ((pdev)->device == 0x9a19 ||\ +(pdev)->device == 0x9a39 ||\ +(pdev)->device == 0x4e19 ||\ +(pdev)->device == 0x465d ||\ +(pdev)->device == 0x1919)) #define IS_AZALIA(pdev) ((pdev)->vendor == 0x8086 && (pdev)->device == 0x3a3e) #define IOAPIC_RANGE_START (0xfee0) @@ -360,6 +366,7 @@ int intel_iommu_enabled = 0; EXPORT_SYMBOL_GPL(intel_iommu_enabled); static int dmar_map_gfx = 1; +static int dmar_map_ipu = 1; static int dmar_forcedac; static int intel_iommu_strict; static int intel_iommu_superpage = 1; @@ -368,6 +375,7 @@ static int iommu_skip_te_disable; #define IDENTMAP_GFX 2 #define IDENTMAP_AZALIA4 +#define IDENTMAP_IPU 8 int intel_iommu_gfx_mapped; EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped); @@ -2839,6 +2847,9 @@ static int device_def_domain_type(struct device *dev) if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + + if ((iommu_identity_mapping & IDENTMAP_IPU) && IS_INTEL_IPU(pdev)) + return IOMMU_DOMAIN_IDENTITY; } return 0; @@ -3278,6 +3289,9 @@ static int __init init_dmars(void) if (!dmar_map_gfx) iommu_identity_mapping |= IDENTMAP_GFX; + if (!dmar_map_ipu) + iommu_identity_mapping |= IDENTMAP_IPU; + check_tylersburg_isoch(); ret = si_domain_init(hw_pass_through); @@ -5622,6 +5636,18 @@ static void quirk_iommu_igfx(struct pci_dev *dev) dmar_map_gfx = 0; } +static void quirk_iommu_ipu(struct pci_dev *dev) +{ + if (!IS_INTEL_IPU(dev)) + return; + + if (risky_device(dev)) + return; + + pci_info(dev, "Passthrough IOMMU for integrated Intel IPU\n"); + dmar_map_ipu = 0; +} + /* G4x/GM45 integrated gfx dmar support is totally busted. */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_igfx); @@ -5657,6 +5683,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1632, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163A, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, quirk_iommu_igfx); +/* disable IPU dmar support */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_iommu_ipu); + static void quirk_iommu_rwbf(struct pci_dev *dev) { if (risky_device(dev)) -- 2.7.4
[PATCH] scsi: ufs: Add batched WB buffer flush
Currently, WriteBooster (WB) buffer is always flushed during hibern8. However, this is inefficient because data in the WB buffer can be invalid due to spatial locality of IO workload. If the WB buffer flush is flushed in a batched manner, the amount of data migration and power consumption can be reduced because the overwritten data of the WB buffer may be invalid due to spatial locality. This patch supports batched flush of WB buffer. When batched flush is enabled, fWriteBoosterBufferFlushDuringHibernate is set only when b_rpm_dev_flush_capable is true during runtime suspend. When the device is resumed, fWriteBoosterBufferFlushDuringHibernate is cleared to stop flush during hibern8. Co-developed-by: Keoseong Park Signed-off-by: Keoseong Park Signed-off-by: Daejun Park --- Documentation/ABI/testing/sysfs-driver-ufs | 9 drivers/scsi/ufs/ufs-sysfs.c | 50 ++ drivers/scsi/ufs/ufshcd.c | 14 -- drivers/scsi/ufs/ufshcd.h | 2 + 4 files changed, 71 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-ufs b/Documentation/ABI/testing/sysfs-driver-ufs index d1bc23cb6a9d..b67b8449e840 100644 --- a/Documentation/ABI/testing/sysfs-driver-ufs +++ b/Documentation/ABI/testing/sysfs-driver-ufs @@ -1172,3 +1172,12 @@ Description: This node is used to set or display whether UFS WriteBooster is (if the platform supports UFSHCD_CAP_CLK_SCALING). For a platform that doesn't support UFSHCD_CAP_CLK_SCALING, we can disable/enable WriteBooster through this sysfs node. + +What: /sys/bus/platform/drivers/ufshcd/*/wb_batched_flush +Date: April 2021 +Contact: Daejun Park +Description: This entry shows whether batch flushing of UFS WriteBooster + buffers is enabled. Writing 1 to this entry allows the device to flush + the WriteBooster buffer only when it needs to perform a buffer flush + during runtime suspend. Writing 0 to this entry allows the device to + flush the WriteBooster buffer during link hibernation. diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c index d7c3cff9662f..943ac12e59c6 100644 --- a/drivers/scsi/ufs/ufs-sysfs.c +++ b/drivers/scsi/ufs/ufs-sysfs.c @@ -253,6 +253,54 @@ static ssize_t wb_on_store(struct device *dev, struct device_attribute *attr, return res < 0 ? res : count; } + +static ssize_t wb_batched_flush_show(struct device *dev, +struct device_attribute *attr, char *buf) +{ + struct ufs_hba *hba = dev_get_drvdata(dev); + + return sysfs_emit(buf, "%d\n", hba->vps->wb_batched_flush); +} + +static ssize_t wb_batched_flush_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct ufs_hba *hba = dev_get_drvdata(dev); + unsigned int wb_batched_flush; + ssize_t res; + + if (!ufshcd_is_wb_allowed(hba)) { + dev_warn(dev, "To control WB through wb_batched_flush is not allowed!\n"); + return -EOPNOTSUPP; + } + + if (kstrtouint(buf, 0, _batched_flush)) + return -EINVAL; + + if (wb_batched_flush != 0 && wb_batched_flush != 1) + return -EINVAL; + + down(>host_sem); + if (!ufshcd_is_user_access_allowed(hba)) { + res = -EBUSY; + goto out; + } + + if (wb_batched_flush == hba->vps->wb_batched_flush) + goto out; + + pm_runtime_get_sync(hba->dev); + res = ufshcd_wb_toggle_flush_during_h8(hba, !wb_batched_flush); + pm_runtime_put_sync(hba->dev); + if (!res) + hba->vps->wb_batched_flush = wb_batched_flush; + +out: + up(>host_sem); + return res < 0 ? res : count; +} + static DEVICE_ATTR_RW(rpm_lvl); static DEVICE_ATTR_RO(rpm_target_dev_state); static DEVICE_ATTR_RO(rpm_target_link_state); @@ -261,6 +309,7 @@ static DEVICE_ATTR_RO(spm_target_dev_state); static DEVICE_ATTR_RO(spm_target_link_state); static DEVICE_ATTR_RW(auto_hibern8); static DEVICE_ATTR_RW(wb_on); +static DEVICE_ATTR_RW(wb_batched_flush); static struct attribute *ufs_sysfs_ufshcd_attrs[] = { _attr_rpm_lvl.attr, @@ -271,6 +320,7 @@ static struct attribute *ufs_sysfs_ufshcd_attrs[] = { _attr_spm_target_link_state.attr, _attr_auto_hibern8.attr, _attr_wb_on.attr, + _attr_wb_batched_flush.attr, NULL }; diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 0625da7a42ee..e11dc578a17c 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -244,7 +244,6 @@ static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on); static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
[PATCH 3/4] MIPS: Reinstate platform `__div64_32' handler
Our current MIPS platform `__div64_32' handler is inactive, because it is incorrectly only enabled for 64-bit configurations, for which generic `do_div' code does not call it anyway. The handler is not suitable for being called from there though as it only calculates 32 bits of the quotient under the assumption the 64-bit divident has been suitably reduced. Code for such reduction used to be there, however it has been incorrectly removed with commit c21004cd5b4c ("MIPS: Rewrite to work with gcc 4.4.0."), which should have only updated an obsoleted constraint for an inline asm involving $hi and $lo register outputs, while possibly wiring the original MIPS variant of the `do_div' macro as `__div64_32' handler for the generic `do_div' implementation Correct the handler as follows then: - Revert most of the commit referred, however retaining the current formatting, except for the final two instructions of the inline asm sequence, which the original commit missed. Omit the original 64-bit parts though. - Rename the original `do_div' macro to `__div64_32'. Use the combined `x' constraint referring to the MD accumulator as a whole, replacing the original individual `h' and `l' constraints used for $hi and $lo registers respectively, of which `h' has been obsoleted with GCC 4.4. Update surrounding code accordingly. We have since removed support for GCC versions before 4.9, so no need for a special arrangement here; GCC has supported the `x' constraint since forever anyway, or at least going back to 1991. - Rename the `__base' local variable in `__div64_32' to `__radix' to avoid a conflict with a local variable in `do_div'. - Actually enable this code for 32-bit rather than 64-bit configurations by qualifying it with BITS_PER_LONG being 32 instead of 64. Include for this macro rather than as we don't need anything else. - Finally include last rather than first. This has passed correctness verification with test_div64 and reduced the module's average execution time down to 1.0668s and 0.2629s from 2.1529s and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. For a reference 64-bit `do_div' code where we have the DDIVU instruction available to do the whole calculation right away averages at 0.0660s for the latter CPU. Reported-by: Huacai Chen Signed-off-by: Maciej W. Rozycki Fixes: c21004cd5b4c ("MIPS: Rewrite to work with gcc 4.4.0.") Cc: sta...@vger.kernel.org # v2.6.30+ --- Our handcrafted handler seems to run at ~25% of the performance of the 64-bit hardware instruction; not too bad I would say. Though there's likely some overhead from surrounding code that interferes with the figures. Then there are a couple of `checkpatch.pl' nits about trailing whitespace in inline asm, which however makes it more readable. So the change stays as it is. --- arch/mips/include/asm/div64.h | 57 ++ 1 file changed, 41 insertions(+), 16 deletions(-) linux-mips-div64-generic-fix.diff Index: linux-3maxp-div64/arch/mips/include/asm/div64.h === --- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h +++ linux-3maxp-div64/arch/mips/include/asm/div64.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2000, 2004 Maciej W. Rozycki + * Copyright (C) 2000, 2004, 2021 Maciej W. Rozycki * Copyright (C) 2003, 07 Ralf Baechle (r...@linux-mips.org) * * This file is subject to the terms and conditions of the GNU General Public @@ -9,25 +9,18 @@ #ifndef __ASM_DIV64_H #define __ASM_DIV64_H -#include - -#if BITS_PER_LONG == 64 +#include -#include +#if BITS_PER_LONG == 32 /* * No traps on overflows for any of these... */ -#define __div64_32(n, base)\ -({ \ +#define do_div64_32(res, high, low, base) ({ \ unsigned long __cf, __tmp, __tmp2, __i; \ unsigned long __quot32, __mod32;\ - unsigned long __high, __low;\ - unsigned long long __n; \ \ - __high = *__n >> 32;\ - __low = __n;\ __asm__(\ " .setpush\n" \ " .setnoat\n" \ @@ -51,18 +44,50 @@ " subu%0, %0, %z6 \n" \ " addiu %2, %2, 1 \n" \ "3: \n" \ - " bnez
[PATCH 1/4] lib/math: Add a `do_div' test module
Implement a module for correctness and performance evaluation for the `do_div' function, often handled in an optimised manner by platform code. Use a somewhat randomly generated set of inputs that is supposed to be representative, using the same set of divisors twice, expressed as a constant and as a variable each, so as to verify the implementation for both cases should they be handled by different code execution paths. Reference results were produced with GNU bc. At the conclusion output the total execution time elapsed. Signed-off-by: Maciej W. Rozycki --- NB there's a `checkpatch.pl' warning for a split line, but I can see no gain from joining it. --- lib/Kconfig.debug | 10 ++ lib/math/Makefile |2 lib/math/test_div64.c | 249 ++ 3 files changed, 261 insertions(+) linux-div64-test.diff Index: linux-3maxp-div64/lib/Kconfig.debug === --- linux-3maxp-div64.orig/lib/Kconfig.debug +++ linux-3maxp-div64/lib/Kconfig.debug @@ -2027,6 +2027,16 @@ config TEST_SORT If unsure, say N. +config TEST_DIV64 + tristate "64bit/32bit division and modulo test" + depends on DEBUG_KERNEL || m + help + Enable this to turn on 'do_div()' function test. This test is + executed only once during system boot (so affects only boot time), + or at module load time. + + If unsure, say N. + config KPROBES_SANITY_TEST bool "Kprobes sanity tests" depends on DEBUG_KERNEL Index: linux-3maxp-div64/lib/math/Makefile === --- linux-3maxp-div64.orig/lib/math/Makefile +++ linux-3maxp-div64/lib/math/Makefile @@ -4,3 +4,5 @@ obj-y += div64.o gcd.o lcm.o int_pow.o i obj-$(CONFIG_CORDIC) += cordic.o obj-$(CONFIG_PRIME_NUMBERS)+= prime_numbers.o obj-$(CONFIG_RATIONAL) += rational.o + +obj-$(CONFIG_TEST_DIV64) += test_div64.o Index: linux-3maxp-div64/lib/math/test_div64.c === --- /dev/null +++ linux-3maxp-div64/lib/math/test_div64.c @@ -0,0 +1,249 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2021 Maciej W. Rozycki + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include + +#include + +#define TEST_DIV64_N_ITER 1024 + +static const u64 test_div64_dividents[] = { + 0xab275080, + 0x000fe73c1959, + 0x00e54c0a74b1, + 0x0d4398ff1ef9, + 0xa18c2ee1c097, + 0x00079fb80b072e4a, + 0x0072db27380dd689, + 0x0842f488162e2284, + 0xf66745411d8ab063, +}; +#define SIZE_DIV64_DIVIDENTS ARRAY_SIZE(test_div64_dividents) + +#define TEST_DIV64_DIVISOR_0 0x0009 +#define TEST_DIV64_DIVISOR_1 0x007c +#define TEST_DIV64_DIVISOR_2 0x0204 +#define TEST_DIV64_DIVISOR_3 0xcb5b +#define TEST_DIV64_DIVISOR_4 0x0001 +#define TEST_DIV64_DIVISOR_5 0x0008a880 +#define TEST_DIV64_DIVISOR_6 0x003fd3ae +#define TEST_DIV64_DIVISOR_7 0x0b658fac +#define TEST_DIV64_DIVISOR_8 0xdc08b349 + +static const u32 test_div64_divisors[] = { + TEST_DIV64_DIVISOR_0, + TEST_DIV64_DIVISOR_1, + TEST_DIV64_DIVISOR_2, + TEST_DIV64_DIVISOR_3, + TEST_DIV64_DIVISOR_4, + TEST_DIV64_DIVISOR_5, + TEST_DIV64_DIVISOR_6, + TEST_DIV64_DIVISOR_7, + TEST_DIV64_DIVISOR_8, +}; +#define SIZE_DIV64_DIVISORS ARRAY_SIZE(test_div64_divisors) + +static const struct { + u64 quotient; + u32 remainder; +} test_div64_results[SIZE_DIV64_DIVISORS][SIZE_DIV64_DIVIDENTS] = { + { + { 0x13045e47, 0x0001 }, + { 0x0161596c, 0x0030 }, + { 0x0054e9d4, 0x0130 }, + { 0xd776, 0x278e }, + { 0xab27, 0x5080 }, + { 0x13c4, 0x0004ce80 }, + { 0x02ae, 0x001e143c }, + { 0x000f, 0x0033e56c }, + { 0x, 0xab275080 }, + }, { + { 0x0001c45c02d1, 0x }, + { 0x20d5213c, 0x0049 }, + { 0x07e3d65f, 0x01dd }, + { 0x00140531, 0x65ee }, + { 0x000fe73c, 0x1959 }, + { 0x0001d637, 0x0004e5d9 }, + { 0x3fc9, 0x000713bb }, + { 0x0165, 0x029abe7d }, + { 0x0012, 0x6e9f7e37 }, + }, { + { 0x00197a3a0cf7, 0x0002 }, + { 0x0001d9632e5c, 0x0021 }, + { 0x71c28039, 0x01cd }, + { 0x0120a844, 0xb885 }, + { 0x00e54c0a, 0x74b1 }, + { 0x001a7bb3,
[PATCH 2/4] div64: Correct inline documentation for `do_div'
Correct inline documentation for `do_div', which is a function-like macro the `n' parameter of which has the semantics of a C++ reference: it is both read and written in the context of the caller without an explicit dereference such as with a pointer. In the C programming language it has no equivalent for proper functions, in terms of which the documentation expresses the semantics of `do_div', but substituting a pointer in documentation is misleading, and using the C++ notation should at least raise the reader's attention and encourage to seek explanation even if the C++ semantics is not readily understood. While at it observe that "semantics" is an uncountable noun, so refer to it with a singular rather than plural verb. Signed-off-by: Maciej W. Rozycki --- NB there's a `checkpatch.pl' warning for tabs preceded by spaces, but that is just the style of the piece of code quoted and I can see no gain from changing it or worse yet making inconsistent. --- include/asm-generic/div64.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) linux-div64-doc-fix.diff Index: linux-3maxp-div64/include/asm-generic/div64.h === --- linux-3maxp-div64.orig/include/asm-generic/div64.h +++ linux-3maxp-div64/include/asm-generic/div64.h @@ -8,12 +8,14 @@ * Optimization for constant divisors on 32-bit machines: * Copyright (C) 2006-2015 Nicolas Pitre * - * The semantics of do_div() are: + * The semantics of do_div() is, in C++ notation, observing that the name + * is a function-like macro and the n parameter has the semantics of a C++ + * reference: * - * uint32_t do_div(uint64_t *n, uint32_t base) + * uint32_t do_div(uint64_t , uint32_t base) * { - * uint32_t remainder = *n % base; - * *n = *n / base; + * uint32_t remainder = n % base; + * n = n / base; * return remainder; * } *
[PATCH 4/4] MIPS: Avoid DIVU in `__div64_32' is result would be zero
We already check the high part of the divident against zero to avoid the costly DIVU instruction in that case, needed to reduce the high part of the divident, so we may well check against the divisor instead and set the high part of the quotient to zero right away. We need to treat the high part the divident in that case though as the remainder that would be calculated by the DIVU instruction we avoided. This has passed correctness verification with test_div64 and reduced the module's average execution time down to 1.0445s and 0.2619s from 1.0668s and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. Signed-off-by: Maciej W. Rozycki --- I have made an experimental change on top of this to put `__div64_32' out of line, and that increases the averages respectively up to 1.0785s and 0.2705s. Not a terrible loss, especially compared to generic times quoted with 3/4, but still, so I think it would best be made where optimising for size, as noted in the cover letter. --- arch/mips/include/asm/div64.h |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-3maxp-div64/arch/mips/include/asm/div64.h === --- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h +++ linux-3maxp-div64/arch/mips/include/asm/div64.h @@ -68,9 +68,11 @@ \ __high = __div >> 32; \ __low = __div; \ - __upper = __high; \ \ - if (__high) { \ + if (__high < __radix) { \ + __upper = __high; \ + __high = 0; \ + } else {\ __asm__("divu $0, %z1, %z2" \ : "=x" (__modquot) \ : "Jr" (__high), "Jr" (__radix)); \
[PATCH 0/4] Reinstate and improve MIPS `do_div' implementation
Hi, As Huacai has recently discovered the MIPS backend for `do_div' has been broken and inadvertently disabled with commit c21004cd5b4c ("MIPS: Rewrite to work with gcc 4.4.0."). As it is code I have originally written myself and Huacai had issues bringing it back to life leading to a request to discard it even I have decided to step in. In the end I have fixed the code and measured its performance to be ~100% better on average than our generic code. I have decided it would be worth having the test module I have prepared for correctness evaluation as well as benchmarking, so I have included it with the series, also so that I can refer to the results easily. In the end I have included four patches on this occasion: 1/4 is the test module, 2/4 is an inline documentation fix/clarification for the `do_div' wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small performance improvement to it. I have investigated a fifth change as a potential improvement where I replaced the call to `do_div64_32' with a DIVU instruction for cases where the high part of the intermediate divident is zero, but it has turned out to regress performance a little, so I have discarded it. Also a follow-up change might be worth having to reduce the code size and place `__div64_32' out of line for CC_OPTIMIZE_FOR_SIZE configurations, but I have not fully prepared such a change at this time. I did use the WIP form I have for performance evaluation however; see the figures quoted with 4/4. These changes have been verified with a DECstation system with an R3400 MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor @160MHz. See individual change descriptions and any additional discussions for further details. Questions, comments or concerns? Otherwise please apply. Maciej
[PATCH v2] iommu/vt-d: Use passthrough mode for the Intel IPUs
Intel IPU(Image Processing Unit) has its own (IO)MMU hardware, The IPU driver allocates its own page table that is not mapped via the DMA, and thus the Intel IOMMU driver blocks access giving this error: DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [00:05.0] PASID fault addr 76406000 [fault reason 06] PTE Read access is not set As IPU is not an external facing device which is not risky, so use IOMMU passthrough mode for Intel IPUs. Fixes: 26f5689592e2 ("media: staging/intel-ipu3: mmu: Implement driver") Signed-off-by: Bingbu Cao --- Changes since v1: - Use IPU PCI DID value instead of macros to align with others - Check IPU PCI device ID in quirk --- drivers/iommu/intel/iommu.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ee0932307d64..7e2fbdae467e 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -55,6 +55,12 @@ #define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY) #define IS_USB_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB) #define IS_ISA_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) +#define IS_INTEL_IPU(pdev) ((pdev)->vendor == PCI_VENDOR_ID_INTEL && \ + ((pdev)->device == 0x9a19 ||\ +(pdev)->device == 0x9a39 ||\ +(pdev)->device == 0x4e19 ||\ +(pdev)->device == 0x465d ||\ +(pdev)->device == 0x1919)) #define IS_AZALIA(pdev) ((pdev)->vendor == 0x8086 && (pdev)->device == 0x3a3e) #define IOAPIC_RANGE_START (0xfee0) @@ -360,6 +366,7 @@ int intel_iommu_enabled = 0; EXPORT_SYMBOL_GPL(intel_iommu_enabled); static int dmar_map_gfx = 1; +static int dmar_map_ipu = 1; static int dmar_forcedac; static int intel_iommu_strict; static int intel_iommu_superpage = 1; @@ -368,6 +375,7 @@ static int iommu_skip_te_disable; #define IDENTMAP_GFX 2 #define IDENTMAP_AZALIA4 +#define IDENTMAP_IPU 8 int intel_iommu_gfx_mapped; EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped); @@ -2839,6 +2847,9 @@ static int device_def_domain_type(struct device *dev) if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + + if ((iommu_identity_mapping & IDENTMAP_IPU) && IS_INTEL_IPU(pdev)) + return IOMMU_DOMAIN_IDENTITY; } return 0; @@ -3278,6 +3289,9 @@ static int __init init_dmars(void) if (!dmar_map_gfx) iommu_identity_mapping |= IDENTMAP_GFX; + if (!dmar_map_ipu) + iommu_identity_mapping |= IDENTMAP_IPU; + check_tylersburg_isoch(); ret = si_domain_init(hw_pass_through); @@ -5622,6 +5636,18 @@ static void quirk_iommu_igfx(struct pci_dev *dev) dmar_map_gfx = 0; } +static void quirk_iommu_ipu(struct pci_dev *dev) +{ + if (!IS_INTEL_IPU(dev)) + return; + + if (risky_device(dev)) + return; + + pci_info(dev, "Passthrough IOMMU for integrated Intel IPU\n"); + dmar_map_ipu = 0; +} + /* G4x/GM45 integrated gfx dmar support is totally busted. */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_igfx); @@ -5657,6 +5683,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1632, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163A, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, quirk_iommu_igfx); +/* disable IPU dmar support */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_iommu_ipu); + static void quirk_iommu_rwbf(struct pci_dev *dev) { if (risky_device(dev)) -- 2.7.4
Re: [PATCH] Revert "MIPS: make userspace mapping young by default".
Hi, On Mon, Apr 19, 2021 at 10:21:40PM +0800, Zhou Yanjie wrote: > Hi > > On 2021/4/19 下午12:56, Huang Pei wrote: > > On Sat, Apr 17, 2021 at 12:45:59AM +0800, Zhou Yanjie wrote: > > > On 2021/4/16 下午5:20, 黄沛 wrote: > > > > Is there any log about the panic? > > > > > > Yes, below is the log: > > > > > > > > > [ 195.436017] CPU 0 Unable to handle kernel paging request at virtual > > > address 77eb8000, epc == 80117868, ra == 80118208 > > > [ 195.446709] Oops[#1]: > > > [ 195.448977] CPU: 0 PID: 1461 Comm: Xsession Not tainted > > > 5.12.0-rc6-00227-gc8fc6defbd2e-dirty #1 > > > [ 195.457661] $ 0 : 0001 80117864 77eb9000 > > > [ 195.462888] $ 4 : 77eb8000 82419600 838ea000 82482ba0 > > > [ 195.468116] $ 8 : 826f8b18 8306f800 72d5 8306f800 > > > [ 195.473343] $12 : 0002 0a03 0001 0402 > > > [ 195.478568] $16 : 77eb8000 809faf60 0004 82482ba0 > > > [ 195.483794] $20 : 77eb8000 82419600 82482ba0 8086 > > > [ 195.489021] $24 : 8086121c 80117864 > > > [ 195.494248] $28 : 838ea000 838ebd70 80118208 > > > [ 195.499475] Hi : 8c4e > > > [ 195.502343] Lo : 4627 > > > [ 195.505212] epc : 80117868 r4k_blast_dcache_page_dc32+0x4/0x9c > > > [ 195.511217] ra : 80118208 local_r4k_flush_cache_page+0x120/0x1b8 > > > [ 195.517476] Status: 10001403 KERNEL EXL IE > > > [ 195.521657] Cause : 4080800c (ExcCode 03) > > > [ 195.525654] BadVA : 77eb8000 > > > [ 195.528523] PrId : 00d00100 (Ingenic XBurst) > > > [ 195.532866] Modules linked in: > > > [ 195.535911] Process Xsession (pid: 1461, threadinfo=00975a3e, > > > task=3724fd66, tls=77ebd690) > > > [ 195.544162] Stack : 808a05ec f7edcbfd 8306f800 8086 > > > 809faf60 > > > 80990a3c 80117f90 > > > [ 195.552524] 809faf60 82419600 8306f800 801fd84c > > > 801180b4 > > > 838ebe80 80110b7c > > > [ 195.560887] 80990a3c 82482ba0 82482ba0 77eb8000 4627 > > > f7edcbfd > > > 838ebe80 801cbc08 > > > [ 195.569249] 0001 181b2000 801fa06c > > > 83999ae0 > > > 8086 0004 > > > [ 195.577610] 80990a3c f7edcbfd 80990a3c 838ebe80 0004 > > > 80990a3c > > > 82482ba0 04627685 > > > [ 195.585973] ... > > > [ 195.588413] Call Trace: > > > [ 195.590849] [<80117868>] r4k_blast_dcache_page_dc32+0x4/0x9c > > > [ 195.596501] [<80118208>] local_r4k_flush_cache_page+0x120/0x1b8 > > > [ 195.602413] [<80117f90>] r4k_on_each_cpu.isra.8+0x24/0x58 > > > [ 195.607805] [<801180b4>] r4k_flush_cache_page+0x34/0x58 > > > [ 195.613023] [<801cbc08>] wp_page_copy+0x3a8/0x56c > > > [ 195.617723] [<801ce944>] do_swap_page+0x4cc/0x558 > > > [ 195.622419] [<801cf3f8>] handle_mm_fault+0x790/0x93c > > > [ 195.627374] [<8011025c>] do_page_fault+0x19c/0x540 > > > [ 195.632159] [<801142f0>] tlb_do_page_fault_1+0x10c/0x11c > > > [ 195.637465] > > > [ 195.638947] Code: 03e8 24831000 bc950020 > > > bc950040 bc950060 bc950080 bc9500a0 > > > [ 195.648706] > > > [ 195.650243] ---[ end trace 7cc7d7f611932c42 ]--- > > > [ 195.654857] Kernel panic - not syncing: Fatal exception > > > [ 195.660072] Rebooting in 10 seconds.. > > > > > > > > > this problem can be triggered stably (by use Microsoft Remote Desktop > > > client > > > to login to debian9 running on CU1830-Neo). > > > > > Could you print out the PTE value at 0x77eb8000 ? > > > Here is the new log: > > > [ 33.681712] CPU 0 Unable to handle kernel paging request at virtual > address 77ea4000, epc == 801178ac, ra == 80118250 > [ 33.692395] Oops[#1]: > [ 33.694662] CPU: 0 PID: 1389 Comm: Xsession Not tainted 5.12.0-rc8-dirty > #2 > [ 33.701612] $ 0 : 0001 801178a8 77ea5000 > [ 33.706839] $ 4 : 77ea4000 81bcd220 80118130 856712a0 > [ 33.712066] $ 8 : 833e4a80 8544b800 70a8 8544b800 > [ 33.717293] $12 : 0002 05b7 0001 > [ 33.722518] $16 : 81bcd220 77ea4000 80a11ad8 0004 > [ 33.727745] $20 : 77ea4000 81bcd220 856712a0 8086 > [ 33.732972] $24 : 001c 801178a8 > [ 33.738197] $28 : 82564000 82565d68 80118250 > [ 33.743424] Hi : f0cc > [ 33.746293] Lo : 7866 > [ 33.749162] epc : 801178ac r4k_blast_dcache_page_dc32+0x4/0x9c > [ 33.755166] ra : 80118250 local_r4k_flush_cache_page+0x120/0x2c8 > [ 33.761425] Status: 10001403 KERNEL EXL IE > [ 33.765605] Cause : 4080800c (ExcCode 03) > [ 33.769603] BadVA : 77ea4000 > [ 33.772472] PrId : 00d00100 (Ingenic XBurst) > [ 33.776816] Modules linked in: > [ 33.779861] Process Xsession (pid: 1389, threadinfo=c8bdf64c, > task=2372d853, tls=77ea9690) > [ 33.788111] Stack : 808a256c 808a256c bfa6939a 8544b800 8086 > 8094d308 80a11ad8 > [ 33.796474] 856712a0 80117fd8 8094d308 81bcd220 8544b800 801fdb10 > 80945ce8 801180fc > [ 33.804838] 82565e80 80110b8c 80a11ad8 856712a0 856712a0 77ea4000 > 7866 bfa6939a > [ 33.813201]
Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
Hi Matthew, On 4/16/21 7:45 PM, Matthew Wilcox wrote: > Replacement patch to fix compiler warning. > > From: "Matthew Wilcox (Oracle)" > Date: Fri, 16 Apr 2021 16:34:55 -0400 > Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems > To: bro...@redhat.com > Cc: linux-kernel@vger.kernel.org, > linux...@kvack.org, > net...@vger.kernel.org, > linuxppc-...@lists.ozlabs.org, > linux-arm-ker...@lists.infradead.org, > linux-m...@vger.kernel.org, > ilias.apalodi...@linaro.org, > mcr...@linux.microsoft.com, > grygorii.stras...@ti.com, > a...@kernel.org, > h...@lst.de, > linux-snps-...@lists.infradead.org, > mho...@kernel.org, > mgor...@suse.de > > 32-bit architectures which expect 8-byte alignment for 8-byte integers > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct > page inadvertently expanded in 2019. FWIW, ARC doesn't require 8 byte alignment for 8 byte integers. This is only needed for 8-byte atomics due to the requirements of LLOCKD/SCOND instructions. > When the dma_addr_t was added, > it forced the alignment of the union to 8 bytes, which inserted a 4 byte > gap between 'flags' and the union. > > Fix this by storing the dma_addr_t in one or two adjacent unsigned longs. > This restores the alignment to that of an unsigned long, and also fixes a > potential problem where (on a big endian platform), the bit used to denote > PageTail could inadvertently get set, and a racing get_user_pages_fast() > could dereference a bogus compound_head(). > > Fixes: c25fff7171be ("mm: add dma_addr_t to struct page") > Signed-off-by: Matthew Wilcox (Oracle) > --- > include/linux/mm_types.h | 4 ++-- > include/net/page_pool.h | 12 +++- > net/core/page_pool.c | 12 +++- > 3 files changed, 20 insertions(+), 8 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6613b26a8894..5aacc1c10a45 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -97,10 +97,10 @@ struct page { > }; > struct {/* page_pool used by netstack */ > /** > - * @dma_addr: might require a 64-bit value even on > + * @dma_addr: might require a 64-bit value on >* 32-bit architectures. >*/ > - dma_addr_t dma_addr; > + unsigned long dma_addr[2]; > }; > struct {/* slab, slob and slub */ > union { > diff --git a/include/net/page_pool.h b/include/net/page_pool.h > index b5b195305346..ad6154dc206c 100644 > --- a/include/net/page_pool.h > +++ b/include/net/page_pool.h > @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct > page_pool *pool, > > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16; > + return ret; > +} > + > +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr) > +{ > + page->dma_addr[0] = addr; > + if (sizeof(dma_addr_t) > sizeof(unsigned long)) > + page->dma_addr[1] = addr >> 16 >> 16; > } > > static inline bool is_page_pool_compiled_in(void) > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index ad8b0707af04..f014fd8c19a6 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct > page_pool *pool, > struct page *page, > unsigned int dma_sync_size) > { > + dma_addr_t dma_addr = page_pool_get_dma_addr(page); > + > dma_sync_size = min(dma_sync_size, pool->p.max_len); > - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr, > + dma_sync_single_range_for_device(pool->p.dev, dma_addr, >pool->p.offset, dma_sync_size, >pool->p.dma_dir); > } > @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct > page_pool *pool, > put_page(page); > return NULL; > } > - page->dma_addr = dma; > + page_pool_set_dma_addr(page, dma); > > if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) > page_pool_dma_sync_for_device(pool, page, pool->p.max_len); > @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, > struct page *page) >*/ > goto skip_dma_unmap; > > - dma = page->dma_addr; > + dma = page_pool_get_dma_addr(page); > > - /* When page is unmapped, it cannot be returned our pool */ > + /* When page is unmapped, it cannot be
Re: [PATCH] dt-bindings: net: mediatek: support MT7621 SoC
On Sun, Apr 18, 2021 at 11:24 PM Bjørn Mork wrote: > > Ilya Lipnitskiy writes: > > > Add missing binding documentation for SoC support that has been in place > > since v5.1 > > > > Fixes: 889bcbdeee57 ("net: ethernet: mediatek: support MT7621 SoC ethernet > > hardware") > > Cc: Bjørn Mork > > Signed-off-by: Ilya Lipnitskiy > > --- > > Documentation/devicetree/bindings/net/mediatek-net.txt | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt > > b/Documentation/devicetree/bindings/net/mediatek-net.txt > > index 72d03e07cf7c..950ef6af20b1 100644 > > --- a/Documentation/devicetree/bindings/net/mediatek-net.txt > > +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt > > @@ -10,6 +10,7 @@ Required properties: > > - compatible: Should be > > "mediatek,mt2701-eth": for MT2701 SoC > > "mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC > > + "mediatek,mt7621-eth": for MT7621 SoC > > "mediatek,mt7622-eth": for MT7622 SoC > > "mediatek,mt7629-eth": for MT7629 SoC > > "ralink,rt5350-eth": for Ralink Rt5350F and MT7628/88 SoC > > > Thanks for taking care of this! > > Note, however, that this compatible value is defined in > Documentation/devicetree/bindings/net/ralink,rt2880-net.txt > > I believe that file should go away. These two files are both documenting > the same compatible property AFAICS. Removed along with two others in https://lore.kernel.org/lkml/20210420024222.101615-1-ilya.lipnits...@gmail.com/T/#u Ilya
[PATCH] dt-bindings: net: mediatek/ralink: remove unused bindings
Revert commit 663148e48a66 ("Documentation: DT: net: add docs for ralink/mediatek SoC ethernet binding") No in-tree drivers use the compatible strings present in these bindings, and some have been superseded by DSA-capable mtk_eth_soc driver, so remove these obsolete bindings. Cc: John Crispin Signed-off-by: Ilya Lipnitskiy --- .../bindings/net/mediatek,mt7620-gsw.txt | 24 .../bindings/net/ralink,rt2880-net.txt| 59 --- .../bindings/net/ralink,rt3050-esw.txt| 30 -- 3 files changed, 113 deletions(-) delete mode 100644 Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt delete mode 100644 Documentation/devicetree/bindings/net/ralink,rt2880-net.txt delete mode 100644 Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt diff --git a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt b/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt deleted file mode 100644 index 358fed2fab43.. --- a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt +++ /dev/null @@ -1,24 +0,0 @@ -Mediatek Gigabit Switch -=== - -The mediatek gigabit switch can be found on Mediatek SoCs (mt7620, mt7621). - -Required properties: -- compatible: Should be "mediatek,mt7620-gsw" or "mediatek,mt7621-gsw" -- reg: Address and length of the register set for the device -- interrupts: Should contain the gigabit switches interrupt -- resets: Should contain the gigabit switches resets -- reset-names: Should contain the reset names "gsw" - -Example: - -gsw@1011 { - compatible = "ralink,mt7620-gsw"; - reg = <0x1011 8000>; - - resets = < 23>; - reset-names = "gsw"; - - interrupt-parent = <>; - interrupts = <17>; -}; diff --git a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt b/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt deleted file mode 100644 index 9fe1a0a22e44.. --- a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt +++ /dev/null @@ -1,59 +0,0 @@ -Ralink Frame Engine Ethernet controller -=== - -The Ralink frame engine ethernet controller can be found on Ralink and -Mediatek SoCs (RT288x, RT3x5x, RT366x, RT388x, rt5350, mt7620, mt7621, mt76x8). - -Depending on the SoC, there is a number of ports connected to the CPU port -directly and/or via a (gigabit-)switch. - -* Ethernet controller node - -Required properties: -- compatible: Should be one of "ralink,rt2880-eth", "ralink,rt3050-eth", - "ralink,rt3050-eth", "ralink,rt3883-eth", "ralink,rt5350-eth", - "mediatek,mt7620-eth", "mediatek,mt7621-eth" -- reg: Address and length of the register set for the device -- interrupts: Should contain the frame engines interrupt -- resets: Should contain the frame engines resets -- reset-names: Should contain the reset names "fe". If a switch is present - "esw" is also required. - - -* Ethernet port node - -Required properties: -- compatible: Should be "ralink,eth-port" -- reg: The number of the physical port -- phy-handle: reference to the node describing the phy - -Example: - -mdio-bus { - ... - phy0: ethernet-phy@0 { - phy-mode = "mii"; - reg = <0>; - }; -}; - -ethernet@40 { - compatible = "ralink,rt2880-eth"; - reg = <0x0040 1>; - - #address-cells = <1>; - #size-cells = <0>; - - resets = < 18>; - reset-names = "fe"; - - interrupt-parent = <>; - interrupts = <5>; - - port@0 { - compatible = "ralink,eth-port"; - reg = <0>; - phy-handle = <>; - }; - -}; diff --git a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt b/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt deleted file mode 100644 index 87e315856efa.. --- a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt +++ /dev/null @@ -1,30 +0,0 @@ -Ralink Fast Ethernet Embedded Switch - - -The ralink fast ethernet embedded switch can be found on Ralink and Mediatek -SoCs (RT3x5x, RT5350, MT76x8). - -Required properties: -- compatible: Should be "ralink,rt3050-esw" -- reg: Address and length of the register set for the device -- interrupts: Should contain the embedded switches interrupt -- resets: Should contain the embedded switches resets -- reset-names: Should contain the reset names "esw" - -Optional properties: -- ralink,portmap: can be used to choose if the default switch setup is - w or w -- ralink,led_polarity: override the active high/low settings of the leds - -Example: - -esw@1011 { - compatible = "ralink,rt3050-esw"; - reg = <0x1011 8000>; - - resets = < 23>; - reset-names = "esw"; - - interrupt-parent = <>; - interrupts = <17>; -}; -- 2.31.1
Re: [PATCH 1/3] powerpc/8xx: Enhance readability of trap types
On Mon, Apr 19, 2021 at 11:48 PM Christophe Leroy wrote: > > This patch makes use of trap types in head_8xx.S > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/interrupt.h | 29 > arch/powerpc/kernel/head_8xx.S | 49 ++-- > 2 files changed, 47 insertions(+), 31 deletions(-) > > diff --git a/arch/powerpc/include/asm/interrupt.h > b/arch/powerpc/include/asm/interrupt.h > index ed2c4042c3d1..cf2c5c3ae716 100644 > --- a/arch/powerpc/include/asm/interrupt.h > +++ b/arch/powerpc/include/asm/interrupt.h > @@ -2,13 +2,6 @@ > #ifndef _ASM_POWERPC_INTERRUPT_H > #define _ASM_POWERPC_INTERRUPT_H > > -#include > -#include > -#include > -#include > -#include > -#include > - > /* BookE/4xx */ > #define INTERRUPT_CRITICAL_INPUT 0x100 > > @@ -39,9 +32,11 @@ > /* BookE/BookS/4xx/8xx */ > #define INTERRUPT_DATA_STORAGE0x300 > #define INTERRUPT_INST_STORAGE0x400 > +#define INTERRUPT_EXTERNAL 0x500 > #define INTERRUPT_ALIGNMENT 0x600 > #define INTERRUPT_PROGRAM 0x700 > #define INTERRUPT_SYSCALL 0xc00 > +#define INTERRUPT_TRACE0xd00 The INTERRUPT_TRACE macro is defined in BookS section. In BookE, 0xd00 stands for debug interrupt, so I defined it as INTERRUPT_DEBUG. I understand they are similar things, but the terminologies are different in reference manuals. Regards, Xiongwei
Re: [PATCH 2/2] USB: misc: Add driver for the WCH CH341 in I2C/GPIO mode
Hi-- On 4/19/21 7:25 PM, Frank Zago wrote: > From: frank zago > > > Signed-off-by: Frank Zago > Signed-off-by: frank zago Don't think you need both of those. > --- > MAINTAINERS | 6 + > drivers/usb/misc/Kconfig| 18 ++ > drivers/usb/misc/Makefile | 1 + > drivers/usb/misc/ch341/Kconfig | 0 > drivers/usb/misc/ch341/Makefile | 3 + > drivers/usb/misc/ch341/ch341-core.c | 87 + > drivers/usb/misc/ch341/ch341-gpio.c | 249 ++ > drivers/usb/misc/ch341/ch341-i2c.c | 267 > drivers/usb/misc/ch341/ch341.h | 50 ++ > 9 files changed, 681 insertions(+) > create mode 100644 drivers/usb/misc/ch341/Kconfig > create mode 100644 drivers/usb/misc/ch341/Makefile > create mode 100644 drivers/usb/misc/ch341/ch341-core.c > create mode 100644 drivers/usb/misc/ch341/ch341-gpio.c > create mode 100644 drivers/usb/misc/ch341/ch341-i2c.c > create mode 100644 drivers/usb/misc/ch341/ch341.h > > diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig > index 8f1144359012..2d4db92f0de4 100644 > --- a/drivers/usb/misc/Kconfig > +++ b/drivers/usb/misc/Kconfig > @@ -284,3 +284,21 @@ config BRCM_USB_PINMAP > This option enables support for remapping some USB external > signals, which are typically on dedicated pins on the chip, > to any gpio. > + > +config USB_CH341_CORE > + tristate "USB WinChipHead CH341 in I2C/SPI/GPIO mode" > + depends on USB && GPIOLIB && I2C && SPI > + help > + > + If you say yes to this option, support for the CH341 chips, > + running in I2C/SPI/GPIO mode will be included. Some versions > + of the chip do not support all the functionnalities but > + there is no way to differentiate them. For instance the > + CH341A and CH341B support everything while the CH341T can > + only do I2C. > + > + The serial mode is not supported by this driver. Use the > + CH341 USB serial driver. > + > + This driver can also be built as a module. If so, the > + module will be called ch341-buses. Above should be in drivers/usb/misc/ch341/Kconfig file instead? > diff --git a/drivers/usb/misc/Makefile b/drivers/usb/misc/Makefile > index 5f4e598573ab..95c0ca15b8c9 100644 > --- a/drivers/usb/misc/Makefile > +++ b/drivers/usb/misc/Makefile > @@ -32,3 +32,4 @@ obj-$(CONFIG_USB_CHAOSKEY) += chaoskey.o > obj-$(CONFIG_USB_SISUSBVGA) += sisusbvga/ > obj-$(CONFIG_USB_LINK_LAYER_TEST)+= lvstest.o > obj-$(CONFIG_BRCM_USB_PINMAP)+= brcmstb-usb-pinmap.o > +obj-$(CONFIG_USB_CH341_CORE) += ch341/ > diff --git a/drivers/usb/misc/ch341/Kconfig b/drivers/usb/misc/ch341/Kconfig > new file mode 100644 > index ..e69de29bb2d1 Is that file (above) empty on purpose? > diff --git a/drivers/usb/misc/ch341/Makefile b/drivers/usb/misc/ch341/Makefile > new file mode 100644 > index ..7c6429e7a46e > --- /dev/null > +++ b/drivers/usb/misc/ch341/Makefile > @@ -0,0 +1,3 @@ > +obj-$(CONFIG_USB_CH341_CORE) := ch341-buses.o > + > +ch341-buses-objs := ch341-core.o ch341-i2c.o ch341-gpio.o -- ~Randy
[PATCH 1/2] Revert "USB: serial: ch341: add new Product ID for CH341A"
From: frank zago The 0x5512 USB PID is for the I2C/GPIO/SPI interfaces. UART is still present but only the TX and RX pins are available; DTS, DTR, ... are used for other things. Remove the PID, and let a I2C driver bind to it. Existing CH341 boards usually have physical jumpers to switch between the 3 modes. This reverts commit 46ee4abb10a07bd8f8ce910ee6b4ae6a947d7f63. Signed-off-by: Frank Zago Signed-off-by: frank zago --- drivers/usb/serial/ch341.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/serial/ch341.c b/drivers/usb/serial/ch341.c index 2db917eab799..235adc77ee0e 100644 --- a/drivers/usb/serial/ch341.c +++ b/drivers/usb/serial/ch341.c @@ -81,7 +81,6 @@ #define CH341_QUIRK_SIMULATE_BREAK BIT(1) static const struct usb_device_id id_table[] = { - { USB_DEVICE(0x1a86, 0x5512) }, { USB_DEVICE(0x1a86, 0x5523) }, { USB_DEVICE(0x1a86, 0x7522) }, { USB_DEVICE(0x1a86, 0x7523) }, -- 2.27.0
[PATCH 2/2] USB: misc: Add driver for the WCH CH341 in I2C/GPIO mode
From: frank zago The CH341 is a multifunction chip, presenting 3 different USB PID. One of these functions is for I2C/SPI/GPIO. This new driver manages I2C and GPIO. A future update will manage the SPI part as well. The I2C interface can run at 4 different speeds. This driver currently only offer 100MHz. Tested with a variety of I2C sensors, and the IIO subsystem. The GPIO interface offers 8 GPIOs. 6 are read/write, and 2 are rea-only. However the SPI interface will use 6 of them, leaving 2 available GPIOs. Signed-off-by: Frank Zago Signed-off-by: frank zago --- MAINTAINERS | 6 + drivers/usb/misc/Kconfig| 18 ++ drivers/usb/misc/Makefile | 1 + drivers/usb/misc/ch341/Kconfig | 0 drivers/usb/misc/ch341/Makefile | 3 + drivers/usb/misc/ch341/ch341-core.c | 87 + drivers/usb/misc/ch341/ch341-gpio.c | 249 ++ drivers/usb/misc/ch341/ch341-i2c.c | 267 drivers/usb/misc/ch341/ch341.h | 50 ++ 9 files changed, 681 insertions(+) create mode 100644 drivers/usb/misc/ch341/Kconfig create mode 100644 drivers/usb/misc/ch341/Makefile create mode 100644 drivers/usb/misc/ch341/ch341-core.c create mode 100644 drivers/usb/misc/ch341/ch341-gpio.c create mode 100644 drivers/usb/misc/ch341/ch341-i2c.c create mode 100644 drivers/usb/misc/ch341/ch341.h diff --git a/MAINTAINERS b/MAINTAINERS index 95e6766718b0..17e3d03cf9bf 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -19349,6 +19349,12 @@ M: David Härdeman S: Maintained F: drivers/media/rc/winbond-cir.c +WINCHIPHEAD CH341 I2C/GPIO/SPI DRIVER +M: Frank Zago +L: linux-...@vger.kernel.org +S: Maintained +F: drivers/usb/misc/ch341/ + WINSYSTEMS EBC-C384 WATCHDOG DRIVER M: William Breathitt Gray L: linux-watch...@vger.kernel.org diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig index 8f1144359012..2d4db92f0de4 100644 --- a/drivers/usb/misc/Kconfig +++ b/drivers/usb/misc/Kconfig @@ -284,3 +284,21 @@ config BRCM_USB_PINMAP This option enables support for remapping some USB external signals, which are typically on dedicated pins on the chip, to any gpio. + +config USB_CH341_CORE + tristate "USB WinChipHead CH341 in I2C/SPI/GPIO mode" + depends on USB && GPIOLIB && I2C && SPI + help + + If you say yes to this option, support for the CH341 chips, + running in I2C/SPI/GPIO mode will be included. Some versions + of the chip do not support all the functionnalities but + there is no way to differentiate them. For instance the + CH341A and CH341B support everything while the CH341T can + only do I2C. + + The serial mode is not supported by this driver. Use the + CH341 USB serial driver. + + This driver can also be built as a module. If so, the + module will be called ch341-buses. diff --git a/drivers/usb/misc/Makefile b/drivers/usb/misc/Makefile index 5f4e598573ab..95c0ca15b8c9 100644 --- a/drivers/usb/misc/Makefile +++ b/drivers/usb/misc/Makefile @@ -32,3 +32,4 @@ obj-$(CONFIG_USB_CHAOSKEY)+= chaoskey.o obj-$(CONFIG_USB_SISUSBVGA)+= sisusbvga/ obj-$(CONFIG_USB_LINK_LAYER_TEST) += lvstest.o obj-$(CONFIG_BRCM_USB_PINMAP) += brcmstb-usb-pinmap.o +obj-$(CONFIG_USB_CH341_CORE) += ch341/ diff --git a/drivers/usb/misc/ch341/Kconfig b/drivers/usb/misc/ch341/Kconfig new file mode 100644 index ..e69de29bb2d1 diff --git a/drivers/usb/misc/ch341/Makefile b/drivers/usb/misc/ch341/Makefile new file mode 100644 index ..7c6429e7a46e --- /dev/null +++ b/drivers/usb/misc/ch341/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_USB_CH341_CORE) := ch341-buses.o + +ch341-buses-objs := ch341-core.o ch341-i2c.o ch341-gpio.o diff --git a/drivers/usb/misc/ch341/ch341-core.c b/drivers/usb/misc/ch341/ch341-core.c new file mode 100644 index ..19c531715ea3 --- /dev/null +++ b/drivers/usb/misc/ch341/ch341-core.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for the CH341A, and CH341B USB to I2C/GPIO adapter + * Driver for the CH341T USB to I2C adapter + * + * Copyright 2021, Frank Zago + * Copyright (c) 2017 Gunar Schorcht (gu...@schorcht.net) + * Copyright (c) 2016 Tse Lun Bien + * Copyright (c) 2014 Marco Gittler + * Copyright (c) 2006-2007 Till Harbaum (t...@harbaum.org) + * + * The full UART functionality is handled by the CH341 serial driver + */ + +#include +#include + +#include "ch341.h" + +static void ch341_usb_free_device(struct ch341_device *dev) +{ + ch341_gpio_remove(dev); + ch341_i2c_remove(dev); + + usb_set_intfdata(dev->iface, NULL); + usb_put_dev(dev->usb_dev); + + kfree(dev); +} + +static int ch341_usb_probe(struct usb_interface *iface, + const struct usb_device_id *usb_id) +{ + struct ch341_device *dev; +
Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc
On 2021/4/20 7:55, Michal Kubecek wrote: > On Mon, Apr 19, 2021 at 05:29:46PM +0200, Michal Kubecek wrote: >> >> As pointed out in the discussion on v3, this patch may result in >> significantly higher CPU consumption with multiple threads competing on >> a saturated outgoing device. I missed this submission so that I haven't >> checked it yet but given the description of v3->v4 changes above, it's >> quite likely that it suffers from the same problem. > > And it indeed does. However, with the additional patch from the v3 > discussion, the numbers are approximately the same as with an unpatched > mainline kernel. > > As with v4, I tried this patch on top of 5.12-rc7 with real devices. > I used two machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs > (2 8-core CPUs with HT disabled) and 16 Rx/Tx queues, receiver has > 48 CPUs (2 12-core CPUs with HT enabled) and 48 Rx/Tx queues. > > threads5.12-rc75.12-rc7 + v45.12-rc7 + v4 + stop > 125.1% 38.1%22.9% > 866.2% 277.0%74.1% > 1690.1% 150.7%91.0% > 32 107.2% 272.6% 108.3% > 64 116.3% 487.5% 118.1% >128 126.1% 946.7% 126.9% > > (The values are normalized to one core, i.e. 100% corresponds to one > fully used logical CPU.) > > So it seems that repeated scheduling while the queue was stopped is > indeed the main performance issue and that other cases of the logic > being too pessimistic do not play significant role. There is an > exception with 8 connections/threads and the result with just this > series also looks abnormally high (e.g. much higher than with > 16 threads). It might be worth investigating what happens there and > what do the results with other thread counts around 8 look like. Will try to investigate the 8 connections/threads case. > > I'll run some more tests with other traffic patterns tomorrow and > I'm also going to take a closer look at the additional patch. Thanks for taking the detail testing and looking. > > Michal > > . >
Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface
Hi Keqian, On 4/20/21 9:25 AM, Keqian Zhu wrote: Hi Baolu, On 2021/4/19 21:33, Lu Baolu wrote: Hi Keqian, On 2021/4/19 17:32, Keqian Zhu wrote: +EXPORT_SYMBOL_GPL(iommu_split_block); Do you really have any consumers of this interface other than the dirty bit tracking? If not, I don't suggest to make this as a generic IOMMU interface. There is an implicit requirement for such interfaces. The iommu_map/unmap(iova, size) shouldn't be called at the same time. Currently there's no such sanity check in the iommu core. A poorly written driver could mess up the kernel by misusing this interface. Yes, I don't think up a scenario except dirty tracking. Indeed, we'd better not make them as a generic interface. Do you have any suggestion that underlying iommu drivers can share these code but not make it as a generic iommu interface? I have a not so good idea. Make the "split" interfaces as a static function, and transfer the function pointer to start_dirty_log. But it looks weird and inflexible. I understand splitting/merging super pages is an optimization, but not a functional requirement. So is it possible to let the vendor iommu driver decide whether splitting super pages when starting dirty bit tracking and the opposite operation during when stopping it? The requirement for Right. If I understand you correct, actually that is what this series does. I mean to say no generic APIs, jut do it by the iommu subsystem itself. It's totally transparent to the upper level, just like what map() does. The upper layer doesn't care about either super page or small page is in use when do a mapping, right? If you want to consolidate some code, how about putting them in start/stop_tracking()? Best regards, baolu We realized split/merge in IOMMU core layer, but don't force vendor driver to use it. The problem is that when we expose these interfaces to vendor IOMMU driver, will also expose them to upper driver. upper layer is that starting/stopping dirty bit tracking and mapping/unmapping are mutually exclusive. OK, I will explicitly add the hints. Thanks. Thanks, Keqian On the other hand, if a driver calls map/unmap with split/merge at the same time, it's a bug of driver, it should follow the rule. Best regards, baolu .
Re: [PATCH 0/5] Bring the BusLogic host bus adapter driver up to Y2021
On 4/19/21 10:01 AM, Maciej W. Rozycki wrote: > On Mon, 19 Apr 2021, Khalid Aziz wrote: > >> On 4/18/21 2:21 PM, Ondrej Zary wrote: >>> >>> Found the 3000763 document here: >>> https://doc.lagout.org/science/0_Computer Science/0_Computer >>> History/old-hardware/buslogic/3000763_PCI_EISA_Wide_SCSI_Tech_Ref_Dec94.pdf >>> >>> There's also 3002593 there: >>> https://doc.lagout.org/science/0_Computer Science/0_Computer >>> History/old-hardware/buslogic/ >>> >> >> Thanks!!! > > Ondrej: Thanks a lot indeed! These documents seem to have the essential > interface details all covered, except for Fast-20 SCSI adapters, which I > guess are a minor modification from the software's point of view. > > Khalid: I have skimmed over these documents and I infer 24-bit addressing > can be verified with any MultiMaster adapter, including ones that do have > 32-bit addressing implemented, by using the legacy Initialize Mailbox HBA > command. That could be used to stop Christoph's recent changes for older > adapter support removal and replace them with proper fixes for whatever > has become broken. Is that something you'd be willing as the driver's > maintainer to look into, or shall I? > Hi Maciej, Do you mean use OpCode 01 (INITIALIZE MAILBOX) to set a 24-bit address for mailbox in place of OpCode 81? Verifying the change would be a challenge. Do you have an old adapter to test it with? If you do, go ahead and make the changes. I will be happy to review. I have only a BT-757 adapter. Thanks, Khalid
Re: [PATCH] MIPS: Fix cmdline "mem=" parameter parsing
Hi, Jiaxun On 04/20/2021 09:05 AM, Jiaxun Yang wrote: 在 2021/4/19 18:50, Youling Tang 写道: This problem may only occur on NUMA platforms. When machine start with the "mem=" parameter on Loongson64, it cannot boot. When parsing the "mem=" parameter, first remove all RAM, and then add memory through memblock_add(), which causes the newly added memory to be located on MAX_NUMNODES. The solution is to add the current "mem=" parameter range to the memory area of the corresponding node, instead of adding all of it to the MAX_NUMNODES node area. Get the node number corresponding to the "mem=" parameter range through pa_to_nid(), and then add it to the corresponding node through memblock_add_node(). Signed-off-by: Jinyang He Signed-off-by: Youling Tang --- arch/mips/kernel/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 279be01..b86e241 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -359,7 +359,7 @@ static int __init early_parse_mem(char *p) if (*p == '@') start = memparse(p + 1, ); -memblock_add(start, size); +memblock_add_node(start, size, pa_to_nid(start)); pa_to_nid is not available for all platforms. Thanks for your correction. pa_to_nid() only has actual definitions in mach-ip27 and mach-loongson64 (only for NUMA platform). In arch/mips/include/asm/mmzone.h: #ifndef pa_to_nid #define pa_to_nid(addr) 0 #endif So only need #include to solve the "error: implicit declaration of function'pa_to_nid'" compilation error. Thanks, Youling Thanks. - Jiaxun return 0; }
Re: [v9,0/7] PCI: mediatek: Add new generation controller support
On Mon, 2021-04-19 at 11:44 +0100, Lorenzo Pieralisi wrote: > On Fri, Apr 16, 2021 at 02:21:00PM -0500, Bjorn Helgaas wrote: > > On Wed, Mar 24, 2021 at 11:05:03AM +0800, Jianjun Wang wrote: > > > These series patches add pcie-mediatek-gen3.c and dt-bindings file to > > > support new generation PCIe controller. > > > > Incidental: b4 doesn't work on this thread, I suspect because the > > usual subject line format is: > > > > [PATCH v9 9/7] > > > > instead of: > > > > [v9,0/7] > > > > For b4 info, see > > https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst > > Jianjun will update the series accordingly (and please add to v10 the > review tags you received. > > Lorenzo Yes, I will update this series in v10 to fix the subject line format and use EXPORT_SYMBOL_GPL(), thanks for your comments. Thanks.
Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: > This patch suggests to do page table walk to find the error virtual > address. If we find multiple virtual addresses in walking, we now can't > determine which one is correct, so we fall back to sending SIGBUS in > kill_me_maybe() without error info as we do now. This corner case needs > to be solved in the future. Instead of walking the page tables, I wonder what about the following idea: When failing to get vaddr, memory_failure just ensures the mapping is removed and an hwpoisoned swap pte is put in place; or the original page is flagged with PG_HWPOISONED and kept in the radix tree (e.g., for SHMEM THP). NOTE: no SIGBUS is sent to user space. Then do_machine_check just returns to user space to resume execution, the re-execution will result in a #PF and should land to the exact page fault handling code that generates a SIGBUS with the precise vaddr info: https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3290 https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3647 Thanks, -Jue
Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
On Mon, 2021-04-19 at 20:39 -0500, Rob Herring wrote: > On Mon, Apr 19, 2021 at 7:35 PM Leonardo Bras wrote: > > > > On Mon, 2021-04-19 at 10:44 -0500, Rob Herring wrote: > > > On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras wrote: > > > > > > > > Hello Rob, thanks for this feedback! > > > > > > > > On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote: > > > > > +PPC and PCI lists > > > > > > > > > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras > > > > > wrote: > > > > > > > > > > > > Many other resource flag parsers already add this flag when the > > > > > > input > > > > > > has bits 24 & 25 set, so update this one to do the same. > > > > > > > > > > Many others? Looks like sparc and powerpc to me. > > > > > > > > > > > > > s390 also does that, but it look like it comes from a device-tree. > > > > > > I'm only looking at DT based platforms, and s390 doesn't use DT. > > > > Correct. > > Sorry, I somehow write above the opposite of what I was thinking. > > > > > > > > > > Those would be the > > > > > ones I worry about breaking. Sparc doesn't use of/address.c so it's > > > > > fine. Powerpc version of the flags code was only fixed in 2019, so I > > > > > don't think powerpc will care either. > > > > > > > > In powerpc I reach this function with this stack, while configuring a > > > > virtio-net device for a qemu/KVM pseries guest: > > > > > > > > pci_process_bridge_OF_ranges+0xac/0x2d4 > > > > pSeries_discover_phbs+0xc4/0x158 > > > > discover_phbs+0x40/0x60 > > > > do_one_initcall+0x60/0x2d0 > > > > kernel_init_freeable+0x308/0x3a8 > > > > kernel_init+0x2c/0x168 > > > > ret_from_kernel_thread+0x5c/0x70 > > > > > > > > For this, both MMIO32 and MMIO64 resources will have flags 0x200. > > > > > > Oh good, powerpc has 2 possible flags parsing functions. So in the > > > above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64? > > > > > > Does pci_parse_of_flags() get called in your case? > > > > > > > It's called in some cases, but not for the device I am debugging > > (virtio-net pci@8002000). > > > > For the above device, here is an expanded stack trace: > > > > of_bus_pci_get_flags() (from parser->bus->get_flags()) > > of_pci_range_parser_one() (from macro for_each_of_pci_range) > > pci_process_bridge_OF_ranges+0xac/0x2d4 > > pSeries_discover_phbs+0xc4/0x158 > > discover_phbs+0x40/0x60 > > do_one_initcall+0x60/0x2d0 > > kernel_init_freeable+0x308/0x3a8 > > kernel_init+0x2c/0x168 > > ret_from_kernel_thread+0x5c/0x70 > > > > For other devices, I could also see the following stack trace: > > ## device ethernet@8 > > > > pci_parse_of_flags() > > of_create_pci_dev+0x7f0/0xa40 > > __of_scan_bus+0x248/0x320 > > pcibios_scan_phb+0x370/0x3b0 > > pcibios_init+0x8c/0x12c > > do_one_initcall+0x60/0x2d0 > > kernel_init_freeable+0x308/0x3a8 > > kernel_init+0x2c/0x168 > > ret_from_kernel_thread+0x5c/0x70 > > > > Devices that get parsed with of_bus_pci_get_flags() appears first at > > dmesg (around 0.015s in my test), while devices that get parsed by > > pci_parse_of_flags() appears later (0.025s in my test). > > > > I am not really used to this code, but having the term "discover phbs" > > in the first trace and the term "scan phb" in the second, makes me > > wonder if the first trace is seen on devices that are seen/described in > > the device-tree and the second trace is seen in devices not present in > > the device-tree and found scanning pci bus. > > That was my guess as well. I think on pSeries that most PCI devices > are in the DT whereas on Arm and other flattened DT (non OpenFirmware) > platforms PCI devices are not in DT. > It makes sense to me. > Of course, for virtio devices, > they would not be in DT in either case. I don't get this part... in pseries it looks like virtio devices can be in device-tree. Oh, I think I get it... this pci@8002000 looks like a bus (described in device-tree, so discovered), and then the devices are inside it, getting scanned. The virtio device gets the correct flags (from pci_parse_of_flags), but the bus (pci@8002000) does not seem to get it correctly, because it comes from of_bus_pci_get_flags() which makes sense according to the name of the function. (see lspci bellow, output without patch) 00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) Subsystem: Red Hat, Inc. Device 1100 Device tree node: /sys/firmware/devicetree/base/pci@8002000/ethernet@8 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- BAR=0 offset= size= Capabilities: [70] Vendor Specific Information: VirtIO: Notify BAR=4 offset=3000 size=1000 multiplier=0004 Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg BAR=4 offset=2000 size=1000 Capabilities:
Re: [PATCH v2] tools/power turbostat: Fix RAPL summary collection on AMD processors
On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote: > Turbostat fails to correctly collect and display RAPL summary information > on Family 17h and 19h AMD processors. Running turbostat on these processors > returns immediately. If turbostat is working correctly then RAPL summary > data is displayed until the user provided command completes. If a command > is not provided by the user then turbostat is designed to continuously > display RAPL information until interrupted. > > The issue is due to offset_to_idx() and idx_to_offset() missing support for > AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing > cases for AMD MSRs and idx_to_offset() does not include a path to return > AMD MSR(s) for any idx. > > The solution is add AMD MSR support to offset_to_idx() and idx_to_offset(). > These functions are split-out and renamed along architecture vendor lines > for supporting both AMD and Intel MSRs. > > Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display") > Signed-off-by: Terry Bowman Thanks for fixing, Terry, and previously there was a patch for this from Bas Nieuwenhuizen: https://lkml.org/lkml/2021/3/12/682 and it is expected to have been merged in Len's branch already. thanks, Chenyu
Re: [syzbot] INFO: task hung in __io_uring_cancel
Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll == BUG: KASAN: null-ptr-deref in instrument_atomic_read_write include/linux/instrumented.h:101 [inline] BUG: KASAN: null-ptr-deref in atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x150/0x310 fs/io_uring.c:8930 Write of size 4 at addr 0114 by task iou-sqp-31588/31596 CPU: 0 PID: 31596 Comm: iou-sqp-31588 Not tainted 5.12.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:403 [inline] kasan_report.cold+0x5f/0xd8 mm/kasan/report.c:416 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_uring_cancel_sqpoll+0x150/0x310 fs/io_uring.c:8930 io_sq_thread+0x47e/0x1310 fs/io_uring.c:6873 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 == Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 31596 Comm: iou-sqp-31588 Tainted: GB 5.12.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 panic+0x306/0x73d kernel/panic.c:231 end_report mm/kasan/report.c:102 [inline] end_report.cold+0x5a/0x5a mm/kasan/report.c:88 __kasan_report mm/kasan/report.c:406 [inline] kasan_report.cold+0x6a/0xd8 mm/kasan/report.c:416 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline] io_uring_cancel_sqpoll+0x150/0x310 fs/io_uring.c:8930 io_sq_thread+0x47e/0x1310 fs/io_uring.c:6873 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Kernel Offset: disabled Rebooting in 86400 seconds.. Tested on: commit: 734551df io_uring: fix shared sqpoll cancellation hangs git tree: git://git.kernel.dk/linux-block for-5.13/io_uring console output: https://syzkaller.appspot.com/x/log.txt?x=175fec6dd0 kernel config: https://syzkaller.appspot.com/x/.config?x=601d16d8cd22e315 dashboard link: https://syzkaller.appspot.com/bug?extid=47fc00967b06a3019bd2 compiler:
Re:Re: [PATCH] libnvdimm.h: Remove duplicate struct declaration
>On Mon, Apr 19, 2021 at 07:27:25PM +0800, Wan Jiabing wrote:>> struct device >is declared at 133rd line. >> The declaration here is unnecessary. Remove it. >> >> Signed-off-by: Wan Jiabing >> --- >> include/linux/libnvdimm.h | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h >> index 01f251b6e36c..89b69e645ac7 100644 >> --- a/include/linux/libnvdimm.h >> +++ b/include/linux/libnvdimm.h >> @@ -141,7 +141,6 @@ static inline void __iomem *devm_nvdimm_ioremap(struct >> device *dev, >> >> struct nvdimm_bus; >> struct module; >> -struct device; >> struct nd_blk_region; > >What is the coding style preference for pre-declarations like this? Should >they be placed at the top of the file? > >The patch is reasonable but if the intent is to declare right before use for >clarity, both devm_nvdimm_memremap() and nd_blk_region_desc() use struct >device. So perhaps this duplicate is on purpose? > >Ira OK, my script just catch this duplicate. And I will report the duplicate if there is no MACRO dependence. But I hadn't thought of whether the duplicate is a prompt on purpose. Sorry. Thanks for your reply. Wan Jiabing >> struct nd_blk_region_desc { >> int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev); >> -- >> 2.25.1 >>
Re: [PATCH] f2fs: fix to cover allocate_segment() with lock
On 2021/4/20 0:57, Jaegeuk Kim wrote: On 04/14, Chao Yu wrote: As we did for other cases, in fix_curseg_write_pointer(), let's change as below: - use callback function s_ops->allocate_segment() instead of raw function allocate_segment_by_default(); - cover allocate_segment() with curseg_lock and sentry_lock. Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b2ee6b7791b0..daf9531ec58f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) f2fs_notice(sbi, "Assign new section to curseg[%d]: " "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); - allocate_segment_by_default(sbi, type, true); + + down_read(_I(sbi)->curseg_lock); + down_write(_I(sbi)->sentry_lock); + SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true); + up_write(_I(sbi)->sentry_lock); + up_read(_I(sbi)->curseg_lock); Seems f2fs_allocate_new_section()? f2fs_allocate_new_section() will allocate new section only when current section has been initialized and has valid block/ckpt_block. It looks fix_curseg_write_pointer() wants to force migrating current segment to new section whenever write pointer and curseg->next_blkoff is inconsistent. So how about adding a parameter to force f2fs_allocate_new_section() to allocate new section? Thanks, /* check consistency of the zone curseg pointed to */ if (check_zone_write_pointer(sbi, zbd, )) -- 2.29.2 .
Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: ... > + * This function is intended to handle "Action Required" MCEs on already > + * hardware poisoned pages. They could happen, for example, when > + * memory_failure() failed to unmap the error page at the first call, or > + * when multiple Action Optional MCE events races on different CPUs with > + * Local MCE enabled. +Tony Luck Hey Tony, I thought SRAO MCEs are broadcasted to all cores in the system as they come without an execution context, is it correct? If Yes, Naoya, I think we might want to remove the comments about the "multiple Action Optional MCE racing" part. Best, -Jue
Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc
On 2021/4/19 22:57, Michal Kubecek wrote: > On Mon, Apr 19, 2021 at 10:04:27AM +0800, Yunsheng Lin wrote: >>> >>> I tried this patch o top of 5.12-rc7 with real devices. I used two >>> machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs (2 8-core CPUs >>> with HT disabled) and 16 Rx/Tx queues, receiver has 48 CPUs (2 12-core >>> CPUs with HT enabled) and 48 Rx/Tx queues. With multiple TCP streams on >>> a saturated ethernet, the CPU consumption grows quite a lot: >>> >>> threads unpatched 5.12-rc75.12-rc7 + v3 >>> 1 25.6% 30.6% >>> 8 73.1% 241.4% >>> 128 132.2% 1012.0% >>> >>> (The values are normalized to one core, i.e. 100% corresponds to one >>> fully used logical CPU.) I didn't perform a full statistical evaluation >>> but the growth is way beyond any statistical fluctuation with one >>> exception: 8-thread test of patched kernel showed values from 155.5% to >>> 311.4%. Closer look shows that most of the CPU time was spent in softirq >>> and running top in parallel with the test confirms that there are >>> multiple ksofirqd threads running at 100% CPU. I had similar problem >>> with earlier versions of my patch (work in progress, I still need to >>> check some corner cases and write commit message explaining the logic) >> >> Great, if there is a better idea, maybe share the core idea first so >> that we both can work on the that? > > I'm not sure if it's really better but to minimize the false positives > and unnecessary calls to __netif_schedule(), I replaced q->seqlock with > an atomic combination of a "running" flag (which corresponds to current > seqlock being locked) and a "drainers" count (number of other threads > going to clean up the qdisc queue). This way we could keep track of them > and get reliable information if another thread is going to run a cleanup > after we leave the qdisc_run() critical section (so that there is no > need to schedule). It seems you are trying to match the skb enqueuing with the calling of __qdisc_run() here, which is not reliable when considering the dequeue batching, see try_bulk_dequeue_skb() or try_bulk_dequeue_skb_slow() in dequeue_skb(). > >>> The biggest problem IMHO is that the loop in __qdisc_run() may finish >>> without rescheduling not only when the qdisc queue is empty but also >>> when the corresponding device Tx queue is stopped which devices tend to >>> do whenever they cannot send any more packets out. Thus whenever >>> __QDISC_STATE_NEED_RESCHEDULE is set with device queue stopped or >>> frozen, we keep rescheduling the queue cleanup without any chance to >>> progress or clear __QDISC_STATE_NEED_RESCHEDULE. For this to happen, all >>> we need is another thready to fail the first spin_trylock() while device >>> queue is stopped and qdisc queue not empty. >> >> Yes, We could just return false before doing the second spin_trylock() if >> the netdev queue corresponding qdisc is stopped, and when the netdev queue >> is restarted, __netif_schedule() is called again, see netif_tx_wake_queue(). >> >> Maybe add a sch_qdisc_stopped() function and do the testting in >> qdisc_run_begin: >> >> if (dont_retry || sch_qdisc_stopped()) >> return false; >> >> bool sch_qdisc_stopped(struct Qdisc *q) >> { >> const struct netdev_queue *txq = q->dev_queue; >> >> if (!netif_xmit_frozen_or_stopped(txq)) >> return true; >> >> reture false; >> } >> >> At least for qdisc with TCQ_F_ONETXQUEUE flags set is doable? > > Either this or you can do the check in qdisc_run_end() - when the device > queue is stopped or frozen, there is no need to schedule as we know it's > going to be done when the flag is cleared again (and we cannot do > anything until then anyway). > >>> Another part of the problem may be that to avoid the race, the logic is >>> too pessimistic: consider e.g. (dotted lines show "barriers" where >>> ordering is important): >>> >>> CPU ACPU B >>> spin_trylock() succeeds >>> pfifo_fast_enqueue() >>> .. >>> skb_array empty, exit loop >>> first spin_trylock() fails >>> set __QDISC_STATE_NEED_RESCHEDULE >>> second spin_trylock() fails >>> .. >>> spin_unlock() >>> call __netif_schedule() >>> >>> When we switch the order of spin_lock() on CPU A and second >>> spin_trylock() on CPU B (but keep setting __QDISC_STATE_NEED_RESCHEDULE >>> before CPU A tests it), we end up scheduling a queue cleanup even if >>> there is already one running. And either of these is quite realistic. >> >> But I am not sure it is a good thing or bad thing when the above happen, >> because it may be able to enable the
[RFC] memory reserve for userspace oom-killer
Proposal: Provide memory guarantees to userspace oom-killer. Background: Issues with kernel oom-killer: 1. Very conservative and prefer to reclaim. Applications can suffer for a long time. 2. Borrows the context of the allocator which can be resource limited (low sched priority or limited CPU quota). 3. Serialized by global lock. 4. Very simplistic oom victim selection policy. These issues are resolved through userspace oom-killer by: 1. Ability to monitor arbitrary metrics (PSI, vmstat, memcg stats) to early detect suffering. 2. Independent process context which can be given dedicated CPU quota and high scheduling priority. 3. Can be more aggressive as required. 4. Can implement sophisticated business logic/policies. Android's LMKD and Facebook's oomd are the prime examples of userspace oom-killers. One of the biggest challenges for userspace oom-killers is to potentially function under intense memory pressure and are prone to getting stuck in memory reclaim themselves. Current userspace oom-killers aim to avoid this situation by preallocating user memory and protecting themselves from global reclaim by either mlocking or memory.min. However a new allocation from userspace oom-killer can still get stuck in the reclaim and policy rich oom-killer do trigger new allocations through syscalls or even heap. Our attempt of userspace oom-killer faces similar challenges. Particularly at the tail on the very highly utilized machines we have observed userspace oom-killer spectacularly failing in many possible ways in the direct reclaim. We have seen oom-killer stuck in direct reclaim throttling, stuck in reclaim and allocations from interrupts keep stealing reclaimed memory. We have even observed systems where all the processes were stuck in throttle_direct_reclaim() and only kswapd was running and the interrupts kept stealing the memory reclaimed by kswapd. To reliably solve this problem, we need to give guaranteed memory to the userspace oom-killer. At the moment we are contemplating between the following options and I would like to get some feedback. 1. prctl(PF_MEMALLOC) The idea is to give userspace oom-killer (just one thread which is finding the appropriate victims and will be sending SIGKILLs) access to MEMALLOC reserves. Most of the time the preallocation, mlock and memory.min will be good enough but for rare occasions, when the userspace oom-killer needs to allocate, the PF_MEMALLOC flag will protect it from reclaim and let the allocation dip into the memory reserves. The misuse of this feature would be risky but it can be limited to privileged applications. Userspace oom-killer is the only appropriate user of this feature. This option is simple to implement. 2. Mempool The idea is to preallocate mempool with a given amount of memory for userspace oom-killer. Preferably this will be per-thread and oom-killer can preallocate mempool for its specific threads. The core page allocator can check before going to the reclaim path if the task has private access to the mempool and return page from it if yes. This option would be more complicated than the previous option as the lifecycle of the page from the mempool would be more sophisticated. Additionally the current mempool does not handle higher order pages and we might need to extend it to allow such allocations. Though this feature might have more use-cases and it would be less risky than the previous option. Another idea I had was to use kthread based oom-killer and provide the policies through eBPF program. Though I am not sure how to make it monitor arbitrary metrics and if that can be done without any allocations. Please do provide feedback on these approaches. thanks, Shakeel
[PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically
If a malicious or compromised Hyper-V sends a spurious message of type CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will call complete() on an uninitialized event, and cause an oops. Reported-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) --- Changes since v1[1]: - add inline comment in vmbus_unload_response() [1] https://lkml.kernel.org/r/20210416143932.16512-1-parri.and...@gmail.com drivers/hv/channel_mgmt.c | 7 ++- drivers/hv/connection.c | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 4c9e45d1f462c..335a10ee03a5e 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -826,6 +826,11 @@ static void vmbus_unload_response(struct vmbus_channel_message_header *hdr) /* * This is a global event; just wakeup the waiting thread. * Once we successfully unload, we can cleanup the monitor state. +* +* NB. A malicious or compromised Hyper-V could send a spurious +* message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call +* of the complete() below. Make sure that unload_event has been +* initialized by the time this complete() is executed. */ complete(_connection.unload_event); } @@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash) if (vmbus_proto_version < VERSION_WIN8_1) return; - init_completion(_connection.unload_event); + reinit_completion(_connection.unload_event); memset(, 0, sizeof(struct vmbus_channel_message_header)); hdr.msgtype = CHANNELMSG_UNLOAD; vmbus_post_msg(, sizeof(struct vmbus_channel_message_header), diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index dc19d5ae4373c..311cd005b3be6 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -26,6 +26,8 @@ struct vmbus_connection vmbus_connection = { .conn_state = DISCONNECTED, + .unload_event = COMPLETION_INITIALIZER( + vmbus_connection.unload_event), .next_gpadl_handle = ATOMIC_INIT(0xE1E10), .ready_for_suspend_event = COMPLETION_INITIALIZER( -- 2.25.1