Re: [PATCH v1] dmaengine: imx-sdma: refine to load context only once
On 06-12-18, 06:36, Robin Gong wrote: > The context loaded only one time before channel running,but ^^ space after comma please > currently sdma_config_channel() and dma_prep_* duplicated with > sdma_load_context(), so refine it to load context only one time > before channel running and reload after the channel terminated. I was going to apply it, but it fails to apply for me, please rebase and send -- ~Vinod
Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent
On Fri, Jan 4, 2019 at 6:43 PM John Garry wrote: > > On 04/01/2019 12:48, Sabyasachi Gupta wrote: > > On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta > > wrote: > >> > >> On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta > >> wrote: > >>> > >>> On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta > >>> wrote: > > Replace dma_alloc_coherent + memset with dma_zalloc_coherent > > > If you're going to make this change, then how about change these to the > managed version, so that we can avoid the explicit free'ing at driver > removal? I can't get it > > Signed-off-by: Sabyasachi Gupta > >>> > >>> Any comment on this patch? > >> > >> Any comment on this patch? > > > > Any comment on this patch? > > > >> > >>> > --- > drivers/scsi/mvsas/mv_init.c | 12 > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c > index 3ac3437..495bddb 100644 > --- a/drivers/scsi/mvsas/mv_init.c > +++ b/drivers/scsi/mvsas/mv_init.c > @@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct > Scsi_Host *shost) > /* > * alloc and init our DMA areas > */ > - mvi->tx = dma_alloc_coherent(mvi->dev, > + mvi->tx = dma_zalloc_coherent(mvi->dev, > sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ, > >tx_dma, GFP_KERNEL); > > I'm guessing that this does not pass checkpatch with --strict option. > > Thanks, > John I have not not checked with --strict option > > if (!mvi->tx) > goto err_out; > - memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ); > - mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ, > + mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ, > >rx_fis_dma, GFP_KERNEL); > if (!mvi->rx_fis) > goto err_out; > - memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ); > > - mvi->rx = dma_alloc_coherent(mvi->dev, > + mvi->rx = dma_zalloc_coherent(mvi->dev, > sizeof(*mvi->rx) * (MVS_RX_RING_SZ > + 1), > >rx_dma, GFP_KERNEL); > if (!mvi->rx) > goto err_out; > - memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1)); > mvi->rx[0] = cpu_to_le32(0xfff); > mvi->rx_cons = 0xfff; > > - mvi->slot = dma_alloc_coherent(mvi->dev, > + mvi->slot = dma_zalloc_coherent(mvi->dev, > sizeof(*mvi->slot) * slot_nr, > >slot_dma, GFP_KERNEL); > if (!mvi->slot) > goto err_out; > - memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr); > > mvi->bulk_buffer = dma_alloc_coherent(mvi->dev, > TRASH_BUCKET_SIZE, > -- > 2.7.4 > > > > > . > > > >
Re: [PATCHv3 1/2] mm/memblock: extend the limit inferior of bottom-up after parsing hotplug attr
On Thu, Jan 03, 2019 at 10:47:06AM -0800, Tejun Heo wrote: > Hello, > > On Wed, Jan 02, 2019 at 07:05:38PM +0200, Mike Rapoport wrote: > > I agree that currently the bottom-up allocation after the kernel text has > > issues with KASLR. But this issues are not necessarily related to the > > memory hotplug. Even with a single memory node, a bottom-up allocation will > > fail if KASLR would put the kernel near the end of node0. > > > > What I am trying to understand is whether there is a fundamental reason to > > prevent allocations from [0, kernel_start)? > > > > Maybe Tejun can recall why he suggested to start bottom-up allocations from > > kernel_end. > > That's from 79442ed189ac ("mm/memblock.c: introduce bottom-up > allocation mode"). I wasn't involved in that patch, so no idea why > the restrictions were added, but FWIW it doesn't seem necessary to me. I should have added the reference [1] at the first place :) Thanks! [1] https://lore.kernel.org/lkml/20130904192215.gg26...@mtj.dyndns.org/ > Thanks. > > -- > tejun > -- Sincerely yours, Mike.
Re: [PATCH v5 05/11] platform/x86: intel_pmc: Make PCI dependency explicit
On 1/4/2019 9:20 AM, Andy Shevchenko wrote: Unfortunately, this is not how it should be fixed. First of all, we (Heikki, Sathya, me — all in Cc list) that this driver should be refactored and unified with intel_scu_ipc. Then, it has two drivers inside and PCI is just a glue for it which is optional. Thus, needs to be split accordingly. OK. Two questions: 1. How should I change this now? 2. Can you change the Kconfig accordingly after you merge the two drivers? I'm focused on the build failure at this moment.
Re: [PATCH] riscv: remove redundant kernel-space generic-y
On Fri, Jan 4, 2019 at 11:16 AM Palmer Dabbelt wrote: > > On Sun, 16 Dec 2018 06:11:11 PST (-0800), yamada.masah...@socionext.com wrote: > > This commit removes redundant generic-y defines in > > arch/riscv/include/asm/Kbuild. > > > > [1] It is redundant to define the same generic-y in both > > arch/$(ARCH)/include/asm/Kbuild and > > arch/$(ARCH)/include/uapi/asm/Kbuild. > > > > Remove the following generic-y: > > > > errno.h > > fcntl.h > > ioctl.h > > ioctls.h > > ipcbuf.h > > mman.h > > msgbuf.h > > param.h > > poll.h > > posix_types.h > > resource.h > > sembuf.h > > setup.h > > shmbuf.h > > signal.h > > socket.h > > sockios.h > > stat.h > > statfs.h > > swab.h > > termbits.h > > termios.h > > types.h > > > > [2] It is redundant to define generic-y when arch-specific > > implementation exists in arch/$(ARCH)/include/asm/*.h > > > > Remove the following generic-y: > > > > cacheflush.h > > dma-mapping.h > > module.h > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/riscv/include/asm/Kbuild | 26 -- > > 1 file changed, 26 deletions(-) > > > > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild > > index 6a646d9..f7068f3 100644 > > --- a/arch/riscv/include/asm/Kbuild > > +++ b/arch/riscv/include/asm/Kbuild > > @@ -1,5 +1,4 @@ > > generic-y += bugs.h > > -generic-y += cacheflush.h > > generic-y += checksum.h > > generic-y += compat.h > > generic-y += cputime.h > > @@ -7,18 +6,12 @@ generic-y += device.h > > generic-y += div64.h > > generic-y += dma.h > > generic-y += dma-contiguous.h > > -generic-y += dma-mapping.h > > generic-y += emergency-restart.h > > -generic-y += errno.h > > generic-y += exec.h > > generic-y += fb.h > > -generic-y += fcntl.h > > generic-y += hardirq.h > > generic-y += hash.h > > generic-y += hw_irq.h > > -generic-y += ioctl.h > > -generic-y += ioctls.h > > -generic-y += ipcbuf.h > > generic-y += irq_regs.h > > generic-y += irq_work.h > > generic-y += kdebug.h > > @@ -27,34 +20,15 @@ generic-y += kvm_para.h > > generic-y += local.h > > generic-y += local64.h > > generic-y += mm-arch-hooks.h > > -generic-y += mman.h > > -generic-y += module.h > > -generic-y += msgbuf.h > > generic-y += mutex.h > > -generic-y += param.h > > generic-y += percpu.h > > -generic-y += poll.h > > -generic-y += posix_types.h > > generic-y += preempt.h > > -generic-y += resource.h > > generic-y += scatterlist.h > > generic-y += sections.h > > -generic-y += sembuf.h > > generic-y += serial.h > > -generic-y += setup.h > > -generic-y += shmbuf.h > > generic-y += shmparam.h > > -generic-y += signal.h > > -generic-y += socket.h > > -generic-y += sockios.h > > -generic-y += stat.h > > -generic-y += statfs.h > > -generic-y += swab.h > > -generic-y += termbits.h > > -generic-y += termios.h > > generic-y += topology.h > > generic-y += trace_clock.h > > -generic-y += types.h > > generic-y += unaligned.h > > generic-y += user.h > > generic-y += vga.h > > Thanks. These just sort of collected there because I hadn't trimmed them. Is > there a script that checks these? No script is available for now, but I'd like to turn on warnings to catch some cases: https://patchwork.kernel.org/patch/10746823/ BTW, can I apply this to my tree along with other cleanups? -- Best Regards Masahiro Yamada
Re: [PATCH v3] mm/page_owner: fix for deferred struct page init
On 1/4/19 8:09 AM, Michal Hocko wrote: >> Here is the number without DEFERRED_STRUCT_PAGE_INIT. >> >> == page_ext_init() after page_alloc_init_late() == >> Node 0, zone DMA: page owner found early allocated 0 pages >> Node 0, zone DMA32: page owner found early allocated 7009 pages >> Node 0, zone Normal: page owner found early allocated 85827 pages >> Node 4, zone Normal: page owner found early allocated 75063 pages >> >> == page_ext_init() before kmemleak_init() == >> Node 0, zone DMA: page owner found early allocated 0 pages >> Node 0, zone DMA32: page owner found early allocated 6654 pages >> Node 0, zone Normal: page owner found early allocated 41907 pages >> Node 4, zone Normal: page owner found early allocated 41356 pages >> >> So, it told us that it will miss tens of thousands of early page allocation >> call >> sites. > > This is an answer for the first part of the question (how much). The > second is _do_we_care_? Well, the purpose of this simple "ugly" ifdef is to avoid a regression for the existing page_owner users with DEFERRED_STRUCT_PAGE_INIT deselected that would start to miss tens of thousands early page allocation call sites. The other option I can think of to not hurt your eyes is to rewrite the whole page_ext_init(), init_page_owner(), init_debug_guardpage() to use all early functions, so it can work in both with DEFERRED_STRUCT_PAGE_INIT=y and without. However, I have a hard-time to convince myself it is a sensible thing to do.
Re: [PATCH v6 1/7] dmaengine: xilinx_dma: commonize DMA copy size calculation
On 20-11-18, 16:31, Andrea Merello wrote: > This patch removes a bit of duplicated code by introducing a new > function that implements calculations for DMA copy size, and > prepares for changes to the copy size calculation that will > happen in following patches. Applied all, thanks -- ~Vinod
Re: [PATCH ghak90 (was ghak32) V4 01/10] audit: collect audit task parameters
On 2019-01-03 18:50, Guenter Roeck wrote: > Hi Richard, > > On Tue, Jul 31, 2018 at 04:07:36PM -0400, Richard Guy Briggs wrote: > > The audit-related parameters in struct task_struct should ideally be > > collected together and accessed through a standard audit API. > > > > Collect the existing loginuid, sessionid and audit_context together in a > > new struct audit_task_info called "audit" in struct task_struct. > > > > Use kmem_cache to manage this pool of memory. > > Un-inline audit_free() to be able to always recover that memory. > > > > See: https://github.com/linux-audit/audit-kernel/issues/81 > > > > Signed-off-by: Richard Guy Briggs > > Overall I am not sure if keeping task_struct a bit smaller is worth > the added complexity, but I guess that is just me. The motivation was to consolidate all the audit bits into one pointer, isolating them from the rest of the kernel, restricting access only to helper functions to prevent abuse by other subsystems and trying to reduce kABI issues in the future. I agree it is a bit more complex. It was provoked by the need to add contid which seemed to make the most sense as a peer to loginuid and sessionid, and adding it to task_struct would have made it a bit too generic and available. This is addressed at some length by Paul Moore here in v2: https://lkml.org/lkml/2018/4/18/759 > Anyway, couple of nitpicks. Please feel free to ignore, and my apologies > if some of all of the comments are duplicates. Noted. They all look like reasonable improvements, particulaly the unnecessary else and default return. Thanks. The double context check may go away anyways based on the removal of audit_take_context() in Paul's 2a1fe215e730 ("audit: use current whenever possible") which has yet to be incorporated. > Guenter > > > --- > > include/linux/audit.h | 34 -- > > include/linux/sched.h | 5 + > > init/init_task.c | 3 +-- > > init/main.c | 2 ++ > > kernel/auditsc.c | 51 > > ++- > > kernel/fork.c | 4 +++- > > 6 files changed, 73 insertions(+), 26 deletions(-) > > > > diff --git a/include/linux/audit.h b/include/linux/audit.h > > index 9334fbe..8964332 100644 > > --- a/include/linux/audit.h > > +++ b/include/linux/audit.h > > @@ -219,8 +219,15 @@ static inline void audit_log_task_info(struct > > audit_buffer *ab, > > > > /* These are defined in auditsc.c */ > > /* Public API */ > > Not sure if the structure below belongs after "Public API". > Is it part of the public API ? > > > +struct audit_task_info { > > + kuid_t loginuid; > > + unsigned intsessionid; > > + struct audit_context*ctx; > > +}; > > Add empty line ? > > > +extern struct audit_task_info init_struct_audit; > > +extern void __init audit_task_init(void); > > extern int audit_alloc(struct task_struct *task); > > -extern void __audit_free(struct task_struct *task); > > +extern void audit_free(struct task_struct *task); > > extern void __audit_syscall_entry(int major, unsigned long a0, unsigned > > long a1, > > unsigned long a2, unsigned long a3); > > extern void __audit_syscall_exit(int ret_success, long ret_value); > > @@ -242,12 +249,15 @@ extern void audit_seccomp_actions_logged(const char > > *names, > > > > static inline void audit_set_context(struct task_struct *task, struct > > audit_context *ctx) > > { > > - task->audit_context = ctx; > > + task->audit->ctx = ctx; > > } > > > > static inline struct audit_context *audit_context(void) > > { > > - return current->audit_context; > > + if (current->audit) > > + return current->audit->ctx; > > + else > > + return NULL; > > Unnecessary else (and static checkers may complain). > > > } > > > > static inline bool audit_dummy_context(void) > > @@ -255,11 +265,7 @@ static inline bool audit_dummy_context(void) > > void *p = audit_context(); > > return !p || *(int *)p; > > } > > -static inline void audit_free(struct task_struct *task) > > -{ > > - if (unlikely(task->audit_context)) > > - __audit_free(task); > > -} > > + > > static inline void audit_syscall_entry(int major, unsigned long a0, > >unsigned long a1, unsigned long a2, > >unsigned long a3) > > @@ -331,12 +337,18 @@ extern int auditsc_get_stamp(struct audit_context > > *ctx, > > > > static inline kuid_t audit_get_loginuid(struct task_struct *tsk) > > { > > - return tsk->loginuid; > > + if (tsk->audit) > > + return tsk->audit->loginuid; > > + else > > + return INVALID_UID; > > Unnecessary else. > > > } > > > > static inline unsigned int audit_get_sessionid(struct task_struct *tsk) > > { > > - return tsk->sessionid; > > + if (tsk->audit) > > + return tsk->audit->sessionid;
Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state
On Fri, Jan 04, 2019 at 03:18:05PM +0100, Greg Kroah-Hartman wrote: > On Fri, Jan 04, 2019 at 02:08:32PM +, Dave Martin wrote: > > On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote: > > > On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote: > > > > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote: > > > > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote: > > > > > > There is a lot of variation in the Arm ecosystem. Because of this, > > > > > > there exist possible cases where the kernel cannot authoritatively > > > > > > determine if a machine is vulnerable. > > > > > > > > > > Really? Why not? What keeps you from "knowing" this? Can't the > > > > > developer of the chip tell you? > > > > > > > > There tends to be a few cases, possibly incomplete white/black lists, > > > > > > Then fix the lists :) > > > > > > > firmware that isn't responding correctly, or the user didn't build in > > > > the > > > > code to check the mitigation (possibly because its an embedded system > > > > and > > > > they know its not vulnerable?). > > > > > > If the firmware doesn't respond, that would imply it is vulnerable :) > > > > > > And if the code isn't built in, again, it's vulnerable. > > > > > > > I would hope that it is an exceptional case. > > > > > > Then have the default be vulnerable, don't give people false hope. > > > > > > > > > Rather than guess the vulnerability status in cases where > > > > > > the mitigation is disabled or the firmware isn't responding > > > > > > correctly, we need to display an "Unknown" state. > > > > > > > > > > Shouldn't "Unknown" really be the same thing as "Vulnerable"? A user > > > > > should treat it the same way, "Unknown" makes it feel like "maybe I > > > > > can > > > > > just ignore this and hope I really am safe", which is not a good idea > > > > > at > > > > > all. > > > > > > > > I tend to agree its not clear what to do with "unknown". > > > > > > > > OTOH, I think there is a hesitation to declare something vulnerable > > > > when it > > > > isn't. Meltdown for example, is fairly rare given that it currently only > > > > affects a few arm parts, so declaring someone vulnerable when they > > > > likely > > > > aren't is going to be just as difficult to explain. > > > > > > If you know it is rare, then you know how to properly detect it so > > > "unknown" is not needed, correct? > > > > > > Again, "unknown" is not going to help anyone out here, please don't do > > > it. > > > > Thinking about it, "unknown" is actually the common case. > > > > Kernels that predate the sysfs vulnerabilities interface effectively > > report this for all vulnerabilities by omitting the sysfs entries > > entirely. > > > > Current kernels also don't know anything about future vulnerabilities > > that may be added in sysfs later on (but which may nevertheless be > > discovered subsequently to affect current hardware). > > > > So, can we simply omit the sysfs entries for which we can't provide a > > good answer? > > As you say, we already do this for older systems. > > But don't add new logic to explicitly not create the files just because > we "can not figure it out". For those systems, I would default to > "vulnerable" as I think that's what we do today, right? Nope: currently the vulnerabilities directory doesn't even exist for arm64 because we don't select GENERIC_CPU_VULNERABILITIES. There are also a few other things to consider here: 1. The action to take as an end-user is slightly different in the case that you know for sure that your system is vulnerable, as opposed to the case that you don't know whether your system is vulnerable or not. The former needs a firmware update; the second needs a statement about the CPU, which could result in a simple whitelist update in Linux. 2. There's an unfortunate political angle to this. Whilst the Arm website [1] provides information for all of the Arm-designed CPUs (i.e. Cortex-A*), it doesn't comment on partner implementations. I'm not at all keen to be seen as branding them all as vulnerable in the Linux kernel, as this is likely to cause more problems than it solves. If we had complete whitelist information available in public, that would be ideal, but it's not the case. 3. The architecture has added some ID registers to determine if a CPU is affected by Spectre and Meltdown, so a whitelist only needs to cover existing CPUs. So I agree with Dave that continuing to omit the files when we don't know whether or not the system is affected is the right thing to do. Will [1] https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
Re: [PATCH 1/4] pmbus: associate PMBUS_SKIP_STATUS_CHECK with driver_data
On 1/4/19 12:55 AM, Xiaoting Liu wrote: Current code compares device name with name in i2c_device_id to decide whether PMBUS_SKIP_STATUS_CHECK should be set in pmbus_platform_data, which makes adding new devices with PMBUS_SKIP_STATUS_CHECK should also modify code in pmbus_probe(). This patch adds pmbus_device_info to save pages and flags. Its pointer is put in driver_data of i2c_device_id, which makes adding new device more straightforward. Good idea, but I don't see at this time where the patch is needed. Maybe in patch 3/4 which is missing from the series ? Signed-off-by: Shunyong Yang Signed-off-by: Xiaoting Liu --- drivers/hwmon/pmbus/pmbus.c | 54 +++-- include/linux/pmbus.h | 5 + 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/drivers/hwmon/pmbus/pmbus.c b/drivers/hwmon/pmbus/pmbus.c index 7688dab32f6e..aa4cf9636e99 100644 --- a/drivers/hwmon/pmbus/pmbus.c +++ b/drivers/hwmon/pmbus/pmbus.c @@ -172,13 +172,15 @@ static int pmbus_probe(struct i2c_client *client, struct pmbus_driver_info *info; struct pmbus_platform_data *pdata = NULL; struct device *dev = >dev; + struct pmbus_device_info *device_info; info = devm_kzalloc(dev, sizeof(struct pmbus_driver_info), GFP_KERNEL); if (!info) return -ENOMEM; - if (!strcmp(id->name, "dps460") || !strcmp(id->name, "dps800") || - !strcmp(id->name, "sgd009")) { + device_info = (struct pmbus_device_info *)id->driver_data; + + if (device_info->flags & PMBUS_SKIP_STATUS_CHECK) { pdata = devm_kzalloc(dev, sizeof(struct pmbus_platform_data), GFP_KERNEL); if (!pdata) @@ -187,36 +189,44 @@ static int pmbus_probe(struct i2c_client *client, pdata->flags = PMBUS_SKIP_STATUS_CHECK; } - info->pages = id->driver_data; + info->pages = device_info->pages; info->identify = pmbus_identify; dev->platform_data = pdata; return pmbus_do_probe(client, id, info); } +static const struct pmbus_device_info default_pmbus_info = {1, 0}; +static const struct pmbus_device_info dps460_pmbus_info = { + 1, PMBUS_SKIP_STATUS_CHECK}; +static const struct pmbus_device_info dps800_pmbus_info = { + 1, PMBUS_SKIP_STATUS_CHECK}; +static const struct pmbus_device_info sgd009_pmbus_info = { + 1, PMBUS_SKIP_STATUS_CHECK}; Three structures with exactly the same content does not add value. Please merge into one with a common name that reflects its use. +static const struct pmbus_device_info pmbus_info = {0, 0}; default_pmbus_info and pmbus_info are badly named and ordered. The name should reflect that one sets one page and that the other leaves the number of pages unset. I would suggest three structures and names, such as pmbus_info_one pmbus_info_one_skip pmbus_info_zero though I am open to better names. /* * Use driver_data to set the number of pages supported by the chip. */ static const struct i2c_device_id pmbus_id[] = { - {"adp4000", 1}, - {"bmr453", 1}, - {"bmr454", 1}, - {"dps460", 1}, - {"dps800", 1}, - {"mdt040", 1}, - {"ncp4200", 1}, - {"ncp4208", 1}, - {"pdt003", 1}, - {"pdt006", 1}, - {"pdt012", 1}, - {"pmbus", 0}, - {"sgd009", 1}, - {"tps40400", 1}, - {"tps544b20", 1}, - {"tps544b25", 1}, - {"tps544c20", 1}, - {"tps544c25", 1}, - {"udt020", 1}, + {"adp4000", (kernel_ulong_t)_pmbus_info}, + {"bmr453", (kernel_ulong_t)_pmbus_info}, + {"bmr454", (kernel_ulong_t)_pmbus_info}, + {"dps460", (kernel_ulong_t)_pmbus_info}, + {"dps800", (kernel_ulong_t)_pmbus_info}, + {"mdt040", (kernel_ulong_t)_pmbus_info}, + {"ncp4200", (kernel_ulong_t)_pmbus_info}, + {"ncp4208", (kernel_ulong_t)_pmbus_info}, + {"pdt003", (kernel_ulong_t)_pmbus_info}, + {"pdt006", (kernel_ulong_t)_pmbus_info}, + {"pdt012", (kernel_ulong_t)_pmbus_info}, + {"pmbus", (kernel_ulong_t)_info}, + {"sgd009", (kernel_ulong_t)_pmbus_info}, + {"tps40400", (kernel_ulong_t)_pmbus_info}, + {"tps544b20", (kernel_ulong_t)_pmbus_info}, + {"tps544b25", (kernel_ulong_t)_pmbus_info}, + {"tps544c20", (kernel_ulong_t)_pmbus_info}, + {"tps544c25", (kernel_ulong_t)_pmbus_info}, + {"udt020", (kernel_ulong_t)_pmbus_info}, {} }; diff --git a/include/linux/pmbus.h b/include/linux/pmbus.h index ee3c2aba2a8e..3c05edad7666 100644 --- a/include/linux/pmbus.h +++ b/include/linux/pmbus.h @@ -46,4 +46,9 @@ struct pmbus_platform_data { struct regulator_init_data *reg_init_data; }; +struct pmbus_device_info { + int pages; + u32 flags; +}; + This should not be needed here. The structure
[GIT PULL] chrome-platform updates for v4.21
Hi Linus, Happy new year. Sorry for the late in the window pull request, but here are the updates for chrome-platform. Mainly a maintainership change, and a pair of patches from Brian to fixup wakeup handling for one kind of EC event. The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform.git tags/tag-chrome-platform-for-v4.21 for you to fetch changes up to cdd6a4a0e2ec590c30ad0e965fa08bf37461cadb: MAINTAINERS: add maintainers for ChromeOS EC sub-drivers (2019-01-03 12:26:43 -0800) chrome platform changes for v4.21 Changes for EC_MKBP_EVENT_SENSOR_FIFO handling. Also, maintainership changes. Olofj out, Enric balletbo in. Benson Leung (1): MAINTAINERS: platform/chrome: Add Enric as a maintainer Brian Norris (2): platform/chrome: straighten out cros_ec_get_{next,host}_event() error codes platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup Enric Balletbo i Serra (1): MAINTAINERS: add maintainers for ChromeOS EC sub-drivers Olof Johansson (1): MAINTAINERS: platform/chrome: remove myself as maintainer MAINTAINERS | 11 ++- drivers/platform/chrome/cros_ec_proto.c | 22 +- include/linux/mfd/cros_ec.h | 6 -- 3 files changed, 31 insertions(+), 8 deletions(-) Thanks, Benson -- Benson Leung Staff Software Engineer Chrome OS Kernel Google Inc. ble...@google.com Chromium OS Project ble...@chromium.org signature.asc Description: PGP signature
Re: [PATCH] dmaengine: fix dmaengine_desc_callback_valid() doesn't check for callback_result
On 16-11-18, 14:56, Andrea Merello wrote: > There are two flavors of DMA completion callbacks: callback() and > callback_result(); the latter takes an additional parameter that carries > result information. > > Most dmaengine helper functions that work with callbacks take care of both > flavors i.e. dmaengine_desc_get_callback_invoke() first checks for > callback_result() to be not NULL, and eventually it calls this one; > otherwise it goes on checking for callback(). > > It seems however that dmaengine_desc_callback_valid() does not care about > callback_result(), and it returns false in case callback() is NULL but > callback_result() is not; unless there is a (hidden to me) reason for doing > so then I'd say this is wrong. > > I've hit this by using a DMA controller driver (xilinx_dma) that doesn't > trigger any callback invocation unless dmaengine_desc_callback_valid() > returns true, while I had only callback_result() implemented in my client > driver (which AFAICT is always fine since dmaengine documentation says that > callback() will be deprecated). > > This patch fixes this by making dmaengine_desc_callback_valid() to return > true in the said scenario. > > Signed-off-by: Andrea Merello > --- > drivers/dma/dmaengine.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/dma/dmaengine.h b/drivers/dma/dmaengine.h > index 501c0b063f85..0ba2c1f3c55d 100644 > --- a/drivers/dma/dmaengine.h > +++ b/drivers/dma/dmaengine.h > @@ -168,7 +168,7 @@ dmaengine_desc_get_callback_invoke(struct > dma_async_tx_descriptor *tx, > static inline bool > dmaengine_desc_callback_valid(struct dmaengine_desc_callback *cb) > { > - return (cb->callback) ? true : false; > + return (cb->callback || cb->callback_result); So I do not think this one should take care of callback_result, it is supposed to check if the callback is valid or not.. Nothing more. Ofcourse usage of this maybe incorrect which should be fixed. We do have dmaengine_desc_callback_invoke() which propagates the callback_result to user -- ~Vinod
Re: (Audio/Video): The Truth about Linux GPLv2 and license recission (revocation).
Sadly your informative videos deleted (by NSA?), could you please reupload? This could be useful for removing your source code if you disagree with Cock of Conduct пт, 4 янв. 2019 г. в 16:04, : > > Information regarding the rights of the linux programmers, regarding > rescission (revocation) of their granted license to use their code. > > Video: > http://www.liveleak.com/view?t=9O5vz_1546606404 > > ( Audio Only: ) > (Part1: http://www.liveleak.com/view?t=s3Sr9_1546605652 ) > (Part2: http://www.liveleak.com/view?t=aOkfS_1546605889 ) >
[PATCH] regulator: act8945a: Use rdev_get_id() to access id of regulator
Use rdev_get_id() instead of directly access rdev->desc->id. Signed-off-by: Axel Lin --- drivers/regulator/act8945a-regulator.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/regulator/act8945a-regulator.c b/drivers/regulator/act8945a-regulator.c index 603db77723b6..caa61d306a69 100644 --- a/drivers/regulator/act8945a-regulator.c +++ b/drivers/regulator/act8945a-regulator.c @@ -87,7 +87,8 @@ static const struct regulator_linear_range act8945a_voltage_ranges[] = { static int act8945a_set_suspend_state(struct regulator_dev *rdev, bool enable) { struct regmap *regmap = rdev->regmap; - int id = rdev->desc->id, reg, val; + int id = rdev_get_id(rdev); + int reg, val; switch (id) { case ACT8945A_ID_DCDC1: @@ -159,7 +160,7 @@ static int act8945a_set_mode(struct regulator_dev *rdev, unsigned int mode) { struct act8945a_pmic *act8945a = rdev_get_drvdata(rdev); struct regmap *regmap = rdev->regmap; - int id = rdev->desc->id; + int id = rdev_get_id(rdev); int reg, ret, val = 0; switch (id) { @@ -190,11 +191,11 @@ static int act8945a_set_mode(struct regulator_dev *rdev, unsigned int mode) switch (mode) { case REGULATOR_MODE_STANDBY: - if (rdev->desc->id > ACT8945A_ID_DCDC3) + if (id > ACT8945A_ID_DCDC3) val = BIT(5); break; case REGULATOR_MODE_NORMAL: - if (rdev->desc->id <= ACT8945A_ID_DCDC3) + if (id <= ACT8945A_ID_DCDC3) val = BIT(5); break; default: @@ -213,7 +214,7 @@ static int act8945a_set_mode(struct regulator_dev *rdev, unsigned int mode) static unsigned int act8945a_get_mode(struct regulator_dev *rdev) { struct act8945a_pmic *act8945a = rdev_get_drvdata(rdev); - int id = rdev->desc->id; + int id = rdev_get_id(rdev); if (id < ACT8945A_ID_DCDC1 || id >= ACT8945A_ID_MAX) return -EINVAL; -- 2.17.1
Re: [PATCH 4/4] pmbus (dps650ab): add power supply driver
On 1/4/19 1:05 AM, Xiaoting Liu wrote: The Delta dps650ab provides main power and standby power to server. dps650ab can be detected by MFR_ID and MFR_MODEL referring to manufacturer's feedback. This patch adds driver to moniter power supply status. Signed-off-by: Xiaoting Liu --- drivers/hwmon/pmbus/Kconfig| 10 + drivers/hwmon/pmbus/Makefile | 1 + drivers/hwmon/pmbus/dps650ab.c | 100 + drivers/hwmon/pmbus/pmbus.c| 3 ++ 4 files changed, 114 insertions(+) diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig index 629cb45f8557..de4638396592 100644 --- a/drivers/hwmon/pmbus/Kconfig +++ b/drivers/hwmon/pmbus/Kconfig @@ -184,4 +184,14 @@ config SENSORS_ZL6100 This driver can also be built as a module. If so, the module will be called zl6100. +config SENSORS_DPS650AB + tristate "Delta DPS650AB" + default n + help + If you say yes here you get hardware monitoring support for the + Delta DPS650AB controller. + + This driver can also be built as a module. If so, the module will + be called dps650ab. + endif # PMBUS diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile index ea0e39518c21..414818230a26 100644 --- a/drivers/hwmon/pmbus/Makefile +++ b/drivers/hwmon/pmbus/Makefile @@ -21,3 +21,4 @@ obj-$(CONFIG_SENSORS_TPS53679)+= tps53679.o obj-$(CONFIG_SENSORS_UCD9000) += ucd9000.o obj-$(CONFIG_SENSORS_UCD9200) += ucd9200.o obj-$(CONFIG_SENSORS_ZL6100) += zl6100.o +obj-$(CONFIG_SENSORS_DPS650AB) += dps650ab.o diff --git a/drivers/hwmon/pmbus/dps650ab.c b/drivers/hwmon/pmbus/dps650ab.c new file mode 100644 index ..3c300e621f5a --- /dev/null +++ b/drivers/hwmon/pmbus/dps650ab.c @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Hardware monitoring driver for DELTA DPS650AB + * + * Copyright (c) 2018 Huaxintong Semiconductor Technology Co., Ltd. + */ + +#include +#include +#include +#include +#include +#include "pmbus.h" + +#define DPS650AB_MFR_ID"DELTA" +#define DPS650AB_MFR_MODEL "DPS-650AB-16 A" + +static int dps650ab_probe(struct i2c_client *client, +const struct i2c_device_id *id) +{ + struct pmbus_driver_info *info; + u8 buf[I2C_SMBUS_BLOCK_MAX]; + int ret; + + memset(buf, 0, I2C_SMBUS_BLOCK_MAX); + I don't think this is needed. + if (!i2c_check_functionality(client->adapter, +I2C_FUNC_SMBUS_READ_BYTE_DATA + | I2C_FUNC_SMBUS_READ_WORD_DATA + | I2C_FUNC_SMBUS_READ_BLOCK_DATA)) + return -ENODEV; + + ret = i2c_smbus_read_block_data(client, PMBUS_MFR_ID, buf); + if (ret < 0) { + dev_err(>dev, "Failed to read PMBUS_MFR_ID\n"); + return ret; + } + + if (strncmp(buf, DPS650AB_MFR_ID, strlen(DPS650AB_MFR_ID))) { It might make sense to verify the length of the returned string as well, unless strings such as "DELTAX" are accepted on purpose. But even if that is the case I would prefer an added check of "ret < strlen(DPS650AB_MFR_ID)" over the memset() above + dev_err(>dev, "DPS650AB_MFR_ID unrecognised\n"); + return -ENODEV; + } + + ret = i2c_smbus_read_block_data(client, PMBUS_MFR_MODEL, buf); + if (ret < 0) { + dev_err(>dev, "Failed to read PMBUS_MFR_MODEL\n"); + return ret; + } + + if (strncmp(buf, DPS650AB_MFR_MODEL, strlen(DPS650AB_MFR_MODEL))) { ... to ensure that nothing bad slips through here. As currently written, the code accepts a MFR_ID of "DELTA50AB-16 A" combined with a MFR_MODEL "DPS-6" as valid. + dev_err(>dev, "DPS650AB_MFR_MODEL unrecognised\n"); It would be a nice touch to actually list the unrecognized model. Otherwise the message is quite useless. Same for MFR_ID. + return -ENODEV; + } + + info = devm_kzalloc(>dev, sizeof(*info), GFP_KERNEL); + if (!info) + return -ENOMEM; + + info->pages = 1; + info->format[PSC_VOLTAGE_IN] = linear; + info->format[PSC_VOLTAGE_OUT] = linear; + info->format[PSC_CURRENT_IN] = linear; + info->format[PSC_CURRENT_OUT] = linear; + info->format[PSC_POWER] = linear; + info->format[PSC_TEMPERATURE] = linear; + + info->func[0] = PMBUS_HAVE_VIN + | PMBUS_HAVE_IIN | PMBUS_HAVE_PIN + | PMBUS_HAVE_STATUS_INPUT + | PMBUS_HAVE_POUT | PMBUS_HAVE_FAN12 + | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT + | PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT + | PMBUS_HAVE_TEMP | PMBUS_HAVE_TEMP2 + | PMBUS_HAVE_STATUS_TEMP; + info->func[1] = info->func[0]; + + return pmbus_do_probe(client, id, info); +} +
Re: [PATCH] dmaengine: qcom_hidma: Check for driver register failure
On 28-12-18, 14:11, Aditya Pakki wrote: > While initializing the driver, the function platform_driver_register can > fail and return an error. Consistent with other invocations, this patch > returns the error upstream. Applied, thanks -- ~Vinod
Re: [PATCH] dma/mv_xor: Fix a missing check in mv_xor_channel_add
On 24-12-18, 11:41, Aditya Pakki wrote: > dma_async_device_register() may fail and return an error. The capabilities > checked in mv_xor_channel_add() are not complete. The fix handles the > error by freeing the resources. Applied after fixing subsystem tag, thanks -- ~Vinod
Re: [PATCH] dmaengine: stm32-mdma: Add a check on read_u32_array
On 28-12-18, 13:26, Aditya Pakki wrote: > In stm32_mdma_probe, after reading the property "st,ahb-addr-masks", the > second call is not checked for failure. This time of check to time of use > case of "count" error is sent upstream. Applied, thanks -- ~Vinod
[PATCH] mm: kmemleak: Turn kmemleak_lock to spin lock and RCU primitives
From: He Zhe It's not necessary to keep consistency between readers and writers of kmemleak_lock. RCU is more proper for this case. And in order to gain better performance, we turn the reader locks to RCU read locks and writer locks to normal spin locks. "time echo scan > /sys/kernel/debug/kmemleak" is improved from around 1.010s to 0.475s, without lock debug options, tested on Intel Corporation Broadwell Client platform/Basking Ridge. spin_lock_nested is replaced with irqsave version since the original outside irqsave lock is gone. Otherwise we might have the following potential deadlock, reported by lockdep. WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 4.20.0-standard #1 Not tainted - kmemleak/163 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 8d7de78e (&(>lock)->rlock/1){+.+.}, at: scan_block+0xc4/0x1e0 and this task is already holding: 9178399c (&(>lock)->rlock){..-.}, at: scan_gray_list+0xec/0x180 which would create a new lock dependency: (&(>lock)->rlock){..-.} -> (&(>lock)->rlock/1){+.+.} but this new dependency connects a HARDIRQ-irq-safe lock: (&(>lock)->rlock){-.-.} snip CPU0CPU1 lock(&(>lock)->rlock/1); local_irq_disable(); lock(&(>lock)->rlock); lock(&(>lock)->rlock); lock(&(>lock)->rlock); Signed-off-by: He Zhe Cc: catalin.mari...@arm.com --- mm/kmemleak.c | 38 -- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index f9d9dc2..ef9ea00 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -26,7 +26,7 @@ * * The following locks and mutexes are used by kmemleak: * - * - kmemleak_lock (rwlock): protects the object_list modifications and + * - kmemleak_lock (spinlock): protects the object_list modifications and * accesses to the object_tree_root. The object_list is the main list * holding the metadata (struct kmemleak_object) for the allocated memory * blocks. The object_tree_root is a red black tree used to look-up @@ -199,7 +199,7 @@ static LIST_HEAD(gray_list); /* search tree for object boundaries */ static struct rb_root object_tree_root = RB_ROOT; /* rw_lock protecting the access to object_list and object_tree_root */ -static DEFINE_RWLOCK(kmemleak_lock); +static DEFINE_SPINLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; @@ -515,9 +515,7 @@ static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias) struct kmemleak_object *object; rcu_read_lock(); - read_lock_irqsave(_lock, flags); object = lookup_object(ptr, alias); - read_unlock_irqrestore(_lock, flags); /* check whether the object is still available */ if (object && !get_object(object)) @@ -537,13 +535,13 @@ static struct kmemleak_object *find_and_remove_object(unsigned long ptr, int ali unsigned long flags; struct kmemleak_object *object; - write_lock_irqsave(_lock, flags); + spin_lock_irqsave(_lock, flags); object = lookup_object(ptr, alias); if (object) { rb_erase(>rb_node, _tree_root); list_del_rcu(>object_list); } - write_unlock_irqrestore(_lock, flags); + spin_unlock_irqrestore(_lock, flags); return object; } @@ -617,7 +615,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, /* kernel backtrace */ object->trace_len = __save_stack_trace(object->trace); - write_lock_irqsave(_lock, flags); + spin_lock_irqsave(_lock, flags); min_addr = min(min_addr, ptr); max_addr = max(max_addr, ptr + size); @@ -648,7 +646,7 @@ static struct kmemleak_object *create_object(unsigned long ptr, size_t size, list_add_tail_rcu(>object_list, _list); out: - write_unlock_irqrestore(_lock, flags); + spin_unlock_irqrestore(_lock, flags); return object; } @@ -1334,7 +1332,7 @@ static void scan_block(void *_start, void *_end, unsigned long *end = _end - (BYTES_PER_POINTER - 1); unsigned long flags; - read_lock_irqsave(_lock, flags); + rcu_read_lock(); for (ptr = start; ptr < end; ptr++) { struct kmemleak_object *object; unsigned long pointer; @@ -1350,14 +1348,8 @@ static void scan_block(void *_start, void *_end, if (pointer < min_addr || pointer >= max_addr) continue; - /* -* No need for get_object() here since we hold kmemleak_lock. -* object->use_count cannot be dropped to 0 while the object -* is still present in object_tree_root and object_list -* (with
Re: [PATCH v5 06/11] platform/x86: apple-gmux: Make PCI dependency explicit
On Fri, Jan 4, 2019 at 4:12 PM Andy Shevchenko wrote: > > On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya wrote: > > > > After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without > > CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were > > satisfied implicitly through dependencies on CONFIG_ACPI have to be > > specified directly. This driver depends on the PCI infrastructure but > > the dependency has not been explicitly called out. > > > > Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI > > set") > > Signed-off-by: Sinan Kaya > > --- > > drivers/platform/x86/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > > index 7afb96cb1cd6..89f4b86244a7 100644 > > --- a/drivers/platform/x86/Kconfig > > +++ b/drivers/platform/x86/Kconfig > > @@ -1136,6 +1136,7 @@ config SAMSUNG_Q10 > > config APPLE_GMUX > > tristate "Apple Gmux Driver" > > > depends on ACPI > > + depends on PCI > > Can you stick with the same pattern you have used for the rest in this series? > I.e. > depends on ACPI && PCI After satisfying it Reviewed-by: Andy Shevchenko Acked-by: Andy Shevchenko > > > > depends on PNP > > depends on BACKLIGHT_CLASS_DEVICE > > depends on BACKLIGHT_APPLE=n || BACKLIGHT_APPLE > > -- > > 2.19.0 > > > > > -- > With Best Regards, > Andy Shevchenko -- With Best Regards, Andy Shevchenko
Re: [PATCH v5 05/11] platform/x86: intel_pmc: Make PCI dependency explicit
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya wrote: > > After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without > CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were > satisfied implicitly through dependencies on CONFIG_ACPI have to be > specified directly. Code relies on PCI for execution. Specify this > in the Kconfig. > Unfortunately, this is not how it should be fixed. First of all, we (Heikki, Sathya, me — all in Cc list) that this driver should be refactored and unified with intel_scu_ipc. Then, it has two drivers inside and PCI is just a glue for it which is optional. Thus, needs to be split accordingly. > Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI > set") > Signed-off-by: Sinan Kaya > --- > drivers/platform/x86/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > index b36ea14b41ad..7afb96cb1cd6 100644 > --- a/drivers/platform/x86/Kconfig > +++ b/drivers/platform/x86/Kconfig > @@ -1174,7 +1174,7 @@ config INTEL_SMARTCONNECT > > config INTEL_PMC_IPC > tristate "Intel PMC IPC Driver" > - depends on ACPI > + depends on ACPI && PCI > ---help--- > This driver provides support for PMC control on some Intel platforms. > The PMC is an ARC processor which defines IPC commands for > communication > -- > 2.19.0 > -- With Best Regards, Andy Shevchenko
[PATCH] hv_balloon: avoid touching uninitialized struct page during tail onlining
Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use 128M. To deal with it we implement partial section onlining by registering custom page onlining callback (hv_online_page()). Later, when more memory arrives we try to online the 'tail' (see hv_bring_pgs_online()). It was found that in some cases this 'tail' onlining causes issues: BUG: Bad page state in process kworker/0:2 pfn:109e3a page:e08344278e80 count:0 mapcount:1 mapping: index:0x0 flags: 0xf8000() raw: 000f8000 dead0100 dead0200 raw: page dumped because: nonzero mapcount ... Workqueue: events hot_add_req [hv_balloon] Call Trace: dump_stack+0x5c/0x80 bad_page.cold.112+0x7f/0xb2 free_pcppages_bulk+0x4b8/0x690 free_unref_page+0x54/0x70 hv_page_online_one+0x5c/0x80 [hv_balloon] hot_add_req.cold.24+0x182/0x835 [hv_balloon] ... Turns out that we now have deferred struct page initialization for memory hotplug so e.g. memory_block_action() in drivers/base/memory.c does pages_correctly_probed() check and in that check it avoids inspecting struct pages and checks sections instead. But in Hyper-V balloon driver we do PageReserved(pfn_to_page()) check and this is now wrong. Switch to checking online_section_nr() instead. Signed-off-by: Vitaly Kuznetsov --- drivers/hv/hv_balloon.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 5301fef16c31..7c6349a50ef1 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -888,12 +888,14 @@ static unsigned long handle_pg_range(unsigned long pg_start, pfn_cnt -= pgs_ol; /* * Check if the corresponding memory block is already -* online by checking its last previously backed page. -* In case it is we need to bring rest (which was not -* backed previously) online too. +* online. It is possible to observe struct pages still +* being uninitialized here so check section instead. +* In case the section is online we need to bring the +* rest of pfns (which were not backed previously) +* online too. */ if (start_pfn > has->start_pfn && - !PageReserved(pfn_to_page(start_pfn - 1))) + online_section_nr(pfn_to_section_nr(start_pfn))) hv_bring_pgs_online(has, start_pfn, pgs_ol); } -- 2.20.1
Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state
On Fri, Jan 04, 2019 at 02:08:32PM +, Dave Martin wrote: > On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote: > > On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote: > > > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote: > > > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote: > > > > > There is a lot of variation in the Arm ecosystem. Because of this, > > > > > there exist possible cases where the kernel cannot authoritatively > > > > > determine if a machine is vulnerable. > > > > > > > > Really? Why not? What keeps you from "knowing" this? Can't the > > > > developer of the chip tell you? > > > > > > There tends to be a few cases, possibly incomplete white/black lists, > > > > Then fix the lists :) > > > > > firmware that isn't responding correctly, or the user didn't build in the > > > code to check the mitigation (possibly because its an embedded system and > > > they know its not vulnerable?). > > > > If the firmware doesn't respond, that would imply it is vulnerable :) > > > > And if the code isn't built in, again, it's vulnerable. > > > > > I would hope that it is an exceptional case. > > > > Then have the default be vulnerable, don't give people false hope. > > > > > > > Rather than guess the vulnerability status in cases where > > > > > the mitigation is disabled or the firmware isn't responding > > > > > correctly, we need to display an "Unknown" state. > > > > > > > > Shouldn't "Unknown" really be the same thing as "Vulnerable"? A user > > > > should treat it the same way, "Unknown" makes it feel like "maybe I can > > > > just ignore this and hope I really am safe", which is not a good idea at > > > > all. > > > > > > I tend to agree its not clear what to do with "unknown". > > > > > > OTOH, I think there is a hesitation to declare something vulnerable when > > > it > > > isn't. Meltdown for example, is fairly rare given that it currently only > > > affects a few arm parts, so declaring someone vulnerable when they likely > > > aren't is going to be just as difficult to explain. > > > > If you know it is rare, then you know how to properly detect it so > > "unknown" is not needed, correct? > > > > Again, "unknown" is not going to help anyone out here, please don't do > > it. > > Thinking about it, "unknown" is actually the common case. > > Kernels that predate the sysfs vulnerabilities interface effectively > report this for all vulnerabilities by omitting the sysfs entries > entirely. > > Current kernels also don't know anything about future vulnerabilities > that may be added in sysfs later on (but which may nevertheless be > discovered subsequently to affect current hardware). > > So, can we simply omit the sysfs entries for which we can't provide a > good answer? As you say, we already do this for older systems. But don't add new logic to explicitly not create the files just because we "can not figure it out". For those systems, I would default to "vulnerable" as I think that's what we do today, right? thanks, g reg k-h
RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests
Hi Felipe, Resending... Since I am waiting on your suggestion, thought of giving remainder. Thanks, Anurag Kumar Vulisha >-Original Message- >From: Anurag Kumar Vulisha >Sent: Wednesday, December 12, 2018 8:41 PM >To: 'Alan Stern' ; Felipe Balbi >Cc: Greg Kroah-Hartman ; Shuah Khan >; Johan Hovold ; Jaejoong Kim >; Benjamin Herrenschmidt ; >Roger Quadros ; Manu Gautam ; >martin.peter...@oracle.com; Bart Van Assche ; Mike >Christie ; Matthew Wilcox ; Colin Ian >King ; linux-...@vger.kernel.org; linux- >ker...@vger.kernel.org; v.anuragku...@gmail.com; Thinh Nguyen >; Tejas Joglekar ; Ajay >Yugalkishore Pandey >Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb >requests > > >Hi Felipe, > >>-Original Message- >>From: Alan Stern [mailto:st...@rowland.harvard.edu] >>Sent: Friday, December 07, 2018 10:40 PM >>To: Felipe Balbi >>Cc: Anurag Kumar Vulisha ; Greg Kroah-Hartman >>; Shuah Khan ; Johan Hovold >>; Jaejoong Kim ; Benjamin >>Herrenschmidt ; Roger Quadros ; >Manu >>Gautam ; martin.peter...@oracle.com; Bart Van >>Assche ; Mike Christie ; Matthew >>Wilcox ; Colin Ian King ; >>linux- >>u...@vger.kernel.org; linux-kernel@vger.kernel.org; v.anuragku...@gmail.com; >>Thinh Nguyen ; Tejas Joglekar >>; Ajay Yugalkishore Pandey >>Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb >>requests >> >>On Fri, 7 Dec 2018, Felipe Balbi wrote: >> >>> >>> hi, >>> >>> Anurag Kumar Vulisha writes: >>> >>Does the data book suggest a value for the timeout? >>> >> >>> > >>> > No, the databook doesn't mention about the timeout value >>> > >>> >>> >At this point, it seems that the generic approach will be messier than >>> >>> >having >>every >>> >>> >controller driver implement its own fix. At least, that's how it >>> >>> >appears to me. >>> >>> Why, if the UDC implementation will, anyway, be a timer? >> >>It's mostly a question of what happens when the timer expires. (After >>all, starting a timer is just as easy to do in a UDC driver as it is in >>the core.) When the timer expires, what can the core do? >> >>Don't say it can cancel the offending request and resubmit it. That >>leads to the sort of trouble we discussed earlier in this thread. In >>particular, we don't want the class driver's completion routine to be >>called when the cancel occurs. Furthermore, this leads to a race: >>Suppose the class driver decides to cancel the request at the same time >>as the core does a cancel and resubmit. Then the class driver's cancel >>could get lost and the request would remain on the UDC's queue. >> >>What you really want to do is issue the appropriate stop and restart >>commands to the hardware, while leaving the request logically active >>the entire time. The UDC core can't do this, but a UDC driver can. >> > >I agree with Alan's comment as it looks like there may be some corner cases >where the issue may occur with dequeue approach. Are you okay if the timeout >handler gets moved to the dwc3 driver (the timers still added into udc layer)? >Please let us know your suggestion on this > >Thanks, >Anurag Kumar Vulisha > >>> >>(Especially if dwc3 is the only driver affected.) >>> > >>> > As discussed above, the issue may happen with other gadgets too. As I got >>> > divide >>opinions >>> > on this implementation and both the implementations looks fine to me, I am >little >>confused >>> > on which should be implemented. >>> > >>> > @Felipe: Do you agree with Alan's implementation? Please let us know your >>suggestion >>> > on this. >>> >>> I still think a generic timer is a better solution since it has other uses. >> >>Putting a struct timer into struct usb_request is okay with me, but I >>wouldn't go any farther than that. >> >>> >>Since the purpose of the timeout is to detect a deadlock caused by a >>> >>hardware bug, I suggest a fixed and relatively short timeout value such >>> >>as one second. Cancelling and requeuing a few requests at 1-second >>> >>intervals shouldn't add very much overhead. >>> >>> I wouldn't call this a HW bug though. This is just how the UDC >>> behaves. There are N streams and host can move data in any stream at any >>> time. This means that host & gadget _can_ disagree on what stream to >>> start next. >> >>But the USB 3 spec says what should happen when the host and gadget >>disagree in this way, doesn't it? And it doesn't say that they should >>deadlock. :-) Or have I misread the spec? >> >>> One way to avoid this would be to never pre-start any streams and always >>> rely on XferNotReady, but that would mean greatly reduced throughput for >>> streams. >> >>It would be good if there was some way to actively detect the problem >>instead of passively waiting for a timer to expire. >> >>Alan Stern
Re: [PATCH v5 04/11] platform/x86: intel_ips: make PCI dependency explicit
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya wrote: > > After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without > CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were > satisfied implicitly through dependencies on CONFIG_ACPI have to be > specified directly. Ipss driver is a PCI device driver but this has > not been mentioned anywhere in Kconfig. > Acked-by: Andy Shevchenko > Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI > set") > Signed-off-by: Sinan Kaya > --- > drivers/platform/x86/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > index e3b62c2ee8d1..b36ea14b41ad 100644 > --- a/drivers/platform/x86/Kconfig > +++ b/drivers/platform/x86/Kconfig > @@ -1009,7 +1009,7 @@ config INTEL_MFLD_THERMAL > > config INTEL_IPS > tristate "Intel Intelligent Power Sharing" > - depends on ACPI > + depends on ACPI && PCI > ---help--- > Intel Calpella platforms support dynamic power sharing between the > CPU and GPU, maximizing performance in a given TDP. This driver, > -- > 2.19.0 > -- With Best Regards, Andy Shevchenko
Re: [PATCH v5] arm64: implement ftrace with regs
On Mon, Dec 17, 2018 at 09:52:04AM +0530, Amit Daniel Kachhap wrote: > There is no error message or crash but no useful output like below, > > /sys/kernel/tracing # echo wake_up_process > set_graph_function > /sys/kernel/tracing # echo function_graph > current_tracer > /sys/kernel/tracing # cat trace > # tracer: function_graph > # > # CPU DURATION FUNCTION CALLS > # | | | | | | | It turned out the graph caller works perfectly, I only broke the filtering. Fixed now in v6; thanks a lot for testing! Torsten
Re: [PATCH v5 06/11] platform/x86: apple-gmux: Make PCI dependency explicit
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya wrote: > > After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without > CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were > satisfied implicitly through dependencies on CONFIG_ACPI have to be > specified directly. This driver depends on the PCI infrastructure but > the dependency has not been explicitly called out. > > Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI > set") > Signed-off-by: Sinan Kaya > --- > drivers/platform/x86/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > index 7afb96cb1cd6..89f4b86244a7 100644 > --- a/drivers/platform/x86/Kconfig > +++ b/drivers/platform/x86/Kconfig > @@ -1136,6 +1136,7 @@ config SAMSUNG_Q10 > config APPLE_GMUX > tristate "Apple Gmux Driver" > depends on ACPI > + depends on PCI Can you stick with the same pattern you have used for the rest in this series? I.e. depends on ACPI && PCI > depends on PNP > depends on BACKLIGHT_CLASS_DEVICE > depends on BACKLIGHT_APPLE=n || BACKLIGHT_APPLE > -- > 2.19.0 > -- With Best Regards, Andy Shevchenko
[PATCH v6] arm64: implement ftrace with regs
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning of each function. Replace the first NOP thus generated with a quick LR saver (move it to scratch reg x9), so the 2nd replacement insn, the call to ftrace, does not clobber the value. Ftrace will then generate the standard stack frames. Note that patchable-function-entry in GCC disables IPA-RA, which means ABI register calling conventions are obeyed *and* scratch registers such as x9 are available. Introduce and handle an ftrace_regs_trampoline for module PLTs, right after ftrace_trampoline, and double the size of this special section. Signed-off-by: Torsten Duwe --- This patch applies on 4.20 with the additional changes bdb85cd1d20669dfae813555dddb745ad09323ba (arm64/module: switch to ADRP/ADD sequences for PLT entries) and 7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680 (arm64: ftrace: always pass instrumented pc in x0) along with their respective series, or alternatively on Linus' master, which already has these. changes since v5: * fix mentioned pc in x0 to hold the start address of the call site, not the return address or the branch address. This resolves the problem found by Amit. --- arch/arm64/Kconfig|2 arch/arm64/Makefile |4 + arch/arm64/include/asm/assembler.h|1 arch/arm64/include/asm/ftrace.h | 13 +++ arch/arm64/include/asm/module.h |3 arch/arm64/kernel/Makefile|6 - arch/arm64/kernel/entry-ftrace.S | 131 ++ arch/arm64/kernel/ftrace.c| 125 arch/arm64/kernel/module-plts.c |3 arch/arm64/kernel/module.c|2 drivers/firmware/efi/libstub/Makefile |3 include/asm-generic/vmlinux.lds.h |1 include/linux/compiler_types.h|4 + 13 files changed, 262 insertions(+), 36 deletions(-) --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -131,6 +131,8 @@ config ARM64 select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE + select HAVE_DYNAMIC_FTRACE_WITH_REGS \ + if $(cc-option,-fpatchable-function-entry=2) select HAVE_EFFICIENT_UNALIGNED_ACCESS select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_TRACER --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds endif +ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y) + CC_FLAGS_FTRACE := -fpatchable-function-entry=2 +endif + # Default value head-y := arch/arm64/kernel/head.o --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -17,6 +17,19 @@ #define MCOUNT_ADDR((unsigned long)_mcount) #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE +/* + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning + * of each function, with the second NOP actually calling ftrace. In contrary + * to a classic _mcount call, the call instruction to be modified is thus + * the second one, and not the only one. + */ +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS +#define ARCH_SUPPORTS_FTRACE_OPS 1 +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE +#else +#define REC_IP_BRANCH_OFFSET 0 +#endif + #ifndef __ASSEMBLY__ #include --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$( AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET) CFLAGS_armv8_deprecated.o := -I$(src) -CFLAGS_REMOVE_ftrace.o = -pg -CFLAGS_REMOVE_insn.o = -pg -CFLAGS_REMOVE_return_address.o = -pg +CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE) +CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE) # Object file lists. arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o \ --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -13,7 +13,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__K # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly # disable the stackleak plugin -cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \ +cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\ + ,$(KBUILD_CFLAGS)) -fpie \ $(DISABLE_STACKLEAK_PLUGIN) cflags-$(CONFIG_ARM) := $(subst -pg,,$(KBUILD_CFLAGS)) \ -fno-builtin -fpic \ --- a/arch/arm64/kernel/entry-ftrace.S +++ b/arch/arm64/kernel/entry-ftrace.S @@ -13,6 +13,8 @@ #include #include #include +#include +#include /* * Gcc with -pg will put the following code in the beginning of each function: @@ -122,6 +124,7 @@ skip_ftrace_call: // } ENDPROC(_mcount) #else /* CONFIG_DYNAMIC_FTRACE */ +#ifndef
Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state
On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote: > > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote: > > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote: > > > > There is a lot of variation in the Arm ecosystem. Because of this, > > > > there exist possible cases where the kernel cannot authoritatively > > > > determine if a machine is vulnerable. > > > > > > Really? Why not? What keeps you from "knowing" this? Can't the > > > developer of the chip tell you? > > > > There tends to be a few cases, possibly incomplete white/black lists, > > Then fix the lists :) > > > firmware that isn't responding correctly, or the user didn't build in the > > code to check the mitigation (possibly because its an embedded system and > > they know its not vulnerable?). > > If the firmware doesn't respond, that would imply it is vulnerable :) > > And if the code isn't built in, again, it's vulnerable. > > > I would hope that it is an exceptional case. > > Then have the default be vulnerable, don't give people false hope. > > > > > Rather than guess the vulnerability status in cases where > > > > the mitigation is disabled or the firmware isn't responding > > > > correctly, we need to display an "Unknown" state. > > > > > > Shouldn't "Unknown" really be the same thing as "Vulnerable"? A user > > > should treat it the same way, "Unknown" makes it feel like "maybe I can > > > just ignore this and hope I really am safe", which is not a good idea at > > > all. > > > > I tend to agree its not clear what to do with "unknown". > > > > OTOH, I think there is a hesitation to declare something vulnerable when it > > isn't. Meltdown for example, is fairly rare given that it currently only > > affects a few arm parts, so declaring someone vulnerable when they likely > > aren't is going to be just as difficult to explain. > > If you know it is rare, then you know how to properly detect it so > "unknown" is not needed, correct? > > Again, "unknown" is not going to help anyone out here, please don't do > it. Thinking about it, "unknown" is actually the common case. Kernels that predate the sysfs vulnerabilities interface effectively report this for all vulnerabilities by omitting the sysfs entries entirely. Current kernels also don't know anything about future vulnerabilities that may be added in sysfs later on (but which may nevertheless be discovered subsequently to affect current hardware). So, can we simply omit the sysfs entries for which we can't provide a good answer? IMHO the kernel should make best efforts to provide answers for every vulnerability that it knows about, so the checks should not be Kconfig- dependent without a good reason. There will be cases where whitelists/blacklists are the only source of answers, and we are ultimately reliant on vendors to provide that information. Upstream Linux is likely to lag, there's not much we can do about that. Cheers ---Dave
Applied "ASoC: xlnx: add pcm formatter platform driver" to the asoc tree
The patch ASoC: xlnx: add pcm formatter platform driver has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 6f6c3c36f0917be24587eeba818ab4fdfcb5465a Mon Sep 17 00:00:00 2001 From: Maruthi Srinivas Bayyavarapu Date: Fri, 21 Dec 2018 14:27:28 +0530 Subject: [PATCH] ASoC: xlnx: add pcm formatter platform driver The audio formatter PL IP supports DMA of two streams - mm2s and s2mm for playback and capture respectively. Apart from DMA, IP also does conversions like PCM to AES and viceversa. This patch adds DMA component driver for the IP. Signed-off-by: Maruthi Srinivas Bayyavarapu Signed-off-by: Mark Brown --- sound/soc/xilinx/xlnx_formatter_pcm.c | 565 ++ 1 file changed, 565 insertions(+) create mode 100644 sound/soc/xilinx/xlnx_formatter_pcm.c diff --git a/sound/soc/xilinx/xlnx_formatter_pcm.c b/sound/soc/xilinx/xlnx_formatter_pcm.c new file mode 100644 index ..f7235f7664d7 --- /dev/null +++ b/sound/soc/xilinx/xlnx_formatter_pcm.c @@ -0,0 +1,565 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Xilinx ASoC audio formatter support +// +// Copyright (C) 2018 Xilinx, Inc. +// +// Author: Maruthi Srinivas Bayyavarapu + +#include +#include +#include +#include +#include +#include + +#include +#include + +#define DRV_NAME "xlnx_formatter_pcm" + +#define XLNX_S2MM_OFFSET 0 +#define XLNX_MM2S_OFFSET 0x100 + +#define XLNX_AUD_CORE_CONFIG 0x4 +#define XLNX_AUD_CTRL 0x10 +#define XLNX_AUD_STS 0x14 + +#define AUD_CTRL_RESET_MASKBIT(1) +#define AUD_CFG_MM2S_MASK BIT(15) +#define AUD_CFG_S2MM_MASK BIT(31) + +#define XLNX_AUD_FS_MULTIPLIER 0x18 +#define XLNX_AUD_PERIOD_CONFIG 0x1C +#define XLNX_AUD_BUFF_ADDR_LSB 0x20 +#define XLNX_AUD_BUFF_ADDR_MSB 0x24 +#define XLNX_AUD_XFER_COUNT0x28 +#define XLNX_AUD_CH_STS_START 0x2C +#define XLNX_BYTES_PER_CH 0x44 + +#define AUD_STS_IOC_IRQ_MASK BIT(31) +#define AUD_STS_CH_STS_MASKBIT(29) +#define AUD_CTRL_IOC_IRQ_MASK BIT(13) +#define AUD_CTRL_TOUT_IRQ_MASK BIT(14) +#define AUD_CTRL_DMA_EN_MASK BIT(0) + +#define CFG_MM2S_CH_MASK GENMASK(11, 8) +#define CFG_MM2S_CH_SHIFT 8 +#define CFG_MM2S_XFER_MASK GENMASK(14, 13) +#define CFG_MM2S_XFER_SHIFT13 +#define CFG_MM2S_PKG_MASK BIT(12) + +#define CFG_S2MM_CH_MASK GENMASK(27, 24) +#define CFG_S2MM_CH_SHIFT 24 +#define CFG_S2MM_XFER_MASK GENMASK(30, 29) +#define CFG_S2MM_XFER_SHIFT29 +#define CFG_S2MM_PKG_MASK BIT(28) + +#define AUD_CTRL_DATA_WIDTH_SHIFT 16 +#define AUD_CTRL_ACTIVE_CH_SHIFT 19 +#define PERIOD_CFG_PERIODS_SHIFT 16 + +#define PERIODS_MIN2 +#define PERIODS_MAX6 +#define PERIOD_BYTES_MIN 192 +#define PERIOD_BYTES_MAX (50 * 1024) + +enum bit_depth { + BIT_DEPTH_8, + BIT_DEPTH_16, + BIT_DEPTH_20, + BIT_DEPTH_24, + BIT_DEPTH_32, +}; + +struct xlnx_pcm_drv_data { + void __iomem *mmio; + bool s2mm_presence; + bool mm2s_presence; + unsigned int s2mm_irq; + unsigned int mm2s_irq; + struct snd_pcm_substream *play_stream; + struct snd_pcm_substream *capture_stream; + struct clk *axi_clk; +}; + +/* + * struct xlnx_pcm_stream_param - stream configuration + * @mmio: base address offset + * @interleaved: audio channels arrangement in buffer + * @xfer_mode: data formatting mode during transfer + * @ch_limit: Maximum channels supported + * @buffer_size: stream ring buffer size + */ +struct xlnx_pcm_stream_param { + void __iomem *mmio; + bool interleaved; + u32 xfer_mode; + u32 ch_limit; + u64 buffer_size; +}; + +static const struct snd_pcm_hardware xlnx_pcm_hardware = { + .info = SNDRV_PCM_INFO_INTERLEAVED | SNDRV_PCM_INFO_BLOCK_TRANSFER | + SNDRV_PCM_INFO_BATCH | SNDRV_PCM_INFO_PAUSE | + SNDRV_PCM_INFO_RESUME, + .formats = SNDRV_PCM_FMTBIT_S8 | SNDRV_PCM_FMTBIT_S16_LE | + SNDRV_PCM_FMTBIT_S24_LE, + .channels_min = 2, + .channels_max = 2, + .rates = SNDRV_PCM_RATE_8000_192000, + .rate_min = 8000, + .rate_max = 192000,
Applied "ASoC: rockchip: fix platform_no_drv_owner.cocci warnings" to the asoc tree
The patch ASoC: rockchip: fix platform_no_drv_owner.cocci warnings has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From e1de3d237b5083fcd0da6fcf600848a4cef9cc67 Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Tue, 25 Dec 2018 02:20:36 + Subject: [PATCH] ASoC: rockchip: fix platform_no_drv_owner.cocci warnings Remove .owner field if calls are used which set it automatically Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: YueHaibing Signed-off-by: Mark Brown --- sound/soc/codecs/rk3328_codec.c | 1 - 1 file changed, 1 deletion(-) diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c index f3442a2283ea..24f8f86d58e9 100644 --- a/sound/soc/codecs/rk3328_codec.c +++ b/sound/soc/codecs/rk3328_codec.c @@ -508,7 +508,6 @@ MODULE_DEVICE_TABLE(of, rk3328_codec_of_match); static struct platform_driver rk3328_codec_driver = { .driver = { .name = "rk3328-codec", - .owner = THIS_MODULE, .of_match_table = of_match_ptr(rk3328_codec_of_match), }, .probe = rk3328_platform_probe, -- 2.20.1
Applied "ASoC: rockchip: support ACODEC for rk3328" to the asoc tree
The patch ASoC: rockchip: support ACODEC for rk3328 has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From c32759035ad246d3e4c65d23a07f9e6ba32caeaf Mon Sep 17 00:00:00 2001 From: Katsuhiro Suzuki Date: Fri, 21 Dec 2018 00:36:35 +0900 Subject: [PATCH] ASoC: rockchip: support ACODEC for rk3328 This patch adds support for audio CODEC core of rk3328. Rockchip does not publish detail specification of this core but driver source code is opened on their GitHub repository. https://github.com/rockchip-linux/kernel So I ported this code to linux-next and added some trivial fixes. Signed-off-by: Katsuhiro Suzuki Signed-off-by: Mark Brown --- .../bindings/sound/rockchip,rk3328-codec.txt | 23 + sound/soc/codecs/Kconfig | 5 + sound/soc/codecs/Makefile | 2 + sound/soc/codecs/rk3328_codec.c | 517 ++ sound/soc/codecs/rk3328_codec.h | 210 +++ 5 files changed, 757 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt create mode 100644 sound/soc/codecs/rk3328_codec.c create mode 100644 sound/soc/codecs/rk3328_codec.h diff --git a/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt b/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt new file mode 100644 index ..2469588c7ccb --- /dev/null +++ b/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt @@ -0,0 +1,23 @@ +* Rockchip Rk3328 internal codec + +Required properties: + +- compatible: "rockchip,rk3328-codec" +- reg: physical base address of the controller and length of memory mapped + region. +- rockchip,grf: the phandle of the syscon node for GRF register. +- clocks: a list of phandle + clock-specifer pairs, one for each entry in clock-names. +- clock-names: should be "pclk". +- spk-depop-time-ms: speak depop time msec. + +Example for rk3328 internal codec: + +codec: codec@ff41 { + compatible = "rockchip,rk3328-codec"; + reg = <0x0 0xff41 0x0 0x1000>; + rockchip,grf = <>; + clocks = < PCLK_ACODEC>; + clock-names = "pclk"; + spk-depop-time-ms = 100; + status = "disabled"; +}; diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig index d46de3e04ff6..87cb9c51e6f5 100644 --- a/sound/soc/codecs/Kconfig +++ b/sound/soc/codecs/Kconfig @@ -130,6 +130,7 @@ config SND_SOC_ALL_CODECS select SND_SOC_PCM5102A select SND_SOC_PCM512x_I2C if I2C select SND_SOC_PCM512x_SPI if SPI_MASTER + select SND_SOC_RK3328 select SND_SOC_RT274 if I2C select SND_SOC_RT286 if I2C select SND_SOC_RT298 if I2C @@ -806,6 +807,10 @@ config SND_SOC_PCM512x_SPI select SND_SOC_PCM512x select REGMAP_SPI +config SND_SOC_RK3328 + tristate "Rockchip RK3328 audio CODEC" + select REGMAP_MMIO + config SND_SOC_RL6231 tristate default y if SND_SOC_RT5514=y diff --git a/sound/soc/codecs/Makefile b/sound/soc/codecs/Makefile index fbe36e6177b0..9bb3346fab2f 100644 --- a/sound/soc/codecs/Makefile +++ b/sound/soc/codecs/Makefile @@ -133,6 +133,7 @@ snd-soc-pcm5102a-objs := pcm5102a.o snd-soc-pcm512x-objs := pcm512x.o snd-soc-pcm512x-i2c-objs := pcm512x-i2c.o snd-soc-pcm512x-spi-objs := pcm512x-spi.o +snd-soc-rk3328-objs := rk3328_codec.o snd-soc-rl6231-objs := rl6231.o snd-soc-rl6347a-objs := rl6347a.o snd-soc-rt1305-objs := rt1305.o @@ -400,6 +401,7 @@ obj-$(CONFIG_SND_SOC_PCM5102A) += snd-soc-pcm5102a.o obj-$(CONFIG_SND_SOC_PCM512x) += snd-soc-pcm512x.o obj-$(CONFIG_SND_SOC_PCM512x_I2C) += snd-soc-pcm512x-i2c.o obj-$(CONFIG_SND_SOC_PCM512x_SPI) += snd-soc-pcm512x-spi.o +obj-$(CONFIG_SND_SOC_RK3328) += snd-soc-rk3328.o obj-$(CONFIG_SND_SOC_RL6231) += snd-soc-rl6231.o obj-$(CONFIG_SND_SOC_RL6347A) += snd-soc-rl6347a.o obj-$(CONFIG_SND_SOC_RT1305) += snd-soc-rt1305.o diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c new file mode 100644 index ..71f3fc2d970c --- /dev/null +++ b/sound/soc/codecs/rk3328_codec.c @@ -0,0 +1,517 @@ +// SPDX-License-Identifier: GPL-2.0
Applied "ASoC: rockchip: add workaround for silence of rk3288 ACODEC" to the asoc tree
The patch ASoC: rockchip: add workaround for silence of rk3288 ACODEC has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From f5758544d98c8bec7793aeea28928f5e8bd45d47 Mon Sep 17 00:00:00 2001 From: Katsuhiro Suzuki Date: Fri, 21 Dec 2018 00:36:36 +0900 Subject: [PATCH] ASoC: rockchip: add workaround for silence of rk3288 ACODEC This patch adds reset and precharge in shutdown of PCM device. ACODEC goes to silence if we change Fs to 44.1kHz from 48kHz. This workaround seems to work but I don't know this workaround is correct sequence or not for ACODEC. Signed-off-by: Katsuhiro Suzuki Signed-off-by: Mark Brown --- sound/soc/codecs/rk3328_codec.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c index 71f3fc2d970c..f3442a2283ea 100644 --- a/sound/soc/codecs/rk3328_codec.c +++ b/sound/soc/codecs/rk3328_codec.c @@ -261,9 +261,12 @@ static int rk3328_codec_close_playback(struct rk3328_codec_priv *rk3328) mdelay(1); } + /* Workaround for silence when changed Fs 48 -> 44.1kHz */ + rk3328_codec_reset(rk3328); + regmap_update_bits(rk3328->regmap, DAC_PRECHARGE_CTRL, DAC_CHARGE_CURRENT_ALL_MASK, - DAC_CHARGE_CURRENT_I); + DAC_CHARGE_CURRENT_ALL_ON); return 0; } -- 2.20.1
Re: [PATCH] tty/n_hdlc: fix sleep in !TASK_RUNNING state warning
> On Jan 4, 2019, at 2:23 AM, Tetsuo Handa > wrote: > > But why not to clarify what are appropriate sanity checks? > ... > want a cleanup for scripts/checkpatch.pl . These are good goals. I avoid purely cosmetic patches. I do not object to cosmetic patches from others that do not change behavior. The checks that concern you deal with changing tty line disciplines. Dealing with line discipline changes has been an ongoing issue since n_hdlc was derived from other line disciplines 20 years ago, with major overhauls along the way. It is complex: driver layers shifting during operation while dealing properly with opens, closes, hangups, and sleeping operations. Patches have been added to the latest unreleased kernel to address line discipline changes, it is still evolving. Why are the existing line discipline checks in n_hdlc where they are? Becasue that’s how they evolved from where they started to accomodate these changes. There are not many and their function is known: has the line discipline changed at that point? I know that is not satisfying but coming up with a definitive comment saying a check is absolutely required in one place and not in another requires more insight into the long history of a moving target than I have. Without that insight I would not alter existing checks in code that is not causing problems.
Re: [PATCH] Variable "val" in function rt274_i2c_probe() could be uninitialized
On Thu, Jan 03, 2019 at 01:59:12PM -0800, Yizhuo wrote: > Inside function rt274_i2c_probe(), if regmap_read() function > returns -EINVAL, then local variable "val" leaves uninitialized > but used in if statement. This is potentially unsafe. Please use subject lines matching the style for the subsystem. This makes it easier for people to identify relevant patches. signature.asc Description: PGP signature
[PATCH 1/2] mailbox: mailbox-test: fix debugfs in multi-instances
Create one debug entry directory per instance to support the multi instantiation. Signed-off-by: Ludovic Barre Signed-off-by: Fabien Dessenne --- drivers/mailbox/mailbox-test.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c index 58bfafc..4e2bfba 100644 --- a/drivers/mailbox/mailbox-test.c +++ b/drivers/mailbox/mailbox-test.c @@ -31,7 +31,6 @@ (MBOX_MAX_MSG_LEN / MBOX_BYTES_PER_LINE)) static bool mbox_data_ready; -static struct dentry *root_debugfs_dir; struct mbox_test_device { struct device *dev; @@ -45,6 +44,7 @@ struct mbox_test_device { spinlock_t lock; wait_queue_head_t waitq; struct fasync_struct*async_queue; + struct dentry *root_debugfs_dir; }; static ssize_t mbox_test_signal_write(struct file *filp, @@ -262,16 +262,16 @@ static int mbox_test_add_debugfs(struct platform_device *pdev, if (!debugfs_initialized()) return 0; - root_debugfs_dir = debugfs_create_dir("mailbox", NULL); - if (!root_debugfs_dir) { + tdev->root_debugfs_dir = debugfs_create_dir(dev_name(>dev), NULL); + if (!tdev->root_debugfs_dir) { dev_err(>dev, "Failed to create Mailbox debugfs\n"); return -EINVAL; } - debugfs_create_file("message", 0600, root_debugfs_dir, + debugfs_create_file("message", 0600, tdev->root_debugfs_dir, tdev, _test_message_ops); - debugfs_create_file("signal", 0200, root_debugfs_dir, + debugfs_create_file("signal", 0200, tdev->root_debugfs_dir, tdev, _test_signal_ops); return 0; @@ -416,7 +416,7 @@ static int mbox_test_remove(struct platform_device *pdev) { struct mbox_test_device *tdev = platform_get_drvdata(pdev); - debugfs_remove_recursive(root_debugfs_dir); + debugfs_remove_recursive(tdev->root_debugfs_dir); if (tdev->tx_channel) mbox_free_channel(tdev->tx_channel); -- 2.7.4
[PATCH 2/2] mailbox: stm32-ipcc: remove useless device_init_wakeup call
If the "wakeup-source" property does not exist there is no need to call device_init_wakeup("false") at probe. Signed-off-by: Fabien Dessenne --- drivers/mailbox/stm32-ipcc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/mailbox/stm32-ipcc.c b/drivers/mailbox/stm32-ipcc.c index 7ca8a14..210fe50 100644 --- a/drivers/mailbox/stm32-ipcc.c +++ b/drivers/mailbox/stm32-ipcc.c @@ -276,8 +276,6 @@ static int stm32_ipcc_probe(struct platform_device *pdev) dev_err(dev, "Failed to set wake up irq\n"); goto err_init_wkp; } - } else { - device_init_wakeup(dev, false); } /* mailbox controller */ -- 2.7.4
[PATCH 1/2] mailbox: stm32-ipcc: do not enable wakeup source by default
By default do not enable the wakeup source. This lets the userspace application decide whether the wakeup source shall be enabled or not. Signed-off-by: Fabien Dessenne --- drivers/mailbox/stm32-ipcc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mailbox/stm32-ipcc.c b/drivers/mailbox/stm32-ipcc.c index a338bd4..7ca8a14 100644 --- a/drivers/mailbox/stm32-ipcc.c +++ b/drivers/mailbox/stm32-ipcc.c @@ -270,7 +270,7 @@ static int stm32_ipcc_probe(struct platform_device *pdev) goto err_clk; } - device_init_wakeup(dev, true); + device_set_wakeup_capable(dev, true); ret = dev_pm_set_dedicated_wake_irq(dev, ipcc->wkp); if (ret) { dev_err(dev, "Failed to set wake up irq\n"); -- 2.7.4
Re: NULL pointer dereference when writing fuzzed data to /dev/uhid
> I wanted to tell you that I started investigating the other private > report you sent us, but couldn't find the time to properly come with a > fix as the fuzzed data is hard to discriminate from valid data. Oops, I thought I was "over securing" these issues and everyone ignored them since I heard default policy here is to not send reports privately without a good reason, so I re-evaluated them and sent publicly... OTOH these ones seem to be not too severe: on the first glance they require a physical or root access, and even if they don't, *these ones* are only NULL dereferences. Best regards Anatoly пт, 4 янв. 2019 г. в 16:25, Benjamin Tissoires : > > Hi Anatoly, > > > On Fri, Jan 4, 2019 at 1:32 PM Anatoly Trosinenko > wrote: > > > > Hello, > > > > When writing the attached file to /dev/uhid, a NULL dereference occurs > > in kernel. As I understand, the problem is not UHID-specific, but is > > related to HID subsystem. > > Thanks for the report. > I wanted to tell you that I started investigating the other private > report you sent us, but couldn't find the time to properly come with a > fix as the fuzzed data is hard to discriminate from valid data. > > A couple of notes though: > - writing to uhid needs to be done by root. Any distribution that > doesn't enforce that is doomed to have several security issues > - we could somehow reproduce those fuzzed data on a USB or Bluetooth > connection, but that would require physical access to the device, so > you are doomed also > - last IIRC, there was some attempts by the ChromeOS team to allow > access to the HID stack from the Chrome plugins, I don't know if this > is able to generate the issues. > > On the specifics reported here: > > > > > How to reproduce: > > 1) Checkout the fresh master branch of the Linux kernel (tested on > > commit 96d4f267e) > > 2) Compile it with the attached config (kvm-xfstests capable) > > 3) Take one of reproducers and execute > > cat /vtmp/repro > /dev/uhid > > > > What happens: > > > > For chicony.bin: > > > > root@kvm-xfstests:~# cat /vtmp/chicony.bin > /dev/uhid > > [ 19.072703] BUG: unable to handle kernel NULL pointer dereference > > at 0002 > > [ 19.073371] #PF error: [normal kernel read fault] > > [ 19.073755] PGD 800078b2c067 P4D 800078b2c067 PUD 0 > > [ 19.074223] Oops: [#1] SMP PTI > > [ 19.074809] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted > > 4.20.0-xfstests-10979-g96d4f267e40 #1 > > [ 19.075965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS 1.11.1-1ubuntu1 04/01/2014 > > [ 19.077599] Workqueue: events uhid_device_add_worker > > [ 19.078019] RIP: 0010:ch_switch12_report_fixup+0x13/0x70 > > This driver expects the device to be connected on USB, and you are > triggering the oops because you are on uhid. > I am chasing the USB dependencies in most drivers, but this is a hard > task to do when I do not have access to the actual devices. > > I guess one way of fixing that is to add a check for the actual > transport driver during probe: > hid_is_using_ll_driver(hdev, _hid_driver) > > Patches are welcome :) > > > [ 19.078462] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f > > 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48 > > 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76 > > f1 80 > > [ 19.080103] RSP: 0018:a1d880367c48 EFLAGS: 00010286 > > [ 19.080541] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: > > > > [ 19.081133] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: > > 9b653a6fa000 > > [ 19.081780] RBP: 9b653d27b180 R08: 00064992eed0 R09: > > > > [ 19.082409] R10: R11: R12: > > 9b653a6fa000 > > [ 19.083017] R13: 83f14510 R14: 83f14440 R15: > > > > [ 19.083619] FS: () GS:9b653fc0() > > knlGS: > > [ 19.084362] CS: 0010 DS: ES: CR0: 80050033 > > [ 19.085164] CR2: 0002 CR3: 788b8004 CR4: > > 00360ef0 > > [ 19.085789] Call Trace: > > [ 19.086011] hid_open_report+0x81/0x2c0 > > [ 19.086341] hid_device_probe+0x135/0x160 > > [ 19.086754] ? __driver_attach+0x110/0x110 > > [ 19.087109] really_probe+0xe0/0x390 > > [ 19.087411] ? __driver_attach+0x110/0x110 > > [ 19.087782] bus_for_each_drv+0x78/0xc0 > > [ 19.088134] __device_attach+0xcc/0x130 > > [ 19.088477] bus_probe_device+0x9f/0xb0 > > [ 19.088832] device_add+0x422/0x680 > > [ 19.089144] ? __debugfs_create_file+0xb5/0xf0 > > [ 19.089536] hid_add_device+0xec/0x280 > > [ 19.089880] uhid_device_add_worker+0x15/0x60 > > [ 19.090270] process_one_work+0x238/0x5d0 > > [ 19.090627] worker_thread+0x3d/0x390 > > [ 19.090959] ? process_one_work+0x5d0/0x5d0 > > [ 19.091331] kthread+0x121/0x140 > > [ 19.096732] ? __kthread_create_on_node+0x1a0/0x1a0 > > [ 19.097164]
[PATCH 2/2] mailbox: mailbox-test: fix null pointer if no mmio
Fix null pointer issue if resource_size is called with no ioresource. Signed-off-by: Ludovic Barre Signed-off-by: Fabien Dessenne --- drivers/mailbox/mailbox-test.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c index 4e2bfba..4e4ac4b 100644 --- a/drivers/mailbox/mailbox-test.c +++ b/drivers/mailbox/mailbox-test.c @@ -363,22 +363,24 @@ static int mbox_test_probe(struct platform_device *pdev) /* It's okay for MMIO to be NULL */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - size = resource_size(res); tdev->tx_mmio = devm_ioremap_resource(>dev, res); - if (PTR_ERR(tdev->tx_mmio) == -EBUSY) + if (PTR_ERR(tdev->tx_mmio) == -EBUSY) { /* if reserved area in SRAM, try just ioremap */ + size = resource_size(res); tdev->tx_mmio = devm_ioremap(>dev, res->start, size); - else if (IS_ERR(tdev->tx_mmio)) + } else if (IS_ERR(tdev->tx_mmio)) { tdev->tx_mmio = NULL; + } /* If specified, second reg entry is Rx MMIO */ res = platform_get_resource(pdev, IORESOURCE_MEM, 1); - size = resource_size(res); tdev->rx_mmio = devm_ioremap_resource(>dev, res); - if (PTR_ERR(tdev->rx_mmio) == -EBUSY) + if (PTR_ERR(tdev->rx_mmio) == -EBUSY) { + size = resource_size(res); tdev->rx_mmio = devm_ioremap(>dev, res->start, size); - else if (IS_ERR(tdev->rx_mmio)) + } else if (IS_ERR(tdev->rx_mmio)) { tdev->rx_mmio = tdev->tx_mmio; + } tdev->tx_channel = mbox_test_request_channel(pdev, "tx"); tdev->rx_channel = mbox_test_request_channel(pdev, "rx"); -- 2.7.4
Re: [PATCH v4 00/10] steal tasks to improve CPU utilization
On 07-Dec-18 3:09 AM, Steve Sistare wrote: > > When a CPU has no more CFS tasks to run, and idle_balance() fails to > find a task, then attempt to steal a task from an overloaded CPU in the > same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently > identify candidates. To minimize search time, steal the first migratable > task that is found when the bitmap is traversed. For fairness, search > for migratable tasks on an overloaded CPU in order of next to run. > > This simple stealing yields a higher CPU utilization than idle_balance() > alone, because the search is cheap, so it may be called every time the CPU > is about to go idle. idle_balance() does more work because it searches > widely for the busiest queue, so to limit its CPU consumption, it declines > to search if the system is too busy. Simple stealing does not offload the > globally busiest queue, but it is much better than running nothing at all. > > The bitmap of overloaded CPUs is a new type of sparse bitmap, designed to > reduce cache contention vs the usual bitmap when many threads concurrently > set, clear, and visit elements. > > Patch 1 defines the sparsemask type and its operations. > > Patches 2, 3, and 4 implement the bitmap of overloaded CPUs. > > Patches 5 and 6 refactor existing code for a cleaner merge of later >patches. > > Patches 7 and 8 implement task stealing using the overloaded CPUs bitmap. > > Patch 9 disables stealing on systems with more than 2 NUMA nodes for the > time being because of performance regressions that are not due to stealing > per-se. See the patch description for details. > > Patch 10 adds schedstats for comparing the new behavior to the old, and >provided as a convenience for developers only, not for integration. > > The patch series is based on kernel 4.20.0-rc1. It compiles, boots, and > runs with/without each of CONFIG_SCHED_SMT, CONFIG_SMP, CONFIG_SCHED_DEBUG, > and CONFIG_PREEMPT. It runs without error with CONFIG_DEBUG_PREEMPT + > CONFIG_SLUB_DEBUG + CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_MUTEXES + > CONFIG_DEBUG_SPINLOCK + CONFIG_DEBUG_ATOMIC_SLEEP. CPU hot plug and CPU > bandwidth control were tested. > > Stealing improves utilization with only a modest CPU overhead in scheduler > code. In the following experiment, hackbench is run with varying numbers > of groups (40 tasks per group), and the delta in /proc/schedstat is shown > for each run, averaged per CPU, augmented with these non-standard stats: > >%find - percent of time spent in old and new functions that search for > idle CPUs and tasks to steal and set the overloaded CPUs bitmap. > >steal - number of times a task is stolen from another CPU. > > X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs > Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > hackbench process 10 > sched_wakeup_granularity_ns=1500 > >baseline >grps time %busy slice sched idle wake %find steal >18.084 75.02 0.10 105476 4629159183 0.31 0 >2 13.892 85.33 0.10 190225 70958 119264 0.45 0 >3 19.668 89.04 0.10 263896 87047 176850 0.49 0 >4 25.279 91.28 0.10 322171 94691 227474 0.51 0 >8 47.832 94.86 0.09 630636 144141 486322 0.56 0 > >new >grps time %busy slice sched idle wake %find steal %speedup >15.938 96.80 0.24 31255 719024061 0.63 7433 36.1 >2 11.491 99.23 0.16 74097 457869512 0.84 19463 20.9 >3 16.987 99.66 0.15 115824 1985 113826 0.77 24707 15.8 >4 22.504 99.80 0.14 167188 2385 164786 0.75 29353 12.3 >8 44.441 99.86 0.11 389153 1616 387401 0.67 38190 7.6 > > Elapsed time improves by 8 to 36%, and CPU busy utilization is up > by 5 to 22% hitting 99% for 2 or more groups (80 or more tasks). > The cost is at most 0.4% more find time. > > Additional performance results follow. A negative "speedup" is a > regression. Note: for all hackbench runs, sched_wakeup_granularity_ns > is set to 15 msec. Otherwise, preemptions increase at higher loads and > distort the comparison between baseline and new. > > -- 1 Socket Results -- > > X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs > Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Average of 10 runs of: hackbench process 10 > > --- base ----- new --- >groupstime %stdevtime %stdev %speedup > 1 8.0080.1 5.9050.2 35.6 > 2 13.8140.2 11.4380.1 20.7 > 3 19.4880.2 16.9190.1 15.1 > 4 25.0590.1 22.4090.1 11.8 > 8 47.4780.1 44.2210.1 7.3 > > X6-2: 1 socket * 22 cores * 2 hyperthreads = 44 CPUs > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Average of 10 runs of: hackbench process 10 > > --- base ----- new --- >groupstime %stdev
Re: [PATCH v4 1/2] spi: Add Renesas R-Car Gen3 RPC SPI controller driver
Hello! On 01/03/2019 09:35 AM, masonccy...@mxic.com.tw wrote: [...] >> >> > +#define RPC_CMNCR_MOIIO3(val) (((val) & 0x3) << 22) >> >> > +#define RPC_CMNCR_MOIIO2(val) (((val) & 0x3) << 20) >> >> > +#define RPC_CMNCR_MOIIO1(val) (((val) & 0x3) << 18) >> >> > +#define RPC_CMNCR_MOIIO0(val) (((val) & 0x3) << 16) >> >> > +#define RPC_CMNCR_MOIIO_HIZ (RPC_CMNCR_MOIIO0(3) | >> >> RPC_CMNCR_MOIIO1(3) | \ >> >> > + RPC_CMNCR_MOIIO2(3) | RPC_CMNCR_MOIIO3(3)) >> >> > +#define RPC_CMNCR_IO3FV(val) (((val) & 0x3) << 14) >> >> > +#define RPC_CMNCR_IO2FV(val) (((val) & 0x3) << 12) >> >> >> >>Like I said, the above 2 aren't documented in the manual v1.00... >> > >> > okay, add a description as: >> > /* RPC_CMNCR_IO3FV/IO2FV are undocumented bit, but must be set */ >> > #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14) >> > #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12) >> > #define RPC_CMNCR_IO0FV(val)(((val) & 0x3) << 8) >> > #define RPC_CMNCR_IOFV_HIZ (RPC_CMNCR_IO0FV(3) | RPC_CMNCR_IO2FV(3) | >> > \ >> > RPC_CMNCR_IO3FV(3)) >> > >> > is it ok? >> >>Yes. But would have been enough if you just commented with // on >> the same line -- >> it seems these are legal now... > > on the same line is over 80 char, > #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14) // undocumented bit, > but must be set > #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12) // undocumented bit, > but must be set > > or just > #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14) // undocumented bit > #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12) // undocumented bit > is it ok ? The second variant would be enough. [...] >> >> > + ret = rpc_spi_set_freq(rpc, desc->mem->spi->max_speed_hz); >> >> > + if (ret) >> >> > + return ret; >> >> > + >> >> > + rpc_spi_mem_set_prep_op_cfg(desc->mem->spi, >> >> > +>info.op_tmpl, , ); >> >> > + >> >> > + regmap_write(rpc->regmap, RPC_CMNCR, RPC_CMNCR_SFDE | >> >> > + RPC_CMNCR_MOIIO_HIZ | RPC_CMNCR_IOFV_HIZ | >> >> > + RPC_CMNCR_BSZ(0)); >> >> >> >>Why not set this in the probing time and only set/clear the MD bit? >> >> >> > >> > same above! >> > Make sure the value in these register are setting correctly >> > before RPC starting a SPI transfer. >> >>You can set it once and only change the bits you need to change >> afterwards. >> What's wrong with it? >> > > if so, it will patch to: > -- > regmap_read(rpc->regmap, RPC_CMNCR, ); > data &= ~RPC_CMNCR_MD; > regmap_write(rpc->regmap, RPC_CMNCR, data); > -- > Do you think this way is better ? No, this one is better: regmap_update_bits(rpc->regmap, RPC_CMNCR, RPC_CMNCR_MD, 0); > maybe this is better, > write(read(rpc->regs + RPC_CMNCR) & ~RPC_CMNCR_MD, > rpc->regs + RPC_CMNCR); It's effectively the same code as your 1st variant... [...] >> >> > +static void rpc_spi_transfer_setup(struct rpc_spi *rpc, >> >> > + struct spi_message *msg) >> >> > +{ >> >> [...] >> >> > + for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) { >> >> > + if (xfer[i].rx_buf) { >> >> > + rpc->smenr |= >> >> > +RPC_SMENR_SPIDE(rpc_bits_set(xfer[i].len)) | >> >> > +RPC_SMENR_SPIDB >> >> > +(ilog2((unsigned int)xfer[i].rx_nbits)); >> >> >> >>Mhm, I would indent this contination line by 1 extra tab... >> >> >> >> > + } else if (xfer[i].tx_buf) { >> >> > + rpc->smenr |= >> >> > +RPC_SMENR_SPIDE(rpc_bits_set(xfer[i].len)) | >> >> > +RPC_SMENR_SPIDB >> >> > +(ilog2((unsigned int)xfer[i].tx_nbits)); >> >> >> >>And this one... >> > >> > like this ? >> > >> > for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) { >> > if (xfer[i].rx_buf) { >> > rpc->smenr |= >> > RPC_SMENR_SPIDE(rpc_bits_set(xfer >> [i].len)) | >> > RPC_SMENR_SPIDB( >> > ilog2((unsigned int)xfer >> [i].rx_nbits)); >> > } else if (xfer[i].tx_buf) { >> > rpc->smenr |= >> > RPC_SMENR_SPIDE(rpc_bits_set(xfer >> [i].len)) | >> > RPC_SMENR_SPIDB( >> > ilog2((unsigned int)xfer >> [i].tx_nbits)); >> >>I didn't mean you need to leave ( on the first line, can be left >> on the new >> line, as before. >> > > how about this style ? > - > for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) { > if (xfer[i].rx_buf) { >
Re: [PATCH 00/10] steal tasks to improve CPU utilization
On 22-Oct-18 8:40 PM, Steve Sistare wrote: > > When a CPU has no more CFS tasks to run, and idle_balance() fails to > find a task, then attempt to steal a task from an overloaded CPU in the > same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently > identify candidates. To minimize search time, steal the first migratable > task that is found when the bitmap is traversed. For fairness, search > for migratable tasks on an overloaded CPU in order of next to run. > > This simple stealing yields a higher CPU utilization than idle_balance() > alone, because the search is cheap, so it may be called every time the CPU > is about to go idle. idle_balance() does more work because it searches > widely for the busiest queue, so to limit its CPU consumption, it declines > to search if the system is too busy. Simple stealing does not offload the > globally busiest queue, but it is much better than running nothing at all. > > The bitmap of overloaded CPUs is a new type of sparse bitmap, designed to > reduce cache contention vs the usual bitmap when many threads concurrently > set, clear, and visit elements. > > Patch 1 defines the sparsemask type and its operations. > > Patches 2, 3, and 4 implement the bitmap of overloaded CPUs. > > Patches 5 and 6 refactor existing code for a cleaner merge of later >patches. > > Patches 7 and 8 implement task stealing using the overloaded CPUs bitmap. > > Patch 9 disables stealing on systems with more than 2 NUMA nodes for the > time being because of performance regressions that are not due to stealing > per-se. See the patch description for details. > > Patch 10 adds schedstats for comparing the new behavior to the old, and >provided as a convenience for developers only, not for integration. > > The patch series is based on kernel 4.19.0-rc7. It compiles, boots, and > runs with/without each of CONFIG_SCHED_SMT, CONFIG_SMP, CONFIG_SCHED_DEBUG, > and CONFIG_PREEMPT. It runs without error with CONFIG_DEBUG_PREEMPT + > CONFIG_SLUB_DEBUG + CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_MUTEXES + > CONFIG_DEBUG_SPINLOCK + CONFIG_DEBUG_ATOMIC_SLEEP. CPU hot plug and CPU > bandwidth control were tested. > > Stealing imprroves utilization with only a modest CPU overhead in scheduler > code. In the following experiment, hackbench is run with varying numbers > of groups (40 tasks per group), and the delta in /proc/schedstat is shown > for each run, averaged per CPU, augmented with these non-standard stats: > >%find - percent of time spent in old and new functions that search for > idle CPUs and tasks to steal and set the overloaded CPUs bitmap. > >steal - number of times a task is stolen from another CPU. > > X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs > Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > hackbench process 10 > sched_wakeup_granularity_ns=1500 > >baseline >grps time %busy slice sched idle wake %find steal >18.084 75.02 0.10 105476 4629159183 0.31 0 >2 13.892 85.33 0.10 190225 70958 119264 0.45 0 >3 19.668 89.04 0.10 263896 87047 176850 0.49 0 >4 25.279 91.28 0.10 322171 94691 227474 0.51 0 >8 47.832 94.86 0.09 630636 144141 486322 0.56 0 > >new >grps time %busy slice sched idle wake %find steal %speedup >15.938 96.80 0.24 31255 719024061 0.63 7433 36.1 >2 11.491 99.23 0.16 74097 457869512 0.84 19463 20.9 >3 16.987 99.66 0.15 115824 1985 113826 0.77 24707 15.8 >4 22.504 99.80 0.14 167188 2385 164786 0.75 29353 12.3 >8 44.441 99.86 0.11 389153 1616 387401 0.67 38190 7.6 > > Elapsed time improves by 8 to 36%, and CPU busy utilization is up > by 5 to 22% hitting 99% for 2 or more groups (80 or more tasks). > The cost is at most 0.4% more find time. > > Additional performance results follow. A negative "speedup" is a > regression. Note: for all hackbench runs, sched_wakeup_granularity_ns > is set to 15 msec. Otherwise, preemptions increase at higher loads and > distort the comparison between baseline and new. > > -- 1 Socket Results -- > > X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs > Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Average of 10 runs of: hackbench process 10 > > --- base ----- new --- >groupstime %stdevtime %stdev %speedup > 1 8.0080.1 5.9050.2 35.6 > 2 13.8140.2 11.4380.1 20.7 > 3 19.4880.2 16.9190.1 15.1 > 4 25.0590.1 22.4090.1 11.8 > 8 47.4780.1 44.2210.1 7.3 > > X6-2: 1 socket * 22 cores * 2 hyperthreads = 44 CPUs > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Average of 10 runs of: hackbench process 10 > > --- base ----- new --- >groupstime %stdev
Re: [PATCH] irqchip: madera: Drop GPIO includes
On 03/01/19 10:26, Linus Walleij wrote: This irqchip does not use anything GPIO-related so drop the GPIO includes. An older version did some stuff with GPIOs but when that code was removed this include obviously was forgotten. Cc: Richard Fitzgerald Signed-off-by: Linus Walleij --- drivers/irqchip/irq-madera.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/irqchip/irq-madera.c b/drivers/irqchip/irq-madera.c index e9256dee1a45..8b81271c823c 100644 --- a/drivers/irqchip/irq-madera.c +++ b/drivers/irqchip/irq-madera.c @@ -7,7 +7,6 @@ */ #include -#include #include #include #include @@ -16,7 +15,6 @@ #include #include #include -#include #include #include #include Acked-by: Richard Fitzgerald
Re: [PATCH] ARM: mach-s3c24xx: Fix boolean expressions in osiris_dvs_notify
On Thu, 3 Jan 2019 at 21:46, Gustavo A. R. Silva wrote: > > Fix boolean expressions by using logical AND operator '&&' > instead of bitwise operator '&'. > > This issue was detected with the help of Coccinelle. > > Fixes: 4fa084af28ca ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.") > Cc: sta...@vger.kernel.org > Signed-off-by: Gustavo A. R. Silva > --- > arch/arm/mach-s3c24xx/mach-osiris-dvs.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c > b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c > index 058ce73137e8..ccbd7b7965ca 100644 > --- a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c > +++ b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c > @@ -65,16 +65,16 @@ static int osiris_dvs_notify(struct notifier_block *nb, > > switch (val) { > case CPUFREQ_PRECHANGE: > - if (old_dvs & !new_dvs || > - cur_dvs & !new_dvs) { > + if (old_dvs && !new_dvs || > + cur_dvs && !new_dvs) { Technically the old code will work fine because all variables are bools (so 0 or 1). Therefore I am not sure whether this should be ported to stable. Anyway the patch is itself okay and I will take it after merge window. Thanks! Best regards, Krzysztof
Re: Column for keywords?
On Fri, Jan 4, 2019 at 2:01 PM Dmitry Vyukov wrote: > > On Wed, May 16, 2018 at 6:50 PM Tetsuo Handa > wrote: > > > > Dmitry Vyukov wrote: > > > On Tue, May 15, 2018 at 10:59 PM, Tetsuo Handa > > > wrote: > > > > Dmitry Vyukov wrote: > > > >> I've filed https://github.com/google/syzkaller/issues/608 to not lose > > > >> track of this. > > > > > > > > Thanks. Since the time lag between a patch was proposed and that patch > > > > is > > > > applied to a git tree tends to become long, duplicated works like > > > > https://www.spinics.net/lists/linux-fsdevel/msg125240.html and > > > > http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056...@i-love.sakura.ne.jp > > > > are already occurring. > > > > > > This is bad. > > > > > > > Therefore, it is important that the state of the bug (e.g. > > > > bisected, cause identified, patch proposed) is visible from the table. > > > > > > What do you think about the last section of: > > > https://groups.google.com/d/msg/syzkaller-bugs/nw7BIW9V2wk/NE0P_Au4AQAJ > > > there is already a way to say "there is a pending fix for this". > > > > That lacks a way to annotate "there is a pending fix for this, but the fix > > is not yet applied to any git tree". I mean not only "git trees which syzbot > > is checking" but also "git trees which are publicly visible". > > > > (Also, if we can later correct the patch using "#syz fix:" in case the patch > > title was renamed, it is not clear how to specify multiple patches using > > "#syz fix:" when a patch which meant to fix the reported problem contained > > a regression or was incomplete and thus fixup patch followed shortly. An > > example is commit 5f3e3b85cc0a5eae and commit ef95a90ae6f4f219 in > > "WARNING: kmalloc bug in memdup_user (2)". I've tried > > > > #syz fix: RDMA/ucma: Correct option size check using optlen > > #syz fix: RDMA/ucma: ucma_context reference leak in error path > > > > but only the former patch was recorded.) > > > > > > > > But one problem with manual tagging is how to make everybody update > > > these tags. If only few people do it, it can still lead to duplicate > > > work. And it's not syzbot-specific. Can happen with just any bug > > > report on kernel mailing lists. Traditionally it's solved with bug > > > tracking systems and assigning bugs when a developer starts working on > > > it. But kernel does not have a working bug tracker. > > > > > > One simple thing we can do is make syzbot poll more trees to discover > > > Reported-by tags faster. This will automatically update status on > > > dashboard to "fix pending". I've filed > > > https://github.com/google/syzkaller/issues/610 for this. Ideally, we > > > would intercept all mailed patches, but it's hard with kernel > > > development process because there is no system that tracks all pending > > > patches. > > > > > > > The problem is that the pending fix won't be applied to any git tree. > > It depends on when reviewers and maintainers can find time for > > reviewing/committing the fix. Scanning all git trees unlikely helps. > > > > the criteria is that you are "reasonably sure that the commit will > > reach upstream under this title", for whatever reason > > > > won't apply to not yet reviewed patches. What I want is a way to specify > > "a patch was proposed but the patch is not yet reviewed/tested/applied". > > > > Generally, progresses are not recorded frequently enough to avoid duplicated > > works. I want to check not only "fix pending" stage but also e.g. "problem > > guessed", "bisected", "cause identified", "patch proposed", "patch reviewed" > > stages from the top page's table. > > 1. This sounds very much like general bug tracking system. We > specifically didn't want to go down the slippery slope of implementing > yet another bug tracking system. > 2. This problem is not specific to syzbot in any way (just like lost > bug reports). Kernel developers waste time on duplicate work for other > bug reports too. > So I think (1) we need a bug tracking system, (2) use that system for > syzbot to solve this local problem. +Ted who also says that it is not possible to make sense out of current state of kernel bug reports (e.g. what are open bugs for ext4 sorted by priority). +Doug who says the same re rdma_cm subsystem. Both said this in the context of syzbot, but I fail to see how this is any syzbot-specific. This highlights the more broad problem with kernel development process.
Re: NULL pointer dereference when writing fuzzed data to /dev/uhid
Hi Anatoly, On Fri, Jan 4, 2019 at 1:32 PM Anatoly Trosinenko wrote: > > Hello, > > When writing the attached file to /dev/uhid, a NULL dereference occurs > in kernel. As I understand, the problem is not UHID-specific, but is > related to HID subsystem. Thanks for the report. I wanted to tell you that I started investigating the other private report you sent us, but couldn't find the time to properly come with a fix as the fuzzed data is hard to discriminate from valid data. A couple of notes though: - writing to uhid needs to be done by root. Any distribution that doesn't enforce that is doomed to have several security issues - we could somehow reproduce those fuzzed data on a USB or Bluetooth connection, but that would require physical access to the device, so you are doomed also - last IIRC, there was some attempts by the ChromeOS team to allow access to the HID stack from the Chrome plugins, I don't know if this is able to generate the issues. On the specifics reported here: > > How to reproduce: > 1) Checkout the fresh master branch of the Linux kernel (tested on > commit 96d4f267e) > 2) Compile it with the attached config (kvm-xfstests capable) > 3) Take one of reproducers and execute > cat /vtmp/repro > /dev/uhid > > What happens: > > For chicony.bin: > > root@kvm-xfstests:~# cat /vtmp/chicony.bin > /dev/uhid > [ 19.072703] BUG: unable to handle kernel NULL pointer dereference > at 0002 > [ 19.073371] #PF error: [normal kernel read fault] > [ 19.073755] PGD 800078b2c067 P4D 800078b2c067 PUD 0 > [ 19.074223] Oops: [#1] SMP PTI > [ 19.074809] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted > 4.20.0-xfstests-10979-g96d4f267e40 #1 > [ 19.075965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.11.1-1ubuntu1 04/01/2014 > [ 19.077599] Workqueue: events uhid_device_add_worker > [ 19.078019] RIP: 0010:ch_switch12_report_fixup+0x13/0x70 This driver expects the device to be connected on USB, and you are triggering the oops because you are on uhid. I am chasing the USB dependencies in most drivers, but this is a hard task to do when I do not have access to the actual devices. I guess one way of fixing that is to add a check for the actual transport driver during probe: hid_is_using_ll_driver(hdev, _hid_driver) Patches are welcome :) > [ 19.078462] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f > 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48 > 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76 > f1 80 > [ 19.080103] RSP: 0018:a1d880367c48 EFLAGS: 00010286 > [ 19.080541] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: > > [ 19.081133] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: > 9b653a6fa000 > [ 19.081780] RBP: 9b653d27b180 R08: 00064992eed0 R09: > > [ 19.082409] R10: R11: R12: > 9b653a6fa000 > [ 19.083017] R13: 83f14510 R14: 83f14440 R15: > > [ 19.083619] FS: () GS:9b653fc0() > knlGS: > [ 19.084362] CS: 0010 DS: ES: CR0: 80050033 > [ 19.085164] CR2: 0002 CR3: 788b8004 CR4: > 00360ef0 > [ 19.085789] Call Trace: > [ 19.086011] hid_open_report+0x81/0x2c0 > [ 19.086341] hid_device_probe+0x135/0x160 > [ 19.086754] ? __driver_attach+0x110/0x110 > [ 19.087109] really_probe+0xe0/0x390 > [ 19.087411] ? __driver_attach+0x110/0x110 > [ 19.087782] bus_for_each_drv+0x78/0xc0 > [ 19.088134] __device_attach+0xcc/0x130 > [ 19.088477] bus_probe_device+0x9f/0xb0 > [ 19.088832] device_add+0x422/0x680 > [ 19.089144] ? __debugfs_create_file+0xb5/0xf0 > [ 19.089536] hid_add_device+0xec/0x280 > [ 19.089880] uhid_device_add_worker+0x15/0x60 > [ 19.090270] process_one_work+0x238/0x5d0 > [ 19.090627] worker_thread+0x3d/0x390 > [ 19.090959] ? process_one_work+0x5d0/0x5d0 > [ 19.091331] kthread+0x121/0x140 > [ 19.096732] ? __kthread_create_on_node+0x1a0/0x1a0 > [ 19.097164] ret_from_fork+0x3a/0x50 > [ 19.097483] CR2: 0002 > [ 19.097779] ---[ end trace 1b547acaae113039 ]--- > [ 19.098186] RIP: 0010:ch_switch12_report_fixup+0x13/0x70 > [ 19.098621] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f > 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48 > 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76 > f1 80 > [ 19.100251] RSP: 0018:a1d880367c48 EFLAGS: 00010286 > [ 19.100707] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: > > [ 19.101321] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: > 9b653a6fa000 > [ 19.102448] RBP: 9b653d27b180 R08: 00064992eed0 R09: > > [ 19.103029] R10: R11: R12: > 9b653a6fa000 > [ 19.103601] R13: 83f14510 R14: 83f14440
Re: [PATCH] netfilter: account ebt_table_info to kmemcg
On Thu 03-01-19 12:52:54, Shakeel Butt wrote: > On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko wrote: > > > > On Sun 30-12-18 19:59:53, Shakeel Butt wrote: > > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko wrote: > > > > > > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote: > > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote: > > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko > > > > > > wrote: > > > > > > > > > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote: > > > > > > > > Michal Hocko wrote: > > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote: > > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and > > > > > > > > > > the underlying > > > > > > > > > > memory is already accounted to kmemcg. Do the same for > > > > > > > > > > ebtables. The > > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able > > > > > > > > > > to OOM the > > > > > > > > > > whole system from a restricted memcg, a potential DoS. > > > > > > > > > > > > > > > > > > What is the lifetime of these objects? Are they bound to any > > > > > > > > > process? > > > > > > > > > > > > > > > > No, they are not. > > > > > > > > They are free'd only when userspace requests it or the netns is > > > > > > > > destroyed. > > > > > > > > > > > > > > Then this is problematic, because the oom killer is not able to > > > > > > > guarantee the hard limit and so the excessive memory consumption > > > > > > > cannot > > > > > > > be really contained. As a result the memcg will be basically > > > > > > > useless > > > > > > > until somebody tears down the charged objects by other means. The > > > > > > > memcg > > > > > > > oom killer will surely kill all the existing tasks in the cgroup > > > > > > > and > > > > > > > this could somehow reduce the problem. Maybe this is sufficient > > > > > > > for > > > > > > > some usecases but that should be properly analyzed and described > > > > > > > in the > > > > > > > changelog. > > > > > > > > > > > > > > > > > > > Can you explain why you think the memcg hard limit will not be > > > > > > enforced? From what I understand, the memcg oom-killer will kill the > > > > > > allocating processes as you have mentioned. We do force charging for > > > > > > very limited conditions but here the memcg oom-killer will take care > > > > > > of > > > > > > > > > > I was talking about the force charge part. Depending on a specific > > > > > allocation and its life time this can gradually get us over hard limit > > > > > without any bound theoretically. > > > > > > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when > > > > the current task is killed"") there is no way to bail out from the > > > > vmalloc allocation loop so if the request is really large then the memcg > > > > oom will not help. Is that a problem here? > > > > > > > > > > Yes, I think it will be an issue here. > > > > > > > Maybe it is time to revisit fatal_signal_pending check. > > > > > > Yes, we will need something to handle the memcg OOM. I will think more > > > on that front or if you have any ideas, please do propose. > > > > I can see three options here: > > - do not force charge on memcg oom or introduce a limited charge > > overflow (reserves basically). > > - revert the revert and reintroduce the fatal_signal_pending > > check into vmalloc > > - be more specific and check tsk_is_oom_victim in vmalloc and > > fail > > > > I think for the long term solution we might need something similar to > memcg oom reserves (1) but for quick fix I think we can do the > combination of (2) and (3). Johannes argued that fatal_signal_pending is too general check for vmalloc. I would argue that we already break out of some operations on fatal signals. tsk_is_oom_victim is more subtle but much more targeted on the other hand. I do not have any strong preference to be honest but I agree that some limited reserves would be the best solution long term. I just do not have any idea how to scale those reserves to be meaningful. -- Michal Hocko SUSE Labs
Re: [PATCH v5 10/15] KVM: s390: add functions to (un)register GISC with GISA
On Wed, 2 Jan 2019 18:29:00 +0100 Pierre Morel wrote: > On 19/12/2018 20:17, Michael Mueller wrote: > > Add the IAM (Interruption Alert Mask) to the architecture specific > > kvm struct. This mask in the GISA is used to define for which ISC > > a GIB alert can be issued. > > > > The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister() > > are used to (un)register a GISC (guest ISC) with a virtual machine and > > its GISA. > > > > Upon successful completion, kvm_s390_gisc_register() returns the > > ISC to be used for GIB alert interruptions. A negative return code > > indicates an error during registration. > > > > Theses functions will be used by other adapter types like AP and PCI to > > request pass-through interruption support. > > > > Signed-off-by: Michael Mueller > > --- > > arch/s390/include/asm/kvm_host.h | 9 ++ > > arch/s390/kvm/interrupt.c| 66 > > > > 2 files changed, 75 insertions(+) > > > > +int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc) > > +{ > > + if (!kvm->arch.gib_in_use) > > + return -ENODEV; > > + if (gisc > MAX_ISC) > > + return -ERANGE; > > + > > + spin_lock(>arch.iam_ref_lock); > > + if (kvm->arch.iam_ref_count[gisc] == 0) > > + kvm->arch.iam |= 0x80 >> gisc; > > + kvm->arch.iam_ref_count[gisc]++; > > + if (kvm->arch.iam_ref_count[gisc] == 1) > > + set_iam(kvm->arch.gisa, kvm->arch.iam); > > testing the set_iam return value? > Even it should be fine if the caller works correctly, this is done > before GISA is ever used. My feeling is that checking the return code is a good idea, even if it Should Never Fail(tm). > > > + spin_unlock(>arch.iam_ref_lock); > > + > > + return gib->nisc; > > +} > > +EXPORT_SYMBOL_GPL(kvm_s390_gisc_register); > > + > > +int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc) > > +{ > > + int rc = 0; > > + > > + if (!kvm->arch.gib_in_use) > > + return -ENODEV; > > + if (gisc > MAX_ISC) > > + return -ERANGE; > > + > > + spin_lock(>arch.iam_ref_lock); > > + if (kvm->arch.iam_ref_count[gisc] == 0) { > > + rc = -EINVAL; > > + goto out; > > + } > > + kvm->arch.iam_ref_count[gisc]--; > > + if (kvm->arch.iam_ref_count[gisc] == 0) { > > + kvm->arch.iam &= ~(0x80 >> gisc); > > + set_iam(kvm->arch.gisa, kvm->arch.iam); Any chance of this function failing here? If yes, would there be any implications? > > + } > > +out: > > + spin_unlock(>arch.iam_ref_lock); > > + > > + return rc; > > +} > > +EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister); > > + > > void kvm_s390_gib_destroy(void) > > { > > if (!gib) > > > >
Re: [PATCH v4] selftests: add headers_install to lib.mk
Hi Anders, On Wed, Aug 8, 2018 at 12:46 PM Anders Roxell wrote: > On Tue, 7 Aug 2018 at 04:33, Masahiro Yamada > wrote: > > 2018-08-07 2:03 GMT+09:00 Shuah Khan : > > > On 07/25/2018 10:08 AM, Anders Roxell wrote: > > >> On Tue, 24 Jul 2018 at 19:11, Shuah Khan wrote: > > >>> On 07/23/2018 02:49 PM, Anders Roxell wrote: > > On Thu, 7 Jun 2018 at 13:09, Anders Roxell > > wrote: > > > > > > If the kernel headers aren't installed we can't build all the tests. > > > Add a new make target rule 'khdr' in the file lib.mk to generate the > > > kernel headers and that gets include for every test-dir Makefile that > > > includes lib.mk If the testdir in turn have its own sub-dirs the > > > top_srcdir needs to be set to the linux-rootdir to be able to generate > > > the kernel headers. > > > > > > Signed-off-by: Anders Roxell > > > Reviewed-by: Fathi Boudra > > > --- > > > Makefile | 14 > > > +- > > > scripts/subarch.include| 13 + > > > tools/testing/selftests/android/Makefile | 2 +- > > > tools/testing/selftests/android/ion/Makefile | 2 ++ > > > tools/testing/selftests/futex/functional/Makefile | 1 + > > > tools/testing/selftests/gpio/Makefile | 7 ++- > > > tools/testing/selftests/kvm/Makefile | 7 ++- > > > tools/testing/selftests/lib.mk | 12 > > > tools/testing/selftests/net/Makefile | 1 + > > > .../selftests/networking/timestamping/Makefile | 1 + > > > tools/testing/selftests/vm/Makefile| 4 > > > 11 files changed, 36 insertions(+), 28 deletions(-) > > > create mode 100644 scripts/subarch.include > > > > > > diff --git a/Makefile b/Makefile > > > index 6b9aea95ae3a..8050072300fa 100644 > > > --- a/Makefile > > > +++ b/Makefile > > > @@ -286,19 +286,7 @@ KERNELRELEASE = $(shell cat > > > include/config/kernel.release 2> /dev/null) > > > KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if > > > $(SUBLEVEL),.$(SUBLEVEL)))$(EXTRAVERSION) > > > export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION > > > > > > -# SUBARCH tells the usermode build what the underlying arch is. > > > That is set > > > -# first, and if a usermode build is happening, the "ARCH=um" on the > > > command > > > -# line overrides the setting of ARCH below. If a native build is > > > happening, > > > -# then ARCH is assigned, getting whatever value it gets normally, and > > > -# SUBARCH is subsequently ignored. > > > - > > > -SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \ > > > - -e s/sun4u/sparc64/ \ > > > - -e s/arm.*/arm/ -e s/sa110/arm/ \ > > > - -e s/s390x/s390/ -e > > > s/parisc64/parisc/ \ > > > - -e s/ppc.*/powerpc/ -e > > > s/mips.*/mips/ \ > > > - -e s/sh[234].*/sh/ -e > > > s/aarch64.*/arm64/ \ > > > - -e s/riscv.*/riscv/) > > > +include scripts/subarch.include > > >>> > > >>> What is the reason for this SUBARCH block move to to > > >>> scripts/subarch.include? > > >>> Is this necessary for adding headers install dependency to lib.mk? > > >> > > >> This is needed to create headers for cross build. > > > > > > I am sorry for the delay on this patch. I am going to get this into 4.19. > > > If anybody has objections, please let me. > > > > > > Anders! Will be able to rebase the patch and send me the latest. I think > > > I have Acks from kvm, android, and vm so far. > > > > I may be missing something about the tools/ directory, > > but why isn't it like this? > > > > > > kselftest: headers_install > > $(Q)$(MAKE) -C $(srctree)/tools/testing/selftests run_tests > > That wont work if you cross compile, since you wont run the tests > where you build them. So why does a similar dependency work for samples in the main Makefile (even listed twice!)? Documentation/ samples/: headers_install samples: headers_install Or perhaps this also doesn't work? > Then you can argue that we should break it up in a build and run rule. > However, my understanding how people just build and run the the tests > for that particular subsystem that they develop for, like: > make -C tools/testing/selftests// I'm still a bit puzzled: how can you "build and run" the tests when cross-compiling, as you don't run the tests where you build them? Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I
Re: [PATCH] drm/tegra: hdmi: Fix audio to work with any pixel clock rate
On Wed, Dec 14, 2016 at 06:20:39PM +0100, Alban Bedel wrote: > The audio setting implementation was limited to a few specific pixel > clocks. This prevented HDMI audio from working on several test devices > as they need a pixel clock that is not supported by this implementation. > > Fix this by implementing the algorithm provided in the TRM using fixed > point arithmetic. This allows the driver to cope with any sane pixel > clock rate. > > Signed-off-by: Alban Bedel > --- > drivers/gpu/drm/tegra/hdmi.c | 163 > ++- > 1 file changed, 54 insertions(+), 109 deletions(-) I had completely forgotten about this one, but then ran into this issue myself yesterday and only then I started remembering this patch. It did apply almost cleanly and still works perfectly. I made a couple of minor modifications (see below) and applied this for v4.22. > diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c > index cda0491ed6bf..b6078d6604b1 100644 > --- a/drivers/gpu/drm/tegra/hdmi.c > +++ b/drivers/gpu/drm/tegra/hdmi.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include Turned this into #include which I think is the more canonical way to get at do_div() nowadays. > > #include > #include > @@ -112,68 +113,11 @@ static inline void tegra_hdmi_writel(struct tegra_hdmi > *hdmi, u32 value, > } > > struct tegra_hdmi_audio_config { > - unsigned int pclk; > unsigned int n; > unsigned int cts; > unsigned int aval; > }; > > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_32k[] = { > - { 2520, 4096, 25200, 24000 }, > - { 2700, 4096, 27000, 24000 }, > - { 7425, 4096, 74250, 24000 }, > - { 14850, 4096, 148500, 24000 }, > - { 0,0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_44_1k[] = { > - { 2520, 5880, 26250, 25000 }, > - { 2700, 5880, 28125, 25000 }, > - { 7425, 4704, 61875, 2 }, > - { 14850, 4704, 123750, 2 }, > - { 0,0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_48k[] = { > - { 2520, 6144, 25200, 24000 }, > - { 2700, 6144, 27000, 24000 }, > - { 7425, 6144, 74250, 24000 }, > - { 14850, 6144, 148500, 24000 }, > - { 0,0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_88_2k[] = { > - { 2520, 11760, 26250, 25000 }, > - { 2700, 11760, 28125, 25000 }, > - { 7425, 9408, 61875, 2 }, > - { 14850, 9408, 123750, 2 }, > - { 0, 0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_96k[] = { > - { 2520, 12288, 25200, 24000 }, > - { 2700, 12288, 27000, 24000 }, > - { 7425, 12288, 74250, 24000 }, > - { 14850, 12288, 148500, 24000 }, > - { 0, 0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_176_4k[] = { > - { 2520, 23520, 26250, 25000 }, > - { 2700, 23520, 28125, 25000 }, > - { 7425, 18816, 61875, 2 }, > - { 14850, 18816, 123750, 2 }, > - { 0, 0, 0, 0 }, > -}; > - > -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_192k[] = { > - { 2520, 24576, 25200, 24000 }, > - { 2700, 24576, 27000, 24000 }, > - { 7425, 24576, 74250, 24000 }, > - { 14850, 24576, 148500, 24000 }, > - { 0, 0, 0, 0 }, > -}; > - > static const struct tmds_config tegra20_tmds_config[] = { > { /* slow pixel clock modes */ > .pclk = 2700, > @@ -411,52 +355,49 @@ static const struct tmds_config tegra124_tmds_config[] > = { > }, > }; > > -static const struct tegra_hdmi_audio_config * > -tegra_hdmi_get_audio_config(unsigned int sample_rate, unsigned int pclk) > +static int > +tegra_hdmi_get_audio_config(unsigned int audio_freq, unsigned int pix_clock, > + struct tegra_hdmi_audio_config *config) > { > - const struct tegra_hdmi_audio_config *table; > - > - switch (sample_rate) { > - case 32000: > - table = tegra_hdmi_audio_32k; > - break; > - > - case 44100: > - table = tegra_hdmi_audio_44_1k; > - break; > - > - case 48000: > - table = tegra_hdmi_audio_48k; > - break; > - > - case 88200: > - table = tegra_hdmi_audio_88_2k; > - break; > - > - case 96000: > - table = tegra_hdmi_audio_96k; > - break; > - > - case 176400: > - table = tegra_hdmi_audio_176_4k; > - break; > - > - case 192000: > - table = tegra_hdmi_audio_192k; > - break; > - > - default: >
Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent
On 04/01/2019 12:48, Sabyasachi Gupta wrote: On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta wrote: On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta wrote: On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta wrote: Replace dma_alloc_coherent + memset with dma_zalloc_coherent If you're going to make this change, then how about change these to the managed version, so that we can avoid the explicit free'ing at driver removal? Signed-off-by: Sabyasachi Gupta Any comment on this patch? Any comment on this patch? Any comment on this patch? --- drivers/scsi/mvsas/mv_init.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index 3ac3437..495bddb 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct Scsi_Host *shost) /* * alloc and init our DMA areas */ - mvi->tx = dma_alloc_coherent(mvi->dev, + mvi->tx = dma_zalloc_coherent(mvi->dev, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ, >tx_dma, GFP_KERNEL); I'm guessing that this does not pass checkpatch with --strict option. Thanks, John if (!mvi->tx) goto err_out; - memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ); - mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ, + mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ, >rx_fis_dma, GFP_KERNEL); if (!mvi->rx_fis) goto err_out; - memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ); - mvi->rx = dma_alloc_coherent(mvi->dev, + mvi->rx = dma_zalloc_coherent(mvi->dev, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1), >rx_dma, GFP_KERNEL); if (!mvi->rx) goto err_out; - memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1)); mvi->rx[0] = cpu_to_le32(0xfff); mvi->rx_cons = 0xfff; - mvi->slot = dma_alloc_coherent(mvi->dev, + mvi->slot = dma_zalloc_coherent(mvi->dev, sizeof(*mvi->slot) * slot_nr, >slot_dma, GFP_KERNEL); if (!mvi->slot) goto err_out; - memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr); mvi->bulk_buffer = dma_alloc_coherent(mvi->dev, TRASH_BUCKET_SIZE, -- 2.7.4 .
[PATCH v2 2/2] arm64/sve: Disentangle from
Currently, provides common definitions for describing SVE context structures that are also used by the ptrace definitions in . For this reason, a #include of was added in ptrace.h, but it this turns out that this can interact badly with userspace code that tries to include ptrace.h on top of the libc headers (which may provide their own shadow definitions for sigcontext.h). To make the headers easier for userspace to consume, this patch bounces the common definitions into an __SVE_* namespace and moves them to a backend header that can be included by the other headers as appropriate. This should allow ptrace.h to be used alongside libc's sigcontext.h (if any) without ill effects. This should make the situation unambiguous: is the header to include for the sigframe-specific definitions, while is the header to include for ptrace-specific definitions. To avoid conflicting with existing usage, remains the canonical way to get the common definitions for SVE_VQ_MIN, sve_vq_from_vl() etc., both in userspace and in the kernel: relying on these being defined as a side effect of including just was never intended to be safe. Signed-off-by: Dave Martin --- Changes since v1: * Remove check for inappropriate inclusion of sve_context.h. A comment is added instead to document this usage restriction. * Include in sve_context.h for __u32 (thanks, kbuild test robot) Annoyingly, the __u32 cast is not really needed in expected use, but it is possible that userspace is accidentally relying on it. --- arch/arm64/include/uapi/asm/ptrace.h | 39 ++--- arch/arm64/include/uapi/asm/sigcontext.h | 56 +++ arch/arm64/include/uapi/asm/sve_context.h | 53 + 3 files changed, 99 insertions(+), 49 deletions(-) create mode 100644 arch/arm64/include/uapi/asm/sve_context.h diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 65ef8b0..cff79c5 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -23,7 +23,7 @@ #include #include -#include +#include /* @@ -130,9 +130,9 @@ struct user_sve_header { */ /* Offset from the start of struct user_sve_header to the register data */ -#define SVE_PT_REGS_OFFSET \ - ((sizeof(struct user_sve_header) + (SVE_VQ_BYTES - 1)) \ - / SVE_VQ_BYTES * SVE_VQ_BYTES) +#define SVE_PT_REGS_OFFSET \ + ((sizeof(struct user_sve_header) + (__SVE_VQ_BYTES - 1))\ + / __SVE_VQ_BYTES * __SVE_VQ_BYTES) /* * The register data content and layout depends on the value of the @@ -178,39 +178,36 @@ struct user_sve_header { * Additional data might be appended in the future. */ -#define SVE_PT_SVE_ZREG_SIZE(vq) SVE_SIG_ZREG_SIZE(vq) -#define SVE_PT_SVE_PREG_SIZE(vq) SVE_SIG_PREG_SIZE(vq) -#define SVE_PT_SVE_FFR_SIZE(vq)SVE_SIG_FFR_SIZE(vq) +#define SVE_PT_SVE_ZREG_SIZE(vq) __SVE_ZREG_SIZE(vq) +#define SVE_PT_SVE_PREG_SIZE(vq) __SVE_PREG_SIZE(vq) +#define SVE_PT_SVE_FFR_SIZE(vq)__SVE_FFR_SIZE(vq) #define SVE_PT_SVE_FPSR_SIZE sizeof(__u32) #define SVE_PT_SVE_FPCR_SIZE sizeof(__u32) -#define __SVE_SIG_TO_PT(offset) \ - ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET) - #define SVE_PT_SVE_OFFSET SVE_PT_REGS_OFFSET #define SVE_PT_SVE_ZREGS_OFFSET \ - __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET) + (SVE_PT_REGS_OFFSET + __SVE_ZREGS_OFFSET) #define SVE_PT_SVE_ZREG_OFFSET(vq, n) \ - __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n)) + (SVE_PT_REGS_OFFSET + __SVE_ZREG_OFFSET(vq, n)) #define SVE_PT_SVE_ZREGS_SIZE(vq) \ - (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) + (SVE_PT_SVE_ZREG_OFFSET(vq, __SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET) #define SVE_PT_SVE_PREGS_OFFSET(vq) \ - __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq)) + (SVE_PT_REGS_OFFSET + __SVE_PREGS_OFFSET(vq)) #define SVE_PT_SVE_PREG_OFFSET(vq, n) \ - __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n)) + (SVE_PT_REGS_OFFSET + __SVE_PREG_OFFSET(vq, n)) #define SVE_PT_SVE_PREGS_SIZE(vq) \ - (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \ + (SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \ SVE_PT_SVE_PREGS_OFFSET(vq)) #define SVE_PT_SVE_FFR_OFFSET(vq) \ - __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq)) + (SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq)) #define SVE_PT_SVE_FPSR_OFFSET(vq) \ ((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \ - (SVE_VQ_BYTES - 1)) \ - / SVE_VQ_BYTES * SVE_VQ_BYTES) + (__SVE_VQ_BYTES - 1)) \ + / __SVE_VQ_BYTES * __SVE_VQ_BYTES) #define
[PATCH v2 1/2] arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
SVE_PT_REGS_OFFSET is supposed to indicate the offset for skipping over the ptrace NT_ARM_SVE header (struct user_sve_header) to the start of the SVE register data proper. However, currently SVE_PT_REGS_OFFSET is defined in terms of struct sve_context, which is wrong: that structure describes the SVE header in the signal frame, not in the ptrace regset. This patch fixes the definition to use the ptrace header structure struct user_sve_header instead. By good fortune, the two structures are the same size anyway, so there is no functional or ABI change. Signed-off-by: Dave Martin --- arch/arm64/include/uapi/asm/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index a36227f..65ef8b0 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -131,7 +131,7 @@ struct user_sve_header { /* Offset from the start of struct user_sve_header to the register data */ #define SVE_PT_REGS_OFFSET \ - ((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1)) \ + ((sizeof(struct user_sve_header) + (SVE_VQ_BYTES - 1)) \ / SVE_VQ_BYTES * SVE_VQ_BYTES) /* -- 2.1.4
[PATCH v2 0/2] arm64/sve: UAPI: Disentangle ptrace.h from sigcontext.h
This patch refactors the UAPI header definitions for the Arm SVE extension to avoid multiple-definition problems when userspace mixes its own sigcontext.h definitions with the kernel's ptrace.h (which is apparently routine). A common backend header is created to hold common definitions, suitably namespaced. The extra header guard in sve_context.h was considered redundant and is now gone, so patch 1 (kbuild: install_headers.sh: Strip _UAPI from #if-defined() guards) from the previous version of the series has been dropeed. Dave Martin (2): arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition arm64/sve: Disentangle from arch/arm64/include/uapi/asm/ptrace.h | 39 ++--- arch/arm64/include/uapi/asm/sigcontext.h | 56 +++ arch/arm64/include/uapi/asm/sve_context.h | 53 + 3 files changed, 99 insertions(+), 49 deletions(-) create mode 100644 arch/arm64/include/uapi/asm/sve_context.h -- 2.1.4
Re: [PATCH v3] mm/page_owner: fix for deferred struct page init
On Thu 03-01-19 17:22:29, Qian Cai wrote: > On 1/3/19 3:22 PM, Michal Hocko wrote: > > On Thu 03-01-19 14:53:47, Qian Cai wrote: > >> On 1/3/19 2:07 PM, Michal Hocko wrote> So can we make the revert with an > >> explanation that the patch was wrong? > >>> If we want to make hacks to catch more objects to be tracked then it > >>> would be great to have some numbers in hands. > >> > >> Well, those numbers are subject to change depends on future start_kernel() > >> order. Right now, there are many functions could be caught earlier by page > >> owner. > >> > >>kmemleak_init(); > > [...] > >>sched_init_smp(); > > > > The kernel source dump will not tell us much of course. A ball park > > number whether we are talking about dozen, hundreds or thousands of > > allocations would tell us something at least, doesn't it. > > > > Handwaving that it might help us some is not particurarly useful. We are > > already losing some allocations already. Does it matter? Well, that > > depends, sometimes we do want to catch an owner of particular page and > > it is sad to find nothing. But how many times have you or somebody else > > encountered that in practice. That is exactly a useful information to > > judge an ugly ifdefery in the code. See my point? > > Here is the number without DEFERRED_STRUCT_PAGE_INIT. > > == page_ext_init() after page_alloc_init_late() == > Node 0, zone DMA: page owner found early allocated 0 pages > Node 0, zone DMA32: page owner found early allocated 7009 pages > Node 0, zone Normal: page owner found early allocated 85827 pages > Node 4, zone Normal: page owner found early allocated 75063 pages > > == page_ext_init() before kmemleak_init() == > Node 0, zone DMA: page owner found early allocated 0 pages > Node 0, zone DMA32: page owner found early allocated 6654 pages > Node 0, zone Normal: page owner found early allocated 41907 pages > Node 4, zone Normal: page owner found early allocated 41356 pages > > So, it told us that it will miss tens of thousands of early page allocation > call > sites. This is an answer for the first part of the question (how much). The second is _do_we_care_? -- Michal Hocko SUSE Labs
(Audio/Video): The Truth about Linux GPLv2 and license recission (revocation).
Information regarding the rights of the linux programmers, regarding rescission (revocation) of their granted license to use their code. Video: http://www.liveleak.com/view?t=9O5vz_1546606404 ( Audio Only: ) (Part1: http://www.liveleak.com/view?t=s3Sr9_1546605652 ) (Part2: http://www.liveleak.com/view?t=aOkfS_1546605889 )
Re: Column for keywords?
On Wed, May 16, 2018 at 6:50 PM Tetsuo Handa wrote: > > Dmitry Vyukov wrote: > > On Tue, May 15, 2018 at 10:59 PM, Tetsuo Handa > > wrote: > > > Dmitry Vyukov wrote: > > >> I've filed https://github.com/google/syzkaller/issues/608 to not lose > > >> track of this. > > > > > > Thanks. Since the time lag between a patch was proposed and that patch is > > > applied to a git tree tends to become long, duplicated works like > > > https://www.spinics.net/lists/linux-fsdevel/msg125240.html and > > > http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056...@i-love.sakura.ne.jp > > > are already occurring. > > > > This is bad. > > > > > Therefore, it is important that the state of the bug (e.g. > > > bisected, cause identified, patch proposed) is visible from the table. > > > > What do you think about the last section of: > > https://groups.google.com/d/msg/syzkaller-bugs/nw7BIW9V2wk/NE0P_Au4AQAJ > > there is already a way to say "there is a pending fix for this". > > That lacks a way to annotate "there is a pending fix for this, but the fix > is not yet applied to any git tree". I mean not only "git trees which syzbot > is checking" but also "git trees which are publicly visible". > > (Also, if we can later correct the patch using "#syz fix:" in case the patch > title was renamed, it is not clear how to specify multiple patches using > "#syz fix:" when a patch which meant to fix the reported problem contained > a regression or was incomplete and thus fixup patch followed shortly. An > example is commit 5f3e3b85cc0a5eae and commit ef95a90ae6f4f219 in > "WARNING: kmalloc bug in memdup_user (2)". I've tried > > #syz fix: RDMA/ucma: Correct option size check using optlen > #syz fix: RDMA/ucma: ucma_context reference leak in error path > > but only the former patch was recorded.) > > > > > But one problem with manual tagging is how to make everybody update > > these tags. If only few people do it, it can still lead to duplicate > > work. And it's not syzbot-specific. Can happen with just any bug > > report on kernel mailing lists. Traditionally it's solved with bug > > tracking systems and assigning bugs when a developer starts working on > > it. But kernel does not have a working bug tracker. > > > > One simple thing we can do is make syzbot poll more trees to discover > > Reported-by tags faster. This will automatically update status on > > dashboard to "fix pending". I've filed > > https://github.com/google/syzkaller/issues/610 for this. Ideally, we > > would intercept all mailed patches, but it's hard with kernel > > development process because there is no system that tracks all pending > > patches. > > > > The problem is that the pending fix won't be applied to any git tree. > It depends on when reviewers and maintainers can find time for > reviewing/committing the fix. Scanning all git trees unlikely helps. > > the criteria is that you are "reasonably sure that the commit will > reach upstream under this title", for whatever reason > > won't apply to not yet reviewed patches. What I want is a way to specify > "a patch was proposed but the patch is not yet reviewed/tested/applied". > > Generally, progresses are not recorded frequently enough to avoid duplicated > works. I want to check not only "fix pending" stage but also e.g. "problem > guessed", "bisected", "cause identified", "patch proposed", "patch reviewed" > stages from the top page's table. 1. This sounds very much like general bug tracking system. We specifically didn't want to go down the slippery slope of implementing yet another bug tracking system. 2. This problem is not specific to syzbot in any way (just like lost bug reports). Kernel developers waste time on duplicate work for other bug reports too. So I think (1) we need a bug tracking system, (2) use that system for syzbot to solve this local problem.
direct.c:undefined reference to `arch_dma_alloc'
Hi Christoph, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144 commit: 68c608345cc569bcfa1c1b2add4c00c343ecf933 swiotlb: remove dma_mark_clean date: 3 weeks ago config: ia64-zx1_defconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 8.1.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 68c608345cc569bcfa1c1b2add4c00c343ecf933 # save the attached .config to linux build tree GCC_VERSION=8.1.0 make.cross ARCH=ia64 All errors (new ones prefixed by >>): kernel/dma/mapping.o: In function `dma_common_get_sgtable': mapping.c:(.text+0x10d2): undefined reference to `arch_dma_coherent_to_pfn' kernel/dma/mapping.o: In function `dma_common_mmap': mapping.c:(.text+0x1452): undefined reference to `arch_dma_coherent_to_pfn' kernel/dma/direct.o: In function `dma_direct_alloc': >> direct.c:(.text+0x1162): undefined reference to `arch_dma_alloc' kernel/dma/direct.o: In function `dma_direct_free': >> direct.c:(.text+0x1352): undefined reference to `arch_dma_free' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[GIT PULL] Please pull powerpc/linux.git powerpc-4.21-2 tag
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Linus, Please pull some powerpc fixes for 4.21: The following changes since commit 96d4f267e40f9509e8a66e2b39e8b95655617693: Remove 'type' argument from access_ok() function (2019-01-03 18:57:57 -0800) are available in the git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.21-2 for you to fetch changes up to 074400a7be61250d9f0ccec07d5c35ffec4d8d22: powerpc: Drop use of 'type' from access_ok() (2019-01-04 23:07:59 +1100) - -- powerpc fixes for 4.21 #2 A fix for the recent access_ok() change, which broke the build. We recently added a use of type in order to squash a warning elsewhere about type being unused. A handful of other minor build fixes, and one defconfig update. Thanks to: Christian Lamparter, Christophe Leroy, Diana Craciun, Mathieu Malaterre. - -- Christian Lamparter (1): powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage Diana Craciun (1): powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup' Mathieu Malaterre (1): powerpc: Drop use of 'type' from access_ok() Michael Ellerman (4): powerpc/4xx/ocm: Fix phys_addr_t printf warnings powerpc/configs: Add PPC4xx_OCM to ppc40x_defconfig KVM: PPC: Book3S HV: radix: Fix uninitialized var build error Merge branch 'master' into fixes arch/powerpc/configs/ppc40x_defconfig | 1 + arch/powerpc/include/asm/uaccess.h | 2 +- arch/powerpc/kernel/head_booke.h | 18 -- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/powerpc/platforms/4xx/ocm.c | 10 +- 5 files changed, 20 insertions(+), 13 deletions(-) -BEGIN PGP SIGNATURE- iQIcBAEBAgAGBQJcL1eOAAoJEFHr6jzI4aWA86wP/3V5zlUNtsyXpcCx60PNIMya 6z6oWVKoM92nMIyfWuoxKBEbylSyixFSxOsKoiVJDyigW5WWNjsPQ1KOUD2OVn0V OQhj2aDXwUtxdlQ5dK0H9lodsUKUEw1/saqcgi2OS8iDDa8kJIEemDe29Omf5DyE U2RKl++nhAfK8BLfMRCAIqL+TVZq9Zy+zNwbVtsmj7lAlw+Csb4KV3YvlJ7b6pjD Ntm6O3X8sy7GtJdMpeBDOMTbWuvIQpefVH135jS1IjJZbrafZK6a4gQ9cs5mx68d +/nyzkcjy83eRGkv3QY5Y+vEa0wkFdBb6Nf0MH63xCPVYVjom6pRUGL+RPldi2GO bmieDXavlicX9y+BCQSKASTzNo2uR2heF31aCnr6bsOSeT8ZNGbIwgHuGGqrzN+8 qLQb00hA0uKlbNgPpNoyYO5K3khntR348HfH0Uwlm3Nttvn1TEh87c9L19FcdlMw SlA6kpJ6328FkEv5FUPu7v7764dPgOQNImPKnsYdhbUxhHnDMx33LzsUEQQcYWpM 3nsKQMW1yqMH2BG8u9LQtZErFanX+RNm81OsP38F8nvGe1RJkoq7R7MSooZ9tzs2 Ymf8W+KNofKmXVAPh7AyKQzpy0a/sEqAXmtTPBf7bsdwU5PQB0+LMf0yzyZpEp2U 3eFcseqDoO32ScDZQUYt =Y+yv -END PGP SIGNATURE-
Re: Remove 'type' argument from access_ok() function
On Fri, 2019-01-04 at 09:27:58 UTC, Mathieu Malaterre wrote: > In commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with > access_ok()") an attempt was made to remove a warning by referencing the > variable `type`, however in commit 96d4f267e40f ("Remove 'type' argument > from access_ok() function") the variable `type` has been removed. > > Revert commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with > access_ok()") to fix the following compilation error: > > arch/powerpc/include/asm/uaccess.h:66:32: error: âÂÂtypeâ > undeclared (first use in this function) > > Signed-off-by: Mathieu Malaterre Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/074400a7be61250d9f0ccec07d5c35 cheers
[REGRESSION, BISECTED] pci: nvme device with HMB fails on arm64
Hello Christoph, As I have mentioned to you after Xmas, I think your dma-mapping series breaks NVMe drivers that use HMB on arm64 (RK3399 NanoPC T4 board with Toshiba RC100 SSD in this case). The observed behaviour is that the modprobe of the nvme module will hang due to a kernel crash, which I've sent a patch for to the mm subsystem ("mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap"), but that behaviour is triggered by the failure of the NVMe drive to access host memory buffers (error 24578, flags 0x1). Now that your series has been merged into Linus' tree, I've bisected it to this: bfd56cd605219d90b210a5377fca31a644efe95c is the first bad commit commit bfd56cd605219d90b210a5377fca31a644efe95c Author: Christoph Hellwig Date: Sun Nov 4 17:38:39 2018 +0100 dma-mapping: support highmem in the generic remap allocator By using __dma_direct_alloc_pages we can deal entirely with struct page instead of having to derive a kernel virtual address. Signed-off-by: Christoph Hellwig Reviewed-by: Robin Murphy :04 04 565ae62f55f04c11da2471bd59d1b0328273992d 40047c9ecf715f6f7e8293b335c1f16dd511a0e0 M kernel Note that the bisect was done on the mainline tree without any additional patches, as applying those makes the bisect process think that the culprit is the merging of the tracing changes. I haven't tried to revert the patch as it is clear that the following patch depends on it and also because during the bisect process one of the steps generated a kernel that failed to boot as it was missing your patch 9ab91e7c5c51 ("arm64: default to the direct mapping in get_arch_dma_ops"). I've marked that step as bad, as it was related to the series, but I might have been wrong there. The full bisect log is this: $ git bisect log git bisect start # good: [00c569b567c7f1f0da6162868fd02a9f29411805] Merge tag 'locks-v4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux git bisect good 00c569b567c7f1f0da6162868fd02a9f29411805 # bad: [645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect bad 645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144 # bad: [02061181d3a9ccfe15ef6bc15fa56283acc47620] Merge tag 'staging-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 02061181d3a9ccfe15ef6bc15fa56283acc47620 # bad: [f346b0becb1bc62e45495f9cdbae3eef35d0b635] Merge branch 'akpm' (patches from Andrew) git bisect bad f346b0becb1bc62e45495f9cdbae3eef35d0b635 # bad: [938edb8a31b976c9a92eb0cd4ff481e93f76c1f1] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect bad 938edb8a31b976c9a92eb0cd4ff481e93f76c1f1 # good: [00203ba40d40d7f33857416adfb18adaf0e40123] kyber: use sbitmap add_wait_queue/list_del wait helpers git bisect good 00203ba40d40d7f33857416adfb18adaf0e40123 # good: [735bcc77e6ba83e464665cea9041072190ede37e] scsi: hisi_sas: Fix warnings detected by sparse git bisect good 735bcc77e6ba83e464665cea9041072190ede37e # bad: [af7ddd8a627c62a835524b3f5b471edbbbcce025] Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping git bisect bad af7ddd8a627c62a835524b3f5b471edbbbcce025 # bad: [8ddbe5943c0b1259b5ddb6dc1729863433fc256c] dma-mapping: move dma_cache_sync out of line git bisect bad 8ddbe5943c0b1259b5ddb6dc1729863433fc256c # bad: [887712a0a5b31e0cf28087f6445de431b4722e52] x86/calgary: remove the mapping_error dma_map_ops method git bisect bad 887712a0a5b31e0cf28087f6445de431b4722e52 # bad: [b0cbeae4944924640bf550b75487729a20204c14] dma-direct: remove the mapping_error dma_map_ops method git bisect bad b0cbeae4944924640bf550b75487729a20204c14 # bad: [bfd56cd605219d90b210a5377fca31a644efe95c] dma-mapping: support highmem in the generic remap allocator git bisect bad bfd56cd605219d90b210a5377fca31a644efe95c # good: [704f2c20eaa566f6906e8812b6e2115889bd753d] dma-direct: reject highmem pages from dma_alloc_from_contiguous git bisect good 704f2c20eaa566f6906e8812b6e2115889bd753d # good: [0c3b3171ceccb8830c2bb5adff1b4e9b204c1450] dma-mapping: move the arm64 noncoherent alloc/free support to common code git bisect good 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450 # first bad commit: [bfd56cd605219d90b210a5377fca31a644efe95c] dma-mapping: support highmem in the generic remap allocator Does anyone have any suggestions on what I might try as a fix? Best regards, Liviu -- ||___ \ | With enough courage, you can do without a reputation | / \ | -- Rhett Butler | / / || \ /__) (_\
Re: [PATCH] perf stat: Poll for monitored tasks being alive in fork mode
On Fri, Jan 04, 2019 at 10:28:17AM +0800, Jin Yao wrote: > Following test shows the stat keeps running even if no longer > task to monitor (mgen exits at ~5s). > > perf stat -e cycles -p `pgrep mgen` -I1000 -- sleep 10 > time counts unit events > 1.000148916 1,308,365,864 cycles > 2.000379171 1,297,269,875 cycles > 3.000556719 1,297,187,078 cycles > 4.000914241761,261,827 cycles > 5.001306091cycles > 6.001676881cycles > 7.002046336cycles > 8.002405651cycles > 9.002766625cycles > 10.001395827cycles > > We'd better finish stat immediately if there's no longer task to > monitor. > > After: > > perf stat -e cycles -p `pgrep mgen` -I1000 -- sleep 10 > time counts unit events > 1.000180062 1,236,592,661 cycles > 2.000421539 1,223,733,572 cycles > 3.000609910 1,297,047,663 cycles > 4.000807545 1,297,215,816 cycles > 5.001001578 1,297,208,032 cycles > 6.001390345582,343,659 cycles > sleep: Terminated > > Now the stat exits immediately when the monitored tasks ends. > > Signed-off-by: Jin Yao > --- > tools/perf/builtin-stat.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index 63a3afc..71f3bc8 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -553,6 +553,13 @@ static int __run_perf_stat(int argc, const char **argv, > int run_idx) > > if (interval || timeout) { > while (!waitpid(child_pid, , WNOHANG)) { > + if (!is_target_alive(, > + evsel_list->threads) && > + (child_pid != -1)) { do we need that child_pid check? we just returned from waitpid so we should be ok.. we just make the race window smaller could we just do: if (!is_target_alive(, evsel_list->threads)) { kill(child_pid, SIGTERM); break; } also I'm not sure we should do this only under new option, as it might break people's scripts.. thoughts? jirka > + kill(child_pid, SIGTERM); > + break; > + } > + > nanosleep(, NULL); > if (timeout) > break; > -- > 2.7.4 >
[PATCH 25/25] mm, compaction: Do not direct compact remote memory
Remote compaction is expensive and possibly counter-productive. Locality is expected to often have better performance characteristics than remote high-order pages. For small allocations, it's expected that locality is generally required or fallbacks are possible. For larger allocations such as THP, they are forbidden at the time of writing but if __GFP_THISNODE is ever removed, then it would still be preferable to fallback to small local base pages over remote THP in the general case. kcompactd is still woken via kswapd so compaction happens eventually. While this patch potentially has both positive and negative effects, it is best to avoid the possibility of remote compaction given the cost relative to any potential benefit. Signed-off-by: Mel Gorman --- mm/compaction.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index ae70be023b21..cc17f0c01811 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2348,6 +2348,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, continue; } + /* +* Do not compact remote memory. It's expensive and high-order +* small allocations are expected to prefer or require local +* memory. Similarly, larger requests such as THP can fallback +* to base pages in preference to remote huge pages if +* __GFP_THISNODE is not specified +*/ + if (zone_to_nid(zone) != zone_to_nid(ac->preferred_zoneref->zone)) + continue; + status = compact_zone_order(zone, order, gfp_mask, prio, alloc_flags, ac_classzone_idx(ac), capture); rc = max(status, rc); -- 2.16.4
[PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target
As compaction proceeds and creates high-order blocks, the free list search gets less efficient as the larger blocks are used as compaction targets. Eventually, the larger blocks will be behind the migration scanner for partially migrated pageblocks and the search fails. This patch round-robins what orders are searched so that larger blocks can be ignored and find smaller blocks that can be used as migration targets. The overall impact was small on 1-socket but it avoids corner cases where the migration/free scanners meet prematurely or situations where many of the pageblocks encountered by the free scanner are almost full instead of being properly packed. Previous testing had indicated that without this patch there were occasional large spikes in the free scanner without this patch. By co-incidence, the 2-socket results showed a 54% reduction in the free scanner but will not be universally true. Signed-off-by: Mel Gorman --- mm/compaction.c | 33 ++--- mm/internal.h | 3 ++- 2 files changed, 32 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 6c5552c6d8f9..652e249168b1 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1154,6 +1154,24 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long set_pageblock_skip(page); } +/* Search orders in round-robin fashion */ +static int next_search_order(struct compact_control *cc, int order) +{ + order--; + if (order < 0) + order = cc->order - 1; + + /* Search wrapped around? */ + if (order == cc->search_order) { + cc->search_order--; + if (cc->search_order < 0) + cc->search_order = cc->order - 1; + return -1; + } + + return order; +} + static unsigned long fast_isolate_freepages(struct compact_control *cc) { @@ -1186,9 +1204,15 @@ fast_isolate_freepages(struct compact_control *cc) if (WARN_ON_ONCE(min_pfn > low_pfn)) low_pfn = min_pfn; - for (order = cc->order - 1; -order >= 0 && !page; -order--) { + /* +* Search starts from the last successful isolation order or the next +* order to search after a previous failure +*/ + cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order); + + for (order = cc->search_order; +!page && order >= 0; +order = next_search_order(cc, order)) { struct free_area *area = >zone->free_area[order]; struct list_head *freelist; struct page *freepage; @@ -1211,6 +1235,7 @@ fast_isolate_freepages(struct compact_control *cc) if (pfn >= low_pfn) { cc->fast_search_fail = 0; + cc->search_order = order; page = freepage; break; } @@ -2146,6 +2171,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, .total_migrate_scanned = 0, .total_free_scanned = 0, .order = order, + .search_order = order, .gfp_mask = gfp_mask, .zone = zone, .mode = (prio == COMPACT_PRIO_ASYNC) ? @@ -2385,6 +2411,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) struct zone *zone; struct compact_control cc = { .order = pgdat->kcompactd_max_order, + .search_order = pgdat->kcompactd_max_order, .total_migrate_scanned = 0, .total_free_scanned = 0, .classzone_idx = pgdat->kcompactd_classzone_idx, diff --git a/mm/internal.h b/mm/internal.h index e5ca2a10b8ad..d028abd8a8f3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -191,7 +191,8 @@ struct compact_control { struct zone *zone; unsigned long total_migrate_scanned; unsigned long total_free_scanned; - unsigned int fast_search_fail; /* failures to use free list searches */ + unsigned short fast_search_fail;/* failures to use free list searches */ + unsigned short search_order;/* order to start a fast search at */ const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype;/* migratetype of direct compactor */ -- 2.16.4
[PATCH 24/25] mm, compaction: Capture a page under direct compaction
Compaction is inherently race-prone as a suitable page freed during compaction can be allocated by any parallel task. This patch uses a capture_control structure to isolate a page immediately when it is freed by a direct compactor in the slow path of the page allocator. The intent is to avoid redundant scanning. 4.20.0 4.20.0 selective-v2r15 capture-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 2624.85 ( 0.00%) 2594.49 ( 1.16%) Amean fault-both-5 3842.66 ( 0.00%) 4088.32 ( -6.39%) Amean fault-both-7 5459.47 ( 0.00%) 5936.54 ( -8.74%) Amean fault-both-12 9276.60 ( 0.00%)10160.85 ( -9.53%) Amean fault-both-1814030.73 ( 0.00%)13908.92 ( 0.87%) Amean fault-both-2413298.10 ( 0.00%)16819.86 * -26.48%* Amean fault-both-3017648.62 ( 0.00%)17901.74 ( -1.43%) Amean fault-both-3219161.67 ( 0.00%)18621.32 ( 2.82%) Latency is only moderately affected but the devil is in the details. A closer examination indicates that base page fault latency is much reduced but latency of huge pages is increased as it takes creater care to succeed. Part of the "problem" is that allocation success rates are close to 100% even when under pressure and compaction gets harder 4.20.0 4.20.0 selective-v2r15 capture-v2r15 Percentage huge-1 0.00 ( 0.00%)0.00 ( 0.00%) Percentage huge-399.95 ( 0.00%) 99.98 ( 0.03%) Percentage huge-598.83 ( 0.00%) 98.01 ( -0.84%) Percentage huge-796.78 ( 0.00%) 98.30 ( 1.58%) Percentage huge-12 98.85 ( 0.00%) 97.76 ( -1.10%) Percentage huge-18 97.52 ( 0.00%) 99.05 ( 1.57%) Percentage huge-24 97.07 ( 0.00%) 99.34 ( 2.35%) Percentage huge-30 96.59 ( 0.00%) 99.08 ( 2.58%) Percentage huge-32 95.94 ( 0.00%) 99.03 ( 3.22%) And scan rates are reduced as expected by 10% for the migration scanner and 37% for the free scanner indicating that there is less redundant work. Compaction migrate scanned20338945.0018133661.00 Compaction free scanned 12590377.00 7986174.00 The impact on 2-socket is much larger albeit not presented. Under a different workload that fragments heavily, the allocation latency is reduced by 26% while the success rate goes from 63% to 80% Signed-off-by: Mel Gorman --- include/linux/compaction.h | 3 ++- include/linux/sched.h | 4 kernel/sched/core.c| 3 +++ mm/compaction.c| 31 +++-- mm/internal.h | 9 +++ mm/page_alloc.c| 58 ++ 6 files changed, 96 insertions(+), 12 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 68250a57aace..b0d530cf46d1 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -95,7 +95,8 @@ extern int sysctl_compact_unevictable_allowed; extern int fragmentation_index(struct zone *zone, unsigned int order); extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, - const struct alloc_context *ac, enum compact_priority prio); + const struct alloc_context *ac, enum compact_priority prio, + struct page **page); extern void reset_isolation_suitable(pg_data_t *pgdat); extern enum compact_result compaction_suitable(struct zone *zone, int order, unsigned int alloc_flags, int classzone_idx); diff --git a/include/linux/sched.h b/include/linux/sched.h index 89541d248893..f5ac0cf9cc32 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -47,6 +47,7 @@ struct pid_namespace; struct pipe_inode_info; struct rcu_node; struct reclaim_state; +struct capture_control; struct robust_list_head; struct sched_attr; struct sched_param; @@ -964,6 +965,9 @@ struct task_struct { struct io_context *io_context; +#ifdef CONFIG_COMPACTION + struct capture_control *capture_control; +#endif /* Ptrace state: */ unsigned long ptrace_message; kernel_siginfo_t*last_siginfo; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f66920173370..ef478b0daa45 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) INIT_HLIST_HEAD(>preempt_notifiers); #endif +#ifdef CONFIG_COMPACTION + p->capture_control = NULL; +#endif init_numa_balancing(clone_flags, p); } diff --git a/mm/compaction.c b/mm/compaction.c index 7f316e1a7275..ae70be023b21 100644
[PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints
Pageblock hints are cleared when compaction restarts or kswapd makes enough progress that it can sleep but it's over-eager in that the bit is cleared for migration sources with no LRU pages and migration targets with no free pages. As pageblock skip hint flushes are relatively rare and out-of-band with respect to kswapd, this patch makes a few more expensive checks to see if it's appropriate to even clear the bit. Every pageblock that is not cleared will avoid 512 pages being scanned unnecessarily on x86-64. The impact is variable with different workloads showing small differences in latency, success rates and scan rates. This is expected as clearing the hints is not that common but doing a small amount of work out-of-band to avoid a large amount of work in-band later is generally a good thing. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 2 + mm/compaction.c| 119 + 2 files changed, 102 insertions(+), 19 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cc4a507d7ca4..faa1e6523f49 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -480,6 +480,8 @@ struct zone { unsigned long compact_cached_free_pfn; /* pfn where async and sync compaction migration scanner should start */ unsigned long compact_cached_migrate_pfn[2]; + unsigned long compact_init_migrate_pfn; + unsigned long compact_init_free_pfn; #endif #ifdef CONFIG_COMPACTION diff --git a/mm/compaction.c b/mm/compaction.c index cc532e81a7b7..7f316e1a7275 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -231,6 +231,62 @@ static bool pageblock_skip_persistent(struct page *page) return false; } +static bool +__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source, + bool check_target) +{ + struct page *page = pfn_to_online_page(pfn); + struct page *end_page; + + if (!page) + return false; + if (zone != page_zone(page)) + return false; + if (pageblock_skip_persistent(page)) + return false; + + /* +* If skip is already cleared do no further checking once the +* restart points have been set. +*/ + if (check_source && check_target && !get_pageblock_skip(page)) + return true; + + /* +* If clearing skip for the target scanner, do not select a +* non-movable pageblock as the starting point. +*/ + if (!check_source && check_target && + get_pageblock_migratetype(page) != MIGRATE_MOVABLE) + return false; + + /* +* Only clear the hint if a sample indicates there is either a +* free page or an LRU page in the block. One or other condition +* is necessary for the block to be a migration source/target. +*/ + page = pfn_to_page(pageblock_start_pfn(pfn)); + if (zone != page_zone(page)) + return false; + end_page = page + pageblock_nr_pages; + + do { + if (check_source && PageLRU(page)) { + clear_pageblock_skip(page); + return true; + } + + if (check_target && PageBuddy(page)) { + clear_pageblock_skip(page); + return true; + } + + page += (1 << PAGE_ALLOC_COSTLY_ORDER); + } while (page < end_page); + + return false; +} + /* * This function is called to clear all cached information on pageblocks that * should be skipped for page isolation when the migrate and free page scanner @@ -238,30 +294,54 @@ static bool pageblock_skip_persistent(struct page *page) */ static void __reset_isolation_suitable(struct zone *zone) { - unsigned long start_pfn = zone->zone_start_pfn; - unsigned long end_pfn = zone_end_pfn(zone); - unsigned long pfn; + unsigned long migrate_pfn = zone->zone_start_pfn; + unsigned long free_pfn = zone_end_pfn(zone); + unsigned long reset_migrate = free_pfn; + unsigned long reset_free = migrate_pfn; + bool source_set = false; + bool free_set = false; - zone->compact_blockskip_flush = false; + if (!zone->compact_blockskip_flush) + return; - /* Walk the zone and mark every pageblock as suitable for isolation */ - for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { - struct page *page; + zone->compact_blockskip_flush = false; + /* +* Walk the zone and update pageblock skip information. Source looks +* for PageLRU while target looks for PageBuddy. When the scanner +* is found, both PageBuddy and PageLRU are checked as the pageblock +* is suitable as both source and target. +*/ +
[PATCH 22/25] mm, compaction: Sample pageblocks for free pages
Once fast searching finishes, there is a possibility that the linear scanner is scanning full blocks found by the fast scanner earlier. This patch uses an adaptive stride to sample pageblocks for free pages. The more consecutive full pageblocks encountered, the larger the stride until a pageblock with free pages is found. The scanners might meet slightly sooner but it is an acceptable risk given that the search of the free lists may still encounter the pages and adjust the cached PFN of the free scanner accordingly. In terms of latency and success rates, the impact is not obvious but the free scan rate is reduced by 87% on a 1-socket machine and 92% on a 2-socket machine. It's also the first time in the series where the number of pages scanned by the migration scanner is greater than the free scanner due to the increased search efficiency. Signed-off-by: Mel Gorman --- mm/compaction.c | 27 +-- 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 652e249168b1..cc532e81a7b7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -441,6 +441,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, unsigned long *start_pfn, unsigned long end_pfn, struct list_head *freelist, + unsigned int stride, bool strict) { int nr_scanned = 0, total_isolated = 0; @@ -450,10 +451,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, unsigned long blockpfn = *start_pfn; unsigned int order; + /* Strict mode is for isolation, speed is secondary */ + if (strict) + stride = 1; + cursor = pfn_to_page(blockpfn); /* Isolate free pages. */ - for (; blockpfn < end_pfn; blockpfn++, cursor++) { + for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) { int isolated; struct page *page = cursor; @@ -624,7 +629,7 @@ isolate_freepages_range(struct compact_control *cc, break; isolated = isolate_freepages_block(cc, _start_pfn, - block_end_pfn, , true); + block_end_pfn, , 0, true); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -1139,7 +1144,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long /* Scan before */ if (start_pfn != pfn) { - isolate_freepages_block(cc, _pfn, pfn, >freepages, false); + isolate_freepages_block(cc, _pfn, pfn, >freepages, 1, false); if (cc->nr_freepages >= cc->nr_migratepages) return; } @@ -1147,7 +1152,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long /* Scan after */ start_pfn = pfn + nr_isolated; if (start_pfn != end_pfn) - isolate_freepages_block(cc, _pfn, end_pfn, >freepages, false); + isolate_freepages_block(cc, _pfn, end_pfn, >freepages, 1, false); /* Skip this pageblock in the future as it's full or nearly full */ if (cc->nr_freepages < cc->nr_migratepages) @@ -1333,7 +1338,9 @@ static void isolate_freepages(struct compact_control *cc) unsigned long isolate_start_pfn; /* exact pfn we start at */ unsigned long block_end_pfn;/* end of current pageblock */ unsigned long low_pfn; /* lowest pfn scanner is able to scan */ + unsigned long nr_isolated; struct list_head *freelist = >freepages; + unsigned int stride; /* Try a small search of the free lists for a candidate */ isolate_start_pfn = fast_isolate_freepages(cc); @@ -1356,6 +1363,7 @@ static void isolate_freepages(struct compact_control *cc) block_end_pfn = min(block_start_pfn + pageblock_nr_pages, zone_end_pfn(zone)); low_pfn = pageblock_end_pfn(cc->migrate_pfn); + stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1; /* * Isolate free pages until enough are available to migrate the @@ -1387,8 +1395,8 @@ static void isolate_freepages(struct compact_control *cc) continue; /* Found a block suitable for isolating free pages from. */ - isolate_freepages_block(cc, _start_pfn, block_end_pfn, - freelist, false); + nr_isolated = isolate_freepages_block(cc, _start_pfn, + block_end_pfn, freelist, stride, false); /* Update the skip hint if the full pageblock was scanned */ if (isolate_start_pfn == block_end_pfn) @@
[PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner
The fast isolation of pages can move the scanner faster than is necessary depending on the contents of the free list. This patch will only allow the fast isolation to initialise the scanner and advance it slowly. The primary means of moving the scanner forward is via the linear scanner to reduce the likelihood the migration source/target scanners meet prematurely triggering a rescan. 4.20.0 4.20.0 noresched-v2r15 slowfree-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 2736.50 ( 0.00%) 2512.53 ( 8.18%) Amean fault-both-5 4133.70 ( 0.00%) 4159.43 ( -0.62%) Amean fault-both-7 5738.61 ( 0.00%) 5950.15 ( -3.69%) Amean fault-both-12 9392.82 ( 0.00%) 8674.38 ( 7.65%) Amean fault-both-1813257.15 ( 0.00%)12850.79 ( 3.07%) Amean fault-both-2416859.44 ( 0.00%)17242.86 ( -2.27%) Amean fault-both-3016249.30 ( 0.00%)19404.18 * -19.42%* Amean fault-both-3214904.71 ( 0.00%)16200.79 ( -8.70%) The impact to latency, success rates and scan rates is marginal but avoiding unnecessary restarts is important. It helps later patches that are more careful about how pageblocks are treated as earlier iterations of those patches hit corner cases where the restarts were punishing and very visible. Signed-off-by: Mel Gorman --- mm/compaction.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 75eb0d40d4d7..6c5552c6d8f9 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -324,10 +324,9 @@ static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) * future. The information is later cleared by __reset_isolation_suitable(). */ static void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated) + struct page *page, unsigned long pfn) { struct zone *zone = cc->zone; - unsigned long pfn; if (cc->no_set_skip_hint) return; @@ -335,13 +334,8 @@ static void update_pageblock_skip(struct compact_control *cc, if (!page) return; - if (nr_isolated) - return; - set_pageblock_skip(page); - pfn = page_to_pfn(page); - /* Update where async and sync compaction should restart */ if (pfn < zone->compact_cached_free_pfn) zone->compact_cached_free_pfn = pfn; @@ -359,7 +353,7 @@ static inline bool pageblock_skip_persistent(struct page *page) } static inline void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated) + struct page *page, unsigned long pfn) { } @@ -450,7 +444,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, bool strict) { int nr_scanned = 0, total_isolated = 0; - struct page *cursor, *valid_page = NULL; + struct page *cursor; unsigned long flags = 0; bool locked = false; unsigned long blockpfn = *start_pfn; @@ -477,9 +471,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (!pfn_valid_within(blockpfn)) goto isolate_fail; - if (!valid_page) - valid_page = page; - /* * For compound pages such as THP and hugetlbfs, we can save * potentially a lot of iterations if we skip them at once. @@ -576,10 +567,6 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (strict && blockpfn < end_pfn) total_isolated = 0; - /* Update the pageblock-skip if the whole pageblock was scanned */ - if (blockpfn == end_pfn) - update_pageblock_skip(cc, valid_page, total_isolated); - cc->total_free_scanned += nr_scanned; if (total_isolated) count_compact_events(COMPACTISOLATED, total_isolated); @@ -1295,8 +1282,10 @@ fast_isolate_freepages(struct compact_control *cc) } } - if (highest && highest > cc->zone->compact_cached_free_pfn) + if (highest && highest >= cc->zone->compact_cached_free_pfn) { + highest -= pageblock_nr_pages; cc->zone->compact_cached_free_pfn = highest; + } cc->total_free_scanned += nr_scanned; if (!page) @@ -1376,6 +1365,10 @@ static void isolate_freepages(struct compact_control *cc) isolate_freepages_block(cc, _start_pfn, block_end_pfn, freelist, false); + /* Update the skip hint if the full pageblock was scanned */ +
[PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention
Scanning on large machines can take a considerable length of time and eventually need to be rescheduled. This is treated as an abort event but that's not appropriate as the attempt is likely to be retried after making numerous checks and taking another cycle through the page allocator. This patch will check the need to reschedule if necessary but continue the scanning. The main benefit is reduced scanning when compaction is taking a long time or the machine is over-saturated. It also avoids an unnecessary exit of compaction that ends up being retried by the page allocator in the outer loop. 4.20.0 4.20.0 synccached-v2r15noresched-v2r15 Amean fault-both-3 2655.55 ( 0.00%) 2736.50 ( -3.05%) Amean fault-both-5 4580.67 ( 0.00%) 4133.70 ( 9.76%) Amean fault-both-7 5740.50 ( 0.00%) 5738.61 ( 0.03%) Amean fault-both-12 9237.55 ( 0.00%) 9392.82 ( -1.68%) Amean fault-both-1812899.51 ( 0.00%)13257.15 ( -2.77%) Amean fault-both-2416342.47 ( 0.00%)16859.44 ( -3.16%) Amean fault-both-3020394.26 ( 0.00%)16249.30 * 20.32%* Amean fault-both-3217450.76 ( 0.00%)14904.71 * 14.59%* Signed-off-by: Mel Gorman --- mm/compaction.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 1a41a2dbff24..75eb0d40d4d7 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -398,19 +398,11 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -/* - * Aside from avoiding lock contention, compaction also periodically checks - * need_resched() and records async compaction as contended if necessary. - */ +/* Avoid soft-lockups due to long scan times */ static inline void compact_check_resched(struct compact_control *cc) { - /* async compaction aborts if contended */ - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - + if (need_resched()) cond_resched(); - } } /* -- 2.16.4
[PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched
With incremental changes, compact_should_abort no longer makes any documented sense. Rename to compact_check_resched and update the associated comments. There is no benefit other than reducing redundant code and making the intent slightly clearer. It could potentially be merged with earlier patches but it just makes the review slightly harder. Signed-off-by: Mel Gorman --- mm/compaction.c | 61 ++--- 1 file changed, 23 insertions(+), 38 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index be27e4fa1b40..1a41a2dbff24 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -398,6 +398,21 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } +/* + * Aside from avoiding lock contention, compaction also periodically checks + * need_resched() and records async compaction as contended if necessary. + */ +static inline void compact_check_resched(struct compact_control *cc) +{ + /* async compaction aborts if contended */ + if (need_resched()) { + if (cc->mode == MIGRATE_ASYNC) + cc->contended = true; + + cond_resched(); + } +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -426,33 +441,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock, return true; } - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - cond_resched(); - } - - return false; -} - -/* - * Aside from avoiding lock contention, compaction also periodically checks - * need_resched() and either schedules in sync compaction or aborts async - * compaction. This is similar to what compact_unlock_should_abort() does, but - * is used where no lock is concerned. - * - * Returns false when no scheduling was needed, or sync compaction scheduled. - * Returns true when async compaction should abort. - */ -static inline bool compact_should_abort(struct compact_control *cc) -{ - /* async compaction aborts if contended */ - if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) - cc->contended = true; - - cond_resched(); - } + compact_check_resched(cc); return false; } @@ -750,8 +739,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, return 0; } - if (compact_should_abort(cc)) - return 0; + compact_check_resched(cc); if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) { skip_on_failure = true; @@ -1374,12 +1362,10 @@ static void isolate_freepages(struct compact_control *cc) isolate_start_pfn = block_start_pfn) { /* * This can iterate a massively long zone without finding any -* suitable migration targets, so periodically check if we need -* to schedule, or even abort async compaction. +* suitable migration targets, so periodically check resched. */ - if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)) - && compact_should_abort(cc)) - break; + if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) + compact_check_resched(cc); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); @@ -1673,11 +1659,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* * This can potentially iterate a massively long zone with * many pageblocks unsuitable, so periodically check if we -* need to schedule, or even abort async compaction. +* need to schedule. */ - if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)) - && compact_should_abort(cc)) - break; + if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))) + compact_check_resched(cc); page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, zone); -- 2.16.4
[PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks
Migrate has separate cached PFNs for ASYNC and SYNC* migration on the basis that some migrations will fail in ASYNC mode. However, if the cached PFNs match at the start of scanning and pageblocks are skipped due to having no isolation candidates, then the sync state does not matter. This patch keeps matching cached PFNs in sync until a pageblock with isolation candidates is found. The actual benefit is marginal given that the sync scanner following the async scanner will often skip a number of pageblocks but it's useless work. Any benefit depends heavily on whether the scanners restarted recently so overall the reduction in scan rates is a mere 2.8% which is borderline noise. Signed-off-by: Mel Gorman --- mm/compaction.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 921720f7a416..be27e4fa1b40 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1967,6 +1967,7 @@ static enum compact_result compact_zone(struct compact_control *cc) unsigned long end_pfn = zone_end_pfn(cc->zone); unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; + bool update_cached; unsigned long a, b, c; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); @@ -2019,6 +2020,17 @@ static enum compact_result compact_zone(struct compact_control *cc) last_migrated_pfn = 0; + /* +* Migrate has separate cached PFNs for ASYNC and SYNC* migration on +* the basis that some migrations will fail in ASYNC mode. However, +* if the cached PFNs match and pageblocks are skipped due to having +* no isolation candidates, then the sync state does not matter. +* Until a pageblock with isolation candidates is found, keep the +* cached PFNs in sync to avoid revisiting the same blocks. +*/ + update_cached = !sync && + cc->zone->compact_cached_migrate_pfn[0] == cc->zone->compact_cached_migrate_pfn[1]; + trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); @@ -2050,6 +2062,11 @@ static enum compact_result compact_zone(struct compact_control *cc) last_migrated_pfn = 0; goto out; case ISOLATE_NONE: + if (update_cached) { + cc->zone->compact_cached_migrate_pfn[1] = + cc->zone->compact_cached_migrate_pfn[0]; + } + /* * We haven't isolated and migrated anything, but * there might still be unflushed migrations from @@ -2057,6 +2074,7 @@ static enum compact_result compact_zone(struct compact_control *cc) */ goto check_drain; case ISOLATE_SUCCESS: + update_cached = false; last_migrated_pfn = start_pfn; ; } -- 2.16.4
RE: [PATCH] fsl/fman: avoid sleeping in atomic context while adding an address
Hi, this is a duplicate of this other patch addressing the issue: commit 0d9c9a238faf925823bde866182c663b6d734f2e Author: Scott Wood Date: Thu Dec 27 18:29:09 2018 -0600 fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address() Thank you, Madalin > -Original Message- > From: Yi Wang > Sent: Friday, January 4, 2019 7:50 AM > To: Madalin-cristian Bucur > Cc: da...@davemloft.net; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; xue.zhih...@zte.com.cn; wang.y...@zte.com.cn; > huang.jun...@zte.com.cn; Junhua Huang > Subject: [PATCH] fsl/fman: avoid sleeping in atomic context while adding > an address > > From: Junhua Huang > > dev_set_rx_mode will call function pointer mac_dev->add_hash_mac_addr > while holding spin_lock_bh. The function pointer points to > memac_add_hash_mac_address when ethernet type is fman-memac, > which will kmalloc use GFP_KERNEL flag. > > / # ifconfig eth2 192.168.1.168 > [ 576.604544] BUG: sleeping function called from invalid context at > mm/slab.h:393 > [ 576.610587] in_atomic(): 1, irqs_disabled(): 0, pid: 2751, name: > ifconfig > [ 576.616105] 2 locks held by ifconfig/2751: > [ 576.618916] #0:(rtnl_mutex)at: [] > .rtnl_lock+0x1c/0x30 > [ 576.625523] #1:(_xmit_ETHER)at: [] > .dev_set_rx_mode+0x24/0x54 > [ 576.632745] CPU: 5 PID: 2751 Comm: ifconfig Tainted: G W 4.9.115- > rt93-EMBSYS-@332 #3 > [ 576.642942] Call Trace: > [ 576.644085] [c0007499b440] [c0a09eb4] > .dump_stack+0xe0/0x14c (unreliable) > [ 576.650642] [c0007499b4d0] [c0076f3c] > .___might_sleep+0x1ac/0x278 > [ 576.656493] [c0007499b560] [c01aad6c] > .kmem_cache_alloc+0x144/0x28c > [ 576.662518] [c0007499b620] [c0634c18] > .memac_add_hash_mac_address+0x100/0x194 > [ 576.669416] [c0007499b6b0] [c0630b54] > .set_multi+0x1bc/0x20c > [ 576.674829] [c0007499b760] [c063718c] > .dpaa_set_rx_mode+0x84/0x104 > [ 576.680765] [c0007499b7e0] [c07f9394] > .__dev_set_rx_mode+0x64/0xdc > [ 576.686701] [c0007499b870] [c07f943c] > .dev_set_rx_mode+0x30/0x54 > [ 576.692464] [c0007499b8f0] [c07f98f4] > .__dev_change_flags+0x98/0x1c4 > [ 576.698573] [c0007499b980] [c07f9a4c] > .dev_change_flags+0x2c/0x80 > [ 576.704429] [c0007499ba10] [c08cde28] > .devinet_ioctl+0x624/0x8e0 > [ 576.710191] [c0007499bb00] [c08d1678] > .inet_ioctl+0x1f4/0x250 > [ 576.715697] [c0007499bb70] [c07c89d8] > .sock_do_ioctl+0x50/0xa8 > [ 576.721284] [c0007499bc00] [c07c94e8] > .sock_ioctl+0x2b8/0x39c > [ 576.726786] [c0007499bca0] [c01e2500] > .do_vfs_ioctl+0xc8/0x8b4 > [ 576.732373] [c0007499bd90] [c01e2d44] .SyS_ioctl+0x58/0xa4 > [ 576.737612] [c0007499be30] [c7a4] > system_call+0x38/0x108 > > Signed-off-by: Junhua Huang > --- > drivers/net/ethernet/freescale/fman/fman_memac.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c > b/drivers/net/ethernet/freescale/fman/fman_memac.c > index 71a5ded..21dd557 100644 > --- a/drivers/net/ethernet/freescale/fman/fman_memac.c > +++ b/drivers/net/ethernet/freescale/fman/fman_memac.c > @@ -923,7 +923,7 @@ int memac_add_hash_mac_address(struct fman_mac *memac, > enet_addr_t *eth_addr) > hash = get_mac_addr_hash_code(addr) & HASH_CTRL_ADDR_MASK; > > /* Create element to be added to the driver hash table */ > - hash_entry = kmalloc(sizeof(*hash_entry), GFP_KERNEL); > + hash_entry = kmalloc(sizeof(*hash_entry), GFP_ATOMIC); > if (!hash_entry) > return -ENOMEM; > hash_entry->addr = addr; > -- > 2.15.2
[PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner
When scanning for sources or targets, PageCompound is checked for huge pages as they can be skipped quickly but it happens relatively late after a lot of setup and checking. This patch short-cuts the check to make it earlier. It might still change when the lock is acquired but this has less overhead overall. The free scanner advances but the migration scanner does not. Typically the free scanner encounters more movable blocks that change state over the lifetime of the system and also tends to scan more aggressively as it's actively filling its portion of the physical address space with data. This could change in the future but for the moment, this worked better in practice and incurred fewer scan restarts. The impact on latency and allocation success rates is marginal but the free scan rates are reduced by 32% and system CPU usage is reduced by 2.6%. The 2-socket results are not materially different. Signed-off-by: Mel Gorman --- mm/compaction.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 608d274f9880..921720f7a416 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct compact_control *cc, { int block_mt; + if (pageblock_skip_persistent(page)) + return false; + if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction) return true; @@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, continue; /* -* For async compaction, also only scan in MOVABLE blocks. -* Async compaction is optimistic to see if the minimum amount -* of work satisfies the allocation. +* For async compaction, also only scan in MOVABLE blocks +* without huge pages. Async compaction is optimistic to see +* if the minimum amount of work satisfies the allocation. +* The cached PFN is updated as it's possible that all +* remaining blocks between source and target are suitable +* and the compaction scanners fail to meet. */ - if (!suitable_migration_source(cc, page)) + if (!suitable_migration_source(cc, page)) { + update_cached_migrate(cc, block_end_pfn); continue; + } /* Perform the isolation */ low_pfn = isolate_migratepages_block(cc, low_pfn, -- 2.16.4
[PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times
Pageblocks are marked for skip when no pages are isolated after a scan. However, it's possible to hit corner cases where the migration scanner gets stuck near the boundary between the source and target scanner. Due to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that are migrated can be reallocated before the pageblock is complete. The pageblock is not necessarily skipped so it can be rescanned multiple times. Similarly, a pageblock with some dirty/writeback pages may fail to isolate and be rescanned until writeback completes which is wasteful. This patch tracks if a pageblock is being rescanned. If so, then the entire pageblock will be migrated as one operation. This narrows the race window during which pages can be reallocated during migration. Secondly, if there are pages that cannot be isolated then the pageblock will still be fully scanned and marked for skipping. On the second rescan, the pageblock skip is set and the migration scanner makes progress. 4.20.0 4.20.0 finishscan-v2r15 norescan-v2r15 Amean fault-both-3 3729.80 ( 0.00%) 2872.13 * 23.00%* Amean fault-both-5 5148.49 ( 0.00%) 4330.56 * 15.89%* Amean fault-both-7 7393.24 ( 0.00%) 6496.63 ( 12.13%) Amean fault-both-1211709.32 ( 0.00%)10280.59 ( 12.20%) Amean fault-both-1816626.82 ( 0.00%)11079.19 * 33.37%* Amean fault-both-2419944.34 ( 0.00%)17207.80 * 13.72%* Amean fault-both-3023435.53 ( 0.00%)17736.13 * 24.32%* Amean fault-both-3223948.70 ( 0.00%)18509.41 * 22.71%* 4.20.0 4.20.0 finishscan-v2r15 norescan-v2r15 Percentage huge-1 0.00 ( 0.00%)0.00 ( 0.00%) Percentage huge-388.39 ( 0.00%) 96.87 ( 9.60%) Percentage huge-592.07 ( 0.00%) 94.63 ( 2.77%) Percentage huge-791.96 ( 0.00%) 93.83 ( 2.03%) Percentage huge-12 93.38 ( 0.00%) 92.65 ( -0.78%) Percentage huge-18 91.89 ( 0.00%) 93.66 ( 1.94%) Percentage huge-24 91.37 ( 0.00%) 93.15 ( 1.95%) Percentage huge-30 92.77 ( 0.00%) 93.16 ( 0.42%) Percentage huge-32 87.97 ( 0.00%) 92.58 ( 5.24%) The fault latency reduction is large and while the THP allocation success rate is only slightly higher, it's already high at this point of the series. Compaction migrate scanned60718343.0031772603.00 Compaction free scanned 933061894.0063267928.00 Migration scan rates are reduced by 48% and free scan rates are also reduced as the same migration source block is not being selected multiple times. The corner case where migration scan rates go through the roof due to a dirty/writeback pageblock located at the boundary of the migration/free scanner did not happen in this case. When it does happen, the scan rates multiple by factors measured in the hundreds and would be misleading to present. Signed-off-by: Mel Gorman --- mm/compaction.c | 32 ++-- mm/internal.h | 1 + 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9438f0564ed5..9c2cc7955446 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -959,8 +959,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, cc->nr_migratepages++; nr_isolated++; - /* Avoid isolating too much */ - if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) { + /* +* Avoid isolating too much unless this block is being +* rescanned (e.g. dirty/writeback pages, parallel allocation). +*/ + if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) { ++low_pfn; break; } @@ -1007,11 +1010,14 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, spin_unlock_irqrestore(zone_lru_lock(zone), flags); /* -* Updated the cached scanner pfn if the pageblock was scanned -* without isolating a page. The pageblock may not be marked -* skipped already if there were no LRU pages in the block. +* Updated the cached scanner pfn once the pageblock has been scanned +* Pages will either be migrated in which case there is no point +* scanning in the near future or migration failed in which case the +* failure reason may persist. The block is marked for skipping if +* there were no pages isolated in the block or if the block is +* rescanned twice in a row. */ - if (low_pfn == end_pfn && !nr_isolated) { + if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
[PATCH 15/25] mm, compaction: Finish pageblock scanning on contention
Async migration aborts on spinlock contention but contention can be high when there are multiple compaction attempts and kswapd is active. The consequence is that the migration scanners move forward uselessly while still contending on locks for longer while leaving suitable migration sources behind. This patch will acquire the lock but track when contention occurs. When it does, the current pageblock will finish as compaction may succeed for that block and then abort. This will have a variable impact on latency as in some cases useless scanning is avoided (reduces latency) but a lock will be contended (increase latency) or a single contended pageblock is scanned that would otherwise have been skipped (increase latency). 4.20.0 4.20.0 norescan-v2r15finishcontend-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 2872.13 ( 0.00%) 2973.08 ( -3.51%) Amean fault-both-5 4330.56 ( 0.00%) 3870.19 ( 10.63%) Amean fault-both-7 6496.63 ( 0.00%) 6580.50 ( -1.29%) Amean fault-both-1210280.59 ( 0.00%) 9527.40 ( 7.33%) Amean fault-both-1811079.19 ( 0.00%)13395.86 * -20.91%* Amean fault-both-2417207.80 ( 0.00%)14936.94 * 13.20%* Amean fault-both-3017736.13 ( 0.00%)16748.46 ( 5.57%) Amean fault-both-3218509.41 ( 0.00%)18521.30 ( -0.06%) 4.20.0 4.20.0 norescan-v2r15finishcontend-v2r15 Percentage huge-1 0.00 ( 0.00%)0.00 ( 0.00%) Percentage huge-396.87 ( 0.00%) 97.57 ( 0.72%) Percentage huge-594.63 ( 0.00%) 96.88 ( 2.39%) Percentage huge-793.83 ( 0.00%) 95.47 ( 1.74%) Percentage huge-12 92.65 ( 0.00%) 98.64 ( 6.47%) Percentage huge-18 93.66 ( 0.00%) 98.33 ( 4.98%) Percentage huge-24 93.15 ( 0.00%) 98.88 ( 6.15%) Percentage huge-30 93.16 ( 0.00%) 97.09 ( 4.21%) Percentage huge-32 92.58 ( 0.00%) 96.20 ( 3.92%) As expected, a variable impact on latency while allocation success rates are slightly higher. System CPU usage is reduced by about 10% but scan rate impact is mixed Compaction migrate scanned3177260319980216 Compaction free scanned 63267928 120381828 Migration scan rates are reduced 37% which is expected as a pageblock is used by the async scanner instead of skipped but the free scanning is increased. This can be partially accounted for by the increased success rate but also by the fact that the scanners do not meet for longer when pageblocks are actually used. Overall this is justified and completing a pageblock scan is very important for later patches. Signed-off-by: Mel Gorman --- mm/compaction.c | 95 +++-- 1 file changed, 39 insertions(+), 56 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9c2cc7955446..608d274f9880 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -376,24 +376,25 @@ static bool test_and_set_skip(struct compact_control *cc, struct page *page, /* * Compaction requires the taking of some coarse locks that are potentially - * very heavily contended. For async compaction, back out if the lock cannot - * be taken immediately. For sync compaction, spin on the lock if needed. + * very heavily contended. For async compaction, trylock and record if the + * lock is contended. The lock will still be acquired but compaction will + * abort when the current block is finished regardless of success rate. + * Sync compaction acquires the lock. * - * Returns true if the lock is held - * Returns false if the lock is not held and compaction should abort + * Always returns true which makes it easier to track lock state in callers. */ -static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags, +static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, struct compact_control *cc) { - if (cc->mode == MIGRATE_ASYNC) { - if (!spin_trylock_irqsave(lock, *flags)) { - cc->contended = true; - return false; - } - } else { - spin_lock_irqsave(lock, *flags); + /* Track if the lock is contended in async mode */ + if (cc->mode == MIGRATE_ASYNC && !cc->contended) { + if (spin_trylock_irqsave(lock, *flags)) + return true; + + cc->contended = true; } + spin_lock_irqsave(lock, *flags); return true; } @@ -426,10 +427,8 @@ static bool compact_unlock_should_abort(spinlock_t *lock, } if (need_resched()) { - if (cc->mode == MIGRATE_ASYNC) { +
[PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target
Similar to the migration scanner, this patch uses the free lists to quickly locate a migration target. The search is different in that lower orders will be searched for a suitable high PFN if necessary but the search is still bound. This is justified on the grounds that the free scanner typically scans linearly much more than the migration scanner. If a free page is found, it is isolated and compaction continues if enough pages were isolated. For SYNC* scanning, the full pageblock is scanned for any remaining free pages so that is can be marked for skipping in the near future. 1-socket thpfioscale 4.20.0 4.20.0 isolmig-v2r15 findfree-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 3066.68 ( 0.00%) 2884.51 ( 5.94%) Amean fault-both-5 4298.49 ( 0.00%) 4419.70 ( -2.82%) Amean fault-both-7 5986.99 ( 0.00%) 6039.04 ( -0.87%) Amean fault-both-12 9324.85 ( 0.00%) 9992.34 ( -7.16%) Amean fault-both-1813350.05 ( 0.00%)12690.05 ( 4.94%) Amean fault-both-2413491.77 ( 0.00%)14393.93 ( -6.69%) Amean fault-both-3015630.86 ( 0.00%)16894.08 ( -8.08%) Amean fault-both-3217428.50 ( 0.00%)17813.68 ( -2.21%) The impact on latency is variable but the search is optimistic and sensitive to the exact system state. Success rates are similar but the major impact is to the rate of scanning 4.20.0-rc6 4.20.0-rc6 isolmig-v1r4findfree-v1r8 Compaction migrate scanned2551648828324352 Compaction free scanned 8760332156131065 The free scan rates are reduced by 35%. The 2-socket reductions for the free scanner are more dramatic which is a likely reflection that the machine has more memory. Signed-off-by: Mel Gorman --- mm/compaction.c | 203 ++-- 1 file changed, 198 insertions(+), 5 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 24e3a9db4b70..9438f0564ed5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1136,7 +1136,7 @@ static inline bool compact_scanners_met(struct compact_control *cc) /* Reorder the free list to reduce repeated future searches */ static void -move_freelist_tail(struct list_head *freelist, struct page *freepage) +move_freelist_head(struct list_head *freelist, struct page *freepage) { LIST_HEAD(sublist); @@ -1147,6 +1147,193 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage) } } +static void +move_freelist_tail(struct list_head *freelist, struct page *freepage) +{ + LIST_HEAD(sublist); + + if (!list_is_last(freelist, >lru)) { + list_cut_before(, freelist, >lru); + if (!list_empty()) + list_splice_tail(, freelist); + } +} + +static void +fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) +{ + unsigned long start_pfn, end_pfn; + struct page *page = pfn_to_page(pfn); + + /* Do not search around if there are enough pages already */ + if (cc->nr_freepages >= cc->nr_migratepages) + return; + + /* Minimise scanning during async compaction */ + if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC) + return; + + /* Pageblock boundaries */ + start_pfn = pageblock_start_pfn(pfn); + end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone)); + + /* Scan before */ + if (start_pfn != pfn) { + isolate_freepages_block(cc, _pfn, pfn, >freepages, false); + if (cc->nr_freepages >= cc->nr_migratepages) + return; + } + + /* Scan after */ + start_pfn = pfn + nr_isolated; + if (start_pfn != end_pfn) + isolate_freepages_block(cc, _pfn, end_pfn, >freepages, false); + + /* Skip this pageblock in the future as it's full or nearly full */ + if (cc->nr_freepages < cc->nr_migratepages) + set_pageblock_skip(page); +} + +static unsigned long +fast_isolate_freepages(struct compact_control *cc) +{ + unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1); + unsigned int order_scanned = 0, nr_scanned = 0; + unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0; + unsigned long nr_isolated = 0; + unsigned long distance; + struct page *page = NULL; + bool scan_start = false; + int order; + + /* +* If starting the scan, use a deeper search and use the highest +* PFN found if a suitable one is not found. +*/ + if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) { + limit = pageblock_nr_pages >> 1; + scan_start = true; + } +
[PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source
The migration scanner is a linear scan of a zone with a potentiall large search space. Furthermore, many pageblocks are unusable such as those filled with reserved pages or partially filled with pages that cannot migrate. These still get scanned in the common case of allocating a THP and the cost accumulates. The patch uses a partial search of the free lists to locate a migration source candidate that is marked as MOVABLE when allocating a THP. It prefers picking a block with a larger number of free pages already on the basis that there are fewer pages to migrate to free the entire block. The lowest PFN found during searches is tracked as the basis of the start for the linear search after the first search of the free list fails. After the search, the free list is shuffled so that the next search will not encounter the same page. If the search fails then the subsequent searches will be shorter and the linear scanner is used. If this search fails, or if the request is for a small or unmovable/reclaimable allocation then the linear scanner is still used. It is somewhat pointless to use the list search in those cases. Small free pages must be used for the search and there is no guarantee that movable pages are located within that block that are contiguous. 4.20.0 4.20.0 failfast-v2r15 findmig-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 3833.72 ( 0.00%) 3505.69 ( 8.56%) Amean fault-both-5 4967.15 ( 0.00%) 5794.13 * -16.65%* Amean fault-both-7 7139.19 ( 0.00%) 7663.09 ( -7.34%) Amean fault-both-1211326.30 ( 0.00%)10983.36 ( 3.03%) Amean fault-both-1816270.70 ( 0.00%)13602.71 * 16.40%* Amean fault-both-2419839.65 ( 0.00%)16145.77 * 18.62%* Amean fault-both-3021707.05 ( 0.00%)19753.82 ( 9.00%) Amean fault-both-3221968.16 ( 0.00%)20616.16 ( 6.15%) 4.20.0 4.20.0 failfast-v2r15 findmig-v2r15 Percentage huge-1 0.00 ( 0.00%)0.00 ( 0.00%) Percentage huge-384.62 ( 0.00%) 90.58 ( 7.05%) Percentage huge-588.43 ( 0.00%) 91.34 ( 3.29%) Percentage huge-788.33 ( 0.00%) 92.21 ( 4.39%) Percentage huge-12 88.74 ( 0.00%) 92.48 ( 4.21%) Percentage huge-18 86.52 ( 0.00%) 91.65 ( 5.93%) Percentage huge-24 86.42 ( 0.00%) 90.23 ( 4.41%) Percentage huge-30 86.67 ( 0.00%) 90.17 ( 4.04%) Percentage huge-32 86.00 ( 0.00%) 89.72 ( 4.32%) This shows an improvement in allocation latencies and a slight increase in allocation success rates. While not presented, there was a 13% reduction in migration scanning and a 10% reduction on system CPU usage. A 2-socket machine showed similar benefits. Signed-off-by: Mel Gorman --- mm/compaction.c | 179 +++- mm/internal.h | 2 + 2 files changed, 179 insertions(+), 2 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 8f0ce44dba41..137e32e8a2f5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1050,6 +1050,12 @@ static bool suitable_migration_target(struct compact_control *cc, return false; } +static inline unsigned int +freelist_scan_limit(struct compact_control *cc) +{ + return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1; +} + /* * Test whether the free scanner has reached the same or lower pageblock than * the migration scanner, and compaction should thus terminate. @@ -1060,6 +1066,19 @@ static inline bool compact_scanners_met(struct compact_control *cc) <= (cc->migrate_pfn >> pageblock_order); } +/* Reorder the free list to reduce repeated future searches */ +static void +move_freelist_tail(struct list_head *freelist, struct page *freepage) +{ + LIST_HEAD(sublist); + + if (!list_is_last(freelist, >lru)) { + list_cut_position(, freelist, >lru); + if (!list_empty()) + list_splice_tail(, freelist); + } +} + /* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. @@ -1217,6 +1236,160 @@ typedef enum { */ int sysctl_compact_unevictable_allowed __read_mostly = 1; +static inline void +update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) +{ + if (cc->fast_start_pfn == ULONG_MAX) + return; + + if (!cc->fast_start_pfn) + cc->fast_start_pfn = pfn; + + cc->fast_start_pfn = min(cc->fast_start_pfn, pfn); +} + +static inline void +reinit_migrate_pfn(struct compact_control *cc) +{ + if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX) +
[PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance
Due to either a fast search of the free list or a linear scan, it is possible for multiple compaction instances to pick the same pageblock for migration. This is lucky for one scanner and increased scanning for all the others. It also allows a race between requests on which first allocates the resulting free block. This patch tests and updates the pageblock skip for the migration scanner carefully. When isolating a block, it will check and skip if the block is already in use. Once the zone lock is acquired, it will be rechecked so that only one scanner can set the pageblock skip for exclusive use. Any scanner contending will continue with a linear scan. The skip bit is still set if no pages can be isolated in a range. While this may result in redundant scanning, it avoids unnecessarily acquiring the zone lock when there are no suitable migration sources. 1-socket thpscale 4.20.0 4.20.0 findmig-v2r15 isolmig-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 3505.69 ( 0.00%) 3066.68 * 12.52%* Amean fault-both-5 5794.13 ( 0.00%) 4298.49 * 25.81%* Amean fault-both-7 7663.09 ( 0.00%) 5986.99 * 21.87%* Amean fault-both-1210983.36 ( 0.00%) 9324.85 ( 15.10%) Amean fault-both-1813602.71 ( 0.00%)13350.05 ( 1.86%) Amean fault-both-2416145.77 ( 0.00%)13491.77 * 16.44%* Amean fault-both-3019753.82 ( 0.00%)15630.86 * 20.87%* Amean fault-both-3220616.16 ( 0.00%)17428.50 * 15.46%* This is the first patch that shows a significant reduction in latency as multiple compaction scanners do not operate on the same blocks. There is a small increase in the success rate 4.20.0-rc6 4.20.0-rc6 findmig-v1r4 isolmig-v1r4 Percentage huge-390.58 ( 0.00%) 95.84 ( 5.81%) Percentage huge-591.34 ( 0.00%) 94.19 ( 3.12%) Percentage huge-792.21 ( 0.00%) 93.78 ( 1.71%) Percentage huge-12 92.48 ( 0.00%) 94.33 ( 2.00%) Percentage huge-18 91.65 ( 0.00%) 94.15 ( 2.72%) Percentage huge-24 90.23 ( 0.00%) 94.23 ( 4.43%) Percentage huge-30 90.17 ( 0.00%) 95.17 ( 5.54%) Percentage huge-32 89.72 ( 0.00%) 93.59 ( 4.32%) Compaction migrate scanned5416830625516488 Compaction free scanned 80053095487603321 Migration scan rates are reduced by 52%. Signed-off-by: Mel Gorman --- mm/compaction.c | 126 1 file changed, 99 insertions(+), 27 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 137e32e8a2f5..24e3a9db4b70 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -279,13 +279,52 @@ void reset_isolation_suitable(pg_data_t *pgdat) } } +/* + * Sets the pageblock skip bit if it was clear. Note that this is a hint as + * locks are not required for read/writers. Returns true if it was already set. + */ +static bool test_and_set_skip(struct compact_control *cc, struct page *page, + unsigned long pfn) +{ + bool skip; + + /* Do no update if skip hint is being ignored */ + if (cc->ignore_skip_hint) + return false; + + if (!IS_ALIGNED(pfn, pageblock_nr_pages)) + return false; + + skip = get_pageblock_skip(page); + if (!skip && !cc->no_set_skip_hint) + set_pageblock_skip(page); + + return skip; +} + +static void update_cached_migrate(struct compact_control *cc, unsigned long pfn) +{ + struct zone *zone = cc->zone; + + pfn = pageblock_end_pfn(pfn); + + /* Set for isolation rather than compaction */ + if (cc->no_set_skip_hint) + return; + + if (pfn > zone->compact_cached_migrate_pfn[0]) + zone->compact_cached_migrate_pfn[0] = pfn; + if (cc->mode != MIGRATE_ASYNC && + pfn > zone->compact_cached_migrate_pfn[1]) + zone->compact_cached_migrate_pfn[1] = pfn; +} + /* * If no pages were isolated then mark this pageblock to be skipped in the * future. The information is later cleared by __reset_isolation_suitable(). */ static void update_pageblock_skip(struct compact_control *cc, - struct page *page, unsigned long nr_isolated, - bool migrate_scanner) + struct page *page, unsigned long nr_isolated) { struct zone *zone = cc->zone; unsigned long pfn; @@ -304,16 +343,8 @@ static void update_pageblock_skip(struct compact_control *cc, pfn = page_to_pfn(page); /* Update where async and sync compaction should restart */ - if (migrate_scanner) { - if (pfn
[PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages
Reserved pages are set at boot time, tend to be clustered and almost never become unreserved. When isolating pages for either migration sources or target, skip the entire pageblock is one PageReserved page is encountered on the grounds that it is highly probable the entire pageblock is reserved. The performance impact is relative to the number of reserved pages in the system and their location so it'll be variable but intuitively it should make sense. If the memblock allocator was ever changed to spread reserved pages throughout the address space then this patch would be impaired but it would also be considered a bug given that such a change would ruin fragmentation. On both 1-socket and 2-socket machines, scan rates are reduced slightly on workloads that intensively allocate THP while the system is fragmented. Signed-off-by: Mel Gorman --- mm/compaction.c | 16 1 file changed, 16 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 3afa4e9188b6..94d1e5b062ea 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -484,6 +484,15 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } + /* +* A reserved page is never freed and tend to be clustered in +* the same pageblock. Skip the block. +*/ + if (PageReserved(page)) { + blockpfn = end_pfn; + break; + } + if (!PageBuddy(page)) goto isolate_fail; @@ -827,6 +836,13 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, goto isolate_success; } + /* +* A reserved page is never freed and tend to be +* clustered in the same pageblocks. Skip the block. +*/ + if (PageReserved(page)) + low_pfn = end_pfn; + goto isolate_fail; } -- 2.16.4
[PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction
When pageblocks get fragmented, watermarks are artifically boosted to reclaim pages to avoid further fragmentation events. However, compaction is often either fragmentation-neutral or moving movable pages away from unmovable/reclaimable pages. As the true watermarks are preserved, allow compaction to ignore the boost factor. The expected impact is very slight as the main benefit is that compaction is slightly more likely to succeed when the system has been fragmented very recently. On both 1-socket and 2-socket machines for THP-intensive allocation during fragmentation the success rate was increased by less than 1% which is marginal. However, detailed tracing indicated that failure of migration due to a premature ENOMEM triggered by watermark checks were eliminated. Signed-off-by: Mel Gorman --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 57ba9d1da519..05c9a81d54ed 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2958,7 +2958,7 @@ int __isolate_free_page(struct page *page, unsigned int order) * watermark, because we already know our high-order page * exists. */ - watermark = min_wmark_pages(zone) + (1UL << order); + watermark = zone->_watermark[WMARK_MIN] + (1UL << order); if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) return 0; -- 2.16.4
[PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock
When compaction is finishing, it uses a flag to ensure the pageblock is complete but it makes sense to always complete migration of a pageblock. Minimally, skip information is based on a pageblock and partially scanned pageblocks may incur more scanning in the future. The pageblock skip handling also becomes more strict later in the series and the hint is more useful if a complete pageblock was always scanned. The potentially impacts latency as more scanning is done but it's not a consistent win or loss as the scanning is not always a high percentage of the pageblock and sometimes it is offset by future reductions in scanning. Hence, the results are not presented this time due to a misleading mix of gains/losses without any clear pattern. However, full scanning of the pageblock is important for later patches. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 19 --- mm/internal.h | 1 - 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 94d1e5b062ea..8bf2090231a3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1347,16 +1347,14 @@ static enum compact_result __compact_finished(struct compact_control *cc) if (is_via_compact_memory(cc->order)) return COMPACT_CONTINUE; - if (cc->finishing_block) { - /* -* We have finished the pageblock, but better check again that -* we really succeeded. -*/ - if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) - cc->finishing_block = false; - else - return COMPACT_CONTINUE; - } + /* +* Always finish scanning a pageblock to reduce the possibility of +* fallbacks in the future. This is particularly important when +* migration source is unmovable/reclaimable but it's not worth +* special casing. +*/ + if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages)) + return COMPACT_CONTINUE; /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { @@ -1398,7 +1396,6 @@ static enum compact_result __compact_finished(struct compact_control *cc) return COMPACT_SUCCESS; } - cc->finishing_block = true; return COMPACT_CONTINUE; } } diff --git a/mm/internal.h b/mm/internal.h index c6f794ad21a9..edb4029f64c8 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -202,7 +202,6 @@ struct compact_control { bool direct_compaction; /* False from kcompactd or /proc/... */ bool whole_zone;/* Whole zone should/has been scanned */ bool contended; /* Signal lock or sched contention */ - bool finishing_block; /* Finishing current pageblock */ }; unsigned long -- 2.16.4
[PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages
release_pages() is a simpler version of free_unref_page_list() but it tracks the highest PFN for caching the restart point of the compaction free scanner. This patch optionally tracks the highest PFN in the core helper and converts compaction to use it. The performance impact is limited but it should reduce lock contention slightly in some cases. The main benefit is removing some partially duplicated code. Signed-off-by: Mel Gorman --- include/linux/gfp.h | 7 ++- mm/compaction.c | 12 +++- mm/page_alloc.c | 10 +- 3 files changed, 18 insertions(+), 11 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 5f5e25fd6149..9e58799b730f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -543,7 +543,12 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask); extern void __free_pages(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); extern void free_unref_page(struct page *page); -extern void free_unref_page_list(struct list_head *list); +extern void __free_page_list(struct list_head *list, bool dropref, unsigned long *highest_pfn); + +static inline void free_unref_page_list(struct list_head *list) +{ + return __free_page_list(list, false, NULL); +} struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); diff --git a/mm/compaction.c b/mm/compaction.c index 8bf2090231a3..8f0ce44dba41 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -52,16 +52,10 @@ static inline void count_compact_events(enum vm_event_item item, long delta) static unsigned long release_freepages(struct list_head *freelist) { - struct page *page, *next; - unsigned long high_pfn = 0; + unsigned long high_pfn; - list_for_each_entry_safe(page, next, freelist, lru) { - unsigned long pfn = page_to_pfn(page); - list_del(>lru); - __free_page(page); - if (pfn > high_pfn) - high_pfn = pfn; - } + __free_page_list(freelist, true, _pfn); + INIT_LIST_HEAD(freelist); return high_pfn; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cde5dac6229a..57ba9d1da519 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2876,18 +2876,26 @@ void free_unref_page(struct page *page) /* * Free a list of 0-order pages */ -void free_unref_page_list(struct list_head *list) +void __free_page_list(struct list_head *list, bool dropref, + unsigned long *highest_pfn) { struct page *page, *next; unsigned long flags, pfn; int batch_count = 0; + if (highest_pfn) + *highest_pfn = 0; + /* Prepare pages for freeing */ list_for_each_entry_safe(page, next, list, lru) { + if (dropref) + WARN_ON_ONCE(!put_page_testzero(page)); pfn = page_to_pfn(page); if (!free_unref_page_prepare(page, pfn)) list_del(>lru); set_page_private(page, pfn); + if (highest_pfn && pfn > *highest_pfn) + *highest_pfn = pfn; } local_irq_save(flags); -- 2.16.4
[PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler
Pages with no migration handler use a fallback handler which sometimes works and sometimes persistently retries. A historical example was blockdev pages but there are others such as odd refcounting when page->private is used. These are retried multiple times which is wasteful during compaction so this patch will fail migration faster unless the caller specifies MIGRATE_SYNC. This is not expected to help THP allocation success rates but it did reduce latencies very slightly in some cases. 1-socket thpfioscale 4.20.0 4.20.0 noreserved-v2r15 failfast-v2r15 Amean fault-both-1 0.00 ( 0.00%)0.00 * 0.00%* Amean fault-both-3 3839.67 ( 0.00%) 3833.72 ( 0.15%) Amean fault-both-5 5177.47 ( 0.00%) 4967.15 ( 4.06%) Amean fault-both-7 7245.03 ( 0.00%) 7139.19 ( 1.46%) Amean fault-both-1211534.89 ( 0.00%)11326.30 ( 1.81%) Amean fault-both-1816241.10 ( 0.00%)16270.70 ( -0.18%) Amean fault-both-2419075.91 ( 0.00%)19839.65 ( -4.00%) Amean fault-both-3022712.11 ( 0.00%)21707.05 ( 4.43%) Amean fault-both-3221692.92 ( 0.00%)21968.16 ( -1.27%) The 2-socket results are not materially different. Scan rates are similar as expected. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 5d1839a9148d..547cc1f3f3bb 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -916,7 +916,7 @@ static int fallback_migrate_page(struct address_space *mapping, */ if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) - return -EAGAIN; + return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY; return migrate_page(mapping, newpage, page, mode); } -- 2.16.4
[PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances
A zone parameter is passed into a number of top-level compaction functions despite the fact that it's already in cache_control. This is harmless but it did need an audit to check if zone actually ever changes meaningfully. This patches removes the parameter in a number of top-level functions. The change could be much deeper but this was enough to briefly clarify the flow. No functional change. Signed-off-by: Mel Gorman --- mm/compaction.c | 54 ++ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index fb4d9f52ed56..7acb43f07303 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order) return order == -1; } -static enum compact_result __compact_finished(struct zone *zone, - struct compact_control *cc) +static enum compact_result __compact_finished(struct compact_control *cc) { unsigned int order; const int migratetype = cc->migratetype; @@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone *zone, /* Compaction run completes if the migrate and free scanner meet */ if (compact_scanners_met(cc)) { /* Let the next compaction start anew. */ - reset_cached_positions(zone); + reset_cached_positions(cc->zone); /* * Mark that the PG_migrate_skip information should be cleared @@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone *zone, * based on an allocation request. */ if (cc->direct_compaction) - zone->compact_blockskip_flush = true; + cc->zone->compact_blockskip_flush = true; if (cc->whole_zone) return COMPACT_COMPLETE; @@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone *zone, /* Direct compactor: Is a suitable page free? */ for (order = cc->order; order < MAX_ORDER; order++) { - struct free_area *area = >free_area[order]; + struct free_area *area = >zone->free_area[order]; bool can_steal; /* Job done if page is free of the right migratetype */ @@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct zone *zone, return COMPACT_NO_SUITABLE_PAGE; } -static enum compact_result compact_finished(struct zone *zone, - struct compact_control *cc) +static enum compact_result compact_finished(struct compact_control *cc) { int ret; - ret = __compact_finished(zone, cc); - trace_mm_compaction_finished(zone, cc->order, ret); + ret = __compact_finished(cc); + trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; @@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, return false; } -static enum compact_result compact_zone(struct zone *zone, struct compact_control *cc) +static enum compact_result compact_zone(struct compact_control *cc) { enum compact_result ret; - unsigned long start_pfn = zone->zone_start_pfn; - unsigned long end_pfn = zone_end_pfn(zone); + unsigned long start_pfn = cc->zone->zone_start_pfn; + unsigned long end_pfn = zone_end_pfn(cc->zone); unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); - ret = compaction_suitable(zone, cc->order, cc->alloc_flags, + ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, cc->classzone_idx); /* Compaction is likely to fail */ if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED) @@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * Clear pageblock skip if there were failures recently and compaction * is about to be retried after being deferred. */ - if (compaction_restarting(zone, cc->order)) - __reset_isolation_suitable(zone); + if (compaction_restarting(cc->zone, cc->order)) + __reset_isolation_suitable(cc->zone); /* * Setup to move all movable pages to the end of the zone. Used cached @@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->migrate_pfn = start_pfn; cc->free_pfn = pageblock_start_pfn(end_pfn - 1); } else { - cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync]; - cc->free_pfn = zone->compact_cached_free_pfn; +
[PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages
It's non-obvious that high-order free pages are split into order-0 pages from the function name. Fix it. Signed-off-by: Mel Gorman --- mm/compaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7acb43f07303..3afa4e9188b6 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head *freelist) return high_pfn; } -static void map_pages(struct list_head *list) +static void split_map_pages(struct list_head *list) { unsigned int i, order, nr_pages; struct page *page, *next; @@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc, } /* __isolate_free_page() does not map the pages */ - map_pages(); + split_map_pages(); if (pfn < end_pfn) { /* Loop terminated early, cleanup. */ @@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc) } /* __isolate_free_page() does not map the pages */ - map_pages(freelist); + split_map_pages(freelist); /* * Record where the free scanner will restart next time. Either we -- 2.16.4
[PATCH 02/25] mm, compaction: Rearrange compact_control
compact_control spans two cache lines with write-intensive lines on both. Rearrange so the most write-intensive fields are in the same cache line. This has a negligible impact on the overall performance of compaction and is more a tidying exercise than anything. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5ddf5d3771a0..9437ba5791db 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -183,14 +183,14 @@ extern int user_min_free_kbytes; struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ - struct zone *zone; unsigned int nr_freepages; /* Number of isolated free pages */ unsigned int nr_migratepages; /* Number of pages to migrate */ - unsigned long total_migrate_scanned; - unsigned long total_free_scanned; unsigned long free_pfn; /* isolate_freepages search base */ unsigned long migrate_pfn; /* isolate_migratepages search base */ unsigned long last_migrated_pfn;/* Not yet flushed page being freed */ + struct zone *zone; + unsigned long total_migrate_scanned; + unsigned long total_free_scanned; const gfp_t gfp_mask; /* gfp mask of a direct compactor */ int order; /* order a direct compactor needs */ int migratetype;/* migratetype of direct compactor */ -- 2.16.4
[PATCH 00/25] Increase success rates and reduce latency of compaction v2
This series reduces scan rates and success rates of compaction, primarily by using the free lists to shorten scans, better controlling of skip information and whether multiple scanners can target the same block and capturing pageblocks before being stolen by parallel requests. The series is based on the 4.21/5.0 merge window after Andrew's tree had been merged. It's known to rebase cleanly. Primarily I'm using thpscale to measure the impact of the series. The benchmark creates a large file, maps it, faults it, punches holes in the mapping so that the virtual address space is fragmented and then tries to allocate THP. It re-executes for different numbers of threads. From a fragmentation perspective, the workload is relatively benign but it does stress compaction. The overall impact on latencies for a 1-socket machine is baselinepatches Amean fault-both-3 5362.80 ( 0.00%) 4446.89 * 17.08%* Amean fault-both-5 9488.75 ( 0.00%) 5660.86 * 40.34%* Amean fault-both-7 11909.86 ( 0.00%) 8549.63 * 28.21%* Amean fault-both-1216185.09 ( 0.00%)11508.36 * 28.90%* Amean fault-both-1812057.72 ( 0.00%)19013.48 * -57.69%* Amean fault-both-2423939.95 ( 0.00%)19676.16 * 17.81%* Amean fault-both-3026606.14 ( 0.00%)27363.23 ( -2.85%) Amean fault-both-3231677.12 ( 0.00%)23154.09 * 26.91%* While there is a glitch at the 18-thread mark, it's known that the base page allocation latency was much lower and huge pages were taking longer -- partially due a high allocation success rate. The allocation success rates are much improved baselinepatches Percentage huge-370.93 ( 0.00%) 98.30 ( 38.60%) Percentage huge-556.02 ( 0.00%) 83.36 ( 48.81%) Percentage huge-760.98 ( 0.00%) 89.04 ( 46.01%) Percentage huge-12 73.02 ( 0.00%) 94.36 ( 29.23%) Percentage huge-18 94.37 ( 0.00%) 95.87 ( 1.58%) Percentage huge-24 84.95 ( 0.00%) 97.41 ( 14.67%) Percentage huge-30 83.63 ( 0.00%) 96.69 ( 15.62%) Percentage huge-32 81.69 ( 0.00%) 96.10 ( 17.65%) That's a nearly perfect allocation success rate. The biggest impact is on the scan rates Compaction migrate scanned 10652081126934599 Compaction free scanned 418073504026584944 The number of pages scanned for migration was reduced by 74% and the free scanner was reduced by 99.36%. So much less work in exchange for lower latency and better success rates. The series was also evaluated using a workload that heavily fragments memory but the benefits there are also significant, albeit not presented. It was commented that we should be rethinking scanning entirely and to a large extent I agree. However, to achieve that you need a lot of this series in place first so it's best to make the linear scanners as best as possible before ripping them out. include/linux/compaction.h |3 +- include/linux/gfp.h|7 +- include/linux/mmzone.h |2 + include/linux/sched.h |4 + kernel/sched/core.c|3 + mm/compaction.c| 1031 ++-- mm/internal.h | 23 +- mm/migrate.c |2 +- mm/page_alloc.c| 70 ++- 9 files changed, 908 insertions(+), 237 deletions(-) -- 2.16.4
[PATCH 01/25] mm, compaction: Shrink compact_control
The isolate and migrate scanners should never isolate more than a pageblock of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index f4a7bb02decf..5ddf5d3771a0 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -184,8 +184,8 @@ struct compact_control { struct list_head freepages; /* List of free pages to migrate to */ struct list_head migratepages; /* List of pages being migrated */ struct zone *zone; - unsigned long nr_freepages; /* Number of isolated free pages */ - unsigned long nr_migratepages; /* Number of pages to migrate */ + unsigned int nr_freepages; /* Number of isolated free pages */ + unsigned int nr_migratepages; /* Number of pages to migrate */ unsigned long total_migrate_scanned; unsigned long total_free_scanned; unsigned long free_pfn; /* isolate_freepages search base */ -- 2.16.4
[PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control
The last_migrated_pfn field is a bit dubious as to whether it really helps but either way, the information from it can be inferred without increasing the size of compact_control so remove the field. Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka --- mm/compaction.c | 25 + mm/internal.h | 1 - 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index ef29490b0f46..fb4d9f52ed56 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, cc->nr_migratepages++; nr_isolated++; - /* -* Record where we could have freed pages by migration and not -* yet flushed them to buddy allocator. -* - this is the lowest page that was isolated and likely be -* then freed by migration. -*/ - if (!cc->last_migrated_pfn) - cc->last_migrated_pfn = low_pfn; - /* Avoid isolating too much */ if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) { ++low_pfn; @@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, } putback_movable_pages(>migratepages); cc->nr_migratepages = 0; - cc->last_migrated_pfn = 0; nr_isolated = 0; } @@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro enum compact_result ret; unsigned long start_pfn = zone->zone_start_pfn; unsigned long end_pfn = zone_end_pfn(zone); + unsigned long last_migrated_pfn; const bool sync = cc->mode != MIGRATE_ASYNC; cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask); @@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->whole_zone = true; } - cc->last_migrated_pfn = 0; + last_migrated_pfn = 0; trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); @@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { int err; + unsigned long start_pfn = cc->migrate_pfn; switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_CONTENDED; putback_movable_pages(>migratepages); cc->nr_migratepages = 0; + last_migrated_pfn = 0; goto out; case ISOLATE_NONE: /* @@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro */ goto check_drain; case ISOLATE_SUCCESS: + last_migrated_pfn = start_pfn; ; } @@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro cc->migrate_pfn = block_end_pfn( cc->migrate_pfn - 1, cc->order); /* Draining pcplists is useless in this case */ - cc->last_migrated_pfn = 0; - + last_migrated_pfn = 0; } } @@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro * compact_finished() can detect immediately if allocation * would succeed. */ - if (cc->order > 0 && cc->last_migrated_pfn) { + if (cc->order > 0 && last_migrated_pfn) { int cpu; unsigned long current_block_start = block_start_pfn(cc->migrate_pfn, cc->order); - if (cc->last_migrated_pfn < current_block_start) { + if (last_migrated_pfn < current_block_start) { cpu = get_cpu(); lru_add_drain_cpu(cpu); drain_local_pages(zone); put_cpu(); /* No more flushing until we migrate again */ - cc->last_migrated_pfn = 0; + last_migrated_pfn = 0; } } diff --git a/mm/internal.h b/mm/internal.h index
[PATCH 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out
This appears to trigger a firmware bug and causes severe problems with rtl8723ae PCI devices. When the power save mode is activated for longer periods of time the firmware stops to receive any packets. This problem was exposed by commit 873ffe154ae0 ("rtlwifi: Fix logic error in enter/exit power-save mode"). Previously the power save mode was only active rarely and only for a short time so that the problem was not noticeable. Signed-off-by: Bernd Edlinger --- .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h| 1 - .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 +- 3 files changed, 1 insertion(+), 76 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c index bf9859f..77833fb 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c @@ -268,26 +268,6 @@ void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw, (u8 *)_cmdbuf); } -void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode) -{ - struct rtl_priv *rtlpriv = rtl_priv(hw); - u8 u1_h2c_set_pwrmode[3] = { 0 }; - struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw)); - - RT_TRACE(rtlpriv, COMP_POWER, DBG_LOUD, "FW LPS mode = %d\n", mode); - - SET_H2CCMD_PWRMODE_PARM_MODE(u1_h2c_set_pwrmode, mode); - SET_H2CCMD_PWRMODE_PARM_SMART_PS(u1_h2c_set_pwrmode, - (rtlpriv->mac80211.p2p) ? ppsc->smart_ps : 1); - SET_H2CCMD_PWRMODE_PARM_BCN_PASS_TIME(u1_h2c_set_pwrmode, - ppsc->reg_max_lps_awakeintvl); - - RT_PRINT_DATA(rtlpriv, COMP_CMD, DBG_DMESG, - "rtl8723e_set_fw_rsvdpagepkt(): u1_h2c_set_pwrmode\n", - u1_h2c_set_pwrmode, 3); - rtl8723e_fill_h2c_cmd(hw, H2C_SETPWRMODE, 3, u1_h2c_set_pwrmode); -} - #define BEACON_PG 0 /* ->1 */ #define PSPOLL_PG 2 #define NULL_PG 3 diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h index 2e668fc..8618b82 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h @@ -55,7 +55,6 @@ void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw, u8 element_id, u32 cmd_len, u8 *p_cmdbuffer); -void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode); void rtl8723e_set_fw_rsvdpagepkt(struct ieee80211_hw *hw, bool b_dl_finished); void rtl8723e_set_fw_joinbss_report_cmd(struct ieee80211_hw *hw, u8 mstatus); void rtl8723e_set_p2p_ps_offload_cmd(struct ieee80211_hw *hw, u8 p2p_ps_state); diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c index f783e4a..f0eb356 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c @@ -408,29 +408,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 variable, u8 *val) case HW_VAR_WPA_CONFIG: rtl_write_byte(rtlpriv, REG_SECCFG, *((u8 *)val)); break; - case HW_VAR_SET_RPWM:{ - u8 rpwm_val; - - rpwm_val = rtl_read_byte(rtlpriv, REG_PCIE_HRPWM); - udelay(1); - - if (rpwm_val & BIT(7)) { - rtl_write_byte(rtlpriv, REG_PCIE_HRPWM, - (*(u8 *)val)); - } else { - rtl_write_byte(rtlpriv, REG_PCIE_HRPWM, - ((*(u8 *)val) | BIT(7))); - } - - break; - } case HW_VAR_H2C_FW_PWRMODE:{ - u8 psmode = (*(u8 *)val); - - if (psmode != FW_PS_ACTIVE_MODE) - rtl8723e_dm_rf_saving(hw, true); - - rtl8723e_set_fw_pwrmode_cmd(hw, (*(u8 *)val)); break; } case HW_VAR_FW_PSMODE_STATUS: @@ -513,39 +491,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 variable, u8 *val) break; } case HW_VAR_FW_LPS_ACTION:{ - bool b_enter_fwlps = *((bool *)val); - u8 rpwm_val, fw_pwrmode; - bool fw_current_inps; - - if (b_enter_fwlps) { - rpwm_val = 0x02;/* RF off */ - fw_current_inps = true; - rtlpriv->cfg->ops->set_hw_reg(hw, - HW_VAR_FW_PSMODE_STATUS, -
Re: [PATCH] scsi/dpt_i2o.c: Use dma_zalloc_coherent
On Wed, Dec 19, 2018 at 6:41 PM Sabyasachi Gupta wrote: > > On Sat, Dec 1, 2018 at 6:45 PM Sabyasachi Gupta > wrote: > > > > On Sat, Nov 17, 2018 at 6:17 PM Sabyasachi Gupta > > wrote: > > > > > > Replaced dma_alloc_coherent + memset with dma_zalloc_coherent > > > > > > Signed-off-by: Sabyasachi Gupta > > > > Any comment on this patch? > > Any comment on this patch? > Any comment on this patch? > > > > > --- > > > drivers/scsi/dpt_i2o.c | 12 > > > 1 file changed, 4 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c > > > index 37de8fb..056383a 100644 > > > --- a/drivers/scsi/dpt_i2o.c > > > +++ b/drivers/scsi/dpt_i2o.c > > > @@ -1370,13 +1370,12 @@ static s32 adpt_i2o_reset_hba(adpt_hba* pHba) > > > schedule_timeout_uninterruptible(1); > > > } while (m == EMPTY_QUEUE); > > > > > > - status = dma_alloc_coherent(>pDev->dev, 4, , > > > GFP_KERNEL); > > > + status = dma_zalloc_coherent(>pDev->dev, 4, , > > > GFP_KERNEL); > > > if(status == NULL) { > > > adpt_send_nop(pHba, m); > > > printk(KERN_ERR"IOP reset failed - no free memory.\n"); > > > return -ENOMEM; > > > } > > > - memset(status,0,4); > > > > > > msg[0]=EIGHT_WORD_MSG_SIZE|SGL_OFFSET_0; > > > msg[1]=I2O_CMD_ADAPTER_RESET<<24|HOST_TID<<12|ADAPTER_TID; > > > @@ -2836,14 +2835,13 @@ static s32 adpt_i2o_init_outbound_q(adpt_hba* > > > pHba) > > > > > > msg=(u32 __iomem *)(pHba->msg_addr_virt+m); > > > > > > - status = dma_alloc_coherent(>pDev->dev, 4, , > > > GFP_KERNEL); > > > + status = dma_zalloc_coherent(>pDev->dev, 4, , > > > GFP_KERNEL); > > > if (!status) { > > > adpt_send_nop(pHba, m); > > > printk(KERN_WARNING"%s: IOP reset failed - no free > > > memory.\n", > > > pHba->name); > > > return -ENOMEM; > > > } > > > - memset(status, 0, 4); > > > > > > writel(EIGHT_WORD_MSG_SIZE| SGL_OFFSET_6, [0]); > > > writel(I2O_CMD_OUTBOUND_INIT<<24 | HOST_TID<<12 | ADAPTER_TID, > > > [1]); > > > @@ -2890,14 +2888,13 @@ static s32 adpt_i2o_init_outbound_q(adpt_hba* > > > pHba) > > > pHba->reply_pool, pHba->reply_pool_pa); > > > } > > > > > > - pHba->reply_pool = dma_alloc_coherent(>pDev->dev, > > > + pHba->reply_pool = dma_zalloc_coherent(>pDev->dev, > > > pHba->reply_fifo_size * REPLY_FRAME_SIZE > > > * 4, > > > >reply_pool_pa, GFP_KERNEL); > > > if (!pHba->reply_pool) { > > > printk(KERN_ERR "%s: Could not allocate reply pool\n", > > > pHba->name); > > > return -ENOMEM; > > > } > > > - memset(pHba->reply_pool, 0 , pHba->reply_fifo_size * > > > REPLY_FRAME_SIZE * 4); > > > > > > for(i = 0; i < pHba->reply_fifo_size; i++) { > > > writel(pHba->reply_pool_pa + (i * REPLY_FRAME_SIZE * 4), > > > @@ -3126,13 +3123,12 @@ static int adpt_i2o_build_sys_table(void) > > > sys_tbl_len = sizeof(struct i2o_sys_tbl) + // Header + IOPs > > > (hba_count) * sizeof(struct > > > i2o_sys_tbl_entry); > > > > > > - sys_tbl = dma_alloc_coherent(>pDev->dev, > > > + sys_tbl = dma_zalloc_coherent(>pDev->dev, > > > sys_tbl_len, _tbl_pa, GFP_KERNEL); > > > if (!sys_tbl) { > > > printk(KERN_WARNING "SysTab Set failed. Out of > > > memory.\n"); > > > return -ENOMEM; > > > } > > > - memset(sys_tbl, 0, sys_tbl_len); > > > > > > sys_tbl->num_entries = hba_count; > > > sys_tbl->version = I2OVERSION; > > > -- > > > 2.7.4 > > >
[PATCH 2/4] rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input
gain control when no beacon was received in the connected state. This avoids sporadic "Connection to AP ... lost" errors. Signed-off-by: Bernd Edlinger --- drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++ drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 2 files changed, 10 insertions(+) diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 4bf7967..ce23339 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 cur_igvalue) dm_digtable->bt30_cur_igi = 0x32; dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX; dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI; + dm_digtable->pre_cck_fa_state = 0; + dm_digtable->cur_cck_fa_state = 0; } EXPORT_SYMBOL(rtl_dm_diginit); diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c index 42a6fba..902b944 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c @@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct ieee80211_hw *hw) { struct rtl_priv *rtlpriv = rtl_priv(hw); struct dig_t *dm_digtable = >dm_digtable; + struct rtl_mac *mac = rtl_mac(rtlpriv); long rssi_val_min = 0; + if (mac->link_state == MAC80211_LINKED && + mac->opmode == NL80211_IFTYPE_STATION && + rtlpriv->link_info.bcn_rx_inperiod == 0) + return 0; + if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) && (dm_digtable->cursta_cstate == DIG_STA_CONNECT)) { if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0) @@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct ieee80211_hw *hw) } else { rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd); rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47); + dm_digtable->pre_cck_fa_state = 0; + dm_digtable->cur_cck_fa_state = 0; } dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state; -- 1.9.1
[PATCH 4/4] rtlwifi: Move the clearing of
rtlpriv->link_info.num_rx_inperiod in rtl_watchdog_wq_callback a few lines down. This is necessary since it is still used in the "AP off" detection code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod as well for consistency. Signed-off-by: Bernd Edlinger --- drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c b/drivers/net/wireless/realtek/rtlwifi/base.c index ef9b502..7aa68fe 100644 --- a/drivers/net/wireless/realtek/rtlwifi/base.c +++ b/drivers/net/wireless/realtek/rtlwifi/base.c @@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data) ; } - rtlpriv->link_info.num_rx_inperiod = 0; - rtlpriv->link_info.num_tx_inperiod = 0; for (tid = 0; tid <= 7; tid++) rtlpriv->link_info.tidtx_inperiod[tid] = 0; @@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data) rtlpriv->btcoexist.btc_info.in_4way = false; } + rtlpriv->link_info.num_rx_inperiod = 0; + rtlpriv->link_info.num_tx_inperiod = 0; rtlpriv->link_info.bcn_rx_inperiod = 0; /* <6> scan list */ -- 1.9.1
Re: [PATCH 2/2] virtio: document virtio_config_ops restrictions
On Thu, 3 Jan 2019 18:28:49 +0100 Halil Pasic wrote: > On Thu, 3 Jan 2019 17:08:04 +0100 > Cornelia Huck wrote: > > > Some transports (e.g. virtio-ccw) implement virtio operations that > > seem to be a simple read/write as something more involved that > > cannot be done from an atomic context. > > > > Give at least a hint about that. > > > > Signed-off-by: Cornelia Huck > > --- > > include/linux/virtio_config.h | 5 + > > 1 file changed, 5 insertions(+) > > > > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h > > index 7087ef946ba7..987b6491b946 100644 > > --- a/include/linux/virtio_config.h > > +++ b/include/linux/virtio_config.h > > @@ -12,6 +12,11 @@ struct irq_affinity; > > > > /** > > * virtio_config_ops - operations for configuring a virtio device > > + * Note: Do not assume that a transport implements all of the operations > > + * getting/setting a value as a simple read/write! Generally > > speaking, > > + * any of @get/@set, @get_status/@set_status, or @get_features/ > > + * @finalize_features are NOT safe to be called from an atomic > > + * context. > > I think the only exception is @bus_name (and maybe @generation, I don't > know) because it does not have to 'speak' with the hypervisor. If a > transport operation has to 'speak' with the hypervisor, we do it by > making it interpret a channel program. That means not safe to be called > form atomic context. Or am I missing something? I explicitly singled out the listed callbacks because they read/write a value and there might be more to them than meets the eye. I would assume that nobody expects calling e.g. find_vqs (allocating queues) from an atomic context to be a good idea :) Maybe I should do s/Generally speaking/In particular/ ? That said, it's only a hint; we should add might_sleep as well to interfaces where it makes sense.
[PATCH 0/4] rtlwifi: Fix issues with rtl8723ae
Currently the rtl8723ae driver is broken (since v4.7). Connection to AP is lost very often, especially when the signal level is not very good. The main issue is the power save mode is basically not working, and seems to trigger a firmware bug. So I had to remove that code. While debugging the driver I found a couple related issues, for instance that the signal level in dm.undec_sm_pwdb is no longer accurate (may be even much too high) when no more packets are received, and it increases the likelihood to receive something if the input gain is set to maximum. The patch was tested with the rtl8723ae PCI card in my laptop against a FRITZ!Box 7590 AP -- the WiFi connection works now very reliable for me. Bernd Edlinger (4): rtlwifi: rtl8723ae: Take the FW LPS mode handling out rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input gain control when no beacon was received in the connected state. rtlwifi: rtl8723ae: Re-introduce rtl8723e_dm_refresh_rate_adaptive_mask rtlwifi: Move the clearing of rtlpriv->link_info.num_rx_inperiod in rtl_watchdog_wq_callback a few lines down. drivers/net/wireless/realtek/rtlwifi/base.c| 4 +- drivers/net/wireless/realtek/rtlwifi/core.c| 2 + .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +- .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 - .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h| 1 - .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 + 6 files changed, 98 insertions(+), 80 deletions(-) -- 1.9.1
Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent
On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta wrote: > > On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta > wrote: > > > > On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta > > wrote: > > > > > > Replace dma_alloc_coherent + memset with dma_zalloc_coherent > > > > > > Signed-off-by: Sabyasachi Gupta > > > > Any comment on this patch? > > Any comment on this patch? Any comment on this patch? > > > > > > --- > > > drivers/scsi/mvsas/mv_init.c | 12 > > > 1 file changed, 4 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c > > > index 3ac3437..495bddb 100644 > > > --- a/drivers/scsi/mvsas/mv_init.c > > > +++ b/drivers/scsi/mvsas/mv_init.c > > > @@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct > > > Scsi_Host *shost) > > > /* > > > * alloc and init our DMA areas > > > */ > > > - mvi->tx = dma_alloc_coherent(mvi->dev, > > > + mvi->tx = dma_zalloc_coherent(mvi->dev, > > > sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ, > > > >tx_dma, GFP_KERNEL); > > > if (!mvi->tx) > > > goto err_out; > > > - memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ); > > > - mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ, > > > + mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ, > > > >rx_fis_dma, GFP_KERNEL); > > > if (!mvi->rx_fis) > > > goto err_out; > > > - memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ); > > > > > > - mvi->rx = dma_alloc_coherent(mvi->dev, > > > + mvi->rx = dma_zalloc_coherent(mvi->dev, > > > sizeof(*mvi->rx) * (MVS_RX_RING_SZ + > > > 1), > > > >rx_dma, GFP_KERNEL); > > > if (!mvi->rx) > > > goto err_out; > > > - memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1)); > > > mvi->rx[0] = cpu_to_le32(0xfff); > > > mvi->rx_cons = 0xfff; > > > > > > - mvi->slot = dma_alloc_coherent(mvi->dev, > > > + mvi->slot = dma_zalloc_coherent(mvi->dev, > > >sizeof(*mvi->slot) * slot_nr, > > >>slot_dma, GFP_KERNEL); > > > if (!mvi->slot) > > > goto err_out; > > > - memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr); > > > > > > mvi->bulk_buffer = dma_alloc_coherent(mvi->dev, > > >TRASH_BUCKET_SIZE, > > > -- > > > 2.7.4 > > >
[PATCH 3/4] rtlwifi: rtl8723ae: Re-introduce
rtl8723e_dm_refresh_rate_adaptive_mask This function was present in a previous version of the code base, it works just fine for me -- as long as it is not using stale data. Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask. Signed-off-by: Bernd Edlinger --- .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +- 1 file changed, 85 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c index 902b944..acfd54c 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c @@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw *hw) void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw) { struct rtl_priv *rtlpriv = rtl_priv(hw); - struct rate_adaptive *p_ra = &(rtlpriv->ra); + struct rate_adaptive *p_ra = >ra; p_ra->ratr_state = DM_RATR_STA_INIT; p_ra->pre_ratr_state = DM_RATR_STA_INIT; @@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw) } +void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw) +{ + struct rtl_priv *rtlpriv = rtl_priv(hw); + struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw)); + struct rtl_mac *mac = rtl_mac(rtl_priv(hw)); + struct rate_adaptive *p_ra = >ra; + u32 low_rssithresh_for_ra, high_rssithresh_for_ra; + struct ieee80211_sta *sta = NULL; + + if (is_hal_stop(rtlhal)) { + RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD, +" driver is going to unload\n"); + return; + } + + if (!rtlpriv->dm.useramask) { + RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD, +" driver does not control rate adaptive mask\n"); + return; + } + + if (mac->link_state == MAC80211_LINKED && + mac->opmode == NL80211_IFTYPE_STATION) { + switch (p_ra->pre_ratr_state) { + case DM_RATR_STA_HIGH: + high_rssithresh_for_ra = 50; + low_rssithresh_for_ra = 20; + break; + case DM_RATR_STA_MIDDLE: + high_rssithresh_for_ra = 55; + low_rssithresh_for_ra = 20; + break; + case DM_RATR_STA_LOW: + high_rssithresh_for_ra = 60; + low_rssithresh_for_ra = 25; + break; + default: + high_rssithresh_for_ra = 50; + low_rssithresh_for_ra = 20; + break; + } + + if (rtlpriv->link_info.bcn_rx_inperiod == 0) + switch (p_ra->pre_ratr_state) { + case DM_RATR_STA_HIGH: + default: + p_ra->ratr_state = DM_RATR_STA_MIDDLE; + break; + case DM_RATR_STA_MIDDLE: + case DM_RATR_STA_LOW: + p_ra->ratr_state = DM_RATR_STA_LOW; + break; + } + else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra) + p_ra->ratr_state = DM_RATR_STA_HIGH; + else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra) + p_ra->ratr_state = DM_RATR_STA_MIDDLE; + else + p_ra->ratr_state = DM_RATR_STA_LOW; + + if (p_ra->pre_ratr_state != p_ra->ratr_state) { + RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD, +"RSSI = %ld\n", +rtlpriv->dm.undec_sm_pwdb); + RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD, +"RSSI_LEVEL = %d\n", p_ra->ratr_state); + RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD, +"PreState = %d, CurState = %d\n", +p_ra->pre_ratr_state, p_ra->ratr_state); + + rcu_read_lock(); + sta = rtl_find_sta(hw, mac->bssid); + if (sta) + rtlpriv->cfg->ops->update_rate_tbl(hw, sta, + p_ra->ratr_state, + true); + rcu_read_unlock(); + + p_ra->pre_ratr_state = p_ra->ratr_state; + } + } +} + void rtl8723e_dm_rf_saving(struct ieee80211_hw *hw, u8 bforce_in_normal) { struct rtl_priv *rtlpriv = rtl_priv(hw); @@ -834,7 +917,7 @@ void rtl8723e_dm_watchdog(struct ieee80211_hw *hw)
Re: [PATCH RFC lora-next 2/4] net: lora: sx1301: add minimal to get AGC working prior to tx work
On 29/12/2018 09:58, Andreas Färber wrote: Hi Ben, Am 19.12.18 um 16:56 schrieb Ben Whitten: As part of initialisation when opening the lora device after loading the AGC firmware we need to satisfy its startup procedure which involves a few steps; Loading a 16 entry lookup table. For this I have hard coded the laird ETSI certified table for use on the RG186-M2 (EU) cards, this will need investigation on how other devices load calibration data. Isn't calibration performed before this firmware is initialized? I left out reading the values back from firmware previously, but that should get implemented. In the userspace implementation, do you get these from a config file or did you modify the reference code to hardcode them? The calibration you refer to is the IQ offset calibration, as far as I can tell this is a separate thing from the power table the chip uses. In the user space implementation these values are placed in the config file. If these are different calibration values from the ones returned by firmware, then a DT property would be an easy way to get hardware-specific data into the driver. However, same as with your clk config, that makes us dependent on DT, which we shouldn't be for ACPI and USB. If we consider it configuration data rather than an immutable fact, then we would need a netlink command to set them. Perhaps we can provide both mechanisms, a DT power table and a mechanism to set via netlink prior to opening the interface. As these power settings are hardware dependent and certified for our card by FCC and ETSI I would prefer to include in DT. In any case, we have some other vendors on this list, so hopefully someone can comment. :) Selecting the correct channel to transmit on. Currently always 0 for the reference design. Similarly: DT or netlink depending on whether fixed, and fall back to 0 as default. Agreed, so on the DT would it be appropriate to have a handle in the sx1301 node with a link to the radio which can transmit, eg. tx = < 0>; Allowing for both to be transmitters if that if a hardware choice. Then ending the AGC init procedure and seeing that it has come up. Signed-off-by: Ben Whitten --- drivers/net/lora/sx1301.c | 254 +- drivers/net/lora/sx1301.h | 16 +++ 2 files changed, 268 insertions(+), 2 deletions(-) Many thanks for working on this! Some nits inline. diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c index e75df93b96d8..0c7b6d0b31af 100644 --- a/drivers/net/lora/sx1301.c +++ b/drivers/net/lora/sx1301.c @@ -24,6 +24,121 @@ #include "sx1301.h" +static struct sx1301_tx_gain_lut tx_gain_lut[] = { + { + .dig_gain = 0, + .pa_gain = 0, + .dac_gain = 3, + .mix_gain = 8, + .rf_power = -3, + }, + { + .dig_gain = 0, + .pa_gain = 0, + .dac_gain = 3, + .mix_gain = 9, + .rf_power = 0, + }, + { + .dig_gain = 0, + .pa_gain = 0, + .dac_gain = 3, + .mix_gain = 12, + .rf_power = 3, + }, + { + .dig_gain = 0, + .pa_gain = 0, + .dac_gain = 3, + .mix_gain = 13, + .rf_power = 4, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 8, + .rf_power = 6, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 9, + .rf_power = 9, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 10, + .rf_power = 10, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 11, + .rf_power = 12, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 12, + .rf_power = 13, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 13, + .rf_power = 14, + }, + { + .dig_gain = 0, + .pa_gain = 1, + .dac_gain = 3, + .mix_gain = 15, + .rf_power = 16, + }, + { + .dig_gain = 0, + .pa_gain = 2, + .dac_gain = 3, + .mix_gain = 10, + .rf_power = 19, + }, + { + .dig_gain = 0, + .pa_gain = 2, + .dac_gain = 3, + .mix_gain = 11, + .rf_power = 21, + }, + { +
CFS scheduler: spin_lock usage causes dead lock when smp_apic_timer_interrupt occurs
Dear Ingo and Peter, I would like to report a possible bug in the CFS scheduler causing a dead lock. We suspect this bug to have caused intermittent yet highly-persistent system freezes on our quad-core SMP systems. We noticed the problem on 4.1.17 preempt-rt but we suspect the problematic code is not linked to the preempt-rt patch and is also present in the latest 4.20 kernel. The problem concerns the use of spin_lock to lock cfs_b in a situation where the spin lock is used in an interrupt handler: - __run_hrtimer (in kernel/time/hrtimer.c) calls fn(timer) with IRQ's enabled. This can call sched_cfs_period_timer() (in kernel/sched/fair.c) which locks cfs_b. - the hard IRQ smp_apic_timer_interrupt can then occur. It can call ttwu_queue() which grabs the spin lock for its CPU run queue and can then try to enqueue a task via the CFS scheduler. - this can call check_enqueue_throttle() which can call assign_cfs_rq_runtime() which tries to obtain the cfs_b lock. It is now blocked. The cfs_b lock uses spin_lock and so was not intended for use inside a hard irq but the CFS scheduler does just that when it uses a hrtimer_interrupt to wake up and enqueue work. Our initial impression is that the cfs_b needs to be locked using spin_lock_irqsave. My colleague Mike Pearce has submitted a bug report on Bugzilla 3 weeks ago: https://bugzilla.kernel.org/show_bug.cgi?id=201993 We would appreciate any feedback. Kind regards, Tom