Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers
On 11/03/2021 13:00, Jason Gunthorpe wrote: On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote: btw can the id list have only vendor ids and not have device ids? The PCI matcher is quite flexable, see the other patch from Max for the igd ah cool, do this for NVIDIA GPUs then please, I just discovered another P9 system sold with NVIDIA T4s which is not in your list. I think it will make things easier down the road if you maintain an exact list Then why do not you do the exact list for Intel IGD? The commit log does not explain this detail. But best practice is to be as narrow as possible as I hope this will eventually impact module autoloading and other details. The amount of device specific knowledge is too little to tie it up to device ids, it is a generic PCI driver with quirks. We do not have a separate drivers for the hardware which requires quirks. It provides its own capability structure exposed to userspace, that is absolutely not a "quirk" And how do you hope this should impact autoloading? I would like to autoload the most specific vfio driver for the target hardware. Is there an idea how it is going to work? For example, the Intel IGD driver and vfio-pci-igd - how should the system pick one? If there is no MODULE_DEVICE_TABLE in vfio-pci-xxx, is the user supposed to try binding all vfio-pci-xxx drivers until some binds? If you someday need to support new GPU HW that needs a different VFIO driver then you are really stuck because things become indeterminate if there are two devices claiming the ID. We don't have the concept of "best match", driver core works on exact match. -- Alexey
Re: [PATCH v5 04/10] scsi: ufshpb: Make eviction depends on region's reads
Hi Avri, On 2021-03-02 21:24, Avri Altman wrote: In host mode, eviction is considered an extreme measure. verify that the entering region has enough reads, and the exiting region has much less reads. Signed-off-by: Avri Altman --- drivers/scsi/ufs/ufshpb.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c index a8f8d13af21a..6f4fd22eaf2f 100644 --- a/drivers/scsi/ufs/ufshpb.c +++ b/drivers/scsi/ufs/ufshpb.c @@ -17,6 +17,7 @@ #include "../sd.h" #define ACTIVATION_THRESHOLD 4 /* 4 IOs */ +#define EVICTION_THRESHOLD (ACTIVATION_THRESHOLD << 6) /* 256 IOs */ Same here, can this be added as a sysfs entry? Thanks, Can Guo. /* memory management */ static struct kmem_cache *ufshpb_mctx_cache; @@ -1050,6 +1051,13 @@ static struct ufshpb_region *ufshpb_victim_lru_info(struct ufshpb_lu *hpb) if (ufshpb_check_srgns_issue_state(hpb, rgn)) continue; + /* +* in host control mode, verify that the exiting region +* has less reads +*/ + if (hpb->is_hcm && rgn->reads > (EVICTION_THRESHOLD >> 1)) + continue; + victim_rgn = rgn; break; } @@ -1235,7 +1243,7 @@ static int ufshpb_issue_map_req(struct ufshpb_lu *hpb, static int ufshpb_add_region(struct ufshpb_lu *hpb, struct ufshpb_region *rgn) { - struct ufshpb_region *victim_rgn; + struct ufshpb_region *victim_rgn = NULL; struct victim_select_info *lru_info = >lru_info; unsigned long flags; int ret = 0; @@ -1263,6 +1271,16 @@ static int ufshpb_add_region(struct ufshpb_lu *hpb, struct ufshpb_region *rgn) * because the device could detect this region * by not issuing HPB_READ */ + + /* +* in host control mode, verify that the entering +* region has enough reads +*/ + if (hpb->is_hcm && rgn->reads < EVICTION_THRESHOLD) { + ret = -EACCES; + goto out; + } + victim_rgn = ufshpb_victim_lru_info(hpb); if (!victim_rgn) { dev_warn(>sdev_ufs_lu->sdev_dev,
Re: [PATCH v4 1/3] bus: mhi: core: Introduce internal register poll helper function
Hi Bhaumik, On Thu, 11 Mar 2021 at 00:31, Bhaumik Bhatt wrote: > > Introduce helper function to allow MHI core driver to poll for > a value in a register field. This helps reach a common path to > read and poll register values along with a retry time interval. > > Signed-off-by: Bhaumik Bhatt > --- > drivers/bus/mhi/core/internal.h | 3 +++ > drivers/bus/mhi/core/main.c | 23 +++ > 2 files changed, 26 insertions(+) > > diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h > index 6f80ec3..005286b 100644 > --- a/drivers/bus/mhi/core/internal.h > +++ b/drivers/bus/mhi/core/internal.h > @@ -643,6 +643,9 @@ int __must_check mhi_read_reg(struct mhi_controller > *mhi_cntrl, > int __must_check mhi_read_reg_field(struct mhi_controller *mhi_cntrl, > void __iomem *base, u32 offset, u32 mask, > u32 shift, u32 *out); > +int __must_check mhi_poll_reg_field(struct mhi_controller *mhi_cntrl, > + void __iomem *base, u32 offset, u32 mask, > + u32 shift, u32 val, u32 delayus); > void mhi_write_reg(struct mhi_controller *mhi_cntrl, void __iomem *base, >u32 offset, u32 val); > void mhi_write_reg_field(struct mhi_controller *mhi_cntrl, void __iomem > *base, > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c > index 4e0131b..7c7f41a 100644 > --- a/drivers/bus/mhi/core/main.c > +++ b/drivers/bus/mhi/core/main.c > @@ -4,6 +4,7 @@ > * > */ > > +#include > #include > #include > #include > @@ -37,6 +38,28 @@ int __must_check mhi_read_reg_field(struct mhi_controller > *mhi_cntrl, > return 0; > } > > +int __must_check mhi_poll_reg_field(struct mhi_controller *mhi_cntrl, > + void __iomem *base, u32 offset, > + u32 mask, u32 shift, u32 val, u32 delayus) > +{ > + int ret; > + u32 out, retry = (mhi_cntrl->timeout_ms * 1000) / delayus; > + > + while (retry--) { > + ret = mhi_read_reg_field(mhi_cntrl, base, offset, mask, shift, > +); > + if (ret) > + return ret; > + > + if (out == val) > + return 0; > + > + udelay(delayus); Have you read my previous comment? Do you really want to risk hogging the CPU for several seconds? we know that some devices take several seconds to start/boot. Why not using msleep variant here? Regards, Loic
Re: [PATCH v5 03/10] scsi: ufshpb: Add region's reads counter
Hi Avri, On 2021-03-02 21:24, Avri Altman wrote: In host control mode, reads are the major source of activation trials. Keep track of those reads counters, for both active as well inactive regions. We reset the read counter upon write - we are only interested in "clean" reads. less intuitive however, is that we also reset it upon region's deactivation. Region deactivation is often due to the fact that eviction took place: a region become active on the expense of another. This is happening when the max-active-regions limit has crossed. If we don’t reset the counter, we will trigger a lot of trashing of the HPB database, since few reads (or even one) to the region that was deactivated, will trigger a re-activation trial. Keep those counters normalized, as we are using those reads as a comparative score, to make various decisions. If during consecutive normalizations an active region has exhaust its reads - inactivate it. Signed-off-by: Avri Altman --- drivers/scsi/ufs/ufshpb.c | 102 -- drivers/scsi/ufs/ufshpb.h | 5 ++ 2 files changed, 92 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c index 044fec9854a0..a8f8d13af21a 100644 --- a/drivers/scsi/ufs/ufshpb.c +++ b/drivers/scsi/ufs/ufshpb.c @@ -16,6 +16,8 @@ #include "ufshpb.h" #include "../sd.h" +#define ACTIVATION_THRESHOLD 4 /* 4 IOs */ Can this param be added as a sysfs entry? Thanks, Can Guo + /* memory management */ static struct kmem_cache *ufshpb_mctx_cache; static mempool_t *ufshpb_mctx_pool; @@ -554,6 +556,21 @@ static int ufshpb_issue_pre_req(struct ufshpb_lu *hpb, struct scsi_cmnd *cmd, return ret; } +static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int rgn_idx, + int srgn_idx) +{ + struct ufshpb_region *rgn; + struct ufshpb_subregion *srgn; + + rgn = hpb->rgn_tbl + rgn_idx; + srgn = rgn->srgn_tbl + srgn_idx; + + list_del_init(>list_inact_rgn); + + if (list_empty(>list_act_srgn)) + list_add_tail(>list_act_srgn, >lh_act_srgn); +} + /* * This function will set up HPB read command using host-side L2P map data. */ @@ -600,12 +617,44 @@ int ufshpb_prep(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) ufshpb_set_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset, transfer_len); spin_unlock_irqrestore(>rgn_state_lock, flags); + + if (hpb->is_hcm) { + spin_lock_irqsave(>rgn_lock, flags); + rgn->reads = 0; + spin_unlock_irqrestore(>rgn_lock, flags); + } + return 0; } if (!ufshpb_is_support_chunk(hpb, transfer_len)) return 0; + if (hpb->is_hcm) { + bool activate = false; + /* +* in host control mode, reads are the main source for +* activation trials. +*/ + spin_lock_irqsave(>rgn_lock, flags); + rgn->reads++; + if (rgn->reads == ACTIVATION_THRESHOLD) + activate = true; + spin_unlock_irqrestore(>rgn_lock, flags); + if (activate) { + spin_lock_irqsave(>rsp_list_lock, flags); + ufshpb_update_active_info(hpb, rgn_idx, srgn_idx); + hpb->stats.rb_active_cnt++; + spin_unlock_irqrestore(>rsp_list_lock, flags); + dev_dbg(>sdev_ufs_lu->sdev_dev, + "activate region %d-%d\n", rgn_idx, srgn_idx); + } + + /* keep those counters normalized */ + if (rgn->reads > hpb->entries_per_srgn) + schedule_work(>ufshpb_normalization_work); + } + spin_lock_irqsave(>rgn_state_lock, flags); if (ufshpb_test_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset, transfer_len)) { @@ -745,21 +794,6 @@ static int ufshpb_clear_dirty_bitmap(struct ufshpb_lu *hpb, return 0; } -static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int rgn_idx, - int srgn_idx) -{ - struct ufshpb_region *rgn; - struct ufshpb_subregion *srgn; - - rgn = hpb->rgn_tbl + rgn_idx; - srgn = rgn->srgn_tbl + srgn_idx; - - list_del_init(>list_inact_rgn); - - if (list_empty(>list_act_srgn)) - list_add_tail(>list_act_srgn, >lh_act_srgn); -} - static void ufshpb_update_inactive_info(struct ufshpb_lu *hpb, int rgn_idx) { struct ufshpb_region *rgn; @@ -1079,6 +1113,14 @@ static void __ufshpb_evict_region(struct ufshpb_lu *hpb, ufshpb_cleanup_lru_info(lru_info, rgn); + if (hpb->is_hcm) { + unsigned long flags; + +
Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached
Hi Am 10.03.21 um 20:02 schrieb Paul Cercueil: Hi Thomas, Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann a écrit : Hi Paul, having individual functions for each mode only makes sense if the decision is at compile time. But in patch 5, you're working around your earlier design by introducing in-driver helpers that select the correct CMA function. In SHMEM helpers we have the flag map_wc in the GEM structure that selects the pages caching mode (wc vs uncached). I think CMA should Re-reading this, it should rather be WC and cached. use this design as well. Have a map_noncoherent flag in the CMA GEM object and set it from the driver's implementation of gem_create_object. Is that a new addition? That severely reduces the patchset size, which is perfect. It was added a few months ago, around the time you send the first version of the patches at hand. Originally, SHMEM uses write combining by default. And several drivers used default mapping flags instead (so usually cached). IIRC I streamlined everything and flipped the default. Most drivers can use cached mappings and only some require WC. Recently there was also a patchset that added support for cached video RAM (or something like that). So at some point we could store these flags in drm_gem_object. For now, I'd just put a flag into drm_gem_cma_object. I'll send a V3 then. Great! Best regards Thomas Cheers, -Paul And in the long run, we could try to consolidate all drivers/helpers mapping flags in struct drm_gem_object. Best regards Thomas Am 07.03.21 um 21:28 schrieb Paul Cercueil: Rework of my previous patchset which added support for GEM buffers backed by non-coherent memory to the ingenic-drm driver. Having GEM buffers backed by non-coherent memory is interesting in the particular case where it is faster to render to a non-coherent buffer then sync the data cache, than to render to a write-combine buffer, and (by extension) much faster than using a shadow buffer. This is true for instance on some Ingenic SoCs, where even simple blits (e.g. memcpy) are about three times faster using this method. For the record, the previous patchset was accepted for 5.10 then had to be reverted, as it conflicted with some changes made to the DMA API. This new patchset is pretty different as it adds the functionality to the DRM core. The first three patches add variants to existing functions but with the "non-coherent memory" twist, exported as GPL symbols. The fourth patch adds a function to be used with the damage helpers. Finally, the last patch adds support for non-coherent GEM buffers to the ingenic-drm driver. The functionality is enabled through a module parameter, and is disabled by default. Cheers, -Paul Paul Cercueil (5): drm: Add and export function drm_gem_cma_create_noncoherent drm: Add and export function drm_gem_cma_dumb_create_noncoherent drm: Add and export function drm_gem_cma_mmap_noncoherent drm: Add and export function drm_gem_cma_sync_data drm/ingenic: Add option to alloc cached GEM buffers drivers/gpu/drm/drm_gem_cma_helper.c | 223 +++--- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 - drivers/gpu/drm/ingenic/ingenic-drm.h | 4 + drivers/gpu/drm/ingenic/ingenic-ipu.c | 14 +- include/drm/drm_gem_cma_helper.h | 13 ++ 5 files changed, 273 insertions(+), 30 deletions(-) -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer OpenPGP_signature Description: OpenPGP digital signature
Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes
On 3/8/21 2:25 PM, Mike Rapoport wrote: > Hi Anshuman, > > On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote: >> Platforms like arm and arm64 have redefined pfn_valid() because their early >> memory sections might have contained memmap holes caused by memblock areas >> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn >> for struct page backing. This scenario could be captured with a new option >> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be >> improved to accommodate such platforms. This reduces overall code footprint >> and also improves maintainability. > > I wonder whether arm64 would still need to free parts of its memmap after free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled. I am not sure whether there still might be some platforms or boards which would benefit from this. Hence lets just keep this unchanged for now. > the section size was reduced. Maybe the pain of arm64::pfn_valid() is not > worth the memory savings anymore? arm64 pfn_valid() special case was primarily because of MEMBLOCK_NOMAP tagged memory areas, which are reserved by the firmware. > >> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm") >> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in >> turn had expanded its scope to new platforms like arc and m68k. Rather lets >> restrict back the scope for free_unused_memmap() to arm and arm64 platforms >> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP. > > The whole point of 4f5b0c178996 was to let arc and m68k to free unused > memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So > whatever implementation there will be for arm/arm64, please keep arc and > m68k functionally intact. Okay. Will protect free_unused_memmap() on HAVE_EARLY_SECTION_MEMMAP_HOLES config as well. diff --git a/mm/memblock.c b/mm/memblock.c index d9fa2e62ab7a..11b624e94127 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1927,8 +1927,11 @@ static void __init free_unused_memmap(void) unsigned long start, end, prev_end = 0; int i; - if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) || - IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) + return; + + if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) && + !IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID)) return;
[question] Panic in dax_writeback_one
Hi I write a driver to simulate memory as a block device (like a ramdisk). and I hope the memory used would not be observed by kernel, becasue of the struct page will take many memory. When I mount ext2 with dax on my ramdisk. Panic will happen when fsync. Call trace: dax_writeback_one+0x330/0x3e4 dax_writeback_mapping_range+0x15c/0x318 ext2_dax_writepages+0x38/0x44 static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, struct address_space *mapping, void *entry) dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE); The pfn is returned by the driver. In my case, the pfn does not have struct page. so pfn_to_page(pfn) return a wrong address. I noticed the following changes may related to my problem: 1. Before 3fe0791c295cfd3cd735de7a32cc0780949c009f (dax: store pfns in the radix), the radix tree stroes sectors. address which would be flushed is got from driver by passing sector. And the commit assume that all pfn have struct page. 2. CONFIG_FS_DAX_LIMITED (Selected by DCSSBLK [=n] && BLK_DEV [=y] && S390 && BLOCK [=y]) is added to avoid access struct page of pfn. Does anyone have any idea about my problem. -- Regards Chen Jun
Re: [PATCH] cxl: don't manipulate the mm.mm_users field directly
On 10/03/2021 18:44, Laurent Dufour wrote: It is better to rely on the API provided by the MM layer instead of directly manipulating the mm_users field. Signed-off-by: Laurent Dufour --- Thanks! Acked-by: Frederic Barrat drivers/misc/cxl/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c index 01153b74334a..60c829113299 100644 --- a/drivers/misc/cxl/fault.c +++ b/drivers/misc/cxl/fault.c @@ -200,7 +200,7 @@ static struct mm_struct *get_mem_context(struct cxl_context *ctx) if (ctx->mm == NULL) return NULL; - if (!atomic_inc_not_zero(>mm->mm_users)) + if (!mmget_not_zero(ctx->mm)) return NULL; return ctx->mm;
Re: [PATCH 4.19 00/39] 4.19.180-rc1 review
On Wed, 10 Mar 2021 at 18:56, wrote: > > From: Greg Kroah-Hartman > > This is the start of the stable review cycle for the 4.19.180 release. > There are 39 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri, 12 Mar 2021 13:23:09 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.180-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.19.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Tested-by: Linux Kernel Functional Testing Summary kernel: 4.19.180-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: fffeea4063954b09866c112dad21a991ba52a914 git describe: v4.19.179-40-gfffeea406395 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19.179-40-gfffeea406395 No regressions (compared to build v4.19.179) No fixes (compared to build v4.19.179) Ran 50816 total tests in the following environments and test suites. Environments -- - arm - arm64 - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - juno-r2-compat - juno-r2-kasan - mips - nxp-ls2088 - nxp-ls2088-64k_page_size - qemu-arm64-clang - qemu-arm64-kasan - qemu-x86_64-clang - qemu-x86_64-kasan - qemu_arm - qemu_arm64 - qemu_arm64-compat - qemu_i386 - qemu_x86_64 - qemu_x86_64-compat - s390 - sparc - x15 - arm - x86_64 - x86-kasan - x86_64 Test Suites --- * build * linux-log-parser * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-kvm * kselftest-lkdtm * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * ltp-containers-tests * ltp-controllers-tests * ltp-cve-tests * ltp-dio-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * v4l2-compliance * fwts * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-math-tests * network-basic-tests * perf * kselftest-kexec * kselftest-vm * kselftest-x86 * ltp-open-posix-tests * kvm-unit-tests * rcutorture * ssuite * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- -- Linaro LKFT https://lkft.linaro.org
RE: [PATCH 2/2] iio:magnetometer: Support for ST magnetic sensors
Hi Andy, Thanks a lot for your input, I just extend the existing driver in patch V2 for the sersor. Best Regards ** Li Qingwu (Terry) Senior Embedded Software Engineer Leica Geosystems(Shanghai)Co.,Limited (Tel): +86 21 61061036 (FAX): +86 21 61061008 (Mobile): +86 187 0185 9600 E-mail: qing-wu...@leica-geosystems.com.cn Http: www.leica-geosystems.com.cn ** -Original Message- From: Andy Shevchenko Sent: Tuesday, March 9, 2021 7:01 PM To: Jonathan Cameron Cc: LI Qingwu ; Lars-Peter Clausen ; Peter Meerwald ; Rob Herring ; Denis Ciocca ; linux-iio ; devicetree ; Linux Kernel Mailing List ; TERTYCHNYI Grygorii ; ZHIZHIKIN Andrey ; Lorenzo Bianconi Subject: Re: [PATCH 2/2] iio:magnetometer: Support for ST magnetic sensors This email is not from Hexagon’s Office 365 instance. Please be careful while clicking links, opening attachments, or replying to this email. On Sat, Mar 6, 2021 at 6:44 PM Jonathan Cameron wrote: > On Fri, 5 Mar 2021 07:05:36 + > LI Qingwu wrote: > > > Add support for STMicroelectronics digital magnetic sensors, > > LSM303AH,LSM303AGR,LIS2MDL,ISM303DAC,IIS2MDC. > > > > The patch tested with IIS2MDC instrument. > > > > Signed-off-by: LI Qingwu > > Hi, > > Given that at least two parts in here is supported by the existing > driver in iio/magnetometers/st_magn_*.c (lsm303agr) can you confirm > that it doesn't make sense to simply extend that driver to support the > other parts? This is particularly true when the WHO AM I register > reads 0x40 for all these parts. > > I've done a fairly superficial review whilst here, but please check > you can't just add the relevant entries to the existing driver. I even hadn't looked into the code because this one needs a very good justification why it's a new driver rather than extension of the existing one. -- With Best Regards, Andy Shevchenko
Re: Errant readings on LM81 with T2080 SoC
On 3/10/21 1:48 PM, Chris Packham wrote: > > On 10/03/21 6:06 pm, Guenter Roeck wrote: >> On 3/9/21 6:19 PM, Chris Packham wrote: >>> On 9/03/21 9:27 am, Chris Packham wrote: On 8/03/21 5:59 pm, Guenter Roeck wrote: > Other than that, the only other real idea I have would be to monitor > the i2c bus. I am in the fortunate position of being able to go into the office and even happen to have the expensive scope at the moment. Now I just need to find a tame HW engineer so I don't burn myself trying to attach the probes. >>> One thing I see on the scope is that when there is a CPU load there >>> appears to be some clock stretching going on (SCL is held low some >>> times). I don't see it without the CPU load. It's hard to correlate a >>> clock stretching event with a bad read or error but it is one area where >>> the SMBUS spec has a maximum that might cause the device to give up waiting. >>> >> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without >> that it is possible that the hot loops at the beginning and end of >> each operation mess up the driver and cause it to sleep longer >> than intended. Did you try usleep_range() ? > > I've been running with and without CONFIG_PREEMPT. The failures happen > with both. > > I did try usleep_range() and still saw failures. > Bummer. What is really weird is that you see clock stretching under CPU load. Normally clock stretching is triggered by the device, not by the host. I wonder if there are some timing differences before the clock stretching happens. Anyway, I just sent a set of three patches to the list; maybe you can give it a try. The patches convert the driver to the with_info API and drop local caching. The code is module tested with the register dumps I have available for adm9240 and lm81, but it would be great to get test coverage on real hardware. I don't really expect it to solve your problem, but it does reduce and modify the load on the chip (because registers are no longer read in bursts), so it may have some positive impact. >> On a side note, can you send me a register dump for the lm81 ? >> It would be useful for my module test code. > > Here you go this is from a largely unconfigured LM81 > Thanks, that helped a lot! Guenter
Re: [PATCH RESEND][next] rtl8xxxu: Fix fall-through warnings for Clang
On 3/11/21 01:00, Kalle Valo wrote: > Kees Cook writes: > >> On Fri, Mar 05, 2021 at 03:40:33PM +0200, Kalle Valo wrote: >>> "Gustavo A. R. Silva" writes: >>> In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by replacing /* fall through */ comments with the new pseudo-keyword macro fallthrough; instead of letting the code fall through to the next case. Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva >>> >>> It's not cool that you ignore the comments you got in [1], then after a >>> while mark the patch as "RESEND" and not even include a changelog why it >>> was resent. >>> >>> [1] >>> https://patchwork.kernel.org/project/linux-wireless/patch/d522f387b2d0dde774785c7169c1f25aa529989d.1605896060.git.gustavo...@kernel.org/ >> >> Hm, this conversation looks like a miscommunication, mainly? I see >> Gustavo, as requested by many others[1], replacing the fallthrough >> comments with the "fallthrough" statement. (This is more than just a >> "Clang doesn't parse comments" issue.) > > v1 was clearly rejected by Jes, so sending a new version without any > changelog or comments is not the way to go. The changelog shoud at least > have had "v1 was rejected but I'm resending this again because reason here>" or something like that to make it clear what's happening. Why the fact that I replied to that original thread with the message below is being ignored? "Just notice that the idea behind this and the following patch is exactly the same: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=3f95e92c8a8516b745594049dfccc8c5f8895eea I could resend this same patch with a different changelog text, but I don't think such a thing is necessary. However, if people prefer that approach, just let me know and I can do it. Thanks -- Gustavo" Why no one replied to what I was proposing at the time? It seems to me that the person that was ignored was actually me, and not the other way around. :/ -- Gustavo > >> This could be a tree-wide patch and not bother you, but Greg KH has >> generally advised us to send these changes broken out. Anyway, this >> change still needs to land, so what would be the preferred path? I think >> Gustavo could just carry it for Linus to merge without bothering you if >> that'd be preferred? > > I agree with Greg. Please don't do cleanups like this via another tree > as that just creates more work due to conflicts between the trees, which > is a lot more annoying to deal with than applying few patches. But when > submitting patches please follow the rules, don't cut corners. > > Jes, I don't like 'fallthrough' either and prefer the original comment, > but the ship has sailed on this one. Maybe we should just take it? >
module refcount issues in the liquidio driver
Hi all, I just stumbled over the odd handling of module refcounts in the liquidio driver. The big red flag is the call to module_refcount in liquidio_watchdog, which will do the wrong thing for any external module refcount, like a userspace open. But more importantly the whole concept of acquiring module refcounts from inside the driver is pretty bogus. What problem does this try to solve?
RE: [PATCH v3 1/3] scsi: ufshcd: use a function to calculate versions
> Hi Avri, > > On 10/03/2021 4:34 pm, Avri Altman wrote: > >> @@ -9298,10 +9291,7 @@ int ufshcd_init(struct ufs_hba *hba, void > __iomem > >> *mmio_base, unsigned int irq) > >> /* Get UFS version supported by the controller */ > >> hba->ufs_version = ufshcd_get_ufs_version(hba); > >> > >> - if ((hba->ufs_version != UFSHCI_VERSION_10) && > >> - (hba->ufs_version != UFSHCI_VERSION_11) && > >> - (hba->ufs_version != UFSHCI_VERSION_20) && > >> - (hba->ufs_version != UFSHCI_VERSION_21)) > >> + if (hba->ufs_version < ufshci_version(1, 0)) > >> dev_err(hba->dev, "invalid UFS version 0x%x\n", > >> hba->ufs_version); > > Here you replaces the specific allowable values, with an expression > > That doesn't really reflects those values. > > I took this approach based on feedback from previous patches: > > https://lore.kernel.org/linux- > scsi/d1b23943b6b3ae6c1f6af041cc592...@codeaurora.org/ > > https://lkml.org/lkml/2020/4/25/159 > > > Patch 3 of this series removes this check entirely, as it is neither > accurate or useful. I noticed that. > > The driver does not fail when printing this error, nor is the list of > "valid" UFS versions here kept up to date, I struggle to see a situation > in which that error message would actually be helpful. Responses to > previous patches (above) that added UFS 3.0 to the list have all > suggested that removing this check is a more sensible approach. OK. Thanks, Avri
Re: [PATCH 5.4 00/24] 5.4.105-rc1 review
On Wed, 10 Mar 2021 at 18:55, wrote: > > From: Greg Kroah-Hartman > > This is the start of the stable review cycle for the 5.4.105 release. > There are 24 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri, 12 Mar 2021 13:23:09 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.105-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.4.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Tested-by: Linux Kernel Functional Testing Summary kernel: 5.4.105-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.4.y git commit: 62f8f08c9d2fbc6c5692d90f64dd70e3a8edd986 git describe: v5.4.104-25-g62f8f08c9d2f Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.104-25-g62f8f08c9d2f No regressions (compared to build v5.4.104) No fixes (compared to build v5.4.104) Ran 53992 total tests in the following environments and test suites. Environments -- - arc - arm - arm64 - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - juno-r2-compat - juno-r2-kasan - mips - nxp-ls2088 - nxp-ls2088-64k_page_size - parisc - powerpc - qemu-arm-clang - qemu-arm64-clang - qemu-arm64-kasan - qemu-x86_64-clang - qemu-x86_64-kasan - qemu-x86_64-kcsan - qemu_arm - qemu_arm64 - qemu_arm64-compat - qemu_i386 - qemu_x86_64 - qemu_x86_64-compat - riscv - s390 - sh - sparc - x15 - x86 - x86-kasan - x86_64 Test Suites --- * build * linux-log-parser * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest- * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-lkdtm * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fs-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * perf * fwts * kselftest-lib * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * libhugetlbfs * ltp-commands-tests * ltp-controllers-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-ipc-tests * ltp-math-tests * network-basic-tests * v4l2-compliance * kselftest-kexec * kselftest-vm * kselftest-x86 * ltp-open-posix-tests * kvm-unit-tests * rcutorture * kselftest-vsyscall-mode-native- * ssuite -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH V3 1/6] vDPA/ifcvf: get_vendor_id returns a device specific vendor id
On 3/11/2021 2:13 PM, Jason Wang wrote: On 2021/3/11 12:21 下午, Zhu Lingshan wrote: On 3/11/2021 11:23 AM, Jason Wang wrote: On 2021/3/10 5:00 下午, Zhu Lingshan wrote: In this commit, ifcvf_get_vendor_id() will return a device specific vendor id of the probed pci device than a hard code. Signed-off-by: Zhu Lingshan --- drivers/vdpa/ifcvf/ifcvf_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index fa1af301cf55..e501ee07de17 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -324,7 +324,10 @@ static u32 ifcvf_vdpa_get_device_id(struct vdpa_device *vdpa_dev) static u32 ifcvf_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev) { - return IFCVF_SUBSYS_VENDOR_ID; + struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); + struct pci_dev *pdev = adapter->pdev; + + return pdev->subsystem_vendor; } While at this, I wonder if we can do something similar in get_device_id() if it could be simple deduced from some simple math from the pci device id? Thanks Hi Jason, IMHO, this implementation is just some memory read ops, I think other implementations may not save many cpu cycles, an if cost at least three cpu cycles. Thanks! Well, I meant whehter you can deduce virtio device id from pdev->subsystem_device. Thanks Oh, sure, I get you static u32 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)
Re: [External] Re: [PATCH v18 9/9] mm: hugetlb: optimize the code with the help of the compiler
On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko wrote: > > On Mon 08-03-21 18:28:07, Muchun Song wrote: > > When the "struct page size" crosses page boundaries we cannot > > make use of this feature. Let free_vmemmap_pages_per_hpage() > > return zero if that is the case, most of the functions can be > > optimized away. > > I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param > already? Right. > Why do we need any runtime checks? If the size of the struct page is not power of 2, compiler can think is_hugetlb_free_vmemmap_enabled() always return false. So the code snippet of this user can be optimized away. E.g. if (is_hugetlb_free_vmemmap_enabled()) /* do something */ The compiler can drop "/* do something */" directly, because it knows is_hugetlb_free_vmemmap_enabled() always returns false. Thanks. > > > Signed-off-by: Muchun Song > > Reviewed-by: Miaohe Lin > > Reviewed-by: Oscar Salvador > > Tested-by: Chen Huang > > Tested-by: Bodeddula Balasubramaniam > > --- > > include/linux/hugetlb.h | 3 ++- > > mm/hugetlb_vmemmap.c| 7 +++ > > mm/hugetlb_vmemmap.h| 6 ++ > > 3 files changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index c70421e26189..333dd0479fc2 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -880,7 +880,8 @@ extern bool hugetlb_free_vmemmap_enabled; > > > > static inline bool is_hugetlb_free_vmemmap_enabled(void) > > { > > - return hugetlb_free_vmemmap_enabled; > > + return hugetlb_free_vmemmap_enabled && > > +is_power_of_2(sizeof(struct page)); > > } > > #else > > static inline bool is_hugetlb_free_vmemmap_enabled(void) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index 33e42678abe3..1ba1ef45c48c 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -265,6 +265,13 @@ void __init hugetlb_vmemmap_init(struct hstate *h) > > BUILD_BUG_ON(__NR_USED_SUBPAGE >= > >RESERVE_VMEMMAP_SIZE / sizeof(struct page)); > > > > + /* > > + * The compiler can help us to optimize this function to null > > + * when the size of the struct page is not power of 2. > > + */ > > + if (!is_power_of_2(sizeof(struct page))) > > + return; > > + > > if (!hugetlb_free_vmemmap_enabled) > > return; > > > > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h > > index cb2bef8f9e73..29aaaf7b741e 100644 > > --- a/mm/hugetlb_vmemmap.h > > +++ b/mm/hugetlb_vmemmap.h > > @@ -21,6 +21,12 @@ void hugetlb_vmemmap_init(struct hstate *h); > > */ > > static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) > > { > > + /* > > + * This check aims to let the compiler help us optimize the code as > > + * much as possible. > > + */ > > + if (!is_power_of_2(sizeof(struct page))) > > + return 0; > > return h->nr_free_vmemmap_pages; > > } > > #else > > -- > > 2.11.0 > > > > -- > Michal Hocko > SUSE Labs
Re: [PATCH v1 1/3] binder: BINDER_FREEZE ioctl
On Wed, Mar 10, 2021 at 02:52:49PM -0800, Li Li wrote: > if (target_proc) { > binder_inner_proc_lock(target_proc); > + target_proc->outstanding_txns--; > + WARN_ON(target_proc->outstanding_txns < 0); WARN_* is a huge crutch, please just handle stuff like this properly and if you really need to, warn userspace (but what can they do about it?) You also just rebooted all systems that have panic-on-warn set, so if this can be triggered by userspace, you caused a DoS of things :( So please remove all of the WARN_ON() you add in this patch series to properly handle the error conditions and deal with them correctly. And if these were here just for debugging, hopefully the code works properly now and you do not need debugging anymore so they can all just be dropped. thanks, greg k-h
[PATCH 0/3] Add RTC support for PMIC PMK8350
Convert pm8xxx bindings to yaml and add pmk8350 rtc binding. satya priya (3): rtc: pm8xxx: Add RTC support for PMIC PMK8350 dt-bindings: mfd: Convert pm8xxx bindings to yaml dt-bindings: mfd: Add compatible for pmk8350 rtc .../devicetree/bindings/mfd/qcom-pm8xxx.txt| 99 --- .../devicetree/bindings/mfd/qcom-pm8xxx.yaml | 109 + drivers/rtc/rtc-pm8xxx.c | 11 +++ 3 files changed, 120 insertions(+), 99 deletions(-) delete mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt create mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 1/3] rtc: pm8xxx: Add RTC support for PMIC PMK8350
Add the comaptible string for PMIC PMK8350. Signed-off-by: satya priya --- drivers/rtc/rtc-pm8xxx.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/rtc/rtc-pm8xxx.c b/drivers/rtc/rtc-pm8xxx.c index eb20659..29a1c65 100644 --- a/drivers/rtc/rtc-pm8xxx.c +++ b/drivers/rtc/rtc-pm8xxx.c @@ -445,6 +445,16 @@ static const struct pm8xxx_rtc_regs pm8941_regs = { .alarm_en = BIT(7), }; +static const struct pm8xxx_rtc_regs pmk8350_regs = { + .ctrl = 0x6146, + .write = 0x6140, + .read = 0x6148, + .alarm_rw = 0x6240, + .alarm_ctrl = 0x6246, + .alarm_ctrl2= 0x6248, + .alarm_en = BIT(7), +}; + /* * Hardcoded RTC bases until IORESOURCE_REG mapping is figured out */ @@ -453,6 +463,7 @@ static const struct of_device_id pm8xxx_id_table[] = { { .compatible = "qcom,pm8018-rtc", .data = _regs }, { .compatible = "qcom,pm8058-rtc", .data = _regs }, { .compatible = "qcom,pm8941-rtc", .data = _regs }, + { .compatible = "qcom,pmk8350-rtc", .data = _regs }, { }, }; MODULE_DEVICE_TABLE(of, pm8xxx_id_table); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH 2/3] dt-bindings: mfd: Convert pm8xxx bindings to yaml
Convert pm8xxx rtc bindings from .txt to .yaml format. Signed-off-by: satya priya --- .../devicetree/bindings/mfd/qcom-pm8xxx.txt| 99 --- .../devicetree/bindings/mfd/qcom-pm8xxx.yaml | 108 + 2 files changed, 108 insertions(+), 99 deletions(-) delete mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt create mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt deleted file mode 100644 index 9e5eba4..000 --- a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt +++ /dev/null @@ -1,99 +0,0 @@ -Qualcomm PM8xxx PMIC multi-function devices - -The PM8xxx family of Power Management ICs are used to provide regulated -voltages and other various functionality to Qualcomm SoCs. - -= PROPERTIES - -- compatible: - Usage: required - Value type: - Definition: must be one of: - "qcom,pm8058" - "qcom,pm8821" - "qcom,pm8921" - -- #address-cells: - Usage: required - Value type: - Definition: must be 1 - -- #size-cells: - Usage: required - Value type: - Definition: must be 0 - -- interrupts: - Usage: required - Value type: - Definition: specifies the interrupt that indicates a subdevice - has generated an interrupt (summary interrupt). The - format of the specifier is defined by the binding document - describing the node's interrupt parent. - -- #interrupt-cells: - Usage: required - Value type : - Definition: must be 2. Specifies the number of cells needed to encode - an interrupt source. The 1st cell contains the interrupt - number. The 2nd cell is the trigger type and level flags - encoded as follows: - - 1 = low-to-high edge triggered - 2 = high-to-low edge triggered - 4 = active high level-sensitive - 8 = active low level-sensitive - -- interrupt-controller: - Usage: required - Value type: - Definition: identifies this node as an interrupt controller - -= SUBCOMPONENTS - -The PMIC contains multiple independent functions, each described in a subnode. -The below bindings specify the set of valid subnodes. - -== Real-Time Clock - -- compatible: - Usage: required - Value type: - Definition: must be one of: - "qcom,pm8058-rtc" - "qcom,pm8921-rtc" - "qcom,pm8941-rtc" - "qcom,pm8018-rtc" - -- reg: - Usage: required - Value type: - Definition: single entry specifying the base address of the RTC registers - -- interrupts: - Usage: required - Value type: - Definition: single entry specifying the RTC's alarm interrupt - -- allow-set-time: - Usage: optional - Value type: - Definition: indicates that the setting of RTC time is allowed by - the host CPU - -= EXAMPLE - - pmicintc: pmic@0 { - compatible = "qcom,pm8921"; - interrupts = <104 8>; - #interrupt-cells = <2>; - interrupt-controller; - #address-cells = <1>; - #size-cells = <0>; - - rtc@11d { - compatible = "qcom,pm8921-rtc"; - reg = <0x11d>; - interrupts = <0x27 0>; - }; - }; diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml new file mode 100644 index 000..b4892f1 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml @@ -0,0 +1,108 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mfd/qcom-pm8xxx.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm PM8xxx PMIC multi-function devices + +maintainers: + - Lee Jones + +description: | + The PM8xxx family of Power Management ICs are used to provide regulated + voltages and other various functionality to Qualcomm SoCs. + +properties: + compatible: +enum: + - qcom,pm8058 + - qcom,pm8821 + - qcom,pm8921 + + reg: +maxItems: 1 + + '#address-cells': + const: 1 + + '#size-cells': + const: 0 + + interrupts: +description: | + Specifies the interrupt that indicates a subdevice has generated an + interrupt (summary interrupt). + + '#interrupt-cells': +description: | + Specifies the number of cells needed to encode an interrupt source. + The 1st cell contains the interrupt number. The 2nd cell is the + trigger type. +const: 2 + + interrupt-controller: true +
[PATCH 3/3] dt-bindings: mfd: Add compatible for pmk8350 rtc
Add compatible string for pmk8350 rtc support. Signed-off-by: satya priya --- Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml index b4892f1..676decc 100644 --- a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml +++ b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml @@ -53,6 +53,7 @@ patternProperties: - qcom,pm8921-rtc - qcom,pm8941-rtc - qcom,pm8018-rtc + - qcom,pmk8350-rtc reg: description: Specifies the base address of the RTC registers -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [RFC v2 5/5] clk: socfpga: allow compile testing of Stratix 10 / Agilex clocks
On 10/03/2021 17:48, Arnd Bergmann wrote: > On Wed, Mar 10, 2021 at 9:38 AM Krzysztof Kozlowski > wrote: >> --- a/drivers/clk/socfpga/Kconfig >> +++ b/drivers/clk/socfpga/Kconfig >> @@ -1,6 +1,17 @@ >> # SPDX-License-Identifier: GPL-2.0 >> +config COMMON_CLK_SOCFPGA >> + bool "Intel SoCFPGA family clock support" if COMPILE_TEST && >> !ARCH_SOCFPGA && !ARCH_SOCFPGA64 >> + depends on ARCH_SOCFPGA || ARCH_SOCFPGA64 || COMPILE_TEST >> + default y if ARCH_SOCFPGA || ARCH_SOCFPGA64 > > I think the 'depends on' line here is redundant if you also have the > 'if' line and the default. Yes, you're right. Best regards, Krzysztof
Re: [PATCH v9 6/8] media: imx-jpeg: Add V4L2 driver for i.MX8 JPEG Encoder/Decoder
Hi Mirela, On 11/03/2021 01:28, Mirela Rabulea (OSS) wrote: > +static const struct of_device_id mxc_jpeg_match[] = { > + { > + .compatible = "nxp,imx8qxp-jpgdec", > + .data = (void *)MXC_JPEG_DECODE, Don't do this, just say: static const int mxc_decode_mode = MXC_JPEG_DECODE; static const int mxc_encode_mode = MXC_JPEG_ENCODE; and point to that: .data = _decode_mode; > + }, > + { > + .compatible = "nxp,imx8qxp-jpgenc", > + .data = (void *)MXC_JPEG_ENCODE, .data = _encode_mode; > + }, > + { }, > +}; > +static int mxc_jpeg_probe(struct platform_device *pdev) > +{ > + struct mxc_jpeg_dev *jpeg; > + struct device *dev = >dev; > + struct resource *res; > + int dec_irq; > + int ret; > + int mode; > + const struct of_device_id *of_id; > + unsigned int slot; > + > + of_id = of_match_node(mxc_jpeg_match, dev->of_node); > + mode = (int)(u64)of_id->data; and this becomes: mode = *(const int *)of_id->data; This will solve the kernel test robot warning, and for that matter the same gcc warnings I get when I compile. Just post a v9.1 for this patch, everything else looks good. Regards, Hans
[PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
There are two "netif_running" checks in this driver. One is in "lapbeth_xmit" and the other is in "lapbeth_rcv". They serve to make sure that the LAPB APIs called in these functions are called before "lapb_unregister" is called by the "ndo_stop" function. However, these "netif_running" checks are unreliable, because it's possible that immediately after "netif_running" returns true, "ndo_stop" is called (which causes "lapb_unregister" to be called). This patch adds locking to make sure "lapbeth_xmit" and "lapbeth_rcv" can reliably check and ensure the netif is running while doing their work. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He --- drivers/net/wan/lapbether.c | 32 +--- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c index c3372498f4f1..8fda0446ff71 100644 --- a/drivers/net/wan/lapbether.c +++ b/drivers/net/wan/lapbether.c @@ -51,6 +51,8 @@ struct lapbethdev { struct list_headnode; struct net_device *ethdev;/* link to ethernet device */ struct net_device *axdev; /* lapbeth device (lapb#) */ + boolup; + spinlock_t up_lock;/* Protects "up" */ }; static LIST_HEAD(lapbeth_devices); @@ -101,8 +103,9 @@ static int lapbeth_rcv(struct sk_buff *skb, struct net_device *dev, struct packe rcu_read_lock(); lapbeth = lapbeth_get_x25_dev(dev); if (!lapbeth) - goto drop_unlock; - if (!netif_running(lapbeth->axdev)) + goto drop_unlock_rcu; + spin_lock_bh(>up_lock); + if (!lapbeth->up) goto drop_unlock; len = skb->data[0] + skb->data[1] * 256; @@ -117,11 +120,14 @@ static int lapbeth_rcv(struct sk_buff *skb, struct net_device *dev, struct packe goto drop_unlock; } out: + spin_unlock_bh(>up_lock); rcu_read_unlock(); return 0; drop_unlock: kfree_skb(skb); goto out; +drop_unlock_rcu: + rcu_read_unlock(); drop: kfree_skb(skb); return 0; @@ -151,13 +157,11 @@ static int lapbeth_data_indication(struct net_device *dev, struct sk_buff *skb) static netdev_tx_t lapbeth_xmit(struct sk_buff *skb, struct net_device *dev) { + struct lapbethdev *lapbeth = netdev_priv(dev); int err; - /* -* Just to be *really* sure not to send anything if the interface -* is down, the ethernet device may have gone. -*/ - if (!netif_running(dev)) + spin_lock_bh(>up_lock); + if (!lapbeth->up) goto drop; /* There should be a pseudo header of 1 byte added by upper layers. @@ -194,6 +198,7 @@ static netdev_tx_t lapbeth_xmit(struct sk_buff *skb, goto drop; } out: + spin_unlock_bh(>up_lock); return NETDEV_TX_OK; drop: kfree_skb(skb); @@ -285,6 +290,7 @@ static const struct lapb_register_struct lapbeth_callbacks = { */ static int lapbeth_open(struct net_device *dev) { + struct lapbethdev *lapbeth = netdev_priv(dev); int err; if ((err = lapb_register(dev, _callbacks)) != LAPB_OK) { @@ -292,13 +298,22 @@ static int lapbeth_open(struct net_device *dev) return -ENODEV; } + spin_lock_bh(>up_lock); + lapbeth->up = true; + spin_unlock_bh(>up_lock); + return 0; } static int lapbeth_close(struct net_device *dev) { + struct lapbethdev *lapbeth = netdev_priv(dev); int err; + spin_lock_bh(>up_lock); + lapbeth->up = false; + spin_unlock_bh(>up_lock); + if ((err = lapb_unregister(dev)) != LAPB_OK) pr_err("lapb_unregister error: %d\n", err); @@ -356,6 +371,9 @@ static int lapbeth_new_device(struct net_device *dev) dev_hold(dev); lapbeth->ethdev = dev; + lapbeth->up = false; + spin_lock_init(>up_lock); + rc = -EIO; if (register_netdevice(ndev)) goto fail; -- 2.27.0
Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp
On Tue, 9 Feb 2021, Naoya Horiguchi wrote: > From: Naoya Horiguchi > > Currently hwpoison code checks PageAnon() for thp and refuses to handle > errors on non-anonymous thps (just for historical reason). We now > support non-anonymou thp like shmem one, so this patch suggests to enable > to handle shmem thps. Fortunately, we already have can_split_huge_page() > to check if a give thp is splittable, so this patch relies on it. Fortunately? I don't understand. Why call can_split_huge_page() at all, instead of simply trying split_huge_page() directly? And could it do better than -EBUSY when split_huge_page() fails? > > Signed-off-by: Naoya Horiguchi Thanks for trying to add shmem+file THP support, but I think this does not work as intended - Andrew, if Naoya agrees, please drop from mmotm for now, the fixup needed will be more than a line or two. I'm not much into memory-failure myself, but Jue discovered that the SIGBUS never arrives: because split_huge_page() on a shmem or file THP unmaps all its pmds and ptes, and (unlike with anon) leaves them unmapped - in normal circumstances, to be faulted back on demand. So the page_mapped() check in hwpoison_user_mappings() fails, and the intended SIGBUS is not delivered. (Or, is it acceptable that the SIGBUS is not delivered to those who have the huge page mapped: should it get delivered later, to anyone who faults back in the bad 4k?) We believe the tokill list has to be set up earlier, before split_huge_page() is called, then passed in to hwpoison_user_mappings(). Sorry, we don't have a proper patch for that right now, but I expect you can see what needs to be done. But something we found on the way, we do have a patch for: add_to_kill() uses page_address_in_vma(), but that has not been used on file THP tails before - fix appended at the end below, so as not to waste your time on that bit. > --- > mm/memory-failure.c | 34 +- > 1 file changed, 9 insertions(+), 25 deletions(-) > > diff --git v5.11-rc6/mm/memory-failure.c v5.11-rc6_patched/mm/memory-failure.c > index e9481632fcd1..8c1575de0507 100644 > --- v5.11-rc6/mm/memory-failure.c > +++ v5.11-rc6_patched/mm/memory-failure.c > @@ -956,20 +956,6 @@ static int __get_hwpoison_page(struct page *page) > { > struct page *head = compound_head(page); > > - if (!PageHuge(head) && PageTransHuge(head)) { > - /* > - * Non anonymous thp exists only in allocation/free time. We > - * can't handle such a case correctly, so let's give it up. > - * This should be better than triggering BUG_ON when kernel > - * tries to touch the "partially handled" page. > - */ > - if (!PageAnon(head)) { > - pr_err("Memory failure: %#lx: non anonymous thp\n", > - page_to_pfn(page)); > - return 0; > - } > - } > - > if (get_page_unless_zero(head)) { > if (head == compound_head(page)) > return 1; > @@ -1197,21 +1183,19 @@ static int identify_page_state(unsigned long pfn, > struct page *p, > > static int try_to_split_thp_page(struct page *page, const char *msg) > { > - lock_page(page); > - if (!PageAnon(page) || unlikely(split_huge_page(page))) { > - unsigned long pfn = page_to_pfn(page); > + struct page *head; > > + lock_page(page); > + head = compound_head(page); > + if (PageTransHuge(head) && can_split_huge_page(head, NULL) && > + !split_huge_page(page)) { > unlock_page(page); > - if (!PageAnon(page)) > - pr_info("%s: %#lx: non anonymous thp\n", msg, pfn); > - else > - pr_info("%s: %#lx: thp split failed\n", msg, pfn); > - put_page(page); > - return -EBUSY; > + return 0; > } > + pr_info("%s: %#lx: thp split failed\n", msg, page_to_pfn(page)); > unlock_page(page); > - > - return 0; > + put_page(page); > + return -EBUSY; > } > > static int memory_failure_hugetlb(unsigned long pfn, int flags) > -- > 2.25.1 [PATCH] mm: fix page_address_in_vma() on file THP tails From: Jue Wang Anon THP tails were already supported, but memory-failure now needs to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. Signed-off-by: Jue Wang Signed-off-by: Hugh Dickins --- mm/rmap.c |8 1 file changed, 4 insertions(+), 4 deletions(-) --- 5.12-rc2/mm/rmap.c 2021-02-28 16:58:57.950450151 -0800 +++ linux/mm/rmap.c 2021-03-10 20:29:21.591475177 -0800 @@ -717,11 +717,11 @@ unsigned long page_address_in_vma(struct if (!vma->anon_vma || !page__anon_vma || vma->anon_vma->root != page__anon_vma->root) return -EFAULT; - } else if (page->mapping) { - if
Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start()
On Wed, Mar 10, 2021, Jason Gunthorpe wrote: > On Wed, Mar 10, 2021 at 05:20:01PM -0800, Sean Christopherson wrote: > > > > Which I believe is fatal to kvm? These notifiers certainly do not only > > > happen at process exit. > > > > My point about the process dying is that the existing bug that causes > > mmu_notifier_count to become imbalanced is benign only because the process > > is > > being killed, and thus KVM will stop running its vCPUs. > > Are you saying we only call non-blocking invalidate during a process > exit event?? Yes? __oom_reap_task_mm() is the only user of _nonblock(), if that's what you're asking. > > > So, both of the remaining _end users become corrupted with this patch! > > > > I don't follow. mn_hlist_invalidate_range_start() iterates over all > > notifiers, even if a notifier earlier in the chain failed. How will > > KVM become imbalanced? > > Er, ok, that got left in a weird way. There is another "bug" where end > is not supposed to be called if the start failed. Ha, the best kind of feature. :-) > > The existing _end users never fail their _start. If KVM started failing its > > start, then yes, it could get corrupted. > > Well, maybe that is the way out of this now. If we don't permit a > start to fail if there is an end then we have no problem to unwind it > as we can continue to call everything. This can't be backported too > far though, the itree notifier conversions are what made the WARN_ON > safe today. > > Something very approximately like this is closer to my preference: Makes sense. I don't have a strong preference, I'll give this a spin tomorrow. > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 61ee40ed804ee5..6d5cd20f81dadc 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -501,10 +501,25 @@ static int mn_hlist_invalidate_range_start( > ""); > WARN_ON(mmu_notifier_range_blockable(range) || > _ret != -EAGAIN); > + /* > + * We call all the notifiers on any EAGAIN, > + * there is no way for a notifier to know if > + * its start method failed, thus a start that > + * does EAGAIN can't also do end. > + */ > + WARN_ON(ops->invalidate_range_end); > ret = _ret; > } > } > } > + > + if (ret) { > + /* Must be non-blocking to get here*/ > + hlist_for_each_entry_rcu (subscription, >list, > + hlist, srcu_read_lock_held()) > + subscription->ops->invalidate_range_end(subscription, > + range); > + } > srcu_read_unlock(, id); > > return ret;
[PATCH] Rectify spelling and grammar
Change 'inaccesable' to 'inaccessible' Change 'detrmine' to 'determine' Delete 'in' grammatically Signed-off-by: Xiaofeng Cao --- drivers/media/radio/radio-si476x.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/media/radio/radio-si476x.c b/drivers/media/radio/radio-si476x.c index 23997425bdb5..b39a68f83c5f 100644 --- a/drivers/media/radio/radio-si476x.c +++ b/drivers/media/radio/radio-si476x.c @@ -152,7 +152,7 @@ static struct v4l2_ctrl_config si476x_ctrls[] = { /* * SI476X during its station seeking(or tuning) process uses several -* parameters to detrmine if "the station" is valid: +* parameters to determine if "the station" is valid: * * - Signal's SNR(in dBuV) must be lower than * #V4L2_CID_SI476X_SNR_THRESHOLD @@ -255,7 +255,7 @@ struct si476x_radio; * * This table holds pointers to functions implementing particular * operations depending on the mode in which the tuner chip was - * configured to start in. If the function is not supported + * configured to start. If the function is not supported * corresponding element is set to #NULL. * * @tune_freq: Tune chip to a specific frequency @@ -917,7 +917,7 @@ static int si476x_radio_s_ctrl(struct v4l2_ctrl *ctrl) case V4L2_CID_RDS_RECEPTION: /* * It looks like RDS related properties are -* inaccesable when tuner is in AM mode, so cache the +* inaccessible when tuner is in AM mode, so cache the * changes */ if (si476x_core_is_in_am_receiver_mode(radio->core)) -- 2.25.1
Re: [PATCH V3 6/6] vDPA/ifcvf: verify mandatory feature bits for vDPA
On 3/11/2021 2:20 PM, Jason Wang wrote: On 2021/3/11 12:16 下午, Zhu Lingshan wrote: On 3/11/2021 11:20 AM, Jason Wang wrote: On 2021/3/10 5:00 下午, Zhu Lingshan wrote: vDPA requres VIRTIO_F_ACCESS_PLATFORM as a must, this commit examines this when set features. Signed-off-by: Zhu Lingshan --- drivers/vdpa/ifcvf/ifcvf_base.c | 8 drivers/vdpa/ifcvf/ifcvf_base.h | 1 + drivers/vdpa/ifcvf/ifcvf_main.c | 5 + 3 files changed, 14 insertions(+) diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c index ea6a78791c9b..58f47fdce385 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.c +++ b/drivers/vdpa/ifcvf/ifcvf_base.c @@ -224,6 +224,14 @@ u64 ifcvf_get_features(struct ifcvf_hw *hw) return hw->hw_features; } +int ifcvf_verify_min_features(struct ifcvf_hw *hw) +{ + if (!(hw->hw_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) + return -EINVAL; + + return 0; +} + void ifcvf_read_net_config(struct ifcvf_hw *hw, u64 offset, void *dst, int length) { diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h index dbb8c10aa3b1..91c5735d4dc9 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.h +++ b/drivers/vdpa/ifcvf/ifcvf_base.h @@ -123,6 +123,7 @@ void io_write64_twopart(u64 val, u32 *lo, u32 *hi); void ifcvf_reset(struct ifcvf_hw *hw); u64 ifcvf_get_features(struct ifcvf_hw *hw); u64 ifcvf_get_hw_features(struct ifcvf_hw *hw); +int ifcvf_verify_min_features(struct ifcvf_hw *hw); u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid); int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u16 num); struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw); diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 25fb9dfe23f0..f624f202447d 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -179,6 +179,11 @@ static u64 ifcvf_vdpa_get_features(struct vdpa_device *vdpa_dev) static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, u64 features) { struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); + int ret; + + ret = ifcvf_verify_min_features(vf); So this validate device features instead of driver which is the one we really want to check? Thanks Hi Jason, Here we check device feature bits to make sure the device support ACCESS_PLATFORM. If you want to check device features, you need to do that during probe() and fail the probing if without the feature. But I think you won't ship cards without ACCESS_PLATFORM. Yes, there are no reasons ship a card without ACCESS_PLATFORM In get_features(), it will return a intersection of device features bit and driver supported features bits(which includes ACCESS_PLATFORM). Other components like QEMU should not set features bits more than this intersection of bits. so we can make sure if this ifcvf_verify_min_features() passed, both device and driver support ACCESS_PLATFORM. Are you suggesting check driver feature bits in ifcvf_verify_min_features() in the meantime as well? So it really depends on your hardware. If you hardware can always offer ACCESS_PLATFORM, you just need to check driver features. This is how vdpa_sim and mlx5_vdpa work. Yes, we always support ACCESS_PLATFORM, so it is hard coded in the macro IFCVF_SUPPORTED_FEATURES. Now we check whether device support this feature bit as a double conformation, are you suggesting we should check whether ACCESS_PLATFORM & IFCVF_SUPPORTED_FEATURES in set_features() as well? I prefer check both device and IFCVF_SUPPORTED_FEATURES both, more reliable. Thanks! Thanks Thanks! + if (ret) + return ret; vf->req_features = features; ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2] netdevsim: fib: Remove redundant code
Fix the following coccicheck warnings: ./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return "0" on line 889. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- Changes in v2: - Modify the function type to void. drivers/net/netdevsim/fib.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c index 46fb414..3ca0f54 100644 --- a/drivers/net/netdevsim/fib.c +++ b/drivers/net/netdevsim/fib.c @@ -869,10 +869,8 @@ static int nsim_fib6_event(struct nsim_fib_data *data, return err; } -static int nsim_fib_event(struct nsim_fib_event *fib_event) +static void nsim_fib_event(struct nsim_fib_event *fib_event) { - int err = 0; - switch (fib_event->family) { case AF_INET: nsim_fib4_event(fib_event->data, _event->fen_info, @@ -885,8 +883,6 @@ static int nsim_fib_event(struct nsim_fib_event *fib_event) nsim_fib6_event_fini(_event->fib6_event); break; } - - return err; } static int nsim_fib4_prepare_event(struct fib_notifier_info *info, -- 1.8.3.1
[PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid
Since for "cycles:u' on hybrid platform, it creates two "cycles". So the second evsel in evlist also needs initialization. With this patch, root@otcpl-adl-s-2:~# ./perf test 71 71: Convert perf time to TSC: Ok Signed-off-by: Jin Yao --- tools/perf/tests/perf-time-to-tsc.c | 16 1 file changed, 16 insertions(+) diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c index 680c3cffb128..b472205ec8e3 100644 --- a/tools/perf/tests/perf-time-to-tsc.c +++ b/tools/perf/tests/perf-time-to-tsc.c @@ -66,6 +66,11 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe u64 test_tsc, comm1_tsc, comm2_tsc; u64 test_time, comm1_time = 0, comm2_time = 0; struct mmap *md; + bool hybrid = false; + + perf_pmu__scan(NULL); + if (perf_pmu__hybrid_exist()) + hybrid = true; threads = thread_map__new(-1, getpid(), UINT_MAX); CHECK_NOT_NULL__(threads); @@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, int subtest __maybe evsel->core.attr.disabled = 1; evsel->core.attr.enable_on_exec = 0; + /* +* For hybrid "cycles:u", it creates two events. +* Init the second evsel here. +*/ + if (hybrid) { + evsel = evsel__next(evsel); + evsel->core.attr.comm = 1; + evsel->core.attr.disabled = 1; + evsel->core.attr.enable_on_exec = 0; + } + CHECK__(evlist__open(evlist)); CHECK__(evlist__mmap(evlist, UINT_MAX)); -- 2.17.1
[PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
Currently we don't support shadow stat for hybrid. root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1 Performance counter stats for 'system wide': 12,883,109,591 cpu_core/cycles/ 6,405,163,221 cpu_atom/cycles/ 555,553,778 cpu_core/instructions/ 841,158,734 cpu_atom/instructions/ 1.002644773 seconds time elapsed Now there is no shadow stat 'insn per cycle' reported. We will support it later and now just skip the 'perf stat metrics (shadow stat) test'. Signed-off-by: Jin Yao --- tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh index ebebd3596cf9..e6e35fc6c882 100755 --- a/tools/perf/tests/shell/stat+shadow_stat.sh +++ b/tools/perf/tests/shell/stat+shadow_stat.sh @@ -7,6 +7,9 @@ set -e # skip if system-wide mode is forbidden perf stat -a true > /dev/null 2>&1 || exit 2 +# skip if on hybrid platform +perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2 + test_global_aggr() { perf stat -a --no-big-num -e cycles,instructions sleep 1 2>&1 | \ -- 2.17.1
[PATCH v2 27/27] perf Documentation: Document intel-hybrid support
Add some words and examples to help understanding of Intel hybrid perf support. Signed-off-by: Jin Yao --- tools/perf/Documentation/intel-hybrid.txt | 228 ++ tools/perf/Documentation/perf-record.txt | 1 + tools/perf/Documentation/perf-stat.txt| 2 + 3 files changed, 231 insertions(+) create mode 100644 tools/perf/Documentation/intel-hybrid.txt diff --git a/tools/perf/Documentation/intel-hybrid.txt b/tools/perf/Documentation/intel-hybrid.txt new file mode 100644 index ..ff641d9ac81b --- /dev/null +++ b/tools/perf/Documentation/intel-hybrid.txt @@ -0,0 +1,228 @@ +Intel hybrid support + +Support for Intel hybrid events within perf tools. + +For some Intel platforms, such as AlderLake, which is hybrid platform and +it consists of atom cpu and core cpu. Each cpu has dedicated event list. +Part of events are available on core cpu, part of events are available +on atom cpu and even part of events are available on both. + +Kernel exports two new cpu pmus via sysfs: +/sys/devices/cpu_core +/sys/devices/cpu_atom + +The 'cpus' files are created under the directories. For example, + +cat /sys/devices/cpu_core/cpus +0-15 + +cat /sys/devices/cpu_atom/cpus +16-23 + +It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. + +Quickstart + +List hybrid event +- + +As before, use perf-list to list the symbolic event. + +perf list + +inst_retired.any + [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] +inst_retired.any + [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] + +The 'Unit: xxx' is added to brief description to indicate which pmu +the event is belong to. Same event name but with different pmu can +be supported. + +Enable hybrid event with a specific pmu +--- + +To enable a core only event or atom only event, following syntax is supported: + + cpu_core// +or + cpu_atom// + +For example, count the 'cycles' event on core cpus. + + perf stat -e cpu_core/cycles/ + +Create two events for one hardware event automatically +-- + +When creating one event and the event is available on both atom and core, +two events are created automatically. One is for atom, the other is for +core. Most of hardware events and cache events are available on both +cpu_core and cpu_atom. + +For hardware events, they have pre-defined configs (e.g. 0 for cycles). +But on hybrid platform, kernel needs to know where the event comes from +(from atom or from core). The original perf event type PERF_TYPE_HARDWARE +can't carry pmu information. So a new type PERF_TYPE_HARDWARE_PMU is +introduced. + +The new attr.config layout for PERF_TYPE_HARDWARE_PMU: + +0xDD00AA +AA: original hardware event ID +DD: PMU type ID + +Cache event is similar. A new type PERF_TYPE_HW_CACHE_PMU is introduced. + +The new attr.config layout for PERF_TYPE_HW_CACHE_PMU: + +0xDD00CCBBAA +AA: original hardware cache ID +BB: original hardware cache op ID +CC: original hardware cache op result ID +DD: PMU type ID + +PMU type ID is retrieved from sysfs + +cat /sys/devices/cpu_atom/type +10 + +cat /sys/devices/cpu_core/type +4 + +When enabling a hardware event without specified pmu, such as, +perf stat -e cycles -a (use system-wide in this example), two events +are created automatically. + + +perf_event_attr: + type 6 + size 120 + config 0x4 + sample_type IDENTIFIER + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING + disabled 1 + inherit 1 + exclude_guest1 + + +and + + +perf_event_attr: + type 6 + size 120 + config 0xa + sample_type IDENTIFIER + read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING + disabled 1 + inherit 1 + exclude_guest1 + + +type 6 is PERF_TYPE_HARDWARE_PMU. +0x4 in 0x4 indicates it's cpu_core pmu. +0xa in 0xa indicates it's cpu_atom pmu (atom pmu type id is random). + +The kernel creates 'cycles' (0x4) on cpu0-cpu15 (core cpus), +and create 'cycles' (0xa) on cpu16-cpu23 (atom cpus). + +For perf-stat result, it displays two events: + + Performance counter stats for 'system wide': + +12,869,720,529 cpu_core/cycles/ +
[PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test
Add basic hybrid test cases for 'Parse event definition strings' test. root@otcpl-adl-s-2:~# ./perf test 6 6: Parse event definition strings : Ok Signed-off-by: Jin Yao --- tools/perf/tests/parse-events.c | 171 1 file changed, 171 insertions(+) diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c index a7f6661e6112..aec929867020 100644 --- a/tools/perf/tests/parse-events.c +++ b/tools/perf/tests/parse-events.c @@ -1512,6 +1512,123 @@ static int test__all_tracepoints(struct evlist *evlist) return test__checkevent_tracepoint_multi(evlist); } +static int test__hybrid_hw_event_with_pmu(struct evlist *evlist) +{ + struct evsel *evsel = evlist__first(evlist); + + TEST_ASSERT_VAL("wrong number of entries", 1 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config); + return 0; +} + +static int test__hybrid_hw_event(struct evlist *evlist) +{ + struct evsel *evsel1 = evlist__first(evlist); + struct evsel *evsel2 = evlist__last(evlist); + + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel1->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel1->core.attr.config); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel2->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0xa == evsel2->core.attr.config); + return 0; +} + +static int test__hybrid_hw_group_event(struct evlist *evlist) +{ + struct evsel *evsel, *leader; + + evsel = leader = evlist__first(evlist); + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + + evsel = evsel__next(evsel); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x40001 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + return 0; +} + +static int test__hybrid_sw_hw_group_event(struct evlist *evlist) +{ + struct evsel *evsel, *leader; + + evsel = leader = evlist__first(evlist); + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + + evsel = evsel__next(evsel); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + return 0; +} + +static int test__hybrid_hw_sw_group_event(struct evlist *evlist) +{ + struct evsel *evsel, *leader; + + evsel = leader = evlist__first(evlist); + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + + evsel = evsel__next(evsel); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + return 0; +} + +static int test__hybrid_group_modifier1(struct evlist *evlist) +{ + struct evsel *evsel, *leader; + + evsel = leader = evlist__first(evlist); + TEST_ASSERT_VAL("wrong number of entries", 2 == evlist->core.nr_entries); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user); + TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel); + + evsel = evsel__next(evsel); + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == evsel->core.attr.type); + TEST_ASSERT_VAL("wrong config", 0x40001 == evsel->core.attr.config); + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader); + TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user); + TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel); + return 0; +} + +static int test__hybrid_raw1(struct evlist *evlist) +{ + struct evsel
[PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
For hybrid, kernel introduces new perf type PERF_TYPE_HARDWARE_PMU (6) and it's assigned to hybrid hardware events. root@otcpl-adl-s-2:~# ./perf test 17 -vvv ... compare matching [event:base-stat] to [event-6-17179869184-4] [cpu] * 0 [flags] 0|8 8 [type] 0 6 ->FAIL match: [event:base-stat] matches [] event:base-stat does not match, but is optional matched compare matching [event-6-17179869184-4] to [event:base-stat] [cpu] 0 * [flags] 8 0|8 [type] 6 0 ->FAIL match: [event-6-17179869184-4] matches [] expected type=6, got 0 expected config=17179869184, got 0 FAILED './tests/attr/test-stat-C0' - match failure The type matching is failed because one type is 0 but the type of hybrid hardware event is 6. We temporarily skip this test case and TODO in future. Signed-off-by: Jin Yao --- tools/perf/tests/attr.c | 5 + 1 file changed, 5 insertions(+) diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c index dd39ce9b0277..fc7f74159764 100644 --- a/tools/perf/tests/attr.c +++ b/tools/perf/tests/attr.c @@ -34,6 +34,7 @@ #include "event.h" #include "util.h" #include "tests.h" +#include "pmu-hybrid.h" #define ENV "PERF_TEST_ATTR" @@ -184,6 +185,10 @@ int test__attr(struct test *test __maybe_unused, int subtest __maybe_unused) char path_dir[PATH_MAX]; char *exec_path; + perf_pmu__scan(NULL); + if (perf_pmu__hybrid_exist()) + return 0; + /* First try development tree tests. */ if (!lstat("./tests", )) return run_dir("./tests", "./perf"); -- 2.17.1
[PATCH v2 24/27] perf tests: Support 'Session topology' test for hybrid
Force to create one event "cpu_core/cycles/" by default, otherwise in evlist__valid_sample_type, the checking of 'if (evlist->core.nr_entries == 1)' would be failed. root@otcpl-adl-s-2:~# ./perf test 41 41: Session topology: Ok Signed-off-by: Jin Yao --- tools/perf/tests/topology.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index 74748ed75b2c..0f6e73baab2d 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -40,7 +40,15 @@ static int session_write_header(char *path) session = perf_session__new(, false, NULL); TEST_ASSERT_VAL("can't get session", !IS_ERR(session)); - session->evlist = evlist__new_default(); + perf_pmu__scan(NULL); + if (!perf_pmu__hybrid_exist()) { + session->evlist = evlist__new_default(); + } else { + struct parse_events_error err; + + session->evlist = evlist__new(); + parse_events(session->evlist, "cpu_core/cycles/", ); + } TEST_ASSERT_VAL("can't get evlist", session->evlist); perf_header__set_feat(>header, HEADER_CPU_TOPOLOGY); -- 2.17.1
[PATCH v2 23/27] perf tests: Support 'Parse and process metrics' test for hybrid
Some events are not supported. Only pick up some cases for hybrid. root@otcpl-adl-s-2:~# ./perf test 67 67: Parse and process metrics : Ok Signed-off-by: Jin Yao --- tools/perf/tests/parse-metric.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c index 55bf52e588be..149b18f1f96a 100644 --- a/tools/perf/tests/parse-metric.c +++ b/tools/perf/tests/parse-metric.c @@ -370,12 +370,17 @@ static int test_metric_group(void) int test__parse_metric(struct test *test __maybe_unused, int subtest __maybe_unused) { + perf_pmu__scan(NULL); + TEST_ASSERT_VAL("IPC failed", test_ipc() == 0); TEST_ASSERT_VAL("frontend failed", test_frontend() == 0); - TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 0); TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0); TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0); - TEST_ASSERT_VAL("test metric group", test_metric_group() == 0); TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0); - return 0; + + if (!perf_pmu__hybrid_exist()) { + TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 0); + TEST_ASSERT_VAL("test metric group", test_metric_group() == 0); + } +return 0; } -- 2.17.1
[PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
Since for one hw event, two hybrid events are created. For example, evsel->idx evsel__name(evsel) 0 cycles 1 cycles 2 instructions 3 instructions ... So for comparing the evsel name on hybrid, the evsel->idx needs to be divided by 2. root@otcpl-adl-s-2:~# ./perf test 14 14: Roundtrip evsel->name : Ok Signed-off-by: Jin Yao --- tools/perf/tests/evsel-roundtrip-name.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/tools/perf/tests/evsel-roundtrip-name.c b/tools/perf/tests/evsel-roundtrip-name.c index f7f3e5b4c180..2b938a15901e 100644 --- a/tools/perf/tests/evsel-roundtrip-name.c +++ b/tools/perf/tests/evsel-roundtrip-name.c @@ -62,7 +62,8 @@ static int perf_evsel__roundtrip_cache_name_test(void) return ret; } -static int __perf_evsel__name_array_test(const char *names[], int nr_names) +static int __perf_evsel__name_array_test(const char *names[], int nr_names, +int distance) { int i, err; struct evsel *evsel; @@ -82,9 +83,9 @@ static int __perf_evsel__name_array_test(const char *names[], int nr_names) err = 0; evlist__for_each_entry(evlist, evsel) { - if (strcmp(evsel__name(evsel), names[evsel->idx])) { + if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) { --err; - pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->idx]); + pr_debug("%s != %s\n", evsel__name(evsel), names[evsel->idx / distance]); } } @@ -93,18 +94,22 @@ static int __perf_evsel__name_array_test(const char *names[], int nr_names) return err; } -#define perf_evsel__name_array_test(names) \ - __perf_evsel__name_array_test(names, ARRAY_SIZE(names)) +#define perf_evsel__name_array_test(names, distance) \ + __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance) int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, int subtest __maybe_unused) { int err = 0, ret = 0; - err = perf_evsel__name_array_test(evsel__hw_names); + perf_pmu__scan(NULL); + if (perf_pmu__hybrid_exist()) + return perf_evsel__name_array_test(evsel__hw_names, 2); + + err = perf_evsel__name_array_test(evsel__hw_names, 1); if (err) ret = err; - err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1); + err = __perf_evsel__name_array_test(evsel__sw_names, PERF_COUNT_SW_DUMMY + 1, 1); if (err) ret = err; -- 2.17.1
[PATCH v2 22/27] perf tests: Support 'Track with sched_switch' test for hybrid
Since for "cycles:u' on hybrid platform, it creates two "cycles". So the number of events in evlist is not expected in next test steps. Now we just use one event "cpu_core/cycles:u/" for hybrid. root@otcpl-adl-s-2:~# ./perf test 35 35: Track with sched_switch : Ok Signed-off-by: Jin Yao --- tools/perf/tests/switch-tracking.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c index 3ebaa758df77..13a11ce51a1a 100644 --- a/tools/perf/tests/switch-tracking.c +++ b/tools/perf/tests/switch-tracking.c @@ -340,6 +340,11 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_ struct evsel *switch_evsel, *tracking_evsel; const char *comm; int err = -1; + bool hybrid = false; + + perf_pmu__scan(NULL); + if (perf_pmu__hybrid_exist()) + hybrid = true; threads = thread_map__new(-1, getpid(), UINT_MAX); if (!threads) { @@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test __maybe_unused, int subtest __maybe_ cpu_clocks_evsel = evlist__last(evlist); /* Second event */ - err = parse_events(evlist, "cycles:u", NULL); + if (!hybrid) + err = parse_events(evlist, "cycles:u", NULL); + else + err = parse_events(evlist, "cpu_core/cycles:u/", NULL); if (err) { pr_debug("Failed to parse event cycles:u\n"); goto out_err; -- 2.17.1
[PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group
If a group has events which are from different hybrid PMUs, shows a warning. This is to remind the user not to put the core event and atom event into one group. root@ssp-pwrt-002:~# ./perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -- sleep 1 WARNING: Group has events from different hybrid PMUs Performance counter stats for 'sleep 1': cpu_core/cycles/ cpu_atom/cycles/ 1.002585908 seconds time elapsed Signed-off-by: Jin Yao --- tools/perf/builtin-record.c | 3 +++ tools/perf/builtin-stat.c | 7 ++ tools/perf/util/evlist.c| 44 + tools/perf/util/evlist.h| 2 ++ 4 files changed, 56 insertions(+) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 363ea1047148..188a1198cd4b 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -929,6 +929,9 @@ static int record__open(struct record *rec) pos = evlist__reset_weak_group(evlist, pos, true); goto try_again; } + + if (errno == EINVAL && perf_pmu__hybrid_exist()) + evlist__warn_hybrid_group(evlist); rc = -errno; evsel__open_strerror(pos, >target, errno, msg, sizeof(msg)); ui__error("%s\n", msg); diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 7a732508b2b4..6f780a039db0 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist) struct evsel *evsel, *pos, *leader; char buf[1024]; + if (evlist__hybrid_exist(evlist)) + return; + evlist__for_each_entry(evlist, evsel) { leader = evsel->leader; @@ -726,6 +729,10 @@ enum counter_recovery { static enum counter_recovery stat_handle_error(struct evsel *counter) { char msg[BUFSIZ]; + + if (perf_pmu__hybrid_exist() && errno == EINVAL) + evlist__warn_hybrid_group(evsel_list); + /* * PPC returns ENXIO for HW counters until 2.6.37 * (behavior changed with commit b0a873e). diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f139151b9433..5ec891418cdd 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist) perf_cpu_map__put(evlist->core.all_cpus); evlist->core.all_cpus = perf_cpu_map__empty_new(1); } + +static bool group_hybrid_conflict(struct evsel *leader) +{ + struct evsel *pos, *prev = NULL; + + for_each_group_evsel(pos, leader) { + if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name)) + continue; + + if (prev && strcmp(prev->pmu_name, pos->pmu_name)) + return true; + + prev = pos; + } + + return false; +} + +void evlist__warn_hybrid_group(struct evlist *evlist) +{ + struct evsel *evsel; + + evlist__for_each_entry(evlist, evsel) { + if (evsel__is_group_leader(evsel) && + evsel->core.nr_members > 1 && + group_hybrid_conflict(evsel)) { + WARN_ONCE(1, "WARNING: Group has events from " +"different hybrid PMUs\n"); + return; + } + } +} + +bool evlist__hybrid_exist(struct evlist *evlist) +{ + struct evsel *evsel; + + evlist__for_each_entry(evlist, evsel) { + if (evsel__is_hybrid_event(evsel)) + return true; + } + + return false; +} diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 0da683511d98..33dec3bb5739 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -369,4 +369,6 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int idx); void evlist__invalidate_all_cpus(struct evlist *evlist); bool evlist__has_hybrid_events(struct evlist *evlist); +void evlist__warn_hybrid_group(struct evlist *evlist); +bool evlist__hybrid_exist(struct evlist *evlist); #endif /* __PERF_EVLIST_H */ -- 2.17.1
[PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group
A group mixed with hybrid event and global event is allowed. For example, group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'. e.g. perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a The challenge is their available cpus are not fully matched. For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/' is available on CPU16-CPU23. When getting the group id for group member, we must be very careful because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'. Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID. e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16. Another challenge is for group read. The events in group may be not available on all cpus. For example the leader is a software event and it's available on CPU0-CPU1, but the group member is a hybrid event and it's only available on CPU1. For CPU0, we have only one event, but for CPU1 we have two events. So we need to change the read size according to the real number of events on that cpu. Let's see examples, root@ssp-pwrt-002:~# ./perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a -vvv -- sleep 1 Control descriptor is not initialized perf_event_attr: type 1 size 120 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 21 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 perf_event_attr: type 6 size 120 config 0xa sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 16 group_fd 20 flags 0x8 = 28 sys_perf_event_open: pid -1 cpu 17 group_fd 21 flags 0x8 = 29 sys_perf_event_open: pid -1 cpu 18 group_fd 22 flags 0x8 = 30 sys_perf_event_open: pid -1 cpu 19 group_fd 23 flags 0x8 = 31 sys_perf_event_open: pid -1 cpu 20 group_fd 24 flags 0x8 = 32 sys_perf_event_open: pid -1 cpu 21 group_fd 25 flags 0x8 = 33 sys_perf_event_open: pid -1 cpu 22 group_fd 26 flags 0x8 = 34 sys_perf_event_open: pid -1 cpu 23 group_fd 27 flags 0x8 = 35 cpu-clock: 0: 1002495141 1002496165 1002496165 cpu-clock: 1: 1002494114 1002494741 1002494741 cpu-clock: 2: 1002489755 1002491096 1002491096 cpu-clock: 3: 1002486160 1002487660 1002487660 cpu-clock: 4: 1002482342 1002483461 1002483461 cpu-clock: 5: 1002480405 1002480756 1002480756 cpu-clock: 6: 1002478135 1002478698 1002478698 cpu-clock: 7: 1002476940 1002477251 1002477251 cpu-clock: 8: 1002474281 1002475616 1002475616 cpu-clock: 9: 1002471033 1002471983 1002471983 cpu-clock: 10: 1002468437 1002469553 1002469553 cpu-clock: 11: 1002467503 1002467892 1002467892 cpu-clock: 12: 1002465167 1002466190 1002466190 cpu-clock: 13: 1002463794 1002464109 1002464109
[PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU
For a hybrid system, the perf subsystem doesn't know which PMU the events belong to. So the PMU aware version PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU are introduced. Now define the new output[] entries for these two types. Signed-off-by: Jin Yao --- tools/perf/builtin-script.c | 24 1 file changed, 24 insertions(+) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 5915f19cee55..d0e889e636d5 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -275,6 +275,30 @@ static struct { .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT, }, + [PERF_TYPE_HARDWARE_PMU] = { + .user_set = false, + + .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | + PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | + PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP | + PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET | + PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD, + + .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT, + }, + + [PERF_TYPE_HW_CACHE_PMU] = { + .user_set = false, + + .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID | + PERF_OUTPUT_CPU | PERF_OUTPUT_TIME | + PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP | + PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET | + PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD, + + .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT, + }, + [OUTPUT_TYPE_SYNTH] = { .user_set = false, -- 2.17.1
Re: [PATCH] ASoC: soc-core: fix DMI handling
On Wed, 10 Mar 2021 20:39:27 +0100, Pierre-Louis Bossart wrote: > > When DMI information is not present, trying to assign the card long > name results in the following warning. > > WARNING KERN tegra-audio-graph-card sound: ASoC: no DMI vendor name! > > The initial solution suggested was to test if the card device is an > ACPI one. This causes a regression visible to userspace on all Intel > platforms, with UCM unable to load card profiles based on DMI > information: the card devices are not necessarily ACPI ones, e.g. when > the parent creates platform devices on Intel devices. > > To fix this problem, this patch exports the existing dmi_available > variable and tests it in the ASoC core. > > Fixes: c014170408bc ("ASoC: soc-core: Prevent warning if no DMI table is > present") > Signed-off-by: Pierre-Louis Bossart Reviewed-by: Takashi Iwai Takashi > --- > drivers/firmware/dmi_scan.c | 1 + > sound/soc/soc-core.c| 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c > index d51ca0428bb8..f191a1f901ac 100644 > --- a/drivers/firmware/dmi_scan.c > +++ b/drivers/firmware/dmi_scan.c > @@ -166,6 +166,7 @@ static int __init dmi_checksum(const u8 *buf, u8 len) > static const char *dmi_ident[DMI_STRING_MAX]; > static LIST_HEAD(dmi_devices); > int dmi_available; > +EXPORT_SYMBOL_GPL(dmi_available); > > /* > * Save a DMI string > diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c > index 16ba54eb8164..c7e4600b2dd4 100644 > --- a/sound/soc/soc-core.c > +++ b/sound/soc/soc-core.c > @@ -1574,7 +1574,7 @@ int snd_soc_set_dmi_name(struct snd_soc_card *card, > const char *flavour) > if (card->long_name) > return 0; /* long name already set by driver or from DMI */ > > - if (!is_acpi_device_node(card->dev->fwnode)) > + if (!dmi_available) > return 0; > > /* make up dmi long name as: vendor-product-version-board */ > -- > 2.25.1 >
[PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event
perf-stat has supported some aggregation modes, such as --per-core, --per-socket and etc. While for hybrid event, it may only available on part of cpus. So for --per-core, we need to filter out the unavailable cores, for --per-socket, filter out the unavailable sockets, and so on. Before: root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 1,604,426,524 cpu_core/cycles/ S0-D0-C4 2 1,604,408,224 cpu_core/cycles/ S0-D0-C8 2 1,605,995,644 cpu_core/cycles/ S0-D0-C12 2 1,628,056,554 cpu_core/cycles/ S0-D0-C16 2 1,611,488,734 cpu_core/cycles/ S0-D0-C20 2 1,616,314,761 cpu_core/cycles/ S0-D0-C24 2 1,603,558,295 cpu_core/cycles/ S0-D0-C28 2 1,603,541,128 cpu_core/cycles/ S0-D0-C32 0cpu_core/cycles/ S0-D0-C33 0cpu_core/cycles/ S0-D0-C34 0cpu_core/cycles/ S0-D0-C35 0cpu_core/cycles/ S0-D0-C36 0cpu_core/cycles/ S0-D0-C37 0cpu_core/cycles/ S0-D0-C38 0cpu_core/cycles/ S0-D0-C39 0cpu_core/cycles/ After: root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': S0-D0-C0 2 1,621,781,943 cpu_core/cycles/ S0-D0-C4 2 1,621,755,088 cpu_core/cycles/ S0-D0-C8 2 1,604,276,920 cpu_core/cycles/ S0-D0-C12 2 1,603,446,963 cpu_core/cycles/ S0-D0-C16 2 1,604,231,725 cpu_core/cycles/ S0-D0-C20 2 1,603,435,286 cpu_core/cycles/ S0-D0-C24 2 1,603,387,250 cpu_core/cycles/ S0-D0-C28 2 1,604,173,183 cpu_core/cycles/ Signed-off-by: Jin Yao --- tools/perf/util/stat-display.c | 20 1 file changed, 20 insertions(+) diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index ed37d8e7ea1a..2db7c36a03ad 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -634,6 +634,20 @@ static void aggr_cb(struct perf_stat_config *config, } } +static bool aggr_id_hybrid_matched(struct perf_stat_config *config, + struct evsel *counter, struct aggr_cpu_id id) +{ + struct aggr_cpu_id s; + + for (int i = 0; i < evsel__nr_cpus(counter); i++) { + s = config->aggr_get_id(config, evsel__cpus(counter), i); + if (cpu_map__compare_aggr_cpu_id(s, id)) + return true; + } + + return false; +} + static void print_counter_aggrdata(struct perf_stat_config *config, struct evsel *counter, int s, char *prefix, bool metric_only, @@ -647,6 +661,12 @@ static void print_counter_aggrdata(struct perf_stat_config *config, double uval; ad.id = id = config->aggr_map->map[s]; + + if (perf_pmu__hybrid_exist() && + !aggr_id_hybrid_matched(config, counter, id)) { + return; + } + ad.val = ad.ena = ad.run = 0; ad.nr = 0; if (!collect_data(config, counter, aggr_cb, )) -- 2.17.1
[PATCH v2 14/27] perf stat: Add default hybrid events
Previously if '-e' is not specified in perf stat, some software events and hardware events are added to evlist by default. root@otcpl-adl-s-2:~# ./perf stat -- ./triad_loop Performance counter stats for './triad_loop': 109.43 msec task-clock#0.993 CPUs utilized 1 context-switches #0.009 K/sec 0 cpu-migrations#0.000 K/sec 105 page-faults #0.960 K/sec 401,161,982 cycles#3.666 GHz 1,601,216,357 instructions #3.99 insn per cycle 200,217,751 branches # 1829.686 M/sec 14,555 branch-misses #0.01% of all branches 0.110176860 seconds time elapsed Among the events, cycles, instructions, branches and branch-misses are hardware events. One hybrid platform, two events are created for one hardware event. core cycles, atom cycles, core instructions, atom instructions, core branches, atom branches, core branch-misses, atom branch-misses These events will be added to evlist in order on hybrid platform if '-e' is not set. Since parse_events() has been supported to create two hardware events for one event on hybrid platform, so we just use parse_events(evlist, "cycles,instructions,branches,branch-misses") to create the default events and add them to evlist. After: root@ssp-pwrt-002:~# ./perf stat -- ./triad_loop Performance counter stats for './triad_loop': 290.77 msec task-clock#0.996 CPUs utilized 25 context-switches #0.086 K/sec 13 cpu-migrations#0.045 K/sec 107 page-faults #0.368 K/sec 449,620,957 cpu_core/cycles/ # 1546.334 M/sec cpu_atom/cycles/ (0.00%) 1,601,499,820 cpu_core/instructions/# 5507.870 M/sec cpu_atom/instructions/ (0.00%) 200,272,310 cpu_core/branches/# 688.776 M/sec cpu_atom/branches/ (0.00%) 15,255 cpu_core/branch-misses/ #0.052 M/sec cpu_atom/branch-misses/ (0.00%) 0.291897676 seconds time elapsed We can see two events are created for one hardware event. First one is core event the second one is atom event. One thing is, the shadow stats looks a bit different, now it's just 'M/sec'. The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats need to be improved in future if we want to get the original shadow stats. Signed-off-by: Jin Yao --- tools/perf/builtin-stat.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 6c0a21323814..7a732508b2b4 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1162,6 +1162,13 @@ static int parse_stat_cgroups(const struct option *opt, return parse_cgroups(opt, str, unset); } +static int add_default_hybrid_events(struct evlist *evlist) +{ + struct parse_events_error err; + + return parse_events(evlist, "cycles,instructions,branches,branch-misses", ); +} + static struct option stat_options[] = { OPT_BOOLEAN('T', "transaction", _run, "hardware transaction statistics"), @@ -1637,6 +1644,12 @@ static int add_default_attributes(void) { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES }, +}; + struct perf_event_attr default_sw_attrs[] = { + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES }, + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS }, + { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS }, }; /* @@ -1874,6 +1887,15 @@ static int add_default_attributes(void) } if (!evsel_list->core.nr_entries) { + perf_pmu__scan(NULL); + if (perf_pmu__hybrid_exist()) { + if (evlist__add_default_attrs(evsel_list, + default_sw_attrs) < 0) { + return -1; + } + return add_default_hybrid_events(evsel_list); + } + if (target__has_cpu()) default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK; -- 2.17.1
[PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default
When evlist is empty, for example no '-e' specified in perf record, one default 'cycles' event is added to evlist. While on hybrid platform, it needs to create two default 'cycles' events. One is for core, the other is for atom. This patch actually calls evsel__new_cycles() two times to create two 'cycles' events. root@ssp-pwrt-002:~# ./perf record -vv -- sleep 1 ... perf_event_attr: type 6 size 120 config 0x4 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|PERIOD read_format ID disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 precise_ip 3 sample_id_all1 exclude_guest1 mmap21 comm_exec1 ksymbol 1 bpf_event1 sys_perf_event_open: pid 22300 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 22300 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 22300 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid 22300 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 22300 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 22300 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 22300 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid 22300 cpu 7 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid 22300 cpu 8 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid 22300 cpu 9 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid 22300 cpu 10 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid 22300 cpu 11 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid 22300 cpu 12 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid 22300 cpu 13 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid 22300 cpu 14 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid 22300 cpu 15 group_fd -1 flags 0x8 = 21 perf_event_attr: type 6 size 120 config 0xa { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|ID|PERIOD read_format ID disabled 1 inherit 1 freq 1 enable_on_exec 1 precise_ip 3 sample_id_all1 exclude_guest1 sys_perf_event_open: pid 22300 cpu 16 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid 22300 cpu 17 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid 22300 cpu 18 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid 22300 cpu 19 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid 22300 cpu 20 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid 22300 cpu 21 group_fd -1 flags 0x8 = 27 sys_perf_event_open: pid 22300 cpu 22 group_fd -1 flags 0x8 = 28 sys_perf_event_open: pid 22300 cpu 23 group_fd -1 flags 0x8 = 29 ... We can see one core 'cycles' (0x4) is enabled on cpu0-cpu15 and atom 'cycles' (0xa) is enabled on cpu16-cpu23. Signed-off-by: Jin Yao --- tools/perf/builtin-record.c | 10 ++ tools/perf/util/evlist.c| 32 +++- tools/perf/util/evsel.c | 6 +++--- tools/perf/util/evsel.h | 2 +- 4 files changed, 41 insertions(+), 9 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 35465d1db6dd..363ea1047148 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -2786,10 +2786,12 @@ int cmd_record(int argc, const char **argv) if (record.opts.overwrite) record.opts.tail_synthesize = true; - if (rec->evlist->core.nr_entries == 0 && - __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) { - pr_err("Not enough memory for event selector list\n"); - goto out; + if (rec->evlist->core.nr_entries == 0) { + perf_pmu__scan(NULL); + if (__evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) { + pr_err("Not enough memory for event selector list\n"); + goto out; + } } if
[PATCH v2 11/27] perf parse-events: Support hardware events inside PMU
On hybrid platform, some hardware events are only available on a specific pmu. For example, 'L1-dcache-load-misses' is only available on 'cpu_core' pmu. And even for the event which can be available on both pmus, the user also may want to just enable one event. So now following syntax is supported: cpu_core// cpu_core// cpu_core// cpu_atom// cpu_atom// cpu_atom// It limits the event to be enabled only on a specified pmu. The patch uses this idea, for example, if we use "cpu_core/LLC-loads/", in parse_events_add_pmu(), term->config is "LLC-loads". We create a new "parse_events_state" with the pmu_name and use parse_events__scanner to scan the term->config (the string "LLC-loads" in this example). The parse_events_add_cache() will be called during parsing. The parse_state->pmu_name is used to identify the pmu where the event is enabled. Let's see examples: root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv -- ./triad_loop Control descriptor is not initialized perf_event_attr: type 6 size 120 config 0x4 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest1 sys_perf_event_open: pid 7267 cpu -1 group_fd -1 flags 0x8 = 3 perf_event_attr: type 7 size 120 config 0x40002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest1 sys_perf_event_open: pid 7267 cpu -1 group_fd -1 flags 0x8 = 4 cycles: 0: 449252097 29724 29724 LLC-loads: 0: 1857 29724 29724 cycles: 449252097 29724 29724 LLC-loads: 1857 29724 29724 Performance counter stats for './triad_loop': 449,252,097 cpu_core/cycles/ 1,857 cpu_core/LLC-loads/ 0.298898415 seconds time elapsed root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv -- taskset -c 16 ./triad_loop Control descriptor is not initialized perf_event_attr: type 6 size 120 config 0xa sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest1 sys_perf_event_open: pid 7339 cpu -1 group_fd -1 flags 0x8 = 3 perf_event_attr: type 7 size 120 config 0xa0002 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest1 sys_perf_event_open: pid 7339 cpu -1 group_fd -1 flags 0x8 = 4 cycles: 0: 602020010 343657939 342553275 LLC-loads: 0: 3537 343657939 342553275 cycles: 603961400 343657939 342553275 LLC-loads: 3548 343657939 342553275 Performance counter stats for 'taskset -c 16 ./triad_loop': 603,961,400 cpu_atom/cycles/ (99.68%) 3,548 cpu_atom/LLC-loads/ (99.68%) 0.344904585 seconds time elapsed Signed-off-by: Jin Yao --- tools/perf/util/parse-events.c | 100 +++-- tools/perf/util/parse-events.h | 6 +- tools/perf/util/parse-events.y | 21 ++- 3 files changed, 105 insertions(+), 22 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 09e42245f71a..30435adc7a7b 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -489,7 +489,8 @@ static int create_hybrid_cache_event(struct list_head *list, int *idx, static int
Re: [RFC v2 3/5] arm64: socfpga: rename ARCH_STRATIX10 to ARCH_SOCFPGA64
On 10/03/2021 17:42, Arnd Bergmann wrote: > On Wed, Mar 10, 2021 at 4:54 PM Krzysztof Kozlowski > wrote: >> On 10/03/2021 16:47, Krzysztof Kozlowski wrote: >>> This edac Altera driver is very weird... it uses the same compatible >>> differently depending whether this is 32-bit or 64-bit (e.g. Stratix >>> 10)! On ARMv7 the compatible means for example one IRQ... On ARMv8, we >>> have two. It's quite a new code (2019 from Intel), not some ancient >>> legacy, so it should never have been accepted... >> >> Oh, it's not that horrible as it sounds. They actually have different >> compatibles for edac driver with these differences (e.g. in interrupts). >> They just do not use them and instead check for the basic (common?) >> compatible and architecture... Anyway without testing I am not the >> person to fix the edac driver. > > Ok, This should be fixed properly as you describe, but as a quick hack > it wouldn't be hard to just change the #ifdef to check for CONFIG_64BIT > instead of CONFIG_ARCH_STRATIX10 during the rename of the config > symbol. This would work. The trouble with renaming ARCH_SOCFPGA into ARCH_INTEL_SOCFPGA is that still SOCFPGA will appear in many other Kconfig symbols or even directory paths. Let me use ARCH_INTEL_SOCFPGA for 64bit here and renaming of 32bit a little bit later. Best regards, Krzysztof
[PATCH v2 12/27] perf parse-events: Support hybrid raw events
On hybrid platform, same raw event is possible to be available on both cpu_core pmu and cpu_atom pmu. So it's supported to create two raw events for one event encoding. root@ssp-pwrt-002:~# ./perf stat -e r3c -a -vv -- sleep 1 Control descriptor is not initialized perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 perf_event_attr: type 10 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 21 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 r3c: 0: 807321251 1002093589 1002093589 r3c: 1: 807321699 1002088203 1002088203 r3c: 2: 802163010 1002086701 1002086701 r3c: 3: 802162967 1002080660 1002080660 r3c: 4: 801769096 1002077047 1002077047 r3c: 5: 801766174 1002071197 1002071197 r3c: 6: 804147338 1002065696 1002065696 r3c: 7: 804141152 1002055345 1002055345 r3c: 8: 801743651 1002043364 1002043364 r3c: 9: 801742285 1002036921 1002036921 r3c: 10: 804083297 1002032502 1002032502 r3c: 11: 804084735 1002027992 1002027992 r3c: 12: 804504507 1002026371 1002026371 r3c: 13: 804504679 1002022466 1002022466 r3c: 14: 811424953 1002021767 1002021767 r3c: 15: 811423320 1002021594 1002021594 r3c: 0: 810883154 1002021654 1002021654 r3c: 1: 810881069 1002017334 1002017334 r3c: 2: 810878689 1002014010 1002014010 r3c: 3: 810876654 1002011516 1002011516 r3c: 4: 800488244 1002007858 1002007858 r3c: 5: 800486260 1002003635 1002003635 r3c: 6: 800483374 1002000384 1002000384 r3c: 7: 800481011 1001997122 1001997122 r3c: 12874304114 16032851415 16032851415 r3c: 6445458455 8016073513 8016073513 Performance counter stats for 'system wide': 12,874,304,114 cpu_core/r3c/ 6,445,458,455 cpu_atom/r3c/ 1.002310991 seconds time elapsed It also supports the raw event inside pmu. Syntax is similar: cpu_core// cpu_atom// root@ssp-pwrt-002:~# ./perf stat -e cpu_core/r3c/ -vv -- ./triad_loop Control descriptor is not initialized perf_event_attr: type 4 size 120 config 0x3c sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest1 sys_perf_event_open: pid 12340 cpu -1 group_fd -1 flags 0x8 = 3 cpu_core/r3c/: 0: 449000613
[PATCH v2 10/27] perf parse-events: Create two hybrid cache events
For cache events, they have pre-defined configs. The kernel needs to know where the cache event comes from (e.g. from cpu_core pmu or from cpu_atom pmu). But the perf type 'PERF_TYPE_HW_CACHE' can't carry pmu information. So the kernel introduces a new type 'PERF_TYPE_HW_CACHE_PMU'. The new attr.config layout for PERF_TYPE_HW_CACHE_PMU is 0xDD00CCBBAA AA: hardware cache ID BB: hardware cache op ID CC: hardware cache op result ID DD: PMU type ID Similar as hardware event, PMU type ID is retrieved from sysfs. When enabling a hybrid cache event without specified pmu, such as, 'perf stat -e L1-dcache-loads -a', two events are created automatically. One is for atom, the other is for core. root@ssp-pwrt-002:~# ./perf stat -e L1-dcache-loads -vv -a -- sleep 1 Control descriptor is not initialized perf_event_attr: type 7 size 120 config 0x4 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 perf_event_attr: type 7 size 120 config 0xa sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 21 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 L1-dcache-loads: 0: 13103284 1002535421 1002535421 L1-dcache-loads: 1: 12995797 1002532807 1002532807 L1-dcache-loads: 2: 13428186 1002528572 1002528572 L1-dcache-loads: 3: 12913469 1002517437 1002517437 L1-dcache-loads: 4: 12857843 1002507079 1002507079 L1-dcache-loads: 5: 12812079 1002498279 1002498279 L1-dcache-loads: 6: 12829938 1002490010 1002490010 L1-dcache-loads: 7: 12807085 1002481860 1002481860 L1-dcache-loads: 8: 12907189 1002473181 1002473181 L1-dcache-loads: 9: 12823095 1002465895 1002465895 L1-dcache-loads: 10: 12892770 1002459322 1002459322 L1-dcache-loads: 11: 12789718 1002451607 1002451607 L1-dcache-loads: 12: 12838931 1002442632 1002442632 L1-dcache-loads: 13: 12803756 1002434133 1002434133 L1-dcache-loads: 14: 12840574 1002426060 1002426060 L1-dcache-loads: 15: 12799075 1002415964 1002415964 L1-dcache-loads: 0: 39394457 1002406287 1002406287 L1-dcache-loads: 1: 39372632 1002400502 1002400502 L1-dcache-loads: 2: 39405247 1002394865 1002394865 L1-dcache-loads: 3: 39400547 1002389099 1002389099 L1-dcache-loads: 4: 39410752 1002383106 1002383106 L1-dcache-loads: 5: 39402983 1002375365 1002375365 L1-dcache-loads: 6: 39388775 1002369374 1002369374 L1-dcache-loads: 7: 39408527 1002363344 1002363344 L1-dcache-loads: 206442789 16039660259 16039660259 L1-dcache-loads: 315183920 8019081942 8019081942 Performance counter stats for 'system wide': 206,442,789 cpu_core/L1-dcache-loads/ 315,183,920 cpu_atom/L1-dcache-loads/ 1.002751663 seconds
[PATCH v2 09/27] perf parse-events: Create two hybrid hardware events
For hardware events, they have pre-defined configs. The kernel needs to know where the event comes from (e.g. from cpu_core pmu or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE' can't carry pmu information. So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'. The new attr.config layout for PERF_TYPE_HARDWARE_PMU is: 0xDD00AA AA: original hardware event ID DD: PMU type ID PMU type ID is retrieved from sysfs. For example, cat /sys/devices/cpu_atom/type 10 cat /sys/devices/cpu_core/type 4 When enabling a hybrid hardware event without specified pmu, such as, 'perf stat -e cycles -a', two events are created automatically. One is for atom, the other is for core. root@ssp-pwrt-002:~# ./perf stat -e cycles -vv -a -- sleep 1 Control descriptor is not initialized perf_event_attr: type 6 size 120 config 0x4 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 = 4 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 = 8 sys_perf_event_open: pid -1 cpu 5 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid -1 cpu 6 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid -1 cpu 7 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid -1 cpu 8 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid -1 cpu 9 group_fd -1 flags 0x8 = 13 sys_perf_event_open: pid -1 cpu 10 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid -1 cpu 11 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid -1 cpu 12 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid -1 cpu 13 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid -1 cpu 14 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid -1 cpu 15 group_fd -1 flags 0x8 = 19 perf_event_attr: type 6 size 120 config 0xa sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest1 sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid -1 cpu 17 group_fd -1 flags 0x8 = 21 sys_perf_event_open: pid -1 cpu 18 group_fd -1 flags 0x8 = 22 sys_perf_event_open: pid -1 cpu 19 group_fd -1 flags 0x8 = 23 sys_perf_event_open: pid -1 cpu 20 group_fd -1 flags 0x8 = 24 sys_perf_event_open: pid -1 cpu 21 group_fd -1 flags 0x8 = 25 sys_perf_event_open: pid -1 cpu 22 group_fd -1 flags 0x8 = 26 sys_perf_event_open: pid -1 cpu 23 group_fd -1 flags 0x8 = 27 cycles: 0: 810754998 1002563650 1002563650 cycles: 1: 810749852 1002559947 1002559947 cycles: 2: 808096005 1002555036 1002555036 cycles: 3: 808090246 1002543496 1002543496 cycles: 4: 800933425 1002536659 1002536659 cycles: 5: 800928573 1002528386 1002528386 cycles: 6: 800924347 1002520527 1002520527 cycles: 7: 800922009 1002513176 1002513176 cycles: 8: 800919624 1002507326 1002507326 cycles: 9: 800917204 1002500663 1002500663 cycles: 10: 802096579 1002494280 1002494280 cycles: 11: 802093770 1002486404 1002486404 cycles: 12: 803284338 1002479491 1002479491 cycles: 13: 803277609 1002469777 1002469777 cycles: 14: 800875902 1002458861 1002458861 cycles: 15: 800873241 1002451350 1002451350 cycles: 0: 800837379 1002444645 1002444645 cycles: 1: 800833400 1002438505 1002438505 cycles: 2: 800829291 1002433698 1002433698 cycles: 3: 800824390 1002427584 1002427584 cycles: 4: 800819360 1002422099 1002422099 cycles: 5: 800814787 1002415845 1002415845 cycles: 6: 800810125 1002410301 1002410301 cycles: 7: 800791893 1002386845 1002386845 cycles: 12855737722 16040169029 16040169029 cycles: 6406560625 8019379522 8019379522 Performance counter stats for 'system wide': 12,855,737,722 cpu_core/cycles/ 6,406,560,625 cpu_atom/cycles/ 1.002774658 seconds time elapsed type 6 is PERF_TYPE_HARDWARE_PMU. 0x4 in 0x4 indicates the cpu_core pmu. 0xa in 0xa indicates the cpu_atom pmu. Signed-off-by: Jin Yao --- tools/perf/util/parse-events.c | 73
[PATCH v2 08/27] perf stat: Uniquify hybrid event name
It would be useful to tell user the pmu which the event belongs to. perf-stat has supported '--no-merge' option and it can print the pmu name after the event name, such as: "cycles [cpu_core]" Now this option is enabled by default for hybrid platform but change the format to: "cpu_core/cycles/" Signed-off-by: Jin Yao --- tools/perf/builtin-stat.c | 3 +++ tools/perf/util/stat-display.c | 12 ++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 68ecf68699a9..6c0a21323814 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2390,6 +2390,9 @@ int cmd_stat(int argc, const char **argv) evlist__check_cpu_maps(evsel_list); + if (perf_pmu__hybrid_exist()) + stat_config.no_merge = true; + /* * Initialize thread_map with comm names, * so we could print it out on output. diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c index 7f09cdaf5b60..ed37d8e7ea1a 100644 --- a/tools/perf/util/stat-display.c +++ b/tools/perf/util/stat-display.c @@ -526,6 +526,7 @@ static void uniquify_event_name(struct evsel *counter) { char *new_name; char *config; + int ret; if (counter->uniquified_name || !counter->pmu_name || !strncmp(counter->name, counter->pmu_name, @@ -540,8 +541,15 @@ static void uniquify_event_name(struct evsel *counter) counter->name = new_name; } } else { - if (asprintf(_name, -"%s [%s]", counter->name, counter->pmu_name) > 0) { + if (perf_pmu__hybrid_exist()) { + ret = asprintf(_name, "%s/%s/", + counter->pmu_name, counter->name); + } else { + ret = asprintf(_name, "%s [%s]", + counter->name, counter->pmu_name); + } + + if (ret) { free(counter->name); counter->name = new_name; } -- 2.17.1
[PATCH v2 06/27] perf pmu: Add hybrid helper functions
The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu can be used to identify the hybrid platform and return the found hybrid cpu pmu. All the detected hybrid pmus have been saved in 'perf_pmu__hybrid_pmus' list. So we just need to search this list. perf_pmu__hybrid_type_to_pmu converts the user specified string to hybrid pmu name. This is used to support the '--cputype' option in next patches. perf_pmu__hybrid_exist checks the existing of hybrid pmu. Signed-off-by: Jin Yao --- tools/perf/util/pmu-hybrid.c | 40 tools/perf/util/pmu-hybrid.h | 11 ++ 2 files changed, 51 insertions(+) diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c index 7316bf46e54b..86ba84d9469c 100644 --- a/tools/perf/util/pmu-hybrid.c +++ b/tools/perf/util/pmu-hybrid.c @@ -33,3 +33,43 @@ bool perf_pmu__hybrid_mounted(const char *name) snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name); return file_available(path); } + +struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name) +{ + struct perf_pmu *pmu; + + if (!name) + return NULL; + + perf_pmu__for_each_hybrid_pmu(pmu) { + if (!strcmp(name, pmu->name)) + return pmu; + } + + return NULL; +} + +bool perf_pmu__is_hybrid(const char *name) +{ + return perf_pmu__find_hybrid_pmu(name) != NULL; +} + +char *perf_pmu__hybrid_type_to_pmu(const char *type) +{ + char *pmu_name = NULL; + + if (asprintf(_name, "cpu_%s", type) < 0) + return NULL; + + if (perf_pmu__is_hybrid(pmu_name)) + return pmu_name; + + /* +* pmu may be not scanned, check the sysfs. +*/ + if (perf_pmu__hybrid_mounted(pmu_name)) + return pmu_name; + + free(pmu_name); + return NULL; +} diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h index 35bed3714438..7fb2246e939a 100644 --- a/tools/perf/util/pmu-hybrid.h +++ b/tools/perf/util/pmu-hybrid.h @@ -15,4 +15,15 @@ extern struct list_head perf_pmu__hybrid_pmus; bool perf_pmu__hybrid_mounted(const char *name); +struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name); + +bool perf_pmu__is_hybrid(const char *name); + +char *perf_pmu__hybrid_type_to_pmu(const char *type); + +static inline bool perf_pmu__hybrid_exist(void) +{ + return !list_empty(_pmu__hybrid_pmus); +} + #endif /* __PMU_HYBRID_H */ -- 2.17.1
[PATCH v2 04/27] perf pmu: Save pmu name
On hybrid platform, one event is available on one pmu (such as, available on cpu_core or on cpu_atom). This patch saves the pmu name to the pmu field of struct perf_pmu_alias. Then next we can know the pmu which the event can be available on. Signed-off-by: Jin Yao --- tools/perf/util/pmu.c | 10 +- tools/perf/util/pmu.h | 1 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 54e586bf19a5..45d8db1af8d2 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias) zfree(>str); zfree(>metric_expr); zfree(>metric_name); + zfree(>pmu); parse_events_terms__purge(>terms); free(newalias); } @@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias, list_for_each_entry(a, alist, list) { if (!strcasecmp(newalias->name, a->name)) { + if (newalias->pmu && a->pmu && + !strcasecmp(newalias->pmu, a->pmu)) { + continue; + } perf_pmu_update_alias(a, newalias); perf_pmu_free_alias(newalias); return true; @@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, int num; char newval[256]; char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL, -*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL; +*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL, +*pmu = NULL; if (pe) { long_desc = (char *)pe->long_desc; @@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, metric_expr = (char *)pe->metric_expr; metric_name = (char *)pe->metric_name; deprecated = (char *)pe->deprecated; + pmu = (char *)pe->pmu; } alias = malloc(sizeof(*alias)); @@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, } alias->per_pkg = perpkg && sscanf(perpkg, "%d", ) == 1 && num == 1; alias->str = strdup(newval); + alias->pmu = pmu ? strdup(pmu) : NULL; if (deprecated) alias->deprecated = true; diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 8164388478c6..0e724d5b84c6 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -72,6 +72,7 @@ struct perf_pmu_alias { bool deprecated; char *metric_expr; char *metric_name; + char *pmu; }; struct perf_pmu *perf_pmu__find(const char *name); -- 2.17.1
[PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom"
For some Intel platforms, such as Alderlake, which is a hybrid platform and it consists of atom cpu and core cpu. Each cpu has dedicated event list. Part of events are available on core cpu, part of events are available on atom cpu. The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in json is added with a new field "Unit" to indicate which pmu the event is available on. For example, one event in cache.json, { "BriefDescription": "Counts the number of load ops retired that", "CollectPEBSRecord": "2", "Counter": "0,1,2,3", "EventCode": "0xd2", "EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO", "PEBScounters": "0,1,2,3", "SampleAfterValue": "103", "UMask": "0x80", "Unit": "cpu_atom" }, The unit "cpu_atom" indicates this event is only availabe on "cpu_atom". In generated pmu-events.c, we can see: { .name = "mem_load_uops_retired_misc.mmio", .event = "period=103,umask=0x80,event=0xd2", .desc = "Counts the number of load ops retired that. Unit: cpu_atom ", .topic = "cache", .pmu = "cpu_atom", }, But if without this patch, the "uncore_" prefix is added before "cpu_atom", such as: .pmu = "uncore_cpu_atom" That would be a wrong pmu. Signed-off-by: Jin Yao --- tools/perf/pmu-events/jevents.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c index e1f3f5c8c550..b1a15f57c9ad 100644 --- a/tools/perf/pmu-events/jevents.c +++ b/tools/perf/pmu-events/jevents.c @@ -285,6 +285,8 @@ static struct map { { "imx8_ddr", "imx8_ddr" }, { "L3PMC", "amd_l3" }, { "DFPMC", "amd_df" }, + { "cpu_core", "cpu_core" }, + { "cpu_atom", "cpu_atom" }, {} }; -- 2.17.1
[PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias
Simplify the arguments of __perf_pmu__new_alias() by passing the whole 'struct pme_event' pointer. Signed-off-by: Jin Yao --- tools/perf/util/pmu.c | 36 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 44ef28302fc7..54e586bf19a5 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -306,18 +306,25 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias *newalias, } static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, -char *desc, char *val, -char *long_desc, char *topic, -char *unit, char *perpkg, -char *metric_expr, -char *metric_name, -char *deprecated) +char *desc, char *val, struct pmu_event *pe) { struct parse_events_term *term; struct perf_pmu_alias *alias; int ret; int num; char newval[256]; + char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL, +*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL; + + if (pe) { + long_desc = (char *)pe->long_desc; + topic = (char *)pe->topic; + unit = (char *)pe->unit; + perpkg = (char *)pe->perpkg; + metric_expr = (char *)pe->metric_expr; + metric_name = (char *)pe->metric_name; + deprecated = (char *)pe->deprecated; + } alias = malloc(sizeof(*alias)); if (!alias) @@ -406,8 +413,7 @@ static int perf_pmu__new_alias(struct list_head *list, char *dir, char *name, FI /* Remove trailing newline from sysfs file */ strim(buf); - return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, NULL, -NULL, NULL, NULL, NULL); + return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL); } static inline bool pmu_alias_info_file(char *name) @@ -793,11 +799,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, struct perf_pmu *pmu, /* need type casts to override 'const' */ __perf_pmu__new_alias(head, NULL, (char *)pe->name, (char *)pe->desc, (char *)pe->event, - (char *)pe->long_desc, (char *)pe->topic, - (char *)pe->unit, (char *)pe->perpkg, - (char *)pe->metric_expr, - (char *)pe->metric_name, - (char *)pe->deprecated); + pe); } } @@ -864,13 +866,7 @@ static int pmu_add_sys_aliases_iter_fn(struct pmu_event *pe, void *data) (char *)pe->name, (char *)pe->desc, (char *)pe->event, - (char *)pe->long_desc, - (char *)pe->topic, - (char *)pe->unit, - (char *)pe->perpkg, - (char *)pe->metric_expr, - (char *)pe->metric_name, - (char *)pe->deprecated); + pe); } return 0; -- 2.17.1
[PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus
On hybrid platform, atom events can be only enabled on atom CPUs. Core events can be only enabled on core CPUs. So for a hybrid event, it can be only enabled on it's own CPUs. But the problem for current perf is, the cpus for evsel (via PMU sysfs) have been merged to evsel_list->core.all_cpus. It might be all CPUs. So we need to figure out one way to let the hybrid event only use it's own CPUs. The idea is to create a new evlist__invalidate_all_cpus to invalidate the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1 for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus. We will see following code piece in patch. if (cpu == -1 && !evlist->thread_mode) evsel__enable_cpus(pos); It lets the event be enabled on event's own cpus. Signed-off-by: Jin Yao --- tools/perf/builtin-stat.c | 37 ++- tools/perf/util/evlist.c | 72 -- tools/perf/util/evlist.h | 4 ++ tools/perf/util/evsel.h| 8 tools/perf/util/python-ext-sources | 2 + 5 files changed, 117 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 2e2e4a8345ea..68ecf68699a9 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -393,6 +393,18 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu) return 0; } +static int read_counter_cpus(struct evsel *counter, struct timespec *rs) +{ + int cpu, nr_cpus, err = 0; + struct perf_cpu_map *cpus = evsel__cpus(counter); + + nr_cpus = cpus ? cpus->nr : 1; + for (cpu = 0; cpu < nr_cpus; cpu++) + err = read_counter_cpu(counter, rs, cpu); + + return err; +} + static int read_affinity_counters(struct timespec *rs) { struct evsel *counter; @@ -414,8 +426,14 @@ static int read_affinity_counters(struct timespec *rs) if (evsel__cpu_iter_skip(counter, cpu)) continue; if (!counter->err) { - counter->err = read_counter_cpu(counter, rs, - counter->cpu_iter - 1); + if (cpu == -1 && !evsel_list->thread_mode) { + counter->err = read_counter_cpus(counter, rs); + } else if (evsel_list->thread_mode) { + counter->err = read_counter_cpu(counter, rs, 0); + } else { + counter->err = read_counter_cpu(counter, rs, + counter->cpu_iter - 1); + } } } } @@ -781,6 +799,21 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx) if (group) evlist__set_leader(evsel_list); + /* +* On hybrid platform, the cpus for evsel (via PMU sysfs) have been +* merged to evsel_list->core.all_cpus. We use evlist__invalidate_all_cpus +* to invalidate the evsel_list->core.all_cpus then evlist__for_each_cpu +* returns cpu -1 for hybrid evsel. If cpu is -1, hybrid evsel will +* use it's own cpus. +*/ + if (evlist__has_hybrid_events(evsel_list)) { + evlist__invalidate_all_cpus(evsel_list); + if (!target__has_cpu() || + target__has_per_thread()) { + evsel_list->thread_mode = true; + } + } + if (affinity__setup() < 0) return -1; diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 882cd1f721d9..3ee12fcd0c9f 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -381,7 +381,8 @@ bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu) bool evsel__cpu_iter_skip(struct evsel *ev, int cpu) { if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) { - ev->cpu_iter++; + if (cpu != -1) + ev->cpu_iter++; return false; } return true; @@ -410,6 +411,16 @@ static int evlist__is_enabled(struct evlist *evlist) return false; } +static void evsel__disable_cpus(struct evsel *evsel) +{ + int cpu, nr_cpus; + struct perf_cpu_map *cpus = evsel__cpus(evsel); + + nr_cpus = cpus ? cpus->nr : 1; + for (cpu = 0; cpu < nr_cpus; cpu++) + evsel__disable_cpu(evsel, cpu); +} + static void __evlist__disable(struct evlist *evlist, char *evsel_name) { struct evsel *pos; @@ -436,7 +447,12 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name) has_imm = true; if (pos->immediate != imm)
[PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list
We identify the cpu_core pmu and cpu_atom pmu by explicitly checking following files: For cpu_core, checks: "/sys/bus/event_source/devices/cpu_core/cpus" For cpu_atom, checks: "/sys/bus/event_source/devices/cpu_atom/cpus" If the 'cpus' file exists, the pmu exists. But in order not to hardcode the "cpu_core" and "cpu_atom", and make the code in a generic way. So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid pmu exists. All the detected hybrid pmus are linked to a global list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu. Signed-off-by: Jin Yao --- tools/perf/util/Build| 1 + tools/perf/util/pmu-hybrid.c | 35 +++ tools/perf/util/pmu-hybrid.h | 18 ++ tools/perf/util/pmu.c| 9 - tools/perf/util/pmu.h| 4 5 files changed, 66 insertions(+), 1 deletion(-) create mode 100644 tools/perf/util/pmu-hybrid.c create mode 100644 tools/perf/util/pmu-hybrid.h diff --git a/tools/perf/util/Build b/tools/perf/util/Build index e3e12f9d4733..37a8a63c7195 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -69,6 +69,7 @@ perf-y += parse-events-bison.o perf-y += pmu.o perf-y += pmu-flex.o perf-y += pmu-bison.o +perf-y += pmu-hybrid.o perf-y += trace-event-read.o perf-y += trace-event-info.o perf-y += trace-event-scripting.o diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c new file mode 100644 index ..7316bf46e54b --- /dev/null +++ b/tools/perf/util/pmu-hybrid.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "fncache.h" +#include "pmu-hybrid.h" + +LIST_HEAD(perf_pmu__hybrid_pmus); + +bool perf_pmu__hybrid_mounted(const char *name) +{ + char path[PATH_MAX]; + const char *sysfs; + + if (strncmp(name, "cpu_", 4)) + return false; + + sysfs = sysfs__mountpoint(); + if (!sysfs) + return false; + + snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name); + return file_available(path); +} diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h new file mode 100644 index ..35bed3714438 --- /dev/null +++ b/tools/perf/util/pmu-hybrid.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __PMU_HYBRID_H +#define __PMU_HYBRID_H + +#include +#include +#include +#include +#include "pmu.h" + +extern struct list_head perf_pmu__hybrid_pmus; + +#define perf_pmu__for_each_hybrid_pmu(pmu) \ + list_for_each_entry(pmu, _pmu__hybrid_pmus, hybrid_list) + +bool perf_pmu__hybrid_mounted(const char *name); + +#endif /* __PMU_HYBRID_H */ diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 45d8db1af8d2..08280a3e45a8 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -25,6 +25,7 @@ #include "string2.h" #include "strbuf.h" #include "fncache.h" +#include "pmu-hybrid.h" struct perf_pmu perf_pmu__fake; @@ -613,7 +614,6 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path) */ #define SYS_TEMPLATE_ID"./bus/event_source/devices/%s/identifier" #define CPUS_TEMPLATE_UNCORE "%s/bus/event_source/devices/%s/cpumask" -#define CPUS_TEMPLATE_CPU "%s/bus/event_source/devices/%s/cpus" static struct perf_cpu_map *pmu_cpumask(const char *name) { @@ -645,6 +645,9 @@ static bool pmu_is_uncore(const char *name) char path[PATH_MAX]; const char *sysfs; + if (perf_pmu__hybrid_mounted(name)) + return false; + sysfs = sysfs__mountpoint(); snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name); return file_available(path); @@ -946,6 +949,7 @@ static struct perf_pmu *pmu_lookup(const char *name) pmu->is_uncore = pmu_is_uncore(name); if (pmu->is_uncore) pmu->id = pmu_id(name); + pmu->is_hybrid = perf_pmu__hybrid_mounted(name); pmu->max_precise = pmu_max_precise(name); pmu_add_cpu_aliases(, pmu); pmu_add_sys_aliases(, pmu); @@ -957,6 +961,9 @@ static struct perf_pmu *pmu_lookup(const char *name) list_splice(, >aliases); list_add_tail(>list, ); + if (pmu->is_hybrid) + list_add_tail(>hybrid_list, _pmu__hybrid_pmus); + pmu->default_config = perf_pmu__get_default_config(pmu); return pmu; diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 0e724d5b84c6..3b9b4def6032 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include "parse-events.h" #include "pmu-events/pmu-events.h" @@ -19,6 +20,7 @@ enum { #define PERF_PMU_FORMAT_BITS 64 #define EVENT_SOURCE_DEVICE_PATH
[PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h
To get the changes in: Liang Kan's patch ("perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU") Kan's patch is in review at the moment, but the following perf tool patches need this interface for hybrid support. This patch can be removed after Kan's patch is upstreamed. Signed-off-by: Jin Yao --- tools/include/uapi/linux/perf_event.h | 26 ++ 1 file changed, 26 insertions(+) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index ad15e40d7f5d..c0a511eea498 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -33,6 +33,8 @@ enum perf_type_id { PERF_TYPE_HW_CACHE = 3, PERF_TYPE_RAW = 4, PERF_TYPE_BREAKPOINT= 5, + PERF_TYPE_HARDWARE_PMU = 6, + PERF_TYPE_HW_CACHE_PMU = 7, PERF_TYPE_MAX, /* non-ABI */ }; @@ -94,6 +96,30 @@ enum perf_hw_cache_op_result_id { PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */ }; +/* + * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE* + * PERF_TYPE_HARDWARE: 0xAA + * AA: hardware event ID + * PERF_TYPE_HW_CACHE: 0xCCBBAA + * AA: hardware cache ID + * BB: hardware cache op ID + * CC: hardware cache op result ID + * PERF_TYPE_HARDWARE_PMU: 0xDD00AA + * AA: hardware event ID + * DD: PMU type ID + * PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA + * AA: hardware cache ID + * BB: hardware cache op ID + * CC: hardware cache op result ID + * DD: PMU type ID + */ +#define PERF_HW_CACHE_ID_SHIFT 0 +#define PERF_HW_CACHE_OP_ID_SHIFT 8 +#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT 16 +#define PERF_HW_CACHE_EVENT_MASK 0xff + +#define PERF_PMU_TYPE_SHIFT32 + /* * Special "software" events provided by the kernel, even if the hardware * does not support performance events. These events measure various -- 2.17.1
[PATCH v2 00/27] perf tool: AlderLake hybrid support series 1
AlderLake uses a hybrid architecture utilizing Golden Cove cores (core cpu) and Gracemont cores (atom cpu). Each cpu has dedicated event list. Some events are available on core cpu, some events are available on atom cpu and some events can be available on both. Kernel exports new pmus "cpu_core" and "cpu_atom" through sysfs: /sys/devices/cpu_core /sys/devices/cpu_atom cat /sys/devices/cpu_core/cpus 0-15 cat /sys/devices/cpu_atom/cpus 16-23 In this example, core cpus are 0-15 and atom cpus are 16-23. To enable a core only event or atom only event: cpu_core// or cpu_atom// Count the 'cycles' event on core cpus. # perf stat -e cpu_core/cycles/ -a -- sleep 1 Performance counter stats for 'system wide': 12,853,951,349 cpu_core/cycles/ 1.002581249 seconds time elapsed If one event is available on both atom cpu and core cpu, two events are created automatically. # perf stat -e cycles -a -- sleep 1 Performance counter stats for 'system wide': 12,856,467,438 cpu_core/cycles/ 6,404,634,785 cpu_atom/cycles/ 1.002453013 seconds time elapsed Group is supported if the events are from same pmu, otherwise a warning is displayed. # perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -a -- sleep 1 Performance counter stats for 'system wide': 12,863,866,968 cpu_core/cycles/ 554,795,017 cpu_core/instructions/ 1.002616117 seconds time elapsed # perf stat -e '{cpu_core/cycles/,cpu_atom/instructions/}' -a -- sleep 1 WARNING: Group has events from different hybrid PMUs Performance counter stats for 'system wide': 12,863,344,830 cpu_core/cycles/ cpu_atom/instructions/ 1.002568830 seconds time elapsed Note that, since the whole patchset for AlderLake hybrid support is very large (40+ patches), for simplicity, it's splitted into several patch series. The patch series 1 only supports the basic functionality. The other supports for perf-c2c/perf-mem/topdown/metrics/topology header and etc will be added in follow-up series. v2: --- - Drop kernel patches (Kan posted the series "Add Alder Lake support for perf (kernel)" separately). - Drop the patches for perf-c2c/perf-mem/topdown/metrics/topology header supports, which will be added in series 2 or series 3. - Simplify the arguments of __perf_pmu__new_alias() by passing the 'struct pme_event' pointer. - Check sysfs validity before access. - Use pmu style event name, such as "cpu_core/cycles/". - Move command output two chars to the right. - Move pmu hybrid functions to new created pmu-hybrid.c/pmu-hybrid.h. This is to pass the perf test python case. Jin Yao (27): tools headers uapi: Update tools's copy of linux/perf_event.h perf jevents: Support unit value "cpu_core" and "cpu_atom" perf pmu: Simplify arguments of __perf_pmu__new_alias perf pmu: Save pmu name perf pmu: Save detected hybrid pmus to a global pmu list perf pmu: Add hybrid helper functions perf evlist: Hybrid event uses its own cpus perf stat: Uniquify hybrid event name perf parse-events: Create two hybrid hardware events perf parse-events: Create two hybrid cache events perf parse-events: Support hardware events inside PMU perf parse-events: Support hybrid raw events perf evlist: Create two hybrid 'cycles' events by default perf stat: Add default hybrid events perf stat: Filter out unmatched aggregation for hybrid event perf evlist: Warn as events from different hybrid PMUs in a group perf evsel: Adjust hybrid event and global event mixed group perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU perf tests: Add hybrid cases for 'Parse event definition strings' test perf tests: Add hybrid cases for 'Roundtrip evsel->name' test perf tests: Skip 'Setup struct perf_event_attr' test for hybrid perf tests: Support 'Track with sched_switch' test for hybrid perf tests: Support 'Parse and process metrics' test for hybrid perf tests: Support 'Session topology' test for hybrid perf tests: Support 'Convert perf time to TSC' test for hybrid perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid perf Documentation: Document intel-hybrid support tools/include/uapi/linux/perf_event.h | 26 ++ tools/perf/Documentation/intel-hybrid.txt | 228 + tools/perf/Documentation/perf-record.txt | 1 + tools/perf/Documentation/perf-stat.txt | 2 + tools/perf/builtin-record.c| 13 +- tools/perf/builtin-script.c| 24 ++ tools/perf/builtin-stat.c | 69 - tools/perf/pmu-events/jevents.c| 2 + tools/perf/tests/attr.c| 5 + tools/perf/tests/evsel-roundtrip-name.c| 19 +- tools/perf/tests/parse-events.c| 171 + tools/perf/tests/parse-metric.c| 11 +- tools/perf/tests/perf-time-to-tsc.c| 16 ++
[RFC PATCH v2] Bluetooth: hci_qca: Add device_may_wakeup support
Based on device may wakeup status, Bluez stack will enable/disable passive scanning with whitelist in BT controller while suspending. As interrupt from BT SoC is handled by UART driver,we need to use device handle of UART driver to get the status of device may wakeup Signed-off-by: Venkata Lakshmi Narayana Gubba --- drivers/bluetooth/hci_qca.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index de36af6..73af901 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1571,6 +1571,20 @@ static void qca_cmd_timeout(struct hci_dev *hdev) mutex_unlock(>hci_memdump_lock); } +static bool qca_prevent_wake(struct hci_dev *hdev) +{ + struct hci_uart *hu = hci_get_drvdata(hdev); + bool wakeup; + + /* UART driver handles the interrupt from BT SoC.So we need to use +* device handle of UART driver to get the status of device may wakeup. +*/ + wakeup = device_may_wakeup(hu->serdev->ctrl->dev.parent); + bt_dev_dbg(hu->hdev, "wakeup status : %d", wakeup); + + return !wakeup; +} + static int qca_wcn3990_init(struct hci_uart *hu) { struct qca_serdev *qcadev; @@ -1721,6 +1735,7 @@ static int qca_setup(struct hci_uart *hu) qca_debugfs_init(hdev); hu->hdev->hw_error = qca_hw_error; hu->hdev->cmd_timeout = qca_cmd_timeout; + hu->hdev->prevent_wake = qca_prevent_wake; } else if (ret == -ENOENT) { /* No patch/nvm-config found, run with original fw/config */ set_bit(QCA_ROM_FW, >flags); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v3 0/4] KVM: arm64: Add VLPI migration support on GICv4.1
Hi, Sorry to bother again, I am really hoping a response for this series. :-) Thanks, Shenming On 2021/2/26 16:58, Shenming Lu wrote: > Hi Marc, > > Gentle ping. Does this series need any further modification? Wish you can > pick it up. :-) > > Thanks, > Shenming > > On 2021/1/27 20:13, Shenming Lu wrote: >> Hi Marc, sorry for the late commit. >> >> In GICv4.1, migration has been supported except for (directly-injected) >> VLPI. And GICv4.1 Spec explicitly gives a way to get the VLPI's pending >> state (which was crucially missing in GICv4.0). So we make VLPI migration >> capable on GICv4.1 in this patch set. >> >> In order to support VLPI migration, we need to save and restore all >> required configuration information and pending states of VLPIs. But >> in fact, the configuration information of VLPIs has already been saved >> (or will be reallocated on the dst host...) in vgic(kvm) migration. >> So we only have to migrate the pending states of VLPIs specially. >> >> Below is the related workflow in migration. >> >> On the save path: >> In migration completion: >> pause all vCPUs >> | >> call each VM state change handler: >> pause other devices (just keep from sending interrupts, >> and >> such as VFIO migration protocol has already realized it >> [1]) >> | >> flush ITS tables into guest RAM >> | >> flush RDIST pending tables (also flush VLPI state here) >> | >> ... >> On the resume path: >> load each device's state: >> restore ITS tables (include pending tables) from guest RAM >> | >> for other (PCI) devices (paused), if configured to have VLPIs, >> establish the forwarding paths of their VLPIs (and transfer >> the pending states from kvm's vgic to VPT here) >> >> We have tested this series in VFIO migration, and found some related >> issues in QEMU [2]. >> >> Links: >> [1] vfio: UAPI for migration interface for device state: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a8a24f3f6e38103b77cf399c38eb54e1219d00d6 >> [2] vfio: Some fixes and optimizations for VFIO migration: >> https://patchwork.ozlabs.org/cover/1413263/ >> >> History: >> >> v2 -> v3 >> - Add the vgic initialized check to ensure that the allocation and enabling >>of the doorbells have already been done before unmapping the vPEs. >> - Check all get_vlpi_state related conditions in save_pending_tables in one >> place. >> - Nit fixes. >> >> v1 -> v2: >> - Get the VLPI state from the KVM side. >> - Nit fixes. >> >> Thanks, >> Shenming >> >> >> Shenming Lu (3): >> KVM: arm64: GICv4.1: Add function to get VLPI state >> KVM: arm64: GICv4.1: Try to save hw pending state in >> save_pending_tables >> KVM: arm64: GICv4.1: Give a chance to save VLPI's pending state >> >> Zenghui Yu (1): >> KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side >> >> .../virt/kvm/devices/arm-vgic-its.rst | 2 +- >> arch/arm64/kvm/vgic/vgic-its.c| 6 +- >> arch/arm64/kvm/vgic/vgic-v3.c | 61 +-- >> arch/arm64/kvm/vgic/vgic-v4.c | 33 ++ >> arch/arm64/kvm/vgic/vgic.h| 1 + >> 5 files changed, 93 insertions(+), 10 deletions(-) >>
Re: [PATCH v4 4/5] perf stat: Enable iostat mode for x86 platforms
On 2021/3/11 0:19, Alexander Antonov wrote: On 3/9/2021 10:51 AM, liuqi (BA) wrote: Hi Alexander, On 2021/2/3 21:58, Alexander Antonov wrote: This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP): Commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") Mode is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port Each metric requiries only one uncore event which increments at every 4B transfer in corresponding direction. The formulas to compute metrics are generic: #EventCount * 4B / (1024 * 1024) Signed-off-by: Alexander Antonov --- tools/perf/Documentation/perf-iostat.txt | 88 ++ tools/perf/Makefile.perf | 5 +- tools/perf/arch/x86/util/Build | 1 + tools/perf/arch/x86/util/iostat.c | 345 +++ tools/perf/command-list.txt | 1 + tools/perf/perf-iostat.sh | 12 + 6 files changed, 451 insertions(+), 1 deletion(-) create mode 100644 tools/perf/Documentation/perf-iostat.txt create mode 100644 tools/perf/perf-iostat.sh diff --git a/tools/perf/Documentation/perf-iostat.txt b/tools/perf/Documentation/perf-iostat.txt new file mode 100644 index ..165176944031 --- /dev/null +++ b/tools/perf/Documentation/perf-iostat.txt @@ -0,0 +1,88 @@ +perf-iostat(1) +=== + +NAME + +perf-iostat - Show I/O performance metrics + +SYNOPSIS + +[verse] +'perf iostat' list +'perf iostat' -- [] + +DESCRIPTION +--- +Mode is intended to provide four I/O performance metrics per each PCIe root port: + +- Inbound Read - I/O devices below root port read from the host memory, in MB + +- Inbound Write - I/O devices below root port write to the host memory, in MB + +- Outbound Read - CPU reads from I/O devices below root port, in MB + +- Outbound Write - CPU writes to I/O devices below root port, in MB + +OPTIONS +--- +...:: + Any command you can specify in a shell. + +list:: + List all PCIe root ports. I noticed that "iostat" commond and cmd_iostat() callback function is not registered in cmd_struct in perf.c. So I think "perf iostat list" perhaps can not work properly. I also test this patchset on x86 platform, and here is the log: root@ubuntu:/home/lq# ./perf iostat list perf: 'iostat' is not a perf-command. See 'perf --help'. root@ubuntu:/home/lq# ./perf stat --iostat ^C Performance counter stats for 'system wide': port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB) :00 0 0 0 0 :80 0 0 0 0 :17 0 0 0 0 :85 0 0 0 0 :3a 0 0 0 0 :ae 0 0 0 0 :5d 0 0 0 0 :d7 0 0 0 0 0.611303832 seconds time elapsed root@ubuntu:/home/lq# ./perf stat --iostat=:17 ^C Performance counter stats for 'system wide': port Inbound Read(MB) Inbound Write(MB) Outbound Read(MB) Outbound Write(MB) :17 0 0 0 0 0.521317572 seconds time elapsed So how does following perf iostat list work, did I miss something? Thanks, Qi Hello, The 'iostat' mode uses aliases mechanism in perf same as 'perf archive' and in this case you don't need to add function callback into cmd_struct. For example, the command 'perf iostat list' will be converted to 'perf stat --iostat=list'. After building the perf tool you should have two shell scripts in tools/perf directory and one of them is executable, for example: # make -C tools/perf # ls -l tools/perf/perf-iostat* -rwxr-xr-x 1 root root 290 Mar 10 18:17 perf-iostat -rw-r--r-- 1 root root 290 Feb 3 15:14 perf-iostat.sh It should be possible to run 'perf iostat' from build directory: # cd tools/perf # ./perf iostat list S0-uncore_iio_0<:00> S1-uncore_iio_0<:80> S0-uncore_iio_1<:17> S1-uncore_iio_1<:85> S0-uncore_iio_2<:3a> S1-uncore_iio_2<:ae> S0-uncore_iio_3<:5d> S1-uncore_iio_3<:d7> Also you can copy 'perf-iostat' to ~/libexec/perf-core/ or just run 'make install' # make install # cp perf /usr/bin/ # ls -lh ~/libexec/perf-core/ total 24K -rwxr-xr-x 1 root root 1.4K Mar 10 18:17 perf-archive
Re: [PATCH RESEND][next] rtl8xxxu: Fix fall-through warnings for Clang
Kees Cook writes: > On Fri, Mar 05, 2021 at 03:40:33PM +0200, Kalle Valo wrote: >> "Gustavo A. R. Silva" writes: >> >> > In preparation to enable -Wimplicit-fallthrough for Clang, fix >> > multiple warnings by replacing /* fall through */ comments with >> > the new pseudo-keyword macro fallthrough; instead of letting the >> > code fall through to the next case. >> > >> > Notice that Clang doesn't recognize /* fall through */ comments as >> > implicit fall-through markings. >> > >> > Link: https://github.com/KSPP/linux/issues/115 >> > Signed-off-by: Gustavo A. R. Silva >> >> It's not cool that you ignore the comments you got in [1], then after a >> while mark the patch as "RESEND" and not even include a changelog why it >> was resent. >> >> [1] >> https://patchwork.kernel.org/project/linux-wireless/patch/d522f387b2d0dde774785c7169c1f25aa529989d.1605896060.git.gustavo...@kernel.org/ > > Hm, this conversation looks like a miscommunication, mainly? I see > Gustavo, as requested by many others[1], replacing the fallthrough > comments with the "fallthrough" statement. (This is more than just a > "Clang doesn't parse comments" issue.) v1 was clearly rejected by Jes, so sending a new version without any changelog or comments is not the way to go. The changelog shoud at least have had "v1 was rejected but I'm resending this again because " or something like that to make it clear what's happening. > This could be a tree-wide patch and not bother you, but Greg KH has > generally advised us to send these changes broken out. Anyway, this > change still needs to land, so what would be the preferred path? I think > Gustavo could just carry it for Linus to merge without bothering you if > that'd be preferred? I agree with Greg. Please don't do cleanups like this via another tree as that just creates more work due to conflicts between the trees, which is a lot more annoying to deal with than applying few patches. But when submitting patches please follow the rules, don't cut corners. Jes, I don't like 'fallthrough' either and prefer the original comment, but the ship has sailed on this one. Maybe we should just take it? -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH v2] perf machine: Assign boolean values to a bool variable
Fix the following coccicheck warnings: ./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in function 'symbol__match_regex' with return type bool. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong --- Changes in v2: - Make code simpler. tools/perf/util/machine.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index b5c2d8b..ba6517b 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -2037,9 +2037,7 @@ int machine__process_event(struct machine *machine, union perf_event *event, static bool symbol__match_regex(struct symbol *sym, regex_t *regex) { - if (!regexec(regex, sym->name, 0, NULL, 0)) - return 1; - return 0; + return !regexec(regex, sym->name, 0, NULL, 0); } static void ip__resolve_ams(struct thread *thread, -- 1.8.3.1
Re: [PATCH v4] usb: gadget: configfs: Fix KASAN use-after-free
On Thu, 2021-03-11 at 14:42 +0800, Macpaul Lin wrote: > From: Jim Lin > > When gadget is disconnected, running sequence is like this. > . composite_disconnect > . Call trace: > usb_string_copy+0xd0/0x128 > gadget_config_name_configuration_store+0x4 > gadget_config_name_attr_store+0x40/0x50 > configfs_write_file+0x198/0x1f4 > vfs_write+0x100/0x220 > SyS_write+0x58/0xa8 > . configfs_composite_unbind > . configfs_composite_bind > > In configfs_composite_bind, it has > "cn->strings.s = cn->configuration;" > > When usb_string_copy is invoked. it would > allocate memory, copy input string, release previous pointed memory space, > and use new allocated memory. > > When gadget is connected, host sends down request to get information. > Call trace: > usb_gadget_get_string+0xec/0x168 > lookup_string+0x64/0x98 > composite_setup+0xa34/0x1ee8 > > If gadget is disconnected and connected quickly, in the failed case, > cn->configuration memory has been released by usb_string_copy kfree but > configfs_composite_bind hasn't been run in time to assign new allocated > "cn->configuration" pointer to "cn->strings.s". > > When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling > memory is accessed, "BUG: KASAN: use-after-free" error occurs. > > Signed-off-by: Jim Lin > Signed-off-by: Macpaul Lin > Cc: sta...@vger.kernel.org > --- > Changes in v2: > Changes in v3: > - Change commit description > Changes in v4: > - Fix build error and adapt patch to kernel-5.12-rc1. >Replace definition "MAX_USB_STRING_WITH_NULL_LEN" with >"USB_MAX_STRING_WITH_NULL_LEN". > - Note: The patch v2 and v3 has been verified by >Thadeu Lima de Souza Cascardo >http://spinics.net/lists/kernel/msg3840792.html Dear Cascardo, Would you please help to confirm if you've tested it on Linux PC, Chrome OS, or an Android OS? Thanks! Macpaul Lin >and >Macpaul Lin on Android kernels. >http://lkml.org/lkml/2020/6/11/8 > - The patch is suggested to be applied to LTS versions. > > drivers/usb/gadget/configfs.c | 14 ++ > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c > index 0d56f33..15a607c 100644 > --- a/drivers/usb/gadget/configfs.c > +++ b/drivers/usb/gadget/configfs.c > @@ -97,6 +97,8 @@ struct gadget_config_name { > struct list_head list; > }; > > +#define USB_MAX_STRING_WITH_NULL_LEN (USB_MAX_STRING_LEN+1) > + > static int usb_string_copy(const char *s, char **s_copy) > { > int ret; > @@ -106,12 +108,16 @@ static int usb_string_copy(const char *s, char **s_copy) > if (ret > USB_MAX_STRING_LEN) > return -EOVERFLOW; > > - str = kstrdup(s, GFP_KERNEL); > - if (!str) > - return -ENOMEM; > + if (copy) { > + str = copy; > + } else { > + str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL); > + if (!str) > + return -ENOMEM; > + } > + strcpy(str, s); > if (str[ret - 1] == '\n') > str[ret - 1] = '\0'; > - kfree(copy); > *s_copy = str; > return 0; > }
Re: [PATCH 07/11] PM / devfreq: check get_dev_status before start monitor
On 3/10/21 1:56 PM, Dong Aisheng wrote: > On Wed, Mar 10, 2021 at 11:08 AM Chanwoo Choi wrote: >> >> On 3/10/21 11:56 AM, Dong Aisheng wrote: >>> On Wed, Mar 10, 2021 at 12:12 AM Chanwoo Choi wrote: On 21. 3. 10. 오전 12:58, Chanwoo Choi wrote: > On 21. 3. 9. 오후 9:58, Dong Aisheng wrote: >> The devfreq monitor depends on the device to provide load information >> by .get_dev_status() to calculate the next target freq. >> >> And this will cause changing governor to simple ondemand fail >> if device can't support. >> >> Signed-off-by: Dong Aisheng >> --- >> drivers/devfreq/devfreq.c | 10 +++--- >> drivers/devfreq/governor.h| 2 +- >> drivers/devfreq/governor_simpleondemand.c | 3 +-- >> 3 files changed, 9 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >> index 7231fe6862a2..d1787b6c7d7c 100644 >> --- a/drivers/devfreq/devfreq.c >> +++ b/drivers/devfreq/devfreq.c >> @@ -482,10 +482,13 @@ static void devfreq_monitor(struct work_struct >> *work) >>* to be called from governor in response to DEVFREQ_GOV_START >>* event when device is added to devfreq framework. >>*/ >> -void devfreq_monitor_start(struct devfreq *devfreq) >> +int devfreq_monitor_start(struct devfreq *devfreq) >> { >> if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN)) >> -return; >> +return 0; >> + >> +if (!devfreq->profile->get_dev_status) >> +return -EINVAL; Again, I think that get_dev_status is not used for all governors. So that it cause the governor start fail. Don't check whether .get_dev_status is NULL or not. >>> >>> I'm not quite understand your point. >>> it is used by governor_simpleondemand.c and tegra_devfreq_governor. >>> get_target_freq -> devfreq_update_stats -> get_dev_status >> >> The devfreq can add the new governor by anyone. >> So these functions like devfreq_monitor_* have to support >> the governors and also must support the governor to be added >> in the future. > > Yes, but devfreq_monitor_* is only used by polling mode, right? > The governor using it has to implement get_dev_status unless > there's an exception in the future. > > Currently this patch wants to address the issue that user can switch > to ondemand governor (polling mode) by sysfs even devices does > not support it (no get_dev_status implemented). As I commented, I'll fix this issue. If devfreq driver doesn't implement the .get_dev_status, don't show it via available_governors. I think that it is fundamental solution to fix this issue. So on this version, don't add the this conditional statement on this function And on next version, please use the capital letter for first character on patch title as following: - PM / devfreq: Check get_dev_status before start monitor > > Regards > Aisheng > >> >>> >>> Without checking, device can switch to ondemand governor if it does not >>> support. >>> >>> Am i missed something? >>> >>> Regards >>> Aisheng >>> >> switch (devfreq->profile->timer) { >> case DEVFREQ_TIMER_DEFERRABLE: >> @@ -495,12 +498,13 @@ void devfreq_monitor_start(struct devfreq *devfreq) >> INIT_DELAYED_WORK(>work, devfreq_monitor); >> break; >> default: >> -return; >> +return -EINVAL; >> } >> if (devfreq->profile->polling_ms) >> queue_delayed_work(devfreq_wq, >work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); >> +return 0; >> } >> EXPORT_SYMBOL(devfreq_monitor_start); >> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h >> index 5cee3f64fe2b..31af6d072a10 100644 >> --- a/drivers/devfreq/governor.h >> +++ b/drivers/devfreq/governor.h >> @@ -75,7 +75,7 @@ struct devfreq_governor { >> unsigned int event, void *data); >> }; >> -void devfreq_monitor_start(struct devfreq *devfreq); >> +int devfreq_monitor_start(struct devfreq *devfreq); >> void devfreq_monitor_stop(struct devfreq *devfreq); >> void devfreq_monitor_suspend(struct devfreq *devfreq); >> void devfreq_monitor_resume(struct devfreq *devfreq); >> diff --git a/drivers/devfreq/governor_simpleondemand.c >> b/drivers/devfreq/governor_simpleondemand.c >> index d57b82a2b570..ea287b57cbf3 100644 >> --- a/drivers/devfreq/governor_simpleondemand.c >> +++ b/drivers/devfreq/governor_simpleondemand.c >> @@ -89,8 +89,7 @@ static int devfreq_simple_ondemand_handler(struct >> devfreq *devfreq, >> { >> switch (event) { >> case DEVFREQ_GOV_START: >> -devfreq_monitor_start(devfreq); >> -break; >> +return devfreq_monitor_start(devfreq); >>
Re: [PATCH] f2fs: allow to change discard policy based on cached discard cmds
On 2021/3/11 9:47, Sahitya Tummala wrote: With the default DPOLICY_BG discard thread is ioaware, which prevents the discard thread from issuing the discard commands. On low RAM setups, it is observed that these discard commands in the cache are consuming high memory. This patch aims to relax the memory pressure on the system due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE based on the nm_i->ram_thresh configured. Agreed. Signed-off-by: Sahitya Tummala Reviewed-by: Chao Yu Thanks,
Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail
On Thu, Mar 11, 2021 at 7:50 AM Dmitry Vyukov wrote: > > On Thu, Mar 11, 2021 at 7:40 AM Alex Ghiti wrote: > > > > Hi Ben, > > > > Le 3/10/21 à 5:24 PM, Ben Dooks a écrit : > > > On 10/03/2021 17:16, Dmitry Vyukov wrote: > > >> On Wed, Mar 10, 2021 at 5:46 PM syzbot > > >> wrote: > > >>> > > >>> Hello, > > >>> > > >>> syzbot found the following issue on: > > >>> > > >>> HEAD commit:0d7588ab riscv: process: Fix no prototype for > > >>> arch_dup_tas.. > > >>> git tree: > > >>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0 > > >>> kernel config: > > >>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136 > > >>> dashboard link: > > >>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 > > >>> userspace arch: riscv64 > > >>> > > >>> Unfortunately, I don't have any reproducer for this issue yet. > > >>> > > >>> IMPORTANT: if you fix the issue, please add the following tag to the > > >>> commit: > > >>> Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com > > >> > > >> +riscv maintainers > > >> > > >> This is riscv64-specific. > > >> I've seen similar crashes in put_user in other places. It looks like > > >> put_user crashes in the user address is not mapped/protected (?). > > > > > > The unmapped case should have been handled. > > > > > > I think this issue is that the check for user-mode access added. From > > > what I read the code may be wrong in > > > > > > +if (!user_mode(regs) && addr < TASK_SIZE && > > > +unlikely(!(regs->status & SR_SUM))) > > > +die_kernel_fault("access to user memory without uaccess > > > routines", > > > +addr, regs); > > > > > > I think the SR_SUM check might be wrong, as I read the standard the > > > SR_SUM should be set to disable user-space access. So the check > > > should be unlikely(regs->status & SR_SUM) to say access without > > > having disabled the protection. > > > > The check that is done seems correct to me: "The SUM (permit Supervisor > > User Memory access) bit modifies the privilege with which S-mode loads > > and stores access virtual memory. *When SUM=0, S-mode memory accesses > > to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*. > > When SUM=1, these accesses are permitted.SUM has no effect when > > page-based virtual memory is not in effect". > > > > I will try to reproduce the problem locally. > > Weird. It crashes with this all the time: > https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 > > Even on trivial programs that almost don't do anything. > Maybe it's qemu bug? Do registers look sane in the dump? That SR_SUM, etc. > > > 00:13:27 executing program 1: > openat$drirender128(0xff9c, > &(0x7f40)='/dev/dri/renderD128\x00', 0x0, 0x0) > > [ 812.318182][ T4833] Unable to handle kernel access to user memory > without uaccess routines at virtual address 250b60d0 > [ 812.322304][ T4833] Oops [#1] > [ 812.323196][ T4833] Modules linked in: > [ 812.324110][ T4833] CPU: 1 PID: 4833 Comm: syz-executor.1 Not > tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 > [ 812.325862][ T4833] Hardware name: riscv-virtio,qemu (DT) > [ 812.327561][ T4833] epc : schedule_tail+0x72/0xb2 > [ 812.328640][ T4833] ra : schedule_tail+0x70/0xb2 > [ 812.330088][ T4833] epc : ffe8c8b0 ra : ffe8c8ae sp > : ffe0238bbec0 > [ 812.331312][ T4833] gp : ffe005d25378 tp : ffe00a275b00 t0 > : > [ 812.333014][ T4833] t1 : 0001 t2 : 000f4240 s0 > : ffe0238bbee0 > [ 812.334137][ T4833] s1 : 250b60d0 a0 : 0036 a1 > : 0003 > [ 812.336063][ T4833] a2 : 1ffc0cfa8b00 a3 : ffec80cc a4 > : 7f467e72c6adf800 > [ 812.337398][ T4833] a5 : a6 : 00f0 a7 > : ffef8c84 > [ 812.339287][ T4833] s2 : 0004 s3 : ffe0077a96c0 s4 > : ffe020e67fe0 > [ 812.340658][ T4833] s5 : 4020 s6 : ffe0077a9b58 s7 > : ffe067d74850 > [ 812.342492][ T4833] s8 : ffe067d73e18 s9 : > s10: ffe00bd72280 > [ 812.343668][ T4833] s11: 00bd067bf638 t3 : 7f467e72c6adf800 t4 > : ffc403ee7fb2 > [ 812.345510][ T4833] t5 : ffc403ee7fba t6 : 0004 > [ 812.347004][ T4833] status: 0120 badaddr: > 250b60d0 cause: 000f > [ 812.348091][ T4833] Call Trace: > [ 812.349291][ T4833] [] schedule_tail+0x72/0xb2 > [ 812.350796][ T4833] [] ret_from_exception+0x0/0x14 > [ 812.352799][ T4833] Dumping ftrace buffer: > [ 812.354328][ T4833](ftrace buffer empty) > [ 812.428145][ T4833] ---[ end trace 94b077e4d677ee73 ]--- > > > 00:10:42 executing program 1: > bpf$ENABLE_STATS(0x20, 0x0, 0x0) > bpf$ENABLE_STATS(0x20, 0x0, 0x0) > > [ 646.536862][ T5163] loop0: detected capacity change from 0 to 1 > [ 646.566730][
Re: [net-next PATCH v7 02/16] net: phy: Introduce fwnode_mdio_find_device()
On Wed, Mar 10, 2021 at 10:21 PM Calvin Johnson wrote: > > Define fwnode_mdio_find_device() to get a pointer to the > mdio_device from fwnode passed to the function. > > Refactor of_mdio_find_device() to use fwnode_mdio_find_device(). > > Signed-off-by: Calvin Johnson > --- > > Changes in v7: > - correct fwnode_mdio_find_device() description > > Changes in v6: > - fix warning for function parameter of fwnode_mdio_find_device() > > Changes in v5: None > Changes in v4: None > Changes in v3: None > Changes in v2: None > > drivers/net/mdio/of_mdio.c | 11 +-- > drivers/net/phy/phy_device.c | 23 +++ > include/linux/phy.h | 6 ++ > 3 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c > index ea9d5855fb52..d5e0970b2561 100644 > --- a/drivers/net/mdio/of_mdio.c > +++ b/drivers/net/mdio/of_mdio.c > @@ -347,16 +347,7 @@ EXPORT_SYMBOL(of_mdiobus_register); > */ > struct mdio_device *of_mdio_find_device(struct device_node *np) > { > - struct device *d; > - > - if (!np) > - return NULL; > - > - d = bus_find_device_by_of_node(_bus_type, np); > - if (!d) > - return NULL; > - > - return to_mdio_device(d); > + return fwnode_mdio_find_device(of_fwnode_handle(np)); > } > EXPORT_SYMBOL(of_mdio_find_device); > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index cc38e326405a..daabb17bba00 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -2819,6 +2819,29 @@ static bool phy_drv_supports_irq(struct phy_driver > *phydrv) > return phydrv->config_intr && phydrv->handle_interrupt; > } > > +/** > + * fwnode_mdio_find_device - Given a fwnode, find the mdio_device > + * @fwnode: pointer to the mdio_device's fwnode > + * > + * If successful, returns a pointer to the mdio_device with the embedded > + * struct device refcount incremented by one, or NULL on failure. > + * The caller should call put_device() on the mdio_device after its use. > + */ > +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode) > +{ > + struct device *d; > + > + if (!fwnode) > + return NULL; > + > + d = bus_find_device_by_fwnode(_bus_type, fwnode); Sorry about the late review, but can you look into using get_dev_from_fwnode()? As long as you aren't registering two devices for the same fwnode, it's an O(1) operation instead of having to loop through a list of devices in a bus. You can check the returned device's bus type if you aren't sure about not registering two devices with the same fw_node and then fall back to this looping. -Saravana > + if (!d) > + return NULL; > + > + return to_mdio_device(d); > +} > +EXPORT_SYMBOL(fwnode_mdio_find_device); > + > /** > * phy_probe - probe and init a PHY device > * @dev: device to probe and init > diff --git a/include/linux/phy.h b/include/linux/phy.h > index 1a12e4436b5b..f5eb1e3981a1 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -1366,11 +1366,17 @@ struct phy_device *phy_device_create(struct mii_bus > *bus, int addr, u32 phy_id, > bool is_c45, > struct phy_c45_device_ids *c45_ids); > #if IS_ENABLED(CONFIG_PHYLIB) > +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode); > struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool > is_c45); > int phy_device_register(struct phy_device *phy); > void phy_device_free(struct phy_device *phydev); > #else > static inline > +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode) > +{ > + return 0; > +} > +static inline > struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45) > { > return NULL; > -- > 2.17.1 >
Re: [PATCH v2 5/5] PCI: dwc: Move dw_pcie_msi_init() from each users to designware host
On Sun, 7 Mar 2021 23:10:12 +0100 Krzysztof Wilczyński wrote: > > > Hi, > > > Currently, dw_pcie_msi_init() allocates and maps page for msi, then > > program the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI. The Root Complex > > may lose power during suspend-to-RAM, so when we resume, we want to > > redo the latter but not the former. If designware based driver (for > > example, pcie-tegra194.c) calls dw_pcie_msi_init() in resume path, the > > previous msi page will be leaked. From another side, except > > pci-dra7xx.c we can move the dw_pcie_msi_init() from each users to > > designware host, I.E move the msi page allocation and mapping to > > dw_pcie_host_init() and move the PCIE_MSI_ADDR_* programming to > > dw_pcie_setup_rc(). After this moving, we solve the msi page leakage > > as well. > [...] > > A small nitpick. All the "designware" should be "DesignWare" both in > the commit message and the subject. Similarly, "msi" would be "MSI", > and "I.E" would become "i.e.,". If you ever sent another version of the > patch, that is. > Hi This series was dropped and useless after Rob's excellent dw pcie code clean up. Thanks
Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin wrote: > > On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are charged via the new APIs of obj_cgroup. The new APIs > > introduce a struct obj_cgroup to charge slab objects. It prevents > > long-living objects from pinning the original memory cgroup in the memory. > > But there are still some corner objects (e.g. allocations larger than > > order-1 page on SLUB) which are not charged via the new APIs. Those > > objects (include the pages which are allocated from buddy allocator > > directly) are charged as kmem pages which still hold a reference to > > the memory cgroup. > > > > This patch aims to charge the kmem pages by using the new APIs of > > obj_cgroup. Finally, the page->memcg_data of the kmem page points to > > an object cgroup. We can use the page_objcg() to get the object > > cgroup associated with a kmem page. Or we can use page_memcg_check() > > to get the memory cgroup associated with a kmem page, but caller must > > ensure that the returned memcg won't be released (e.g. acquire the > > rcu_read_lock or css_set_lock). > > > > Signed-off-by: Muchun Song > > --- > > include/linux/memcontrol.h | 63 ++-- > > mm/memcontrol.c| 119 > > ++--- > > 2 files changed, 128 insertions(+), 54 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83cbcdcfcc92..07c449af9c0f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page > > *page) > > } > > > > /* > > + * After the initialization objcg->memcg is always pointing at > > + * a valid memcg, but can be atomically swapped to the parent memcg. > > + * > > + * The caller must ensure that the returned memcg won't be released: > > + * e.g. acquire the rcu_read_lock or css_set_lock. > > + */ > > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > +{ > > + return READ_ONCE(objcg->memcg); > > +} > > + > > +/* > > * page_memcg - get the memory cgroup associated with a non-kmem page > > * @page: a pointer to the page struct > > * > > @@ -422,15 +434,19 @@ static inline struct mem_cgroup > > *page_memcg_rcu(struct page *page) > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > - * or NULL. This function unlike page_memcg() can take any page > > + * or NULL. This function unlike page_memcg() can take any page > > * as an argument. It has to be used in cases when it's not known if a page > > - * has an associated memory cgroup pointer or an object cgroups vector. > > + * has an associated memory cgroup pointer or an object cgroups vector or > > + * an object cgroup. > > * > > * Any of the following ensures page and memcg binding stability: > > * - the page lock > > * - LRU isolation > > * - lock_page_memcg() > > * - exclusive reference > > + * > > + * Should be called under rcu lock which can protect memcg associated with > > a > > + * kmem page from being released. > > How about this: > > For a non-kmem page any of the following ensures page and memcg binding > stability: > - the page lock > - LRU isolation > - lock_page_memcg() > - exclusive reference > > For a kmem page a caller should hold an rcu read lock to protect memcg > associated > with a kmem page from being released. OK. I will use this. Thanks Roman. > > > */ > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > @@ -443,6 +459,13 @@ static inline struct mem_cgroup > > *page_memcg_check(struct page *page) > > if (memcg_data & MEMCG_DATA_OBJCGS) > > return NULL; > > > > + if (memcg_data & MEMCG_DATA_KMEM) { > > + struct obj_cgroup *objcg; > > + > > + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > + return obj_cgroup_memcg(objcg); > > + } > > + > > return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > @@ -501,6 +524,25 @@ static inline struct obj_cgroup > > **page_objcgs_check(struct page *page) > > return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > +/* > > + * page_objcg - get the object cgroup associated with a kmem page > > + * @page: a pointer to the page struct > > + * > > + * Returns a pointer to the object cgroup associated with the kmem page, > > + * or NULL. This function assumes that the page is known to have an > > + * associated object cgroup. It's only safe to call this function > > + * against kmem pages (PageMemcgKmem() returns true). > > + */ > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + unsigned long memcg_data = page->memcg_data; > > + > > +
Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail
On Thu, Mar 11, 2021 at 7:40 AM Alex Ghiti wrote: > > Hi Ben, > > Le 3/10/21 à 5:24 PM, Ben Dooks a écrit : > > On 10/03/2021 17:16, Dmitry Vyukov wrote: > >> On Wed, Mar 10, 2021 at 5:46 PM syzbot > >> wrote: > >>> > >>> Hello, > >>> > >>> syzbot found the following issue on: > >>> > >>> HEAD commit:0d7588ab riscv: process: Fix no prototype for > >>> arch_dup_tas.. > >>> git tree: > >>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0 > >>> kernel config: > >>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136 > >>> dashboard link: > >>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 > >>> userspace arch: riscv64 > >>> > >>> Unfortunately, I don't have any reproducer for this issue yet. > >>> > >>> IMPORTANT: if you fix the issue, please add the following tag to the > >>> commit: > >>> Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com > >> > >> +riscv maintainers > >> > >> This is riscv64-specific. > >> I've seen similar crashes in put_user in other places. It looks like > >> put_user crashes in the user address is not mapped/protected (?). > > > > The unmapped case should have been handled. > > > > I think this issue is that the check for user-mode access added. From > > what I read the code may be wrong in > > > > +if (!user_mode(regs) && addr < TASK_SIZE && > > +unlikely(!(regs->status & SR_SUM))) > > +die_kernel_fault("access to user memory without uaccess routines", > > +addr, regs); > > > > I think the SR_SUM check might be wrong, as I read the standard the > > SR_SUM should be set to disable user-space access. So the check > > should be unlikely(regs->status & SR_SUM) to say access without > > having disabled the protection. > > The check that is done seems correct to me: "The SUM (permit Supervisor > User Memory access) bit modifies the privilege with which S-mode loads > and stores access virtual memory. *When SUM=0, S-mode memory accesses > to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*. > When SUM=1, these accesses are permitted.SUM has no effect when > page-based virtual memory is not in effect". > > I will try to reproduce the problem locally. Weird. It crashes with this all the time: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 Even on trivial programs that almost don't do anything. Maybe it's qemu bug? Do registers look sane in the dump? That SR_SUM, etc. 00:13:27 executing program 1: openat$drirender128(0xff9c, &(0x7f40)='/dev/dri/renderD128\x00', 0x0, 0x0) [ 812.318182][ T4833] Unable to handle kernel access to user memory without uaccess routines at virtual address 250b60d0 [ 812.322304][ T4833] Oops [#1] [ 812.323196][ T4833] Modules linked in: [ 812.324110][ T4833] CPU: 1 PID: 4833 Comm: syz-executor.1 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 [ 812.325862][ T4833] Hardware name: riscv-virtio,qemu (DT) [ 812.327561][ T4833] epc : schedule_tail+0x72/0xb2 [ 812.328640][ T4833] ra : schedule_tail+0x70/0xb2 [ 812.330088][ T4833] epc : ffe8c8b0 ra : ffe8c8ae sp : ffe0238bbec0 [ 812.331312][ T4833] gp : ffe005d25378 tp : ffe00a275b00 t0 : [ 812.333014][ T4833] t1 : 0001 t2 : 000f4240 s0 : ffe0238bbee0 [ 812.334137][ T4833] s1 : 250b60d0 a0 : 0036 a1 : 0003 [ 812.336063][ T4833] a2 : 1ffc0cfa8b00 a3 : ffec80cc a4 : 7f467e72c6adf800 [ 812.337398][ T4833] a5 : a6 : 00f0 a7 : ffef8c84 [ 812.339287][ T4833] s2 : 0004 s3 : ffe0077a96c0 s4 : ffe020e67fe0 [ 812.340658][ T4833] s5 : 4020 s6 : ffe0077a9b58 s7 : ffe067d74850 [ 812.342492][ T4833] s8 : ffe067d73e18 s9 : s10: ffe00bd72280 [ 812.343668][ T4833] s11: 00bd067bf638 t3 : 7f467e72c6adf800 t4 : ffc403ee7fb2 [ 812.345510][ T4833] t5 : ffc403ee7fba t6 : 0004 [ 812.347004][ T4833] status: 0120 badaddr: 250b60d0 cause: 000f [ 812.348091][ T4833] Call Trace: [ 812.349291][ T4833] [] schedule_tail+0x72/0xb2 [ 812.350796][ T4833] [] ret_from_exception+0x0/0x14 [ 812.352799][ T4833] Dumping ftrace buffer: [ 812.354328][ T4833](ftrace buffer empty) [ 812.428145][ T4833] ---[ end trace 94b077e4d677ee73 ]--- 00:10:42 executing program 1: bpf$ENABLE_STATS(0x20, 0x0, 0x0) bpf$ENABLE_STATS(0x20, 0x0, 0x0) [ 646.536862][ T5163] loop0: detected capacity change from 0 to 1 [ 646.566730][ T5165] Unable to handle kernel access to user memory without uaccess routines at virtual address 032f80d0 [ 646.586024][ T5165] Oops [#1] [ 646.586640][ T5165] Modules linked in: [ 646.587350][ T5165] CPU: 1 PID: 5165 Comm: syz-executor.1 Not tainted
Re: [PATCH 5.10 00/47] 5.10.23-rc2 review
On Wed, 10 Mar 2021 at 23:59, wrote: > > From: Greg Kroah-Hartman > > This is the start of the stable review cycle for the 5.10.23 release. > There are 47 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri, 12 Mar 2021 18:28:23 +. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.23-rc2.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.10.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Tested-by: Linux Kernel Functional Testing Summary kernel: 5.10.23-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.10.y git commit: 93276f11b3afe08c3f213a3648483b1a8789673b git describe: v5.10.22-48-g93276f11b3af Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.22-48-g93276f11b3af No regressions (compared to build v5.10.22) No fixes (compared to build v5.10.22) Ran 56156 total tests in the following environments and test suites. Environments -- - arc - arm - arm64 - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - juno-r2-compat - juno-r2-kasan - mips - nxp-ls2088 - nxp-ls2088-64k_page_size - parisc - powerpc - qemu-arm-clang - qemu-arm64-clang - qemu-arm64-kasan - qemu-i386-clang - qemu-x86_64-clang - qemu-x86_64-kasan - qemu-x86_64-kcsan - qemu_arm - qemu_arm64 - qemu_arm64-compat - qemu_i386 - qemu_x86_64 - qemu_x86_64-compat - riscv - s390 - sh - sparc - x15 - x86 - x86-kasan - x86_64 Test Suites --- * build * linux-log-parser * install-android-platform-tools-r2600 * kselftest-bpf * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-livepatch * kselftest-ptrace * libhugetlbfs * ltp-containers-tests * ltp-hugetlb-tests * ltp-ipc-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * v4l2-compliance * fwts * kselftest-kvm * kselftest-lib * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * ltp-cap_bounds-tests * ltp-commands-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-io-tests * ltp-math-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * perf * kselftest- * kselftest-android * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-kexec * kselftest-lkdtm * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-vm * kselftest-x86 * ltp-controllers-tests * ltp-open-posix-tests * kvm-unit-tests * kunit * rcutorture * ssuite * kselftest-vsyscall-mode-native- * kselftest-vsyscall-mode-none- -- Linaro LKFT https://lkft.linaro.org
Re: linux-next: build failure after merge of the bpf-next tree
On 2021-03-11 01:47, Stephen Rothwell wrote: Hi all, After merging the bpf-next tree, today's linux-next build (perf) failed like this: make[3]: *** No rule to make target 'libbpf_util.h', needed by '/home/sfr/next/perf/staticobjs/xsk.o'. Stop. Hi Stephen, It's an incremental build issue, as pointed out here [1], that is resolved by cleaning the build. Cheers, Björn [1] https://lore.kernel.org/bpf/CAEf4BzYPDF87At4=_gsndxof84oiqyjxgahl7_jvpuntovu...@mail.gmail.com/ Caused by commit 7e8bbe24cb8b ("libbpf: xsk: Move barriers from libbpf_util.h to xsk.h") I have used the bpf tree from next-20210310 for today.
[PATCH] scsi: Fix a use after free in st_open
In st_open, if STp->in_use is true, STp will be freed by scsi_tape_put(). However, STp is still used by DEBC_printk() after. It is better to DEBC_printk() before scsi_tape_put(). Signed-off-by: Lv Yunlong --- drivers/scsi/st.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c index 841ad2fc369a..9ca536aae784 100644 --- a/drivers/scsi/st.c +++ b/drivers/scsi/st.c @@ -1269,8 +1269,8 @@ static int st_open(struct inode *inode, struct file *filp) spin_lock(_use_lock); if (STp->in_use) { spin_unlock(_use_lock); - scsi_tape_put(STp); DEBC_printk(STp, "Device already in use.\n"); + scsi_tape_put(STp); return (-EBUSY); } -- 2.25.1
Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
On Thu, Mar 11, 2021 at 3:58 AM Roman Gushchin wrote: > > On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote: > > We want to reuse the obj_cgroup APIs to charge the kmem pages. > > If we do that, we should store an object cgroup pointer to > > page->memcg_data for the kmem pages. > > > > Finally, page->memcg_data can have 3 different meanings. > > > > 1) For the slab pages, page->memcg_data points to an object cgroups > > vector. > > > > 2) For the kmem pages (exclude the slab pages), page->memcg_data > > points to an object cgroup. > > > > 3) For the user pages (e.g. the LRU pages), page->memcg_data points > > to a memory cgroup. > > > > Currently we always get the memory cgroup associated with a page via > > page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it > > has to be used in cases when it's not known if a page has an > > associated memory cgroup pointer or an object cgroups vector. Because > > the page->memcg_data of the kmem page is not pointing to a memory > > cgroup in the later patch, the page_memcg() and page_memcg_rcu() > > cannot be applicable for the kmem pages. In this patch, make > > page_memcg() and page_memcg_rcu() no longer apply to the kmem pages. > > We do not change the behavior of the page_memcg_check(), it is also > > applicable for the kmem pages. > > > > In the end, there are 3 helpers to get the memcg associated with a page. > > Usage is as follows. > > > > 1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU > > pages). > > > > - page_memcg() > > - page_memcg_rcu() > > > > 2) Get the memory cgroup associated with a page. It has to be used in > > cases when it's not known if a page has an associated memory cgroup > > pointer or an object cgroups vector. Returns NULL for slab pages or > > uncharged pages. Otherwise, returns memory cgroup for charged pages > > (e.g. the kmem pages, the LRU pages). > > > > - page_memcg_check() > > > > In some place, we use page_memcg() to check whether the page is charged. > > Now introduce page_memcg_charged() helper to do that. > > > > This is a preparation for reparenting the kmem pages. > > > > Signed-off-by: Muchun Song > > --- > > include/linux/memcontrol.h | 33 +++-- > > mm/memcontrol.c| 23 +-- > > mm/page_alloc.c| 4 ++-- > > 3 files changed, 42 insertions(+), 18 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index e6dc793d587d..83cbcdcfcc92 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -358,14 +358,26 @@ enum page_memcg_data_flags { > > > > #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1) > > > > +/* Return true for charged page, otherwise false. */ > > +static inline bool page_memcg_charged(struct page *page) > > +{ > > + unsigned long memcg_data = page->memcg_data; > > + > > + VM_BUG_ON_PAGE(PageSlab(page), page); > > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); > > + > > + return !!memcg_data; > > +} > > + > > /* > > - * page_memcg - get the memory cgroup associated with a page > > + * page_memcg - get the memory cgroup associated with a non-kmem page > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > * or NULL. This function assumes that the page is known to have a > > * proper memory cgroup pointer. It's not safe to call this function > > - * against some type of pages, e.g. slab pages or ex-slab pages. > > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab > > + * pages. > > * > > * Any of the following ensures page and memcg binding stability: > > * - the page lock > > @@ -378,27 +390,31 @@ static inline struct mem_cgroup *page_memcg(struct > > page *page) > > unsigned long memcg_data = page->memcg_data; > > > > VM_BUG_ON_PAGE(PageSlab(page), page); > > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page); > > VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); > > > > return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > /* > > - * page_memcg_rcu - locklessly get the memory cgroup associated with a page > > + * page_memcg_rcu - locklessly get the memory cgroup associated with a > > non-kmem page > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > * or NULL. This function assumes that the page is known to have a > > * proper memory cgroup pointer. It's not safe to call this function > > - * against some type of pages, e.g. slab pages or ex-slab pages. > > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab > > + * pages. > > */ > > static inline struct mem_cgroup *page_memcg_rcu(struct page *page) > > { > > + unsigned long memcg_data =
Re: [PATCH 08/11] PM / devfreq: check get_dev_status in devfreq_update_stats
On 3/10/21 12:00 PM, Dong Aisheng wrote: > On Wed, Mar 10, 2021 at 12:20 AM Chanwoo Choi wrote: >> >> On 21. 3. 9. 오후 9:58, Dong Aisheng wrote: >>> Check .get_dev_status() in devfreq_update_stats in case it's abused >>> when a device does not provide it. >>> >>> Signed-off-by: Dong Aisheng >>> --- >>> drivers/devfreq/governor.h | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h >>> index 31af6d072a10..67a6dbdd5d23 100644 >>> --- a/drivers/devfreq/governor.h >>> +++ b/drivers/devfreq/governor.h >>> @@ -89,6 +89,9 @@ int devfreq_update_target(struct devfreq *devfreq, >>> unsigned long freq); >>> >>> static inline int devfreq_update_stats(struct devfreq *df) >>> { >>> + if (!df->profile->get_dev_status) >>> + return -EINVAL; >>> + >> >> I'm considering the following method instead of returning the error >> when .get_dev_status is NULL. >> >> if (!df->profile->get_dev_status) { >> df->last_status.total_time = 0; >> df->last_status.busy_time = 0; >> df->last_status.current_frequency = 0; >> return 0; >> } > > I might suggest not cause it's meaningless for ondemand governor but > introducing confusing. Simply return error could make the life a bit easier. > does it make sense to you? Actually, I considered the some corner case as following: We can see the simple_ondemand governor through available_governors even if the devfreq driver doesn't implement the .get_dev_status. In this corner case, My intention tried to prevent the error on this case. But, actually, it is different issue. I'll fix this issue when get_dev_status is NULL, don't show the simple_ondemand governor name through available_governors on other patch. And I applied it. Thanks. > > Regards > Aisheng > >> >>> return df->profile->get_dev_status(df->dev.parent, >last_status); >>> } >>> #endif /* _GOVERNOR_H */ >>> >> >> >> -- >> Best Regards, >> Samsung Electronics >> Chanwoo Choi > > -- Best Regards, Chanwoo Choi Samsung Electronics
Re: [PATCH 2/3] net: ethernet: actions: Add Actions Semi Owl Ethernet MAC driver
Hi Cristian, On Thu, Mar 11, 2021 at 03:20:13AM +0200, Cristian Ciocaltea wrote: > Add new driver for the Ethernet MAC used on the Actions Semi Owl > family of SoCs. > > Currently this has been tested only on the Actions Semi S500 SoC > variant. > > Signed-off-by: Cristian Ciocaltea > --- [...] > diff --git a/drivers/net/ethernet/actions/owl-emac.c > b/drivers/net/ethernet/actions/owl-emac.c > new file mode 100644 > index ..ebd8ea88bca4 > --- /dev/null > +++ b/drivers/net/ethernet/actions/owl-emac.c > @@ -0,0 +1,1660 @@ [...] > +static int owl_emac_probe(struct platform_device *pdev) > +{ [...] > + priv->reset = devm_reset_control_get(dev, NULL); Please use priv->reset = devm_reset_control_get_exclusive(dev, NULL); instead, to explicitly state that the driver requires exclusive control over the reset line. > + if (IS_ERR(priv->reset)) { > + ret = PTR_ERR(priv->reset); > + dev_err(dev, "failed to get reset control: %d\n", ret); > + return ret; You could use: return dev_err_probe(dev, PTR_ERR(priv->reset), "failed to get reset control"); regards Philipp
[PATCH v4] usb: gadget: configfs: Fix KASAN use-after-free
From: Jim Lin When gadget is disconnected, running sequence is like this. . composite_disconnect . Call trace: usb_string_copy+0xd0/0x128 gadget_config_name_configuration_store+0x4 gadget_config_name_attr_store+0x40/0x50 configfs_write_file+0x198/0x1f4 vfs_write+0x100/0x220 SyS_write+0x58/0xa8 . configfs_composite_unbind . configfs_composite_bind In configfs_composite_bind, it has "cn->strings.s = cn->configuration;" When usb_string_copy is invoked. it would allocate memory, copy input string, release previous pointed memory space, and use new allocated memory. When gadget is connected, host sends down request to get information. Call trace: usb_gadget_get_string+0xec/0x168 lookup_string+0x64/0x98 composite_setup+0xa34/0x1ee8 If gadget is disconnected and connected quickly, in the failed case, cn->configuration memory has been released by usb_string_copy kfree but configfs_composite_bind hasn't been run in time to assign new allocated "cn->configuration" pointer to "cn->strings.s". When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling memory is accessed, "BUG: KASAN: use-after-free" error occurs. Signed-off-by: Jim Lin Signed-off-by: Macpaul Lin Cc: sta...@vger.kernel.org --- Changes in v2: Changes in v3: - Change commit description Changes in v4: - Fix build error and adapt patch to kernel-5.12-rc1. Replace definition "MAX_USB_STRING_WITH_NULL_LEN" with "USB_MAX_STRING_WITH_NULL_LEN". - Note: The patch v2 and v3 has been verified by Thadeu Lima de Souza Cascardo http://spinics.net/lists/kernel/msg3840792.html and Macpaul Lin on Android kernels. http://lkml.org/lkml/2020/6/11/8 - The patch is suggested to be applied to LTS versions. drivers/usb/gadget/configfs.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c index 0d56f33..15a607c 100644 --- a/drivers/usb/gadget/configfs.c +++ b/drivers/usb/gadget/configfs.c @@ -97,6 +97,8 @@ struct gadget_config_name { struct list_head list; }; +#define USB_MAX_STRING_WITH_NULL_LEN (USB_MAX_STRING_LEN+1) + static int usb_string_copy(const char *s, char **s_copy) { int ret; @@ -106,12 +108,16 @@ static int usb_string_copy(const char *s, char **s_copy) if (ret > USB_MAX_STRING_LEN) return -EOVERFLOW; - str = kstrdup(s, GFP_KERNEL); - if (!str) - return -ENOMEM; + if (copy) { + str = copy; + } else { + str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL); + if (!str) + return -ENOMEM; + } + strcpy(str, s); if (str[ret - 1] == '\n') str[ret - 1] = '\0'; - kfree(copy); *s_copy = str; return 0; } -- 1.7.9.5
[PATCH] staging: rtl8723bs: add initial value
Add initial value for some uninitialized variable and array. Signed-off-by: Hao Peng --- drivers/staging/rtl8723bs/core/rtw_ap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8723bs/core/rtw_ap.c b/drivers/staging/rtl8723bs/core/rtw_ap.c index b6f944b37b08..ceea160db38a 100644 --- a/drivers/staging/rtl8723bs/core/rtw_ap.c +++ b/drivers/staging/rtl8723bs/core/rtw_ap.c @@ -206,8 +206,8 @@ void expire_timeout_chk(struct adapter *padapter) struct sta_info *psta = NULL; struct sta_priv *pstapriv = >stapriv; u8 chk_alive_num = 0; - char chk_alive_list[NUM_STA]; - int i; + char chk_alive_list[NUM_STA] = {0}; + int i = 0; spin_lock_bh(>auth_list_lock); @@ -308,7 +308,7 @@ void expire_timeout_chk(struct adapter *padapter) } } if (pmlmeext->active_keep_alive_check) { - int stainfo_offset; + int stainfo_offset = 0; stainfo_offset = rtw_stainfo_offset(pstapriv, psta); if (stainfo_offset_valid(stainfo_offset)) -- 2.20.1
Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail
Hi Ben, Le 3/10/21 à 5:24 PM, Ben Dooks a écrit : On 10/03/2021 17:16, Dmitry Vyukov wrote: On Wed, Mar 10, 2021 at 5:46 PM syzbot wrote: Hello, syzbot found the following issue on: HEAD commit: 0d7588ab riscv: process: Fix no prototype for arch_dup_tas.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0 kernel config: https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136 dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 userspace arch: riscv64 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com +riscv maintainers This is riscv64-specific. I've seen similar crashes in put_user in other places. It looks like put_user crashes in the user address is not mapped/protected (?). The unmapped case should have been handled. I think this issue is that the check for user-mode access added. From what I read the code may be wrong in + if (!user_mode(regs) && addr < TASK_SIZE && + unlikely(!(regs->status & SR_SUM))) + die_kernel_fault("access to user memory without uaccess routines", + addr, regs); I think the SR_SUM check might be wrong, as I read the standard the SR_SUM should be set to disable user-space access. So the check should be unlikely(regs->status & SR_SUM) to say access without having disabled the protection. The check that is done seems correct to me: "The SUM (permit Supervisor User Memory access) bit modifies the privilege with which S-mode loads and stores access virtual memory. *When SUM=0, S-mode memory accesses to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*. When SUM=1, these accesses are permitted.SUM has no effect when page-based virtual memory is not in effect". I will try to reproduce the problem locally. Thanks, Alex Without this, you can end up with an infinite loop in the fault handler. Unable to handle kernel access to user memory without uaccess routines at virtual address 2749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffe8c8b0 ra : ffe8c8ae sp : ffe025d17ec0 gp : ffe005d25378 tp : ffe00f0d t0 : t1 : 0001 t2 : 000f4240 s0 : ffe025d17ee0 s1 : 2749f0d0 a0 : 002a a1 : 0003 a2 : 1ffc0cfac500 a3 : ffec80cc a4 : 5ae9db91c19bbe00 a5 : a6 : 00f0 a7 : ffe82eba s2 : 0004 s3 : ffe00eef96c0 s4 : ffe022c77fe0 s5 : 4000 s6 : ffe067d74e00 s7 : ffe067d74850 s8 : ffe067d73e18 s9 : ffe067d74e00 s10: ffe00eef96e8 s11: 00ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffc4043cafb2 t5 : ffc4043cafba t6 : 0004 status: 0120 badaddr: 2749f0d0 cause: 000f Call Trace: [] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. -- You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/b74f1b05bd316729%40google.com. ___ linux-riscv mailing list linux-ri...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
Re: [PATCH] net: add net namespace inode for all net_dev events
On Wed, Mar 10, 2021 at 11:31:12AM -0500, Steven Rostedt wrote: > On Wed, 10 Mar 2021 17:03:40 +0800 > Tony Lu wrote: > > > I use pahole to read vmlinux.o directly with defconfig and > > CONFIG_DEBUG_INFO enabled, the result shows 22 structs prefixed with > > trace_event_raw_ that have at least one hole. > > I was thinking of pahole too ;-) > > But the information can also be captured from the format files with simple > scripts as well. And perhaps be more tuned to see if there's actually a fix > for them, and ignore reporting it if there is no fix, as all trace events > are 4 byte aligned, and if we are off by one, sometimes it doesn't matter. I am going to send a patch to fix this issue later. There are many events have more than 1 holes, I am trying to pick up the events that are really to be fixed. Cheers, Tony Lu > > -- Steve
Re: [External] Re: [PATCH v18 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap
On Thu, Mar 11, 2021 at 1:16 AM Randy Dunlap wrote: > > On 3/10/21 7:37 AM, Michal Hocko wrote: > > On Mon 08-03-21 18:28:04, Muchun Song wrote: > >> Add a kernel parameter hugetlb_free_vmemmap to enable the feature of > >> freeing unused vmemmap pages associated with each hugetlb page on boot. > >> > >> We disables PMD mapping of vmemmap pages for x86-64 arch when this > >> feature is enabled. Because vmemmap_remap_free() depends on vmemmap > >> being base page mapped. > >> > >> Signed-off-by: Muchun Song > >> Reviewed-by: Oscar Salvador > >> Reviewed-by: Barry Song > >> Reviewed-by: Miaohe Lin > >> Tested-by: Chen Huang > >> Tested-by: Bodeddula Balasubramaniam > >> --- > >> Documentation/admin-guide/kernel-parameters.txt | 14 ++ > >> Documentation/admin-guide/mm/hugetlbpage.rst| 3 +++ > >> arch/x86/mm/init_64.c | 8 ++-- > >> include/linux/hugetlb.h | 19 +++ > >> mm/hugetlb_vmemmap.c| 24 > >> > >> 5 files changed, 66 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt > >> b/Documentation/admin-guide/kernel-parameters.txt > >> index 04545725f187..de91d54573c4 100644 > >> --- a/Documentation/admin-guide/kernel-parameters.txt > >> +++ b/Documentation/admin-guide/kernel-parameters.txt > >> @@ -1557,6 +1557,20 @@ > >> Documentation/admin-guide/mm/hugetlbpage.rst. > >> Format: size[KMG] > >> > >> +hugetlb_free_vmemmap= > >> +[KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, > >> +this controls freeing unused vmemmap pages associated > >> +with each HugeTLB page. When this option is enabled, > >> +we disable PMD/huge page mapping of vmemmap pages > >> which > >> +increase page table pages. So if a user/sysadmin only > >> +uses a small number of HugeTLB pages (as a percentage > >> +of system memory), they could end up using more memory > >> +with hugetlb_free_vmemmap on as opposed to off. > >> +Format: { on | off (default) } > > > > Please note this is an admin guide and for those this seems overly low > > level. I would use something like the following > > [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > > enabled. > > Allows heavy hugetlb users to free up some more > > memory (6 * PAGE_SIZE for each 2MB hugetlb > > page). > > This feauture is not free though. Large page > > tables are not use to back vmemmap pages which > >are not used Thanks. > > > can lead to a performance degradation for some > > workloads. Also there will be memory allocation > > required when hugetlb pages are freed from the > > pool which can lead to corner cases under heavy > > memory pressure. > >> + > >> +on: enable the feature > >> +off: disable the feature > >> + > >> hung_task_panic= > >> [KNL] Should the hung task detector generate panics. > >> Format: 0 | 1 > > > -- > ~Randy >
Re: [PATCH v4 2/3] dt-bindings: mtd: Document use of nvmem-cells compatible
On 10.03.2021 23:47, Ansuel Smith wrote: On Wed, Mar 10, 2021 at 11:41:24PM +0100, Rafał Miłecki wrote: See inline On 10.03.2021 22:08, Ansuel Smith wrote: Document nvmem-cells compatible used to treat mtd partitions as a nvmem provider. Signed-off-by: Ansuel Smith --- .../bindings/mtd/partitions/nvmem-cells.yaml | 96 +++ 1 file changed, 96 insertions(+) create mode 100644 Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml diff --git a/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml b/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml new file mode 100644 index ..f70d7597a6b0 --- /dev/null +++ b/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml @@ -0,0 +1,96 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/mtd/partitions/nvmem-cells.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Nvmem cells + +description: | + Any partition containing the compatible "nvmem-cells" will register as a + nvmem provider. + Each direct subnodes represents a nvmem cell following the nvmem binding. + Nvmem binding to declare nvmem-cells can be found in: + Documentation/devicetree/bindings/nvmem/nvmem.yaml + +maintainers: + - Ansuel Smith I think that when Rob wrote: On 10.03.2021 03:58, Rob Herring wrote: I think this should reference nvmem.yaml. he meant you using: allOf: - $ref: "nvmem.yaml#" (you'll need to adjust binding path). Please check how it's done in Documentation/devicetree/bindings/nvmem/*.yaml files Aside from that, should I readd the old properties or I can keep the compatible as the only one required? What old properties do you mean? You shouldn't need to add anything to the list of "required" I think. Some NVMEM providers add "#address-cells" and "#size-cells". That makes sense if NVMEM provider must provide at least 1 cell. I'm not sure if we need that for MTD. Even "compatible" is actually redundant but most YAML files list it for convenience. Source: On 10.12.2020 03:48, Rob Herring wrote: > And drop 'compatible' as required. It's redundant anyways because the > schema will only be applied if compatible matches. http://lists.infradead.org/pipermail/linux-mtd/2020-December/084574.html https://patchwork.ozlabs.org/comment/2597326/
Re: [External] Re: [PATCH v18 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap
On Wed, Mar 10, 2021 at 11:37 PM Michal Hocko wrote: > > On Mon 08-03-21 18:28:04, Muchun Song wrote: > > Add a kernel parameter hugetlb_free_vmemmap to enable the feature of > > freeing unused vmemmap pages associated with each hugetlb page on boot. > > > > We disables PMD mapping of vmemmap pages for x86-64 arch when this > > feature is enabled. Because vmemmap_remap_free() depends on vmemmap > > being base page mapped. > > > > Signed-off-by: Muchun Song > > Reviewed-by: Oscar Salvador > > Reviewed-by: Barry Song > > Reviewed-by: Miaohe Lin > > Tested-by: Chen Huang > > Tested-by: Bodeddula Balasubramaniam > > --- > > Documentation/admin-guide/kernel-parameters.txt | 14 ++ > > Documentation/admin-guide/mm/hugetlbpage.rst| 3 +++ > > arch/x86/mm/init_64.c | 8 ++-- > > include/linux/hugetlb.h | 19 +++ > > mm/hugetlb_vmemmap.c| 24 > > > > 5 files changed, 66 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > > b/Documentation/admin-guide/kernel-parameters.txt > > index 04545725f187..de91d54573c4 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1557,6 +1557,20 @@ > > Documentation/admin-guide/mm/hugetlbpage.rst. > > Format: size[KMG] > > > > + hugetlb_free_vmemmap= > > + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, > > + this controls freeing unused vmemmap pages associated > > + with each HugeTLB page. When this option is enabled, > > + we disable PMD/huge page mapping of vmemmap pages > > which > > + increase page table pages. So if a user/sysadmin only > > + uses a small number of HugeTLB pages (as a percentage > > + of system memory), they could end up using more memory > > + with hugetlb_free_vmemmap on as opposed to off. > > + Format: { on | off (default) } > > Please note this is an admin guide and for those this seems overly low OK. > level. I would use something like the following > [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > enabled. > Allows heavy hugetlb users to free up some more > memory (6 * PAGE_SIZE for each 2MB hugetlb > page). > This feauture is not free though. Large page > tables are not use to back vmemmap pages which > can lead to a performance degradation for some > workloads. Also there will be memory allocation > required when hugetlb pages are freed from the > pool which can lead to corner cases under heavy > memory pressure. Very thanks. I will update this. > > + > > + on: enable the feature > > + off: disable the feature > > + > > hung_task_panic= > > [KNL] Should the hung task detector generate panics. > > Format: 0 | 1 > -- > Michal Hocko > SUSE Labs
[unixbhas...@gmail.com: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S]
- Forwarded message from Bhaskar Chowdhury - Date: Thu, 11 Mar 2021 11:40:58 +0530 From: Bhaskar Chowdhury To: unixbhas...@gmail.com, linux-i...@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S X-Mailer: git-send-email 2.26.2 Mundane spelling fixes. Signed-off-by: Bhaskar Chowdhury --- Randy and Adrian felt it should go through your tree..so... arch/ia64/kernel/fsys.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S index 0750a716adc7..2094f3249019 100644 --- a/arch/ia64/kernel/fsys.S +++ b/arch/ia64/kernel/fsys.S @@ -172,7 +172,7 @@ ENTRY(fsys_gettimeofday) // r25 = itc_lastcycle value // r26 = address clocksource cycle_last // r27 = (not used) - // r28 = sequence number at the beginning of critcal section + // r28 = sequence number at the beginning of critical section // r29 = address of itc_jitter // r30 = time processing flags / memory address // r31 = pointer to result @@ -432,7 +432,7 @@ GLOBAL_ENTRY(fsys_bubble_down) * - r29: psr * * We used to clear some PSR bits here but that requires slow -* serialization. Fortuntely, that isn't really necessary. +* serialization. Fortunately, that isn't really necessary. * The rationale is as follows: we used to clear bits * ~PSR_PRESERVED_BITS in PSR.L. Since * PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we -- 2.26.2 - End forwarded message -
Re: [External] Re: [PATCH v18 5/9] mm: hugetlb: set the PageHWPoison to the raw error page
On Wed, Mar 10, 2021 at 11:28 PM Michal Hocko wrote: > > On Mon 08-03-21 18:28:03, Muchun Song wrote: > > Because we reuse the first tail vmemmap page frame and remap it > > with read-only, we cannot set the PageHWPosion on some tail pages. > > So we can use the head[4].private (There are at least 128 struct > > page structures associated with the optimized HugeTLB page, so > > using head[4].private is safe) to record the real error page index > > and set the raw error page PageHWPoison later. > > Can we have more poisoned tail pages? Also who does consume that index > and set the HWPoison on the proper tail page? Good point. I look at the routine of memory failure closely. If we do not clear the HWPoison of the head page, we cannot poison another tail page. So we should not set the destructor of the huge page from HUGETLB_PAGE_DTOR to NULL_COMPOUND_DTOR before calling alloc_huge_page_vmemmap(). In this case, the below check of PageHuge() always returns true. I need to fix this in the previous patch. memory_failure() if (PageHuge(page)) memory_failure_hugetlb() head = compound_head(page) if (TestSetPageHWPoison(head)) return Thanks. > > > Signed-off-by: Muchun Song > > Reviewed-by: Oscar Salvador > > Acked-by: David Rientjes > > Tested-by: Chen Huang > > Tested-by: Bodeddula Balasubramaniam > > --- > > mm/hugetlb.c | 80 > > ++-- > > 1 file changed, 72 insertions(+), 8 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 377e0c1b283f..c0c1b7635ca9 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1304,6 +1304,74 @@ static inline void > > destroy_compound_gigantic_page(struct page *page, > > unsigned int order) { } > > #endif > > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > > +static inline void hwpoison_subpage_deliver(struct hstate *h, struct page > > *head) > > +{ > > + struct page *page; > > + > > + if (!PageHWPoison(head) || !free_vmemmap_pages_per_hpage(h)) > > + return; > > + > > + page = head + page_private(head + 4); > > + > > + /* > > + * Move PageHWPoison flag from head page to the raw error page, > > + * which makes any subpages rather than the error page reusable. > > + */ > > + if (page != head) { > > + SetPageHWPoison(page); > > + ClearPageHWPoison(head); > > + } > > +} > > + > > +static inline void hwpoison_subpage_set(struct hstate *h, struct page > > *head, > > + struct page *page) > > +{ > > + if (!PageHWPoison(head)) > > + return; > > + > > + if (free_vmemmap_pages_per_hpage(h)) { > > + set_page_private(head + 4, page - head); > > + } else if (page != head) { > > + /* > > + * Move PageHWPoison flag from head page to the raw error > > page, > > + * which makes any subpages rather than the error page > > reusable. > > + */ > > + SetPageHWPoison(page); > > + ClearPageHWPoison(head); > > + } > > +} > > + > > +static inline void hwpoison_subpage_clear(struct hstate *h, struct page > > *head) > > +{ > > + if (!PageHWPoison(head) || !free_vmemmap_pages_per_hpage(h)) > > + return; > > + > > + set_page_private(head + 4, 0); > > +} > > +#else > > +static inline void hwpoison_subpage_deliver(struct hstate *h, struct page > > *head) > > +{ > > +} > > + > > +static inline void hwpoison_subpage_set(struct hstate *h, struct page > > *head, > > + struct page *page) > > +{ > > + if (PageHWPoison(head) && page != head) { > > + /* > > + * Move PageHWPoison flag from head page to the raw error > > page, > > + * which makes any subpages rather than the error page > > reusable. > > + */ > > + SetPageHWPoison(page); > > + ClearPageHWPoison(head); > > + } > > +} > > + > > +static inline void hwpoison_subpage_clear(struct hstate *h, struct page > > *head) > > +{ > > +} > > +#endif > > + > > static int update_and_free_page(struct hstate *h, struct page *page) > > __releases(_lock) __acquires(_lock) > > { > > @@ -1357,6 +1425,8 @@ static int update_and_free_page(struct hstate *h, > > struct page *page) > > return -ENOMEM; > > } > > > > + hwpoison_subpage_deliver(h, page); > > + > > for (i = 0; i < pages_per_huge_page(h); > >i++, subpage = mem_map_next(subpage, page, i)) { > > subpage->flags &= ~(1 << PG_locked | 1 << PG_error | > > @@ -1801,14 +1871,7 @@ int dissolve_free_huge_page(struct page *page) > > goto retry; > > } > > > > - /* > > - * Move PageHWPoison flag from head page to the raw error > > page, > > -
'make O=' indigestion with module signing
So, I tried doing a 'make O=... allmodconfig', with a setup where the uid of the build process had write permission to the O= directory, but intentionally did *not* have write permission to the source tree (so they couldn't mess up the tree - I got tired of having to repeatedly do 'make mrproper' because of pilot error) allmodconfig gave me a .config that had: CONFIG_MODULE_SIG_FORMAT=y CONFIG_MODULE_SIG=y CONFIG_MODULE_SIG_FORCE=y CONFIG_MODULE_SIG_ALL=y CONFIG_MODULE_SIG_SHA1=y # CONFIG_MODULE_SIG_SHA224 is not set # CONFIG_MODULE_SIG_SHA256 is not set # CONFIG_MODULE_SIG_SHA384 is not set # CONFIG_MODULE_SIG_SHA512 is not set CONFIG_MODULE_SIG_HASH="sha1" CONFIG_IMA_APPRAISE_REQUIRE_MODULE_SIGS=y CONFIG_MODULE_SIG_KEY="certs/signing_key.pem" What i *expected* was that multiple builds with different O= would each generate themselves a unique signing key and put it in their own O= directory and stay out of each other's way. What actually happened: EXTRACT_CERTS /usr/src/linux-next/"certs/signing_key.pem" At main.c:142: - SSL error:0200100D:system library:fopen:Permission denied: ../crypto/bio/bss_file.c:69 - SSL error:2006D002:BIO routines:BIO_new_file:system lib: ../crypto/bio/bss_file.c:78 extract-cert: /usr/src/linux-next/certs/signing_key.pem: Permission denied make[2]: *** [/usr/src/linux-next/certs/Makefile:106: certs/signing_key.x509] Error 1 make[1]: *** [/usr/src/linux-next/Makefile:1847: certs] Error 2 make[1]: Leaving directory '/usr/src/linux-next/out/arm64' make: *** [Makefile:215: __sub-make] Error 2 It tried to put the key into the source tree rather than the build tree. Before I try to code up a fix for this, is this intentionally designed behavior, or have I just managed to trip over a rarely-tested corner case? pgpL1B7YjVYls.pgp Description: PGP signature
[PATCH] kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
This is a remnant of commit 78046fabe6e7 ("kbuild: determine the output format of DTC by the target suffix"). The parameter "yaml" is meaningless because cmd_dtc no loner takes $(2). Reported-by: Rob Herring Signed-off-by: Masahiro Yamada --- scripts/Makefile.lib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 90a4e04cd8f5..8cd67b1b6d15 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -352,7 +352,7 @@ define rule_dtc endef $(obj)/%.dt.yaml: $(src)/%.dts $(DTC) $(DT_TMP_SCHEMA) FORCE - $(call if_changed_rule,dtc,yaml) + $(call if_changed_rule,dtc) dtc-tmp = $(subst $(comma),_,$(dot-target).dts.tmp) -- 2.27.0
[PATCH] scsi: Fix a double free in myrs_cleanup
In myrs_cleanup, cs->mmio_base will be freed twice by iounmap(). Fixes: 77266186397c6 ("scsi: myrs: Add Mylex RAID controller (SCSI interface)") Signed-off-by: Lv Yunlong --- drivers/scsi/myrs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/myrs.c b/drivers/scsi/myrs.c index 4adf9ded296a..329fd025c718 100644 --- a/drivers/scsi/myrs.c +++ b/drivers/scsi/myrs.c @@ -2273,12 +2273,12 @@ static void myrs_cleanup(struct myrs_hba *cs) if (cs->mmio_base) { cs->disable_intr(cs); iounmap(cs->mmio_base); + cs->mmio_base = NULL; } if (cs->irq) free_irq(cs->irq, cs); if (cs->io_addr) release_region(cs->io_addr, 0x80); - iounmap(cs->mmio_base); pci_set_drvdata(pdev, NULL); pci_disable_device(pdev); scsi_host_put(cs->host); -- 2.25.1
Re: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S
On 3/10/21 10:24 PM, Bhaskar Chowdhury wrote: > On 22:15 Wed 10 Mar 2021, Randy Dunlap wrote: >> On 3/10/21 10:10 PM, Bhaskar Chowdhury wrote: >>> >>> Mundane spelling fixes. >>> >>> Signed-off-by: Bhaskar Chowdhury >> >> Acked-by: Randy Dunlap >> >> but no maintainer Cc:ed to pick it up... >> > I have seen "maintainer less" files and you have pointed out that too in the > past...what do I do Randy??? Should run "git blame" to find out ,who touch > the file and > include those mail??? I don't think that would be good idea. > > Bemused!! Throw some light. Well, we know that ia64 is orphaned, so you need to find someone to merge the patch. Andrew Morton is usually a good bet. :) or use 'git log' to see who has been merging patches to that file. >>> --- >>> arch/ia64/kernel/fsys.S | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) -- ~Randy
Re: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S
On 22:15 Wed 10 Mar 2021, Randy Dunlap wrote: On 3/10/21 10:10 PM, Bhaskar Chowdhury wrote: Mundane spelling fixes. Signed-off-by: Bhaskar Chowdhury Acked-by: Randy Dunlap but no maintainer Cc:ed to pick it up... I have seen "maintainer less" files and you have pointed out that too in the past...what do I do Randy??? Should run "git blame" to find out ,who touch the file and include those mail??? I don't think that would be good idea. Bemused!! Throw some light. --- arch/ia64/kernel/fsys.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S index 0750a716adc7..2094f3249019 100644 --- a/arch/ia64/kernel/fsys.S +++ b/arch/ia64/kernel/fsys.S @@ -172,7 +172,7 @@ ENTRY(fsys_gettimeofday) // r25 = itc_lastcycle value // r26 = address clocksource cycle_last // r27 = (not used) - // r28 = sequence number at the beginning of critcal section + // r28 = sequence number at the beginning of critical section // r29 = address of itc_jitter // r30 = time processing flags / memory address // r31 = pointer to result @@ -432,7 +432,7 @@ GLOBAL_ENTRY(fsys_bubble_down) * - r29: psr * * We used to clear some PSR bits here but that requires slow -* serialization. Fortuntely, that isn't really necessary. +* serialization. Fortunately, that isn't really necessary. * The rationale is as follows: we used to clear bits * ~PSR_PRESERVED_BITS in PSR.L. Since * PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we -- -- ~Randy signature.asc Description: PGP signature
[net-next PATCH v7 16/16] net: dpaa2-mac: Add ACPI support for DPAA2 MAC driver
Modify dpaa2_mac_get_node() to get the dpmac fwnode from either DT or ACPI. Modify dpaa2_mac_get_if_mode() to get interface mode from dpmac_node which is a fwnode. Modify dpaa2_pcs_create() to create pcs from dpmac_node fwnode. Modify dpaa2_mac_connect() to support ACPI along with DT. Signed-off-by: Calvin Johnson --- Changes in v7: - remove unnecassary checks Changes in v6: - use dev_fwnode() - remove useless else - replace of_device_is_available() to fwnode_device_is_available() Changes in v5: - replace fwnode_get_id() with OF and ACPI function calls Changes in v4: None Changes in v3: None Changes in v2: - Refactor OF functions to use fwnode functions .../net/ethernet/freescale/dpaa2/dpaa2-mac.c | 84 +++ 1 file changed, 50 insertions(+), 34 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c index ccaf7e35abeb..0cb6760dce88 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c @@ -1,6 +1,9 @@ // SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause) /* Copyright 2019 NXP */ +#include +#include + #include "dpaa2-eth.h" #include "dpaa2-mac.h" @@ -34,39 +37,51 @@ static int phy_mode(enum dpmac_eth_if eth_if, phy_interface_t *if_mode) return 0; } -/* Caller must call of_node_put on the returned value */ -static struct device_node *dpaa2_mac_get_node(u16 dpmac_id) +static struct fwnode_handle *dpaa2_mac_get_node(struct device *dev, + u16 dpmac_id) { - struct device_node *dpmacs, *dpmac = NULL; - u32 id; + struct fwnode_handle *fwnode, *parent, *child = NULL; + struct device_node *dpmacs = NULL; int err; + u32 id; - dpmacs = of_find_node_by_name(NULL, "dpmacs"); - if (!dpmacs) - return NULL; + fwnode = dev_fwnode(dev->parent); + if (is_of_node(fwnode)) { + dpmacs = of_find_node_by_name(NULL, "dpmacs"); + if (!dpmacs) + return NULL; + parent = of_fwnode_handle(dpmacs); + } else if (is_acpi_node(fwnode)) { + parent = fwnode; + } - while ((dpmac = of_get_next_child(dpmacs, dpmac)) != NULL) { - err = of_property_read_u32(dpmac, "reg", ); + fwnode_for_each_child_node(parent, child) { + err = -EINVAL; + if (is_acpi_device_node(child)) + err = acpi_get_local_address(ACPI_HANDLE_FWNODE(child), ); + else if (is_of_node(child)) + err = of_property_read_u32(to_of_node(child), "reg", ); if (err) continue; - if (id == dpmac_id) - break; - } + if (id == dpmac_id) { + of_node_put(dpmacs); + return child; + } + } of_node_put(dpmacs); - - return dpmac; + return NULL; } -static int dpaa2_mac_get_if_mode(struct device_node *node, +static int dpaa2_mac_get_if_mode(struct fwnode_handle *dpmac_node, struct dpmac_attr attr) { phy_interface_t if_mode; int err; - err = of_get_phy_mode(node, _mode); - if (!err) - return if_mode; + err = fwnode_get_phy_mode(dpmac_node); + if (err > 0) + return err; err = phy_mode(attr.eth_if, _mode); if (!err) @@ -235,26 +250,27 @@ static const struct phylink_mac_ops dpaa2_mac_phylink_ops = { }; static int dpaa2_pcs_create(struct dpaa2_mac *mac, - struct device_node *dpmac_node, int id) + struct fwnode_handle *dpmac_node, + int id) { struct mdio_device *mdiodev; - struct device_node *node; + struct fwnode_handle *node; - node = of_parse_phandle(dpmac_node, "pcs-handle", 0); - if (!node) { + node = fwnode_find_reference(dpmac_node, "pcs-handle", 0); + if (IS_ERR(node)) { /* do not error out on old DTS files */ netdev_warn(mac->net_dev, "pcs-handle node not found\n"); return 0; } - if (!of_device_is_available(node)) { + if (!fwnode_device_is_available(node)) { netdev_err(mac->net_dev, "pcs-handle node not available\n"); - of_node_put(node); + fwnode_handle_put(node); return -ENODEV; } - mdiodev = of_mdio_find_device(node); - of_node_put(node); + mdiodev = fwnode_mdio_find_device(node); + fwnode_handle_put(node); if (!mdiodev) return -EPROBE_DEFER; @@ -283,13 +299,13 @@ static void dpaa2_pcs_destroy(struct dpaa2_mac *mac) int dpaa2_mac_connect(struct dpaa2_mac *mac) {
[net-next PATCH v7 14/16] net: phylink: introduce phylink_fwnode_phy_connect()
Define phylink_fwnode_phy_connect() to connect phy specified by a fwnode to a phylink instance. Signed-off-by: Calvin Johnson --- Changes in v7: None Changes in v6: - remove OF check for fixed-link Changes in v5: None Changes in v4: - call phy_device_free() before returning Changes in v3: None Changes in v2: None drivers/net/phy/phylink.c | 54 +++ include/linux/phylink.h | 3 +++ 2 files changed, 57 insertions(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 053c92e02cd8..23753f92e0a6 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -5,6 +5,7 @@ * * Copyright (C) 2015 Russell King */ +#include #include #include #include @@ -1124,6 +1125,59 @@ int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn, } EXPORT_SYMBOL_GPL(phylink_of_phy_connect); +/** + * phylink_fwnode_phy_connect() - connect the PHY specified in the fwnode. + * @pl: a pointer to a phylink returned from phylink_create() + * @fwnode: a pointer to a fwnode_handle. + * @flags: PHY-specific flags to communicate to the PHY device driver + * + * Connect the phy specified @fwnode to the phylink instance specified + * by @pl. + * + * Returns 0 on success or a negative errno. + */ +int phylink_fwnode_phy_connect(struct phylink *pl, + struct fwnode_handle *fwnode, + u32 flags) +{ + struct fwnode_handle *phy_fwnode; + struct phy_device *phy_dev; + int ret; + + /* Fixed links and 802.3z are handled without needing a PHY */ + if (pl->cfg_link_an_mode == MLO_AN_FIXED || + (pl->cfg_link_an_mode == MLO_AN_INBAND && +phy_interface_mode_is_8023z(pl->link_interface))) + return 0; + + phy_fwnode = fwnode_get_phy_node(fwnode); + if (IS_ERR(phy_fwnode)) { + if (pl->cfg_link_an_mode == MLO_AN_PHY) + return -ENODEV; + return 0; + } + + phy_dev = fwnode_phy_find_device(phy_fwnode); + /* We're done with the phy_node handle */ + fwnode_handle_put(phy_fwnode); + if (!phy_dev) + return -ENODEV; + + ret = phy_attach_direct(pl->netdev, phy_dev, flags, + pl->link_interface); + if (ret) { + phy_device_free(phy_dev); + return ret; + } + + ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface); + if (ret) + phy_detach(phy_dev); + + return ret; +} +EXPORT_SYMBOL_GPL(phylink_fwnode_phy_connect); + /** * phylink_disconnect_phy() - disconnect any PHY attached to the phylink * instance. diff --git a/include/linux/phylink.h b/include/linux/phylink.h index d81a714cfbbd..75d4f99090fd 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -439,6 +439,9 @@ void phylink_destroy(struct phylink *); int phylink_connect_phy(struct phylink *, struct phy_device *); int phylink_of_phy_connect(struct phylink *, struct device_node *, u32 flags); +int phylink_fwnode_phy_connect(struct phylink *pl, + struct fwnode_handle *fwnode, + u32 flags); void phylink_disconnect_phy(struct phylink *); void phylink_mac_change(struct phylink *, bool up); -- 2.17.1
[net-next PATCH v7 15/16] net: phylink: Refactor phylink_of_phy_connect()
Refactor phylink_of_phy_connect() to use phylink_fwnode_phy_connect(). Signed-off-by: Calvin Johnson --- Changes in v7: None Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: None drivers/net/phy/phylink.c | 39 +-- 1 file changed, 1 insertion(+), 38 deletions(-) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 23753f92e0a6..ce7e918430c8 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -1084,44 +1084,7 @@ EXPORT_SYMBOL_GPL(phylink_connect_phy); int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn, u32 flags) { - struct device_node *phy_node; - struct phy_device *phy_dev; - int ret; - - /* Fixed links and 802.3z are handled without needing a PHY */ - if (pl->cfg_link_an_mode == MLO_AN_FIXED || - (pl->cfg_link_an_mode == MLO_AN_INBAND && -phy_interface_mode_is_8023z(pl->link_interface))) - return 0; - - phy_node = of_parse_phandle(dn, "phy-handle", 0); - if (!phy_node) - phy_node = of_parse_phandle(dn, "phy", 0); - if (!phy_node) - phy_node = of_parse_phandle(dn, "phy-device", 0); - - if (!phy_node) { - if (pl->cfg_link_an_mode == MLO_AN_PHY) - return -ENODEV; - return 0; - } - - phy_dev = of_phy_find_device(phy_node); - /* We're done with the phy_node handle */ - of_node_put(phy_node); - if (!phy_dev) - return -ENODEV; - - ret = phy_attach_direct(pl->netdev, phy_dev, flags, - pl->link_interface); - if (ret) - return ret; - - ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface); - if (ret) - phy_detach(phy_dev); - - return ret; + return phylink_fwnode_phy_connect(pl, of_fwnode_handle(dn), flags); } EXPORT_SYMBOL_GPL(phylink_of_phy_connect); -- 2.17.1
[net-next PATCH v7 13/16] net/fsl: Use fwnode_mdiobus_register()
fwnode_mdiobus_register() internally takes care of both DT and ACPI cases to register mdiobus. Replace existing of_mdiobus_register() with fwnode_mdiobus_register(). Note: For both ACPI and DT cases, endianness of MDIO controller need to be specified using "little-endian" property. Signed-off-by: Calvin Johnson --- Changes in v7: - Include fwnode_mdio.h - Alphabetically sort header inclusions Changes in v6: None Changes in v5: None Changes in v4: - Cleanup xgmac_mdio_probe() Changes in v3: - Avoid unnecessary line removal - Remove unused inclusion of acpi.h Changes in v2: None drivers/net/ethernet/freescale/xgmac_mdio.c | 22 - 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/freescale/xgmac_mdio.c b/drivers/net/ethernet/freescale/xgmac_mdio.c index bfa2826c5545..6daf1fb2e9ea 100644 --- a/drivers/net/ethernet/freescale/xgmac_mdio.c +++ b/drivers/net/ethernet/freescale/xgmac_mdio.c @@ -2,6 +2,7 @@ * QorIQ 10G MDIO Controller * * Copyright 2012 Freescale Semiconductor, Inc. + * Copyright 2021 NXP * * Authors: Andy Fleming * Timur Tabi @@ -11,15 +12,16 @@ * kind, whether express or implied. */ -#include -#include +#include #include -#include -#include +#include #include +#include #include -#include #include +#include +#include +#include /* Number of microseconds to wait for a register to respond */ #define TIMEOUT1000 @@ -243,10 +245,9 @@ static int xgmac_mdio_read(struct mii_bus *bus, int phy_id, int regnum) static int xgmac_mdio_probe(struct platform_device *pdev) { - struct device_node *np = pdev->dev.of_node; - struct mii_bus *bus; - struct resource *res; struct mdio_fsl_priv *priv; + struct resource *res; + struct mii_bus *bus; int ret; /* In DPAA-1, MDIO is one of the many FMan sub-devices. The FMan @@ -279,13 +280,16 @@ static int xgmac_mdio_probe(struct platform_device *pdev) goto err_ioremap; } + /* For both ACPI and DT cases, endianness of MDIO controller +* needs to be specified using "little-endian" property. +*/ priv->is_little_endian = device_property_read_bool(>dev, "little-endian"); priv->has_a011043 = device_property_read_bool(>dev, "fsl,erratum-a011043"); - ret = of_mdiobus_register(bus, np); + ret = fwnode_mdiobus_register(bus, pdev->dev.fwnode); if (ret) { dev_err(>dev, "cannot register MDIO bus\n"); goto err_registration; -- 2.17.1
[net-next PATCH v7 12/16] net: mdiobus: Introduce fwnode_mdiobus_register()
Introduce fwnode_mdiobus_register() to register PHYs on the mdiobus. If the fwnode is DT node, then call of_mdiobus_register(). If it is an ACPI node, then call acpi_mdiobus_register(). Signed-off-by: Calvin Johnson --- Changes in v7: - Move fwnode_mdiobus_register() to fwnode_mdio.c Changes in v6: None Changes in v5: None Changes in v4: - Remove redundant else from fwnode_mdiobus_register() Changes in v3: - Use acpi_mdiobus_register() Changes in v2: None drivers/net/mdio/fwnode_mdio.c | 21 + include/linux/fwnode_mdio.h| 5 + 2 files changed, 26 insertions(+) diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c index 0982e816a6fb..523c2778b287 100644 --- a/drivers/net/mdio/fwnode_mdio.c +++ b/drivers/net/mdio/fwnode_mdio.c @@ -7,6 +7,7 @@ */ #include +#include #include #include #include @@ -75,3 +76,23 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus, return 0; } EXPORT_SYMBOL(fwnode_mdiobus_register_phy); + +/** + * fwnode_mdiobus_register - Register mii_bus and create PHYs from fwnode + * @mdio: pointer to mii_bus structure + * @fwnode: pointer to fwnode of MDIO bus. + * + * This function returns of_mdiobus_register() for DT and + * acpi_mdiobus_register() for ACPI. + */ +int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode) +{ + if (is_of_node(fwnode)) + return of_mdiobus_register(mdio, to_of_node(fwnode)); + + if (is_acpi_node(fwnode)) + return acpi_mdiobus_register(mdio, fwnode); + + return -EINVAL; +} +EXPORT_SYMBOL(fwnode_mdiobus_register); diff --git a/include/linux/fwnode_mdio.h b/include/linux/fwnode_mdio.h index 8c0392845916..20f22211260b 100644 --- a/include/linux/fwnode_mdio.h +++ b/include/linux/fwnode_mdio.h @@ -12,6 +12,7 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus, struct fwnode_handle *child, u32 addr); +int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode); #else /* CONFIG_FWNODE_MDIO */ static inline int fwnode_mdiobus_register_phy(struct mii_bus *bus, struct fwnode_handle *child, @@ -19,6 +20,10 @@ static inline int fwnode_mdiobus_register_phy(struct mii_bus *bus, { return -EINVAL; } +static int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode) +{ + return -EINVAL; +} #endif #endif /* __LINUX_FWNODE_MDIO_H */ -- 2.17.1
[net-next PATCH v7 11/16] net: mdio: Add ACPI support code for mdio
Define acpi_mdiobus_register() to Register mii_bus and create PHYs for each ACPI child node. Signed-off-by: Calvin Johnson --- Changes in v7: - Include headers directly used in acpi_mdio.c Changes in v6: - use GENMASK() and ACPI_COMPANION_SET() - some cleanup - remove unwanted header inclusion Changes in v5: - add missing MODULE_LICENSE() - replace fwnode_get_id() with OF and ACPI function calls Changes in v4: None Changes in v3: None Changes in v2: None MAINTAINERS | 1 + drivers/net/mdio/Kconfig | 7 + drivers/net/mdio/Makefile| 1 + drivers/net/mdio/acpi_mdio.c | 56 include/linux/acpi_mdio.h| 25 5 files changed, 90 insertions(+) create mode 100644 drivers/net/mdio/acpi_mdio.c create mode 100644 include/linux/acpi_mdio.h diff --git a/MAINTAINERS b/MAINTAINERS index 146de41d2656..051377b7fa94 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6680,6 +6680,7 @@ F:Documentation/devicetree/bindings/net/mdio* F: Documentation/devicetree/bindings/net/qca,ar803x.yaml F: Documentation/networking/phy.rst F: drivers/net/mdio/ +F: drivers/net/mdio/acpi_mdio.c F: drivers/net/mdio/fwnode_mdio.c F: drivers/net/mdio/of_mdio.c F: drivers/net/pcs/ diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig index 2d5bf5ccffb5..fc8c787b448f 100644 --- a/drivers/net/mdio/Kconfig +++ b/drivers/net/mdio/Kconfig @@ -36,6 +36,13 @@ config OF_MDIO help OpenFirmware MDIO bus (Ethernet PHY) accessors +config ACPI_MDIO + def_tristate PHYLIB + depends on ACPI + depends on PHYLIB + help + ACPI MDIO bus (Ethernet PHY) accessors + if MDIO_BUS config MDIO_DEVRES diff --git a/drivers/net/mdio/Makefile b/drivers/net/mdio/Makefile index ea5390e2ef84..e8b739a3df1c 100644 --- a/drivers/net/mdio/Makefile +++ b/drivers/net/mdio/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for Linux MDIO bus drivers +obj-$(CONFIG_ACPI_MDIO)+= acpi_mdio.o obj-$(CONFIG_FWNODE_MDIO) += fwnode_mdio.o obj-$(CONFIG_OF_MDIO) += of_mdio.o diff --git a/drivers/net/mdio/acpi_mdio.c b/drivers/net/mdio/acpi_mdio.c new file mode 100644 index ..60a86e3fc246 --- /dev/null +++ b/drivers/net/mdio/acpi_mdio.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * ACPI helpers for the MDIO (Ethernet PHY) API + * + * This file provides helper functions for extracting PHY device information + * out of the ACPI ASL and using it to populate an mii_bus. + */ + +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Calvin Johnson "); +MODULE_LICENSE("GPL"); + +/** + * acpi_mdiobus_register - Register mii_bus and create PHYs from the ACPI ASL. + * @mdio: pointer to mii_bus structure + * @fwnode: pointer to fwnode of MDIO bus. + * + * This function registers the mii_bus structure and registers a phy_device + * for each child node of @fwnode. + */ +int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode) +{ + struct fwnode_handle *child; + u32 addr; + int ret; + + /* Mask out all PHYs from auto probing. */ + mdio->phy_mask = GENMASK(31, 0); + ret = mdiobus_register(mdio); + if (ret) + return ret; + + ACPI_COMPANION_SET(>dev, to_acpi_device_node(fwnode)); + + /* Loop over the child nodes and register a phy_device for each PHY */ + fwnode_for_each_child_node(fwnode, child) { + ret = acpi_get_local_address(ACPI_HANDLE_FWNODE(child), ); + if (ret || addr >= PHY_MAX_ADDR) + continue; + + ret = fwnode_mdiobus_register_phy(mdio, child, addr); + if (ret == -ENODEV) + dev_err(>dev, + "MDIO device at address %d is missing.\n", + addr); + } + return 0; +} +EXPORT_SYMBOL(acpi_mdiobus_register); diff --git a/include/linux/acpi_mdio.h b/include/linux/acpi_mdio.h new file mode 100644 index ..748d261fe2f9 --- /dev/null +++ b/include/linux/acpi_mdio.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * ACPI helper for the MDIO (Ethernet PHY) API + */ + +#ifndef __LINUX_ACPI_MDIO_H +#define __LINUX_ACPI_MDIO_H + +#include + +#if IS_ENABLED(CONFIG_ACPI_MDIO) +int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode); +#else /* CONFIG_ACPI_MDIO */ +static inline int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode) +{ + /* +* Fall back to mdiobus_register() function to register a bus. +* This way, we don't have to keep compat bits around in drivers. +*/ + + return mdiobus_register(mdio); +} +#endif + +#endif /* __LINUX_ACPI_MDIO_H */ -- 2.17.1
[net-next PATCH v7 09/16] of: mdio: Refactor of_mdiobus_register_phy()
Refactor of_mdiobus_register_phy() to use fwnode_mdiobus_register_phy(). Signed-off-by: Calvin Johnson --- Changes in v7: - include fwnode_mdio.h Changes in v6: None Changes in v5: None Changes in v4: None Changes in v3: None Changes in v2: None drivers/net/mdio/of_mdio.c | 39 ++ 1 file changed, 2 insertions(+), 37 deletions(-) diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c index db293e0b8249..eebd4d9e1656 100644 --- a/drivers/net/mdio/of_mdio.c +++ b/drivers/net/mdio/of_mdio.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -98,43 +99,7 @@ EXPORT_SYMBOL(of_mdiobus_phy_device_register); static int of_mdiobus_register_phy(struct mii_bus *mdio, struct device_node *child, u32 addr) { - struct mii_timestamper *mii_ts; - struct phy_device *phy; - bool is_c45; - int rc; - u32 phy_id; - - mii_ts = of_find_mii_timestamper(child); - if (IS_ERR(mii_ts)) - return PTR_ERR(mii_ts); - - is_c45 = of_device_is_compatible(child, -"ethernet-phy-ieee802.3-c45"); - - if (!is_c45 && !of_get_phy_id(child, _id)) - phy = phy_device_create(mdio, addr, phy_id, 0, NULL); - else - phy = get_phy_device(mdio, addr, is_c45); - if (IS_ERR(phy)) { - unregister_mii_timestamper(mii_ts); - return PTR_ERR(phy); - } - - rc = of_mdiobus_phy_device_register(mdio, phy, child, addr); - if (rc) { - unregister_mii_timestamper(mii_ts); - phy_device_free(phy); - return rc; - } - - /* phy->mii_ts may already be defined by the PHY driver. A -* mii_timestamper probed via the device tree will still have -* precedence. -*/ - if (mii_ts) - phy->mii_ts = mii_ts; - - return 0; + return fwnode_mdiobus_register_phy(mdio, of_fwnode_handle(child), addr); } static int of_mdiobus_register_device(struct mii_bus *mdio, -- 2.17.1