Re: [PATCH 8/9] vfio/pci: export nvlink2 support into vendor vfio_pci drivers

2021-03-10 Thread Alexey Kardashevskiy




On 11/03/2021 13:00, Jason Gunthorpe wrote:

On Thu, Mar 11, 2021 at 12:42:56PM +1100, Alexey Kardashevskiy wrote:

btw can the id list have only vendor ids and not have device ids?


The PCI matcher is quite flexable, see the other patch from Max for
the igd
  
ah cool, do this for NVIDIA GPUs then please, I just discovered another P9

system sold with NVIDIA T4s which is not in your list.


I think it will make things easier down the road if you maintain an
exact list 



Then why do not you do the exact list for Intel IGD? The commit log does 
not explain this detail.




But best practice is to be as narrow as possible as I hope this will
eventually impact module autoloading and other details.


The amount of device specific knowledge is too little to tie it up to device
ids, it is a generic PCI driver with quirks. We do not have a separate
drivers for the hardware which requires quirks.


It provides its own capability structure exposed to userspace, that is
absolutely not a "quirk"


And how do you hope this should impact autoloading?


I would like to autoload the most specific vfio driver for the target
hardware.



Is there an idea how it is going to work? For example, the Intel IGD 
driver and vfio-pci-igd - how should the system pick one? If there is no 
MODULE_DEVICE_TABLE in vfio-pci-xxx, is the user supposed to try binding 
all vfio-pci-xxx drivers until some binds?




If you someday need to support new GPU HW that needs a different VFIO
driver then you are really stuck because things become indeterminate
if there are two devices claiming the ID. We don't have the concept of
"best match", driver core works on exact match.




--
Alexey


Re: [PATCH v5 04/10] scsi: ufshpb: Make eviction depends on region's reads

2021-03-10 Thread Can Guo

Hi Avri,

On 2021-03-02 21:24, Avri Altman wrote:

In host mode, eviction is considered an extreme measure.
verify that the entering region has enough reads, and the exiting
region has much less reads.

Signed-off-by: Avri Altman 
---
 drivers/scsi/ufs/ufshpb.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index a8f8d13af21a..6f4fd22eaf2f 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -17,6 +17,7 @@
 #include "../sd.h"

 #define ACTIVATION_THRESHOLD 4 /* 4 IOs */
+#define EVICTION_THRESHOLD (ACTIVATION_THRESHOLD << 6) /* 256 IOs */


Same here, can this be added as a sysfs entry?

Thanks,
Can Guo.



 /* memory management */
 static struct kmem_cache *ufshpb_mctx_cache;
@@ -1050,6 +1051,13 @@ static struct ufshpb_region
*ufshpb_victim_lru_info(struct ufshpb_lu *hpb)
if (ufshpb_check_srgns_issue_state(hpb, rgn))
continue;

+   /*
+* in host control mode, verify that the exiting region
+* has less reads
+*/
+   if (hpb->is_hcm && rgn->reads > (EVICTION_THRESHOLD >> 1))
+   continue;
+
victim_rgn = rgn;
break;
}
@@ -1235,7 +1243,7 @@ static int ufshpb_issue_map_req(struct ufshpb_lu 
*hpb,


 static int ufshpb_add_region(struct ufshpb_lu *hpb, struct 
ufshpb_region *rgn)

 {
-   struct ufshpb_region *victim_rgn;
+   struct ufshpb_region *victim_rgn = NULL;
struct victim_select_info *lru_info = >lru_info;
unsigned long flags;
int ret = 0;
@@ -1263,6 +1271,16 @@ static int ufshpb_add_region(struct ufshpb_lu
*hpb, struct ufshpb_region *rgn)
 * because the device could detect this region
 * by not issuing HPB_READ
 */
+
+   /*
+* in host control mode, verify that the entering
+* region has enough reads
+*/
+   if (hpb->is_hcm && rgn->reads < EVICTION_THRESHOLD) {
+   ret = -EACCES;
+   goto out;
+   }
+
victim_rgn = ufshpb_victim_lru_info(hpb);
if (!victim_rgn) {
dev_warn(>sdev_ufs_lu->sdev_dev,


Re: [PATCH v4 1/3] bus: mhi: core: Introduce internal register poll helper function

2021-03-10 Thread Loic Poulain
Hi Bhaumik,

On Thu, 11 Mar 2021 at 00:31, Bhaumik Bhatt  wrote:
>
> Introduce helper function to allow MHI core driver to poll for
> a value in a register field. This helps reach a common path to
> read and poll register values along with a retry time interval.
>
> Signed-off-by: Bhaumik Bhatt 
> ---
>  drivers/bus/mhi/core/internal.h |  3 +++
>  drivers/bus/mhi/core/main.c | 23 +++
>  2 files changed, 26 insertions(+)
>
> diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
> index 6f80ec3..005286b 100644
> --- a/drivers/bus/mhi/core/internal.h
> +++ b/drivers/bus/mhi/core/internal.h
> @@ -643,6 +643,9 @@ int __must_check mhi_read_reg(struct mhi_controller 
> *mhi_cntrl,
>  int __must_check mhi_read_reg_field(struct mhi_controller *mhi_cntrl,
> void __iomem *base, u32 offset, u32 mask,
> u32 shift, u32 *out);
> +int __must_check mhi_poll_reg_field(struct mhi_controller *mhi_cntrl,
> +   void __iomem *base, u32 offset, u32 mask,
> +   u32 shift, u32 val, u32 delayus);
>  void mhi_write_reg(struct mhi_controller *mhi_cntrl, void __iomem *base,
>u32 offset, u32 val);
>  void mhi_write_reg_field(struct mhi_controller *mhi_cntrl, void __iomem 
> *base,
> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
> index 4e0131b..7c7f41a 100644
> --- a/drivers/bus/mhi/core/main.c
> +++ b/drivers/bus/mhi/core/main.c
> @@ -4,6 +4,7 @@
>   *
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -37,6 +38,28 @@ int __must_check mhi_read_reg_field(struct mhi_controller 
> *mhi_cntrl,
> return 0;
>  }
>
> +int __must_check mhi_poll_reg_field(struct mhi_controller *mhi_cntrl,
> +   void __iomem *base, u32 offset,
> +   u32 mask, u32 shift, u32 val, u32 delayus)
> +{
> +   int ret;
> +   u32 out, retry = (mhi_cntrl->timeout_ms * 1000) / delayus;
> +
> +   while (retry--) {
> +   ret = mhi_read_reg_field(mhi_cntrl, base, offset, mask, shift,
> +);
> +   if (ret)
> +   return ret;
> +
> +   if (out == val)
> +   return 0;
> +
> +   udelay(delayus);

Have you read my previous comment?
Do you really want to risk hogging the CPU for several seconds? we
know that some devices take several seconds to start/boot.
Why not using msleep variant here?

Regards,
Loic


Re: [PATCH v5 03/10] scsi: ufshpb: Add region's reads counter

2021-03-10 Thread Can Guo

Hi Avri,

On 2021-03-02 21:24, Avri Altman wrote:

In host control mode, reads are the major source of activation trials.
Keep track of those reads counters, for both active as well inactive
regions.

We reset the read counter upon write - we are only interested in 
"clean"

reads.  less intuitive however, is that we also reset it upon region's
deactivation.  Region deactivation is often due to the fact that
eviction took place: a region become active on the expense of another.
This is happening when the max-active-regions limit has crossed. If we
don’t reset the counter, we will trigger a lot of trashing of the HPB
database, since few reads (or even one) to the region that was
deactivated, will trigger a re-activation trial.

Keep those counters normalized, as we are using those reads as a
comparative score, to make various decisions.
If during consecutive normalizations an active region has exhaust its
reads - inactivate it.

Signed-off-by: Avri Altman 
---
 drivers/scsi/ufs/ufshpb.c | 102 --
 drivers/scsi/ufs/ufshpb.h |   5 ++
 2 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index 044fec9854a0..a8f8d13af21a 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -16,6 +16,8 @@
 #include "ufshpb.h"
 #include "../sd.h"

+#define ACTIVATION_THRESHOLD 4 /* 4 IOs */


Can this param be added as a sysfs entry?

Thanks,
Can Guo


+
 /* memory management */
 static struct kmem_cache *ufshpb_mctx_cache;
 static mempool_t *ufshpb_mctx_pool;
@@ -554,6 +556,21 @@ static int ufshpb_issue_pre_req(struct ufshpb_lu
*hpb, struct scsi_cmnd *cmd,
return ret;
 }

+static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int 
rgn_idx,

+ int srgn_idx)
+{
+   struct ufshpb_region *rgn;
+   struct ufshpb_subregion *srgn;
+
+   rgn = hpb->rgn_tbl + rgn_idx;
+   srgn = rgn->srgn_tbl + srgn_idx;
+
+   list_del_init(>list_inact_rgn);
+
+   if (list_empty(>list_act_srgn))
+   list_add_tail(>list_act_srgn, >lh_act_srgn);
+}
+
 /*
  * This function will set up HPB read command using host-side L2P map 
data.

  */
@@ -600,12 +617,44 @@ int ufshpb_prep(struct ufs_hba *hba, struct
ufshcd_lrb *lrbp)
ufshpb_set_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
 transfer_len);
spin_unlock_irqrestore(>rgn_state_lock, flags);
+
+   if (hpb->is_hcm) {
+   spin_lock_irqsave(>rgn_lock, flags);
+   rgn->reads = 0;
+   spin_unlock_irqrestore(>rgn_lock, flags);
+   }
+
return 0;
}

if (!ufshpb_is_support_chunk(hpb, transfer_len))
return 0;

+   if (hpb->is_hcm) {
+   bool activate = false;
+   /*
+* in host control mode, reads are the main source for
+* activation trials.
+*/
+   spin_lock_irqsave(>rgn_lock, flags);
+   rgn->reads++;
+   if (rgn->reads == ACTIVATION_THRESHOLD)
+   activate = true;
+   spin_unlock_irqrestore(>rgn_lock, flags);
+   if (activate) {
+   spin_lock_irqsave(>rsp_list_lock, flags);
+   ufshpb_update_active_info(hpb, rgn_idx, srgn_idx);
+   hpb->stats.rb_active_cnt++;
+   spin_unlock_irqrestore(>rsp_list_lock, flags);
+   dev_dbg(>sdev_ufs_lu->sdev_dev,
+   "activate region %d-%d\n", rgn_idx, srgn_idx);
+   }
+
+   /* keep those counters normalized */
+   if (rgn->reads > hpb->entries_per_srgn)
+   schedule_work(>ufshpb_normalization_work);
+   }
+
spin_lock_irqsave(>rgn_state_lock, flags);
if (ufshpb_test_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
   transfer_len)) {
@@ -745,21 +794,6 @@ static int ufshpb_clear_dirty_bitmap(struct 
ufshpb_lu *hpb,

return 0;
 }

-static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int 
rgn_idx,

- int srgn_idx)
-{
-   struct ufshpb_region *rgn;
-   struct ufshpb_subregion *srgn;
-
-   rgn = hpb->rgn_tbl + rgn_idx;
-   srgn = rgn->srgn_tbl + srgn_idx;
-
-   list_del_init(>list_inact_rgn);
-
-   if (list_empty(>list_act_srgn))
-   list_add_tail(>list_act_srgn, >lh_act_srgn);
-}
-
 static void ufshpb_update_inactive_info(struct ufshpb_lu *hpb, int 
rgn_idx)

 {
struct ufshpb_region *rgn;
@@ -1079,6 +1113,14 @@ static void __ufshpb_evict_region(struct 
ufshpb_lu *hpb,


ufshpb_cleanup_lru_info(lru_info, rgn);

+   if (hpb->is_hcm) {
+   unsigned long flags;
+
+   

Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached

2021-03-10 Thread Thomas Zimmermann

Hi

Am 10.03.21 um 20:02 schrieb Paul Cercueil:

Hi Thomas,

Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann  a 
écrit :

Hi Paul,

having individual functions for each mode only makes sense if the 
decision is at compile time. But in patch 5, you're working around 
your earlier design by introducing in-driver helpers that select the 
correct CMA function.


In SHMEM helpers we have the flag map_wc in the GEM structure that 
selects the pages caching mode (wc vs uncached). I think CMA should 


Re-reading this, it should rather be WC and cached.

use this design as well. Have a map_noncoherent flag in the CMA GEM 
object and set it from the driver's implementation of gem_create_object.


Is that a new addition? That severely reduces the patchset size, which 
is perfect.


It was added a few months ago, around the time you send the first 
version of the patches at hand.


Originally, SHMEM uses write combining by default. And several drivers 
used default mapping flags instead (so usually cached). IIRC I 
streamlined everything and flipped the default. Most drivers can use 
cached mappings and only some require WC.


Recently there was also a patchset that added support for cached video 
RAM (or something like that). So at some point we could store these 
flags in drm_gem_object. For now, I'd just put a flag into 
drm_gem_cma_object.




I'll send a V3 then.


Great!

Best regards
Thomas



Cheers,
-Paul

And in the long run, we could try to consolidate all drivers/helpers 
mapping flags in struct drm_gem_object.


Best regards
Thomas

Am 07.03.21 um 21:28 schrieb Paul Cercueil:

Rework of my previous patchset which added support for GEM buffers
backed by non-coherent memory to the ingenic-drm driver.

Having GEM buffers backed by non-coherent memory is interesting in
the particular case where it is faster to render to a non-coherent
buffer then sync the data cache, than to render to a write-combine
buffer, and (by extension) much faster than using a shadow buffer.
This is true for instance on some Ingenic SoCs, where even simple
blits (e.g. memcpy) are about three times faster using this method.

For the record, the previous patchset was accepted for 5.10 then had
to be reverted, as it conflicted with some changes made to the DMA API.

This new patchset is pretty different as it adds the functionality to
the DRM core. The first three patches add variants to existing functions
but with the "non-coherent memory" twist, exported as GPL symbols. The
fourth patch adds a function to be used with the damage helpers.
Finally, the last patch adds support for non-coherent GEM buffers to the
ingenic-drm driver. The functionality is enabled through a module
parameter, and is disabled by default.

Cheers,
-Paul

Paul Cercueil (5):
   drm: Add and export function drm_gem_cma_create_noncoherent
   drm: Add and export function drm_gem_cma_dumb_create_noncoherent
   drm: Add and export function drm_gem_cma_mmap_noncoherent
   drm: Add and export function drm_gem_cma_sync_data
   drm/ingenic: Add option to alloc cached GEM buffers

  drivers/gpu/drm/drm_gem_cma_helper.c  | 223 +++---
  drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 -
  drivers/gpu/drm/ingenic/ingenic-drm.h |   4 +
  drivers/gpu/drm/ingenic/ingenic-ipu.c |  14 +-
  include/drm/drm_gem_cma_helper.h  |  13 ++
  5 files changed, 273 insertions(+), 30 deletions(-)



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer






--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature


Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes

2021-03-10 Thread Anshuman Khandual


On 3/8/21 2:25 PM, Mike Rapoport wrote:
> Hi Anshuman,
> 
> On Mon, Mar 08, 2021 at 08:57:53AM +0530, Anshuman Khandual wrote:
>> Platforms like arm and arm64 have redefined pfn_valid() because their early
>> memory sections might have contained memmap holes caused by memblock areas
>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating a pfn
>> for struct page backing. This scenario could be captured with a new option
>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() can be
>> improved to accommodate such platforms. This reduces overall code footprint
>> and also improves maintainability.
> 
> I wonder whether arm64 would still need to free parts of its memmap after

free_unused_memmap() is applicable when CONFIG_SPARSEMEM_VMEMMAP is not enabled.
I am not sure whether there still might be some platforms or boards which would
benefit from this. Hence lets just keep this unchanged for now.

> the section size was reduced. Maybe the pain of arm64::pfn_valid() is not
> worth the memory savings anymore?

arm64 pfn_valid() special case was primarily because of MEMBLOCK_NOMAP tagged
memory areas, which are reserved by the firmware.

> 
>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generic mm")
>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), which in
>> turn had expanded its scope to new platforms like arc and m68k. Rather lets
>> restrict back the scope for free_unused_memmap() to arm and arm64 platforms
>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP.
> 
> The whole point of 4f5b0c178996 was to let arc and m68k to free unused
> memory map with FLATMEM so they won't need DISCONTIGMEM or SPARSEMEM. So
> whatever implementation there will be for arm/arm64, please keep arc and
> m68k functionally intact.

Okay. Will protect free_unused_memmap() on HAVE_EARLY_SECTION_MEMMAP_HOLES
config as well.

diff --git a/mm/memblock.c b/mm/memblock.c
index d9fa2e62ab7a..11b624e94127 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1927,8 +1927,11 @@ static void __init free_unused_memmap(void)
unsigned long start, end, prev_end = 0;
int i;
 
-   if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) ||
-   IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+   return;
+
+   if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES) &&
+   !IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID))
return;


[question] Panic in dax_writeback_one

2021-03-10 Thread chenjun (AM)
Hi

I write a driver to simulate memory as a block device (like a ramdisk). 
and I hope the memory used would not be observed by kernel, becasue of 
the struct page will take many memory.

When I mount ext2 with dax on my ramdisk. Panic will happen when fsync.
Call trace:
  dax_writeback_one+0x330/0x3e4
  dax_writeback_mapping_range+0x15c/0x318
  ext2_dax_writepages+0x38/0x44


static int dax_writeback_one(struct xa_state *xas, struct dax_device 
*dax_dev, struct address_space *mapping, void *entry)
dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
The pfn is returned by the driver. In my case, the pfn does not have
struct page. so pfn_to_page(pfn) return a wrong address.

I noticed the following changes may related to my problem:
1. Before 3fe0791c295cfd3cd735de7a32cc0780949c009f (dax: store pfns in 
the radix), the radix tree stroes sectors. address which would be 
flushed is got from driver by passing sector.
And the commit assume that all pfn have struct page.

2. CONFIG_FS_DAX_LIMITED (Selected by DCSSBLK [=n] && BLK_DEV [=y] && 
S390 && BLOCK [=y]) is added to avoid access struct page of pfn.

Does anyone have any idea about my problem.

-- 
Regards
Chen Jun


Re: [PATCH] cxl: don't manipulate the mm.mm_users field directly

2021-03-10 Thread Frederic Barrat




On 10/03/2021 18:44, Laurent Dufour wrote:

It is better to rely on the API provided by the MM layer instead of
directly manipulating the mm_users field.

Signed-off-by: Laurent Dufour 
---



Thanks!
Acked-by: Frederic Barrat 




  drivers/misc/cxl/fault.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 01153b74334a..60c829113299 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -200,7 +200,7 @@ static struct mm_struct *get_mem_context(struct cxl_context 
*ctx)
if (ctx->mm == NULL)
return NULL;
  
-	if (!atomic_inc_not_zero(>mm->mm_users))

+   if (!mmget_not_zero(ctx->mm))
return NULL;
  
  	return ctx->mm;




Re: [PATCH 4.19 00/39] 4.19.180-rc1 review

2021-03-10 Thread Naresh Kamboju
On Wed, 10 Mar 2021 at 18:56,  wrote:
>
> From: Greg Kroah-Hartman 
>
> This is the start of the stable review cycle for the 4.19.180 release.
> There are 39 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 12 Mar 2021 13:23:09 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.180-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

Summary


kernel: 4.19.180-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: fffeea4063954b09866c112dad21a991ba52a914
git describe: v4.19.179-40-gfffeea406395
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19.179-40-gfffeea406395

No regressions (compared to build v4.19.179)

No fixes (compared to build v4.19.179)


Ran 50816 total tests in the following environments and test suites.

Environments
--
- arm
- arm64
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- juno-r2-compat
- juno-r2-kasan
- mips
- nxp-ls2088
- nxp-ls2088-64k_page_size
- qemu-arm64-clang
- qemu-arm64-kasan
- qemu-x86_64-clang
- qemu-x86_64-kasan
- qemu_arm
- qemu_arm64
- qemu_arm64-compat
- qemu_i386
- qemu_x86_64
- qemu_x86_64-compat
- s390
- sparc
- x15 - arm
- x86_64
- x86-kasan
- x86_64

Test Suites
---
* build
* linux-log-parser
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-kvm
* kselftest-lkdtm
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* v4l2-compliance
* fwts
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-lib
* kselftest-livepatch
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-zram
* libhugetlbfs
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-math-tests
* network-basic-tests
* perf
* kselftest-kexec
* kselftest-vm
* kselftest-x86
* ltp-open-posix-tests
* kvm-unit-tests
* rcutorture
* ssuite
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-

-- 
Linaro LKFT
https://lkft.linaro.org


RE: [PATCH 2/2] iio:magnetometer: Support for ST magnetic sensors

2021-03-10 Thread LI Qingwu
Hi Andy,

Thanks a lot for your input, I just extend the existing driver in patch V2 for 
the sersor.



Best Regards

**
Li Qingwu (Terry)
Senior Embedded Software Engineer 
Leica Geosystems(Shanghai)Co.,Limited
(Tel): +86 21 61061036
(FAX): +86 21 61061008
(Mobile): +86 187 0185 9600
E-mail: qing-wu...@leica-geosystems.com.cn
Http: www.leica-geosystems.com.cn
**

-Original Message-
From: Andy Shevchenko  
Sent: Tuesday, March 9, 2021 7:01 PM
To: Jonathan Cameron 
Cc: LI Qingwu ; Lars-Peter Clausen 
; Peter Meerwald ; Rob Herring 
; Denis Ciocca ; linux-iio 
; devicetree ; Linux 
Kernel Mailing List ; TERTYCHNYI Grygorii 
; ZHIZHIKIN Andrey 
; Lorenzo Bianconi 

Subject: Re: [PATCH 2/2] iio:magnetometer: Support for ST magnetic sensors

This email is not from Hexagon’s Office 365 instance. Please be careful while 
clicking links, opening attachments, or replying to this email.


On Sat, Mar 6, 2021 at 6:44 PM Jonathan Cameron  wrote:
> On Fri,  5 Mar 2021 07:05:36 +
> LI Qingwu  wrote:
>
> > Add support for STMicroelectronics digital magnetic sensors, 
> > LSM303AH,LSM303AGR,LIS2MDL,ISM303DAC,IIS2MDC.
> >
> > The patch tested with IIS2MDC instrument.
> >
> > Signed-off-by: LI Qingwu 
>
> Hi,
>
> Given that at least two parts in here is supported by the existing 
> driver in iio/magnetometers/st_magn_*.c (lsm303agr) can you confirm 
> that it doesn't make sense to simply extend that driver to support the 
> other parts?  This is particularly true when the WHO AM I register 
> reads 0x40 for all these parts.
>
> I've done a fairly superficial review whilst here, but please check 
> you can't just add the relevant entries to the existing driver.

I even hadn't looked into the code because this one needs a very good 
justification why it's a new driver rather than extension of the existing one.

--
With Best Regards,
Andy Shevchenko


Re: Errant readings on LM81 with T2080 SoC

2021-03-10 Thread Guenter Roeck
On 3/10/21 1:48 PM, Chris Packham wrote:
> 
> On 10/03/21 6:06 pm, Guenter Roeck wrote:
>> On 3/9/21 6:19 PM, Chris Packham wrote:
>>> On 9/03/21 9:27 am, Chris Packham wrote:
 On 8/03/21 5:59 pm, Guenter Roeck wrote:
> Other than that, the only other real idea I have would be to monitor
> the i2c bus.
 I am in the fortunate position of being able to go into the office and
 even happen to have the expensive scope at the moment. Now I just need
 to find a tame HW engineer so I don't burn myself trying to attach the
 probes.
>>> One thing I see on the scope is that when there is a CPU load there
>>> appears to be some clock stretching going on (SCL is held low some
>>> times). I don't see it without the CPU load. It's hard to correlate a
>>> clock stretching event with a bad read or error but it is one area where
>>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>>
>> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
>> that it is possible that the hot loops at the beginning and end of
>> each operation mess up the driver and cause it to sleep longer
>> than intended. Did you try usleep_range() ?
> 
> I've been running with and without CONFIG_PREEMPT. The failures happen 
> with both.
> 
> I did try usleep_range() and still saw failures.
> 

Bummer. What is really weird is that you see clock stretching under
CPU load. Normally clock stretching is triggered by the device, not
by the host. I wonder if there are some timing differences before
the clock stretching happens.

Anyway, I just sent a set of three patches to the list; maybe you
can give it a try. The patches convert the driver to the with_info
API and drop local caching.

The code is module tested with the register dumps I have available
for adm9240 and lm81, but it would be great to get test coverage
on real hardware. I don't really expect it to solve your problem,
but it does reduce and modify the load on the chip (because
registers are no longer read in bursts), so it may have some
positive impact.

>> On a side note, can you send me a register dump for the lm81 ?
>> It would be useful for my module test code.
> 
> Here you go this is from a largely unconfigured LM81
> 

Thanks, that helped a lot!

Guenter


Re: [PATCH RESEND][next] rtl8xxxu: Fix fall-through warnings for Clang

2021-03-10 Thread Gustavo A. R. Silva



On 3/11/21 01:00, Kalle Valo wrote:
> Kees Cook  writes:
> 
>> On Fri, Mar 05, 2021 at 03:40:33PM +0200, Kalle Valo wrote:
>>> "Gustavo A. R. Silva"  writes:
>>>
 In preparation to enable -Wimplicit-fallthrough for Clang, fix
 multiple warnings by replacing /* fall through */ comments with
 the new pseudo-keyword macro fallthrough; instead of letting the
 code fall through to the next case.

 Notice that Clang doesn't recognize /* fall through */ comments as
 implicit fall-through markings.

 Link: https://github.com/KSPP/linux/issues/115
 Signed-off-by: Gustavo A. R. Silva 
>>>
>>> It's not cool that you ignore the comments you got in [1], then after a
>>> while mark the patch as "RESEND" and not even include a changelog why it
>>> was resent.
>>>
>>> [1] 
>>> https://patchwork.kernel.org/project/linux-wireless/patch/d522f387b2d0dde774785c7169c1f25aa529989d.1605896060.git.gustavo...@kernel.org/
>>
>> Hm, this conversation looks like a miscommunication, mainly? I see
>> Gustavo, as requested by many others[1], replacing the fallthrough
>> comments with the "fallthrough" statement. (This is more than just a
>> "Clang doesn't parse comments" issue.)
> 
> v1 was clearly rejected by Jes, so sending a new version without any
> changelog or comments is not the way to go. The changelog shoud at least
> have had "v1 was rejected but I'm resending this again because  reason here>" or something like that to make it clear what's happening.

Why the fact that I replied to that original thread with the message
below is being ignored?

"Just notice that the idea behind this and the following patch is exactly
the same:

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=3f95e92c8a8516b745594049dfccc8c5f8895eea

I could resend this same patch with a different changelog text, but I
don't think such a thing is necessary. However, if people prefer that
approach, just let me know and I can do it.

Thanks
--
Gustavo"

Why no one replied to what I was proposing at the time?

It seems to me that the person that was ignored was actually me, and not the
other way around. :/

--
Gustavo

> 
>> This could be a tree-wide patch and not bother you, but Greg KH has
>> generally advised us to send these changes broken out. Anyway, this
>> change still needs to land, so what would be the preferred path? I think
>> Gustavo could just carry it for Linus to merge without bothering you if
>> that'd be preferred?
> 
> I agree with Greg. Please don't do cleanups like this via another tree
> as that just creates more work due to conflicts between the trees, which
> is a lot more annoying to deal with than applying few patches. But when
> submitting patches please follow the rules, don't cut corners.
> 
> Jes, I don't like 'fallthrough' either and prefer the original comment,
> but the ship has sailed on this one. Maybe we should just take it?
> 


module refcount issues in the liquidio driver

2021-03-10 Thread Christoph Hellwig
Hi all,

I just stumbled over the odd handling of module refcounts in the liquidio
driver.  The big red flag is the call to module_refcount in
liquidio_watchdog, which will do the wrong thing for any external module
refcount, like a userspace open.

But more importantly the whole concept of acquiring module refcounts from
inside the driver is pretty bogus.  What problem does this try to solve?




RE: [PATCH v3 1/3] scsi: ufshcd: use a function to calculate versions

2021-03-10 Thread Avri Altman
> Hi Avri,
> 
> On 10/03/2021 4:34 pm, Avri Altman wrote:
> >> @@ -9298,10 +9291,7 @@ int ufshcd_init(struct ufs_hba *hba, void
> __iomem
> >> *mmio_base, unsigned int irq)
> >>  /* Get UFS version supported by the controller */
> >>  hba->ufs_version = ufshcd_get_ufs_version(hba);
> >>
> >> -   if ((hba->ufs_version != UFSHCI_VERSION_10) &&
> >> -   (hba->ufs_version != UFSHCI_VERSION_11) &&
> >> -   (hba->ufs_version != UFSHCI_VERSION_20) &&
> >> -   (hba->ufs_version != UFSHCI_VERSION_21))
> >> +   if (hba->ufs_version < ufshci_version(1, 0))
> >>  dev_err(hba->dev, "invalid UFS version 0x%x\n",
> >>  hba->ufs_version);
> > Here you replaces the specific allowable values, with an expression
> > That doesn't really reflects those values.
> 
> I took this approach based on feedback from previous patches:
> 
> https://lore.kernel.org/linux-
> scsi/d1b23943b6b3ae6c1f6af041cc592...@codeaurora.org/
> 
> https://lkml.org/lkml/2020/4/25/159
> 
> 
> Patch 3 of this series removes this check entirely, as it is neither
> accurate or useful.
I noticed that.

> 
> The driver does not fail when printing this error, nor is the list of
> "valid" UFS versions here kept up to date, I struggle to see a situation
> in which that error message would actually be helpful. Responses to
> previous patches (above) that added UFS 3.0 to the list have all
> suggested that removing this check is a more sensible approach.
OK.

Thanks,
Avri


Re: [PATCH 5.4 00/24] 5.4.105-rc1 review

2021-03-10 Thread Naresh Kamboju
On Wed, 10 Mar 2021 at 18:55,  wrote:
>
> From: Greg Kroah-Hartman 
>
> This is the start of the stable review cycle for the 5.4.105 release.
> There are 24 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 12 Mar 2021 13:23:09 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.105-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h



Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

Summary


kernel: 5.4.105-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.4.y
git commit: 62f8f08c9d2fbc6c5692d90f64dd70e3a8edd986
git describe: v5.4.104-25-g62f8f08c9d2f
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.104-25-g62f8f08c9d2f

No regressions (compared to build v5.4.104)

No fixes (compared to build v5.4.104)

Ran 53992 total tests in the following environments and test suites.

Environments
--
- arc
- arm
- arm64
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- juno-r2-compat
- juno-r2-kasan
- mips
- nxp-ls2088
- nxp-ls2088-64k_page_size
- parisc
- powerpc
- qemu-arm-clang
- qemu-arm64-clang
- qemu-arm64-kasan
- qemu-x86_64-clang
- qemu-x86_64-kasan
- qemu-x86_64-kcsan
- qemu_arm
- qemu_arm64
- qemu_arm64-compat
- qemu_i386
- qemu_x86_64
- qemu_x86_64-compat
- riscv
- s390
- sh
- sparc
- x15
- x86
- x86-kasan
- x86_64

Test Suites
---
* build
* linux-log-parser
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kvm
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fs-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* perf
* fwts
* kselftest-lib
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-zram
* libhugetlbfs
* ltp-commands-tests
* ltp-controllers-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-ipc-tests
* ltp-math-tests
* network-basic-tests
* v4l2-compliance
* kselftest-kexec
* kselftest-vm
* kselftest-x86
* ltp-open-posix-tests
* kvm-unit-tests
* rcutorture
* kselftest-vsyscall-mode-native-
* ssuite

-- 
Linaro LKFT
https://lkft.linaro.org


Re: [PATCH V3 1/6] vDPA/ifcvf: get_vendor_id returns a device specific vendor id

2021-03-10 Thread Zhu, Lingshan




On 3/11/2021 2:13 PM, Jason Wang wrote:


On 2021/3/11 12:21 下午, Zhu Lingshan wrote:



On 3/11/2021 11:23 AM, Jason Wang wrote:


On 2021/3/10 5:00 下午, Zhu Lingshan wrote:

In this commit, ifcvf_get_vendor_id() will return
a device specific vendor id of the probed pci device
than a hard code.

Signed-off-by: Zhu Lingshan 
---
  drivers/vdpa/ifcvf/ifcvf_main.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
b/drivers/vdpa/ifcvf/ifcvf_main.c

index fa1af301cf55..e501ee07de17 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -324,7 +324,10 @@ static u32 ifcvf_vdpa_get_device_id(struct 
vdpa_device *vdpa_dev)

    static u32 ifcvf_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)
  {
-    return IFCVF_SUBSYS_VENDOR_ID;
+    struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
+    struct pci_dev *pdev = adapter->pdev;
+
+    return pdev->subsystem_vendor;
  }



While at this, I wonder if we can do something similar in 
get_device_id() if it could be simple deduced from some simple math 
from the pci device id?


Thanks

Hi Jason,

IMHO, this implementation is just some memory read ops, I think other 
implementations may not save many cpu cycles, an if cost at least 
three cpu cycles.


Thanks!



Well, I meant whehter you can deduce virtio device id from 
pdev->subsystem_device.


Thanks

Oh, sure, I get you







    static u32 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev)










Re: [External] Re: [PATCH v18 9/9] mm: hugetlb: optimize the code with the help of the compiler

2021-03-10 Thread Muchun Song
On Wed, Mar 10, 2021 at 11:41 PM Michal Hocko  wrote:
>
> On Mon 08-03-21 18:28:07, Muchun Song wrote:
> > When the "struct page size" crosses page boundaries we cannot
> > make use of this feature. Let free_vmemmap_pages_per_hpage()
> > return zero if that is the case, most of the functions can be
> > optimized away.
>
> I am confused. Don't you check for this in early_hugetlb_free_vmemmap_param 
> already?

Right.

> Why do we need any runtime checks?

If the size of the struct page is not power of 2, compiler can think
is_hugetlb_free_vmemmap_enabled() always return false. So
the code snippet of this user can be optimized away.

E.g.

if (is_hugetlb_free_vmemmap_enabled())
/* do something */

The compiler can drop "/* do something */" directly, because
it knows is_hugetlb_free_vmemmap_enabled() always returns
false.

Thanks.

>
> > Signed-off-by: Muchun Song 
> > Reviewed-by: Miaohe Lin 
> > Reviewed-by: Oscar Salvador 
> > Tested-by: Chen Huang 
> > Tested-by: Bodeddula Balasubramaniam 
> > ---
> >  include/linux/hugetlb.h | 3 ++-
> >  mm/hugetlb_vmemmap.c| 7 +++
> >  mm/hugetlb_vmemmap.h| 6 ++
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index c70421e26189..333dd0479fc2 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -880,7 +880,8 @@ extern bool hugetlb_free_vmemmap_enabled;
> >
> >  static inline bool is_hugetlb_free_vmemmap_enabled(void)
> >  {
> > - return hugetlb_free_vmemmap_enabled;
> > + return hugetlb_free_vmemmap_enabled &&
> > +is_power_of_2(sizeof(struct page));
> >  }
> >  #else
> >  static inline bool is_hugetlb_free_vmemmap_enabled(void)
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index 33e42678abe3..1ba1ef45c48c 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -265,6 +265,13 @@ void __init hugetlb_vmemmap_init(struct hstate *h)
> >   BUILD_BUG_ON(__NR_USED_SUBPAGE >=
> >RESERVE_VMEMMAP_SIZE / sizeof(struct page));
> >
> > + /*
> > +  * The compiler can help us to optimize this function to null
> > +  * when the size of the struct page is not power of 2.
> > +  */
> > + if (!is_power_of_2(sizeof(struct page)))
> > + return;
> > +
> >   if (!hugetlb_free_vmemmap_enabled)
> >   return;
> >
> > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> > index cb2bef8f9e73..29aaaf7b741e 100644
> > --- a/mm/hugetlb_vmemmap.h
> > +++ b/mm/hugetlb_vmemmap.h
> > @@ -21,6 +21,12 @@ void hugetlb_vmemmap_init(struct hstate *h);
> >   */
> >  static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h)
> >  {
> > + /*
> > +  * This check aims to let the compiler help us optimize the code as
> > +  * much as possible.
> > +  */
> > + if (!is_power_of_2(sizeof(struct page)))
> > + return 0;
> >   return h->nr_free_vmemmap_pages;
> >  }
> >  #else
> > --
> > 2.11.0
> >
>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH v1 1/3] binder: BINDER_FREEZE ioctl

2021-03-10 Thread Greg KH
On Wed, Mar 10, 2021 at 02:52:49PM -0800, Li Li wrote:
>   if (target_proc) {
>   binder_inner_proc_lock(target_proc);
> + target_proc->outstanding_txns--;
> + WARN_ON(target_proc->outstanding_txns < 0);

WARN_* is a huge crutch, please just handle stuff like this properly and
if you really need to, warn userspace (but what can they do about it?)

You also just rebooted all systems that have panic-on-warn set, so if
this can be triggered by userspace, you caused a DoS of things :(

So please remove all of the WARN_ON() you add in this patch series to
properly handle the error conditions and deal with them correctly.

And if these were here just for debugging, hopefully the code works
properly now and you do not need debugging anymore so they can all just
be dropped.

thanks,

greg k-h


[PATCH 0/3] Add RTC support for PMIC PMK8350

2021-03-10 Thread satya priya
Convert pm8xxx bindings to yaml and add pmk8350 rtc binding.

satya priya (3):
  rtc: pm8xxx: Add RTC support for PMIC PMK8350
  dt-bindings: mfd: Convert pm8xxx bindings to yaml
  dt-bindings: mfd: Add compatible for pmk8350 rtc

 .../devicetree/bindings/mfd/qcom-pm8xxx.txt|  99 ---
 .../devicetree/bindings/mfd/qcom-pm8xxx.yaml   | 109 +
 drivers/rtc/rtc-pm8xxx.c   |  11 +++
 3 files changed, 120 insertions(+), 99 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH 1/3] rtc: pm8xxx: Add RTC support for PMIC PMK8350

2021-03-10 Thread satya priya
Add the comaptible string for PMIC PMK8350.

Signed-off-by: satya priya 
---
 drivers/rtc/rtc-pm8xxx.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/rtc/rtc-pm8xxx.c b/drivers/rtc/rtc-pm8xxx.c
index eb20659..29a1c65 100644
--- a/drivers/rtc/rtc-pm8xxx.c
+++ b/drivers/rtc/rtc-pm8xxx.c
@@ -445,6 +445,16 @@ static const struct pm8xxx_rtc_regs pm8941_regs = {
.alarm_en   = BIT(7),
 };
 
+static const struct pm8xxx_rtc_regs pmk8350_regs = {
+   .ctrl   = 0x6146,
+   .write  = 0x6140,
+   .read   = 0x6148,
+   .alarm_rw   = 0x6240,
+   .alarm_ctrl = 0x6246,
+   .alarm_ctrl2= 0x6248,
+   .alarm_en   = BIT(7),
+};
+
 /*
  * Hardcoded RTC bases until IORESOURCE_REG mapping is figured out
  */
@@ -453,6 +463,7 @@ static const struct of_device_id pm8xxx_id_table[] = {
{ .compatible = "qcom,pm8018-rtc", .data = _regs },
{ .compatible = "qcom,pm8058-rtc", .data = _regs },
{ .compatible = "qcom,pm8941-rtc", .data = _regs },
+   { .compatible = "qcom,pmk8350-rtc", .data = _regs },
{ },
 };
 MODULE_DEVICE_TABLE(of, pm8xxx_id_table);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH 2/3] dt-bindings: mfd: Convert pm8xxx bindings to yaml

2021-03-10 Thread satya priya
Convert pm8xxx rtc bindings from .txt to .yaml format.

Signed-off-by: satya priya 
---
 .../devicetree/bindings/mfd/qcom-pm8xxx.txt|  99 ---
 .../devicetree/bindings/mfd/qcom-pm8xxx.yaml   | 108 +
 2 files changed, 108 insertions(+), 99 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml

diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt 
b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt
deleted file mode 100644
index 9e5eba4..000
--- a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-Qualcomm PM8xxx PMIC multi-function devices
-
-The PM8xxx family of Power Management ICs are used to provide regulated
-voltages and other various functionality to Qualcomm SoCs.
-
-= PROPERTIES
-
-- compatible:
-   Usage: required
-   Value type: 
-   Definition: must be one of:
-   "qcom,pm8058"
-   "qcom,pm8821"
-   "qcom,pm8921"
-
-- #address-cells:
-   Usage: required
-   Value type: 
-   Definition: must be 1
-
-- #size-cells:
-   Usage: required
-   Value type: 
-   Definition: must be 0
-
-- interrupts:
-   Usage: required
-   Value type: 
-   Definition: specifies the interrupt that indicates a subdevice
-   has generated an interrupt (summary interrupt). The
-   format of the specifier is defined by the binding document
-   describing the node's interrupt parent.
-
-- #interrupt-cells:
-   Usage: required
-   Value type : 
-   Definition: must be 2. Specifies the number of cells needed to encode
-   an interrupt source. The 1st cell contains the interrupt
-   number. The 2nd cell is the trigger type and level flags
-   encoded as follows:
-
-   1 = low-to-high edge triggered
-   2 = high-to-low edge triggered
-   4 = active high level-sensitive
-   8 = active low level-sensitive
-
-- interrupt-controller:
-   Usage: required
-   Value type: 
-   Definition: identifies this node as an interrupt controller
-
-= SUBCOMPONENTS
-
-The PMIC contains multiple independent functions, each described in a subnode.
-The below bindings specify the set of valid subnodes.
-
-== Real-Time Clock
-
-- compatible:
-   Usage: required
-   Value type: 
-   Definition: must be one of:
-   "qcom,pm8058-rtc"
-   "qcom,pm8921-rtc"
-   "qcom,pm8941-rtc"
-   "qcom,pm8018-rtc"
-
-- reg:
-   Usage: required
-   Value type: 
-   Definition: single entry specifying the base address of the RTC 
registers
-
-- interrupts:
-   Usage: required
-   Value type: 
-   Definition: single entry specifying the RTC's alarm interrupt
-
-- allow-set-time:
-   Usage: optional
-   Value type: 
-   Definition: indicates that the setting of RTC time is allowed by
-   the host CPU
-
-= EXAMPLE
-
-   pmicintc: pmic@0 {
-   compatible = "qcom,pm8921";
-   interrupts = <104 8>;
-   #interrupt-cells = <2>;
-   interrupt-controller;
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   rtc@11d {
-   compatible = "qcom,pm8921-rtc";
-   reg = <0x11d>;
-   interrupts = <0x27 0>;
-   };
-   };
diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml 
b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml
new file mode 100644
index 000..b4892f1
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml
@@ -0,0 +1,108 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/qcom-pm8xxx.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm PM8xxx PMIC multi-function devices
+
+maintainers:
+  - Lee Jones 
+
+description: |
+  The PM8xxx family of Power Management ICs are used to provide regulated
+  voltages and other various functionality to Qualcomm SoCs.
+
+properties:
+  compatible:
+enum:
+  - qcom,pm8058
+  - qcom,pm8821
+  - qcom,pm8921
+
+  reg:
+maxItems: 1
+
+  '#address-cells':
+ const: 1
+
+  '#size-cells':
+ const: 0
+
+  interrupts:
+description: |
+  Specifies the interrupt that indicates a subdevice has generated an
+  interrupt (summary interrupt).
+
+  '#interrupt-cells':
+description: |
+  Specifies the number of cells needed to encode an interrupt source.
+  The 1st cell contains the interrupt number. The 2nd cell is the
+  trigger type.
+const: 2
+
+  interrupt-controller: true
+

[PATCH 3/3] dt-bindings: mfd: Add compatible for pmk8350 rtc

2021-03-10 Thread satya priya
Add compatible string for pmk8350 rtc support.

Signed-off-by: satya priya 
---
 Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml 
b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml
index b4892f1..676decc 100644
--- a/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml
+++ b/Documentation/devicetree/bindings/mfd/qcom-pm8xxx.yaml
@@ -53,6 +53,7 @@ patternProperties:
   - qcom,pm8921-rtc
   - qcom,pm8941-rtc
   - qcom,pm8018-rtc
+  - qcom,pmk8350-rtc
 
   reg:
 description: Specifies the base address of the RTC registers
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



Re: [RFC v2 5/5] clk: socfpga: allow compile testing of Stratix 10 / Agilex clocks

2021-03-10 Thread Krzysztof Kozlowski
On 10/03/2021 17:48, Arnd Bergmann wrote:
> On Wed, Mar 10, 2021 at 9:38 AM Krzysztof Kozlowski
>  wrote:
>> --- a/drivers/clk/socfpga/Kconfig
>> +++ b/drivers/clk/socfpga/Kconfig
>> @@ -1,6 +1,17 @@
>>  # SPDX-License-Identifier: GPL-2.0
>> +config COMMON_CLK_SOCFPGA
>> +   bool "Intel SoCFPGA family clock support" if COMPILE_TEST && 
>> !ARCH_SOCFPGA && !ARCH_SOCFPGA64
>> +   depends on ARCH_SOCFPGA || ARCH_SOCFPGA64 || COMPILE_TEST
>> +   default y if ARCH_SOCFPGA || ARCH_SOCFPGA64
> 
> I think the 'depends on' line here is redundant if you also have the
> 'if' line and the default.

Yes, you're right.
Best regards,
Krzysztof


Re: [PATCH v9 6/8] media: imx-jpeg: Add V4L2 driver for i.MX8 JPEG Encoder/Decoder

2021-03-10 Thread Hans Verkuil
Hi Mirela,

On 11/03/2021 01:28, Mirela Rabulea (OSS) wrote:



> +static const struct of_device_id mxc_jpeg_match[] = {
> + {
> + .compatible = "nxp,imx8qxp-jpgdec",
> + .data   = (void *)MXC_JPEG_DECODE,

Don't do this, just say:

static const int mxc_decode_mode = MXC_JPEG_DECODE;
static const int mxc_encode_mode = MXC_JPEG_ENCODE;

and point to that:

.data = _decode_mode;

> + },
> + {
> + .compatible = "nxp,imx8qxp-jpgenc",
> + .data   = (void *)MXC_JPEG_ENCODE,

.data = _encode_mode;

> + },
> + { },
> +};



> +static int mxc_jpeg_probe(struct platform_device *pdev)
> +{
> + struct mxc_jpeg_dev *jpeg;
> + struct device *dev = >dev;
> + struct resource *res;
> + int dec_irq;
> + int ret;
> + int mode;
> + const struct of_device_id *of_id;
> + unsigned int slot;
> +
> + of_id = of_match_node(mxc_jpeg_match, dev->of_node);
> + mode = (int)(u64)of_id->data;

and this becomes:

mode = *(const int *)of_id->data;

This will solve the kernel test robot warning, and for that matter
the same gcc warnings I get when I compile.

Just post a v9.1 for this patch, everything else looks good.

Regards,

Hans


[PATCH net] net: lapbether: Prevent racing when checking whether the netif is running

2021-03-10 Thread Xie He
There are two "netif_running" checks in this driver. One is in
"lapbeth_xmit" and the other is in "lapbeth_rcv". They serve to make
sure that the LAPB APIs called in these functions are called before
"lapb_unregister" is called by the "ndo_stop" function.

However, these "netif_running" checks are unreliable, because it's
possible that immediately after "netif_running" returns true, "ndo_stop"
is called (which causes "lapb_unregister" to be called).

This patch adds locking to make sure "lapbeth_xmit" and "lapbeth_rcv" can
reliably check and ensure the netif is running while doing their work.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xie He 
---
 drivers/net/wan/lapbether.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index c3372498f4f1..8fda0446ff71 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -51,6 +51,8 @@ struct lapbethdev {
struct list_headnode;
struct net_device   *ethdev;/* link to ethernet device */
struct net_device   *axdev; /* lapbeth device (lapb#) */
+   boolup;
+   spinlock_t  up_lock;/* Protects "up" */
 };
 
 static LIST_HEAD(lapbeth_devices);
@@ -101,8 +103,9 @@ static int lapbeth_rcv(struct sk_buff *skb, struct 
net_device *dev, struct packe
rcu_read_lock();
lapbeth = lapbeth_get_x25_dev(dev);
if (!lapbeth)
-   goto drop_unlock;
-   if (!netif_running(lapbeth->axdev))
+   goto drop_unlock_rcu;
+   spin_lock_bh(>up_lock);
+   if (!lapbeth->up)
goto drop_unlock;
 
len = skb->data[0] + skb->data[1] * 256;
@@ -117,11 +120,14 @@ static int lapbeth_rcv(struct sk_buff *skb, struct 
net_device *dev, struct packe
goto drop_unlock;
}
 out:
+   spin_unlock_bh(>up_lock);
rcu_read_unlock();
return 0;
 drop_unlock:
kfree_skb(skb);
goto out;
+drop_unlock_rcu:
+   rcu_read_unlock();
 drop:
kfree_skb(skb);
return 0;
@@ -151,13 +157,11 @@ static int lapbeth_data_indication(struct net_device 
*dev, struct sk_buff *skb)
 static netdev_tx_t lapbeth_xmit(struct sk_buff *skb,
  struct net_device *dev)
 {
+   struct lapbethdev *lapbeth = netdev_priv(dev);
int err;
 
-   /*
-* Just to be *really* sure not to send anything if the interface
-* is down, the ethernet device may have gone.
-*/
-   if (!netif_running(dev))
+   spin_lock_bh(>up_lock);
+   if (!lapbeth->up)
goto drop;
 
/* There should be a pseudo header of 1 byte added by upper layers.
@@ -194,6 +198,7 @@ static netdev_tx_t lapbeth_xmit(struct sk_buff *skb,
goto drop;
}
 out:
+   spin_unlock_bh(>up_lock);
return NETDEV_TX_OK;
 drop:
kfree_skb(skb);
@@ -285,6 +290,7 @@ static const struct lapb_register_struct lapbeth_callbacks 
= {
  */
 static int lapbeth_open(struct net_device *dev)
 {
+   struct lapbethdev *lapbeth = netdev_priv(dev);
int err;
 
if ((err = lapb_register(dev, _callbacks)) != LAPB_OK) {
@@ -292,13 +298,22 @@ static int lapbeth_open(struct net_device *dev)
return -ENODEV;
}
 
+   spin_lock_bh(>up_lock);
+   lapbeth->up = true;
+   spin_unlock_bh(>up_lock);
+
return 0;
 }
 
 static int lapbeth_close(struct net_device *dev)
 {
+   struct lapbethdev *lapbeth = netdev_priv(dev);
int err;
 
+   spin_lock_bh(>up_lock);
+   lapbeth->up = false;
+   spin_unlock_bh(>up_lock);
+
if ((err = lapb_unregister(dev)) != LAPB_OK)
pr_err("lapb_unregister error: %d\n", err);
 
@@ -356,6 +371,9 @@ static int lapbeth_new_device(struct net_device *dev)
dev_hold(dev);
lapbeth->ethdev = dev;
 
+   lapbeth->up = false;
+   spin_lock_init(>up_lock);
+
rc = -EIO;
if (register_netdevice(ndev))
goto fail;
-- 
2.27.0



Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp

2021-03-10 Thread Hugh Dickins
On Tue, 9 Feb 2021, Naoya Horiguchi wrote:

> From: Naoya Horiguchi 
> 
> Currently hwpoison code checks PageAnon() for thp and refuses to handle
> errors on non-anonymous thps (just for historical reason).  We now
> support non-anonymou thp like shmem one, so this patch suggests to enable
> to handle shmem thps. Fortunately, we already have can_split_huge_page()
> to check if a give thp is splittable, so this patch relies on it.

Fortunately? I don't understand. Why call can_split_huge_page()
at all, instead of simply trying split_huge_page() directly?
And could it do better than -EBUSY when split_huge_page() fails?

> 
> Signed-off-by: Naoya Horiguchi 

Thanks for trying to add shmem+file THP support, but I think this
does not work as intended - Andrew, if Naoya agrees, please drop from
mmotm for now, the fixup needed will be more than a line or two.

I'm not much into memory-failure myself, but Jue discovered that the
SIGBUS never arrives: because split_huge_page() on a shmem or file
THP unmaps all its pmds and ptes, and (unlike with anon) leaves them
unmapped - in normal circumstances, to be faulted back on demand.
So the page_mapped() check in hwpoison_user_mappings() fails,
and the intended SIGBUS is not delivered.

(Or, is it acceptable that the SIGBUS is not delivered to those who
have the huge page mapped: should it get delivered later, to anyone
who faults back in the bad 4k?)

We believe the tokill list has to be set up earlier, before
split_huge_page() is called, then passed in to hwpoison_user_mappings().

Sorry, we don't have a proper patch for that right now, but I expect
you can see what needs to be done.  But something we found on the way,
we do have a patch for: add_to_kill() uses page_address_in_vma(), but
that has not been used on file THP tails before - fix appended at the
end below, so as not to waste your time on that bit.

> ---
>  mm/memory-failure.c | 34 +-
>  1 file changed, 9 insertions(+), 25 deletions(-)
> 
> diff --git v5.11-rc6/mm/memory-failure.c v5.11-rc6_patched/mm/memory-failure.c
> index e9481632fcd1..8c1575de0507 100644
> --- v5.11-rc6/mm/memory-failure.c
> +++ v5.11-rc6_patched/mm/memory-failure.c
> @@ -956,20 +956,6 @@ static int __get_hwpoison_page(struct page *page)
>  {
>   struct page *head = compound_head(page);
>  
> - if (!PageHuge(head) && PageTransHuge(head)) {
> - /*
> -  * Non anonymous thp exists only in allocation/free time. We
> -  * can't handle such a case correctly, so let's give it up.
> -  * This should be better than triggering BUG_ON when kernel
> -  * tries to touch the "partially handled" page.
> -  */
> - if (!PageAnon(head)) {
> - pr_err("Memory failure: %#lx: non anonymous thp\n",
> - page_to_pfn(page));
> - return 0;
> - }
> - }
> -
>   if (get_page_unless_zero(head)) {
>   if (head == compound_head(page))
>   return 1;
> @@ -1197,21 +1183,19 @@ static int identify_page_state(unsigned long pfn, 
> struct page *p,
>  
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
> - lock_page(page);
> - if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> - unsigned long pfn = page_to_pfn(page);
> + struct page *head;
>  
> + lock_page(page);
> + head = compound_head(page);
> + if (PageTransHuge(head) && can_split_huge_page(head, NULL) &&
> + !split_huge_page(page)) {
>   unlock_page(page);
> - if (!PageAnon(page))
> - pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> - else
> - pr_info("%s: %#lx: thp split failed\n", msg, pfn);
> - put_page(page);
> - return -EBUSY;
> + return 0;
>   }
> + pr_info("%s: %#lx: thp split failed\n", msg, page_to_pfn(page));
>   unlock_page(page);
> -
> - return 0;
> + put_page(page);
> + return -EBUSY;
>  }
>  
>  static int memory_failure_hugetlb(unsigned long pfn, int flags)
> -- 
> 2.25.1

[PATCH] mm: fix page_address_in_vma() on file THP tails
From: Jue Wang 

Anon THP tails were already supported, but memory-failure now needs to use
page_address_in_vma() on file THP tails, which its page->mapping check did
not permit: fix it.

Signed-off-by: Jue Wang 
Signed-off-by: Hugh Dickins 
---

 mm/rmap.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- 5.12-rc2/mm/rmap.c  2021-02-28 16:58:57.950450151 -0800
+++ linux/mm/rmap.c 2021-03-10 20:29:21.591475177 -0800
@@ -717,11 +717,11 @@ unsigned long page_address_in_vma(struct
if (!vma->anon_vma || !page__anon_vma ||
vma->anon_vma->root != page__anon_vma->root)
return -EFAULT;
-   } else if (page->mapping) {
-   if 

Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start()

2021-03-10 Thread Sean Christopherson
On Wed, Mar 10, 2021, Jason Gunthorpe wrote:
> On Wed, Mar 10, 2021 at 05:20:01PM -0800, Sean Christopherson wrote:
> 
> > > Which I believe is fatal to kvm? These notifiers certainly do not only
> > > happen at process exit.
> > 
> > My point about the process dying is that the existing bug that causes
> > mmu_notifier_count to become imbalanced is benign only because the process 
> > is
> > being killed, and thus KVM will stop running its vCPUs.
> 
> Are you saying we only call non-blocking invalidate during a process
> exit event?? 

Yes?  __oom_reap_task_mm() is the only user of _nonblock(), if that's what 
you're
asking.

> > > So, both of the remaining _end users become corrupted with this patch!
> > 
> > I don't follow.  mn_hlist_invalidate_range_start() iterates over all
> > notifiers, even if a notifier earlier in the chain failed.  How will
> > KVM become imbalanced?
> 
> Er, ok, that got left in a weird way. There is another "bug" where end
> is not supposed to be called if the start failed.

Ha, the best kind of feature.  :-)

> > The existing _end users never fail their _start.  If KVM started failing its
> > start, then yes, it could get corrupted.  
> 
> Well, maybe that is the way out of this now. If we don't permit a
> start to fail if there is an end then we have no problem to unwind it
> as we can continue to call everything. This can't be backported too
> far though, the itree notifier conversions are what made the WARN_ON
> safe today.
> 
> Something very approximately like this is closer to my preference:

Makes sense.  I don't have a strong preference, I'll give this a spin tomorrow.

> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 61ee40ed804ee5..6d5cd20f81dadc 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -501,10 +501,25 @@ static int mn_hlist_invalidate_range_start(
>   "");
>   WARN_ON(mmu_notifier_range_blockable(range) ||
>   _ret != -EAGAIN);
> + /*
> +  * We call all the notifiers on any EAGAIN,
> +  * there is no way for a notifier to know if
> +  * its start method failed, thus a start that
> +  * does EAGAIN can't also do end.
> +  */
> + WARN_ON(ops->invalidate_range_end);
>   ret = _ret;
>   }
>   }
>   }
> +
> + if (ret) {
> + /* Must be non-blocking to get here*/
> + hlist_for_each_entry_rcu (subscription, >list,
> +   hlist, srcu_read_lock_held())
> + subscription->ops->invalidate_range_end(subscription,
> + range);
> + }
>   srcu_read_unlock(, id);
>  
>   return ret;


[PATCH] Rectify spelling and grammar

2021-03-10 Thread Xiaofeng Cao
Change 'inaccesable' to 'inaccessible'
Change 'detrmine' to 'determine'
Delete 'in' grammatically

Signed-off-by: Xiaofeng Cao 
---
 drivers/media/radio/radio-si476x.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/media/radio/radio-si476x.c 
b/drivers/media/radio/radio-si476x.c
index 23997425bdb5..b39a68f83c5f 100644
--- a/drivers/media/radio/radio-si476x.c
+++ b/drivers/media/radio/radio-si476x.c
@@ -152,7 +152,7 @@ static struct v4l2_ctrl_config si476x_ctrls[] = {
 
/*
 * SI476X during its station seeking(or tuning) process uses several
-* parameters to detrmine if "the station" is valid:
+* parameters to determine if "the station" is valid:
 *
 *  - Signal's SNR(in dBuV) must be lower than
 *  #V4L2_CID_SI476X_SNR_THRESHOLD
@@ -255,7 +255,7 @@ struct si476x_radio;
  *
  * This table holds pointers to functions implementing particular
  * operations depending on the mode in which the tuner chip was
- * configured to start in. If the function is not supported
+ * configured to start. If the function is not supported
  * corresponding element is set to #NULL.
  *
  * @tune_freq: Tune chip to a specific frequency
@@ -917,7 +917,7 @@ static int si476x_radio_s_ctrl(struct v4l2_ctrl *ctrl)
case V4L2_CID_RDS_RECEPTION:
/*
 * It looks like RDS related properties are
-* inaccesable when tuner is in AM mode, so cache the
+* inaccessible when tuner is in AM mode, so cache the
 * changes
 */
if (si476x_core_is_in_am_receiver_mode(radio->core))
-- 
2.25.1



Re: [PATCH V3 6/6] vDPA/ifcvf: verify mandatory feature bits for vDPA

2021-03-10 Thread Zhu, Lingshan




On 3/11/2021 2:20 PM, Jason Wang wrote:


On 2021/3/11 12:16 下午, Zhu Lingshan wrote:



On 3/11/2021 11:20 AM, Jason Wang wrote:


On 2021/3/10 5:00 下午, Zhu Lingshan wrote:

vDPA requres VIRTIO_F_ACCESS_PLATFORM as a must, this commit
examines this when set features.

Signed-off-by: Zhu Lingshan 
---
  drivers/vdpa/ifcvf/ifcvf_base.c | 8 
  drivers/vdpa/ifcvf/ifcvf_base.h | 1 +
  drivers/vdpa/ifcvf/ifcvf_main.c | 5 +
  3 files changed, 14 insertions(+)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c 
b/drivers/vdpa/ifcvf/ifcvf_base.c

index ea6a78791c9b..58f47fdce385 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -224,6 +224,14 @@ u64 ifcvf_get_features(struct ifcvf_hw *hw)
  return hw->hw_features;
  }
  +int ifcvf_verify_min_features(struct ifcvf_hw *hw)
+{
+    if (!(hw->hw_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
+    return -EINVAL;
+
+    return 0;
+}
+
  void ifcvf_read_net_config(struct ifcvf_hw *hw, u64 offset,
 void *dst, int length)
  {
diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h 
b/drivers/vdpa/ifcvf/ifcvf_base.h

index dbb8c10aa3b1..91c5735d4dc9 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -123,6 +123,7 @@ void io_write64_twopart(u64 val, u32 *lo, u32 
*hi);

  void ifcvf_reset(struct ifcvf_hw *hw);
  u64 ifcvf_get_features(struct ifcvf_hw *hw);
  u64 ifcvf_get_hw_features(struct ifcvf_hw *hw);
+int ifcvf_verify_min_features(struct ifcvf_hw *hw);
  u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid);
  int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u16 num);
  struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw);
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c 
b/drivers/vdpa/ifcvf/ifcvf_main.c

index 25fb9dfe23f0..f624f202447d 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -179,6 +179,11 @@ static u64 ifcvf_vdpa_get_features(struct 
vdpa_device *vdpa_dev)
  static int ifcvf_vdpa_set_features(struct vdpa_device *vdpa_dev, 
u64 features)

  {
  struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+    int ret;
+
+    ret = ifcvf_verify_min_features(vf);



So this validate device features instead of driver which is the one 
we really want to check?


Thanks


Hi Jason,

Here we check device feature bits to make sure the device support 
ACCESS_PLATFORM. 



If you want to check device features, you need to do that during 
probe() and fail the probing if without the feature. But I think you 
won't ship cards without ACCESS_PLATFORM.

Yes, there are no reasons ship a card without ACCESS_PLATFORM




In get_features(),
it will return a intersection of device features bit and driver 
supported features bits(which includes ACCESS_PLATFORM).
Other components like QEMU should not set features bits more than 
this intersection of bits. so we can make sure if this
ifcvf_verify_min_features() passed, both device and driver support 
ACCESS_PLATFORM.


Are you suggesting check driver feature bits in 
ifcvf_verify_min_features() in the meantime as well?



So it really depends on your hardware. If you hardware can always 
offer ACCESS_PLATFORM, you just need to check driver features. This is 
how vdpa_sim and mlx5_vdpa work.
Yes, we always support ACCESS_PLATFORM, so it is hard coded in the macro 
IFCVF_SUPPORTED_FEATURES.
Now we check whether device support this feature bit as a double 
conformation, are you suggesting we should check whether ACCESS_PLATFORM 
& IFCVF_SUPPORTED_FEATURES
in set_features() as well? I prefer check both device and 
IFCVF_SUPPORTED_FEATURES both, more reliable.


Thanks!


Thanks




Thanks!




+    if (ret)
+    return ret;
    vf->req_features = features;


___
Virtualization mailing list
virtualizat...@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization








[PATCH v2] netdevsim: fib: Remove redundant code

2021-03-10 Thread Jiapeng Chong
Fix the following coccicheck warnings:

./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return
"0" on line 889.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
Changes in v2:
  - Modify the function type to void.

 drivers/net/netdevsim/fib.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
index 46fb414..3ca0f54 100644
--- a/drivers/net/netdevsim/fib.c
+++ b/drivers/net/netdevsim/fib.c
@@ -869,10 +869,8 @@ static int nsim_fib6_event(struct nsim_fib_data *data,
return err;
 }
 
-static int nsim_fib_event(struct nsim_fib_event *fib_event)
+static void nsim_fib_event(struct nsim_fib_event *fib_event)
 {
-   int err = 0;
-
switch (fib_event->family) {
case AF_INET:
nsim_fib4_event(fib_event->data, _event->fen_info,
@@ -885,8 +883,6 @@ static int nsim_fib_event(struct nsim_fib_event *fib_event)
nsim_fib6_event_fini(_event->fib6_event);
break;
}
-
-   return err;
 }
 
 static int nsim_fib4_prepare_event(struct fib_notifier_info *info,
-- 
1.8.3.1



[PATCH v2 25/27] perf tests: Support 'Convert perf time to TSC' test for hybrid

2021-03-10 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the second evsel in evlist also needs initialization.

With this patch,

root@otcpl-adl-s-2:~# ./perf test 71
71: Convert perf time to TSC: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/perf-time-to-tsc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tools/perf/tests/perf-time-to-tsc.c 
b/tools/perf/tests/perf-time-to-tsc.c
index 680c3cffb128..b472205ec8e3 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -66,6 +66,11 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
u64 test_tsc, comm1_tsc, comm2_tsc;
u64 test_time, comm1_time = 0, comm2_time = 0;
struct mmap *md;
+   bool hybrid = false;
+
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
CHECK_NOT_NULL__(threads);
@@ -88,6 +93,17 @@ int test__perf_time_to_tsc(struct test *test __maybe_unused, 
int subtest __maybe
evsel->core.attr.disabled = 1;
evsel->core.attr.enable_on_exec = 0;
 
+   /*
+* For hybrid "cycles:u", it creates two events.
+* Init the second evsel here.
+*/
+   if (hybrid) {
+   evsel = evsel__next(evsel);
+   evsel->core.attr.comm = 1;
+   evsel->core.attr.disabled = 1;
+   evsel->core.attr.enable_on_exec = 0;
+   }
+
CHECK__(evlist__open(evlist));
 
CHECK__(evlist__mmap(evlist, UINT_MAX));
-- 
2.17.1



[PATCH v2 26/27] perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid

2021-03-10 Thread Jin Yao
Currently we don't support shadow stat for hybrid.

  root@ssp-pwrt-002:~# ./perf stat -e cycles,instructions -a -- sleep 1

   Performance counter stats for 'system wide':

  12,883,109,591  cpu_core/cycles/
   6,405,163,221  cpu_atom/cycles/
 555,553,778  cpu_core/instructions/
 841,158,734  cpu_atom/instructions/

 1.002644773 seconds time elapsed

Now there is no shadow stat 'insn per cycle' reported. We will support
it later and now just skip the 'perf stat metrics (shadow stat) test'.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh 
b/tools/perf/tests/shell/stat+shadow_stat.sh
index ebebd3596cf9..e6e35fc6c882 100755
--- a/tools/perf/tests/shell/stat+shadow_stat.sh
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -7,6 +7,9 @@ set -e
 # skip if system-wide mode is forbidden
 perf stat -a true > /dev/null 2>&1 || exit 2
 
+# skip if on hybrid platform
+perf stat -a -e cycles sleep 1 2>&1 | grep -e cpu_core && exit 2
+
 test_global_aggr()
 {
perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
-- 
2.17.1



[PATCH v2 27/27] perf Documentation: Document intel-hybrid support

2021-03-10 Thread Jin Yao
Add some words and examples to help understanding of
Intel hybrid perf support.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/intel-hybrid.txt | 228 ++
 tools/perf/Documentation/perf-record.txt  |   1 +
 tools/perf/Documentation/perf-stat.txt|   2 +
 3 files changed, 231 insertions(+)
 create mode 100644 tools/perf/Documentation/intel-hybrid.txt

diff --git a/tools/perf/Documentation/intel-hybrid.txt 
b/tools/perf/Documentation/intel-hybrid.txt
new file mode 100644
index ..ff641d9ac81b
--- /dev/null
+++ b/tools/perf/Documentation/intel-hybrid.txt
@@ -0,0 +1,228 @@
+Intel hybrid support
+
+Support for Intel hybrid events within perf tools.
+
+For some Intel platforms, such as AlderLake, which is hybrid platform and
+it consists of atom cpu and core cpu. Each cpu has dedicated event list.
+Part of events are available on core cpu, part of events are available
+on atom cpu and even part of events are available on both.
+
+Kernel exports two new cpu pmus via sysfs:
+/sys/devices/cpu_core
+/sys/devices/cpu_atom
+
+The 'cpus' files are created under the directories. For example,
+
+cat /sys/devices/cpu_core/cpus
+0-15
+
+cat /sys/devices/cpu_atom/cpus
+16-23
+
+It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
+
+Quickstart
+
+List hybrid event
+-
+
+As before, use perf-list to list the symbolic event.
+
+perf list
+
+inst_retired.any
+   [Fixed Counter: Counts the number of instructions retired. Unit: 
cpu_atom]
+inst_retired.any
+   [Number of instructions retired. Fixed Counter - architectural event. 
Unit: cpu_core]
+
+The 'Unit: xxx' is added to brief description to indicate which pmu
+the event is belong to. Same event name but with different pmu can
+be supported.
+
+Enable hybrid event with a specific pmu
+---
+
+To enable a core only event or atom only event, following syntax is supported:
+
+   cpu_core//
+or
+   cpu_atom//
+
+For example, count the 'cycles' event on core cpus.
+
+   perf stat -e cpu_core/cycles/
+
+Create two events for one hardware event automatically
+--
+
+When creating one event and the event is available on both atom and core,
+two events are created automatically. One is for atom, the other is for
+core. Most of hardware events and cache events are available on both
+cpu_core and cpu_atom.
+
+For hardware events, they have pre-defined configs (e.g. 0 for cycles).
+But on hybrid platform, kernel needs to know where the event comes from
+(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
+can't carry pmu information. So a new type PERF_TYPE_HARDWARE_PMU is
+introduced.
+
+The new attr.config layout for PERF_TYPE_HARDWARE_PMU:
+
+0xDD00AA
+AA: original hardware event ID
+DD: PMU type ID
+
+Cache event is similar. A new type PERF_TYPE_HW_CACHE_PMU is introduced.
+
+The new attr.config layout for PERF_TYPE_HW_CACHE_PMU:
+
+0xDD00CCBBAA
+AA: original hardware cache ID
+BB: original hardware cache op ID
+CC: original hardware cache op result ID
+DD: PMU type ID
+
+PMU type ID is retrieved from sysfs
+
+cat /sys/devices/cpu_atom/type
+10
+
+cat /sys/devices/cpu_core/type
+4
+
+When enabling a hardware event without specified pmu, such as,
+perf stat -e cycles -a (use system-wide in this example), two events
+are created automatically.
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0x4
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+and
+
+
+perf_event_attr:
+  type 6
+  size 120
+  config   0xa
+  sample_type  IDENTIFIER
+  read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
+  disabled 1
+  inherit  1
+  exclude_guest1
+
+
+type 6 is PERF_TYPE_HARDWARE_PMU.
+0x4 in 0x4 indicates it's cpu_core pmu.
+0xa in 0xa indicates it's cpu_atom pmu (atom pmu type id is random).
+
+The kernel creates 'cycles' (0x4) on cpu0-cpu15 (core cpus),
+and create 'cycles' (0xa) on cpu16-cpu23 (atom cpus).
+
+For perf-stat result, it displays two events:
+
+ Performance counter stats for 'system wide':
+
+12,869,720,529  cpu_core/cycles/
+

[PATCH v2 19/27] perf tests: Add hybrid cases for 'Parse event definition strings' test

2021-03-10 Thread Jin Yao
Add basic hybrid test cases for 'Parse event definition strings' test.

root@otcpl-adl-s-2:~# ./perf test 6
 6: Parse event definition strings  : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-events.c | 171 
 1 file changed, 171 insertions(+)

diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index a7f6661e6112..aec929867020 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -1512,6 +1512,123 @@ static int test__all_tracepoints(struct evlist *evlist)
return test__checkevent_tracepoint_multi(evlist);
 }
 
+static int test__hybrid_hw_event_with_pmu(struct evlist *evlist)
+{
+   struct evsel *evsel = evlist__first(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 1 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_event(struct evlist *evlist)
+{
+   struct evsel *evsel1 = evlist__first(evlist);
+   struct evsel *evsel2 = evlist__last(evlist);
+
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel1->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == 
evsel1->core.attr.config);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel2->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0xa == 
evsel2->core.attr.config);
+   return 0;
+}
+
+static int test__hybrid_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x40001 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_sw_hw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_hw_sw_group_event(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_SOFTWARE == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   return 0;
+}
+
+static int test__hybrid_group_modifier1(struct evlist *evlist)
+{
+   struct evsel *evsel, *leader;
+
+   evsel = leader = evlist__first(evlist);
+   TEST_ASSERT_VAL("wrong number of entries", 2 == 
evlist->core.nr_entries);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x4 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
!evsel->core.attr.exclude_kernel);
+
+   evsel = evsel__next(evsel);
+   TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE_PMU == 
evsel->core.attr.type);
+   TEST_ASSERT_VAL("wrong config", 0x40001 == evsel->core.attr.config);
+   TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
+   TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
+   TEST_ASSERT_VAL("wrong exclude_kernel", 
evsel->core.attr.exclude_kernel);
+   return 0;
+}
+
+static int test__hybrid_raw1(struct evlist *evlist)
+{
+   struct evsel 

[PATCH v2 21/27] perf tests: Skip 'Setup struct perf_event_attr' test for hybrid

2021-03-10 Thread Jin Yao
For hybrid, kernel introduces new perf type PERF_TYPE_HARDWARE_PMU (6)
and it's assigned to hybrid hardware events.

root@otcpl-adl-s-2:~# ./perf test 17 -vvv
  ...
  compare
matching [event:base-stat]
  to [event-6-17179869184-4]
  [cpu] * 0
  [flags] 0|8 8
  [type] 0 6
->FAIL
match: [event:base-stat] matches []
event:base-stat does not match, but is optional
  matched
  compare
matching [event-6-17179869184-4]
  to [event:base-stat]
  [cpu] 0 *
  [flags] 8 0|8
  [type] 6 0
->FAIL
match: [event-6-17179869184-4] matches []
expected type=6, got 0
expected config=17179869184, got 0
FAILED './tests/attr/test-stat-C0' - match failure

The type matching is failed because one type is 0 but the
type of hybrid hardware event is 6. We temporarily skip this
test case and TODO in future.

Signed-off-by: Jin Yao 
---
 tools/perf/tests/attr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index dd39ce9b0277..fc7f74159764 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -34,6 +34,7 @@
 #include "event.h"
 #include "util.h"
 #include "tests.h"
+#include "pmu-hybrid.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -184,6 +185,10 @@ int test__attr(struct test *test __maybe_unused, int 
subtest __maybe_unused)
char path_dir[PATH_MAX];
char *exec_path;
 
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   return 0;
+
/* First try development tree tests. */
if (!lstat("./tests", ))
return run_dir("./tests", "./perf");
-- 
2.17.1



[PATCH v2 24/27] perf tests: Support 'Session topology' test for hybrid

2021-03-10 Thread Jin Yao
Force to create one event "cpu_core/cycles/" by default,
otherwise in evlist__valid_sample_type, the checking of
'if (evlist->core.nr_entries == 1)' would be failed.

root@otcpl-adl-s-2:~# ./perf test 41
41: Session topology: Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/topology.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c
index 74748ed75b2c..0f6e73baab2d 100644
--- a/tools/perf/tests/topology.c
+++ b/tools/perf/tests/topology.c
@@ -40,7 +40,15 @@ static int session_write_header(char *path)
session = perf_session__new(, false, NULL);
TEST_ASSERT_VAL("can't get session", !IS_ERR(session));
 
-   session->evlist = evlist__new_default();
+   perf_pmu__scan(NULL);
+   if (!perf_pmu__hybrid_exist()) {
+   session->evlist = evlist__new_default();
+   } else {
+   struct parse_events_error err;
+
+   session->evlist = evlist__new();
+   parse_events(session->evlist, "cpu_core/cycles/", );
+   }
TEST_ASSERT_VAL("can't get evlist", session->evlist);
 
perf_header__set_feat(>header, HEADER_CPU_TOPOLOGY);
-- 
2.17.1



[PATCH v2 23/27] perf tests: Support 'Parse and process metrics' test for hybrid

2021-03-10 Thread Jin Yao
Some events are not supported. Only pick up some cases for hybrid.

root@otcpl-adl-s-2:~# ./perf test 67
67: Parse and process metrics   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/parse-metric.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 55bf52e588be..149b18f1f96a 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -370,12 +370,17 @@ static int test_metric_group(void)
 
 int test__parse_metric(struct test *test __maybe_unused, int subtest 
__maybe_unused)
 {
+   perf_pmu__scan(NULL);
+
TEST_ASSERT_VAL("IPC failed", test_ipc() == 0);
TEST_ASSERT_VAL("frontend failed", test_frontend() == 0);
-   TEST_ASSERT_VAL("cache_miss_cycles failed", test_cache_miss_cycles() == 
0);
TEST_ASSERT_VAL("DCache_L2 failed", test_dcache_l2() == 0);
TEST_ASSERT_VAL("recursion fail failed", test_recursion_fail() == 0);
-   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
TEST_ASSERT_VAL("Memory bandwidth", test_memory_bandwidth() == 0);
-   return 0;
+
+   if (!perf_pmu__hybrid_exist()) {
+   TEST_ASSERT_VAL("cache_miss_cycles failed", 
test_cache_miss_cycles() == 0);
+   TEST_ASSERT_VAL("test metric group", test_metric_group() == 0);
+   }
+return 0;
 }
-- 
2.17.1



[PATCH v2 20/27] perf tests: Add hybrid cases for 'Roundtrip evsel->name' test

2021-03-10 Thread Jin Yao
Since for one hw event, two hybrid events are created.

For example,

evsel->idx  evsel__name(evsel)
0   cycles
1   cycles
2   instructions
3   instructions
...

So for comparing the evsel name on hybrid, the evsel->idx
needs to be divided by 2.

root@otcpl-adl-s-2:~# ./perf test 14
14: Roundtrip evsel->name   : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/evsel-roundtrip-name.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/evsel-roundtrip-name.c 
b/tools/perf/tests/evsel-roundtrip-name.c
index f7f3e5b4c180..2b938a15901e 100644
--- a/tools/perf/tests/evsel-roundtrip-name.c
+++ b/tools/perf/tests/evsel-roundtrip-name.c
@@ -62,7 +62,8 @@ static int perf_evsel__roundtrip_cache_name_test(void)
return ret;
 }
 
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
+static int __perf_evsel__name_array_test(const char *names[], int nr_names,
+int distance)
 {
int i, err;
struct evsel *evsel;
@@ -82,9 +83,9 @@ static int __perf_evsel__name_array_test(const char *names[], 
int nr_names)
 
err = 0;
evlist__for_each_entry(evlist, evsel) {
-   if (strcmp(evsel__name(evsel), names[evsel->idx])) {
+   if (strcmp(evsel__name(evsel), names[evsel->idx / distance])) {
--err;
-   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx]);
+   pr_debug("%s != %s\n", evsel__name(evsel), 
names[evsel->idx / distance]);
}
}
 
@@ -93,18 +94,22 @@ static int __perf_evsel__name_array_test(const char 
*names[], int nr_names)
return err;
 }
 
-#define perf_evsel__name_array_test(names) \
-   __perf_evsel__name_array_test(names, ARRAY_SIZE(names))
+#define perf_evsel__name_array_test(names, distance) \
+   __perf_evsel__name_array_test(names, ARRAY_SIZE(names), distance)
 
 int test__perf_evsel__roundtrip_name_test(struct test *test __maybe_unused, 
int subtest __maybe_unused)
 {
int err = 0, ret = 0;
 
-   err = perf_evsel__name_array_test(evsel__hw_names);
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   return perf_evsel__name_array_test(evsel__hw_names, 2);
+
+   err = perf_evsel__name_array_test(evsel__hw_names, 1);
if (err)
ret = err;
 
-   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1);
+   err = __perf_evsel__name_array_test(evsel__sw_names, 
PERF_COUNT_SW_DUMMY + 1, 1);
if (err)
ret = err;
 
-- 
2.17.1



[PATCH v2 22/27] perf tests: Support 'Track with sched_switch' test for hybrid

2021-03-10 Thread Jin Yao
Since for "cycles:u' on hybrid platform, it creates two "cycles".
So the number of events in evlist is not expected in next test
steps. Now we just use one event "cpu_core/cycles:u/" for hybrid.

root@otcpl-adl-s-2:~# ./perf test 35
35: Track with sched_switch : Ok

Signed-off-by: Jin Yao 
---
 tools/perf/tests/switch-tracking.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/switch-tracking.c 
b/tools/perf/tests/switch-tracking.c
index 3ebaa758df77..13a11ce51a1a 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -340,6 +340,11 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
struct evsel *switch_evsel, *tracking_evsel;
const char *comm;
int err = -1;
+   bool hybrid = false;
+
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist())
+   hybrid = true;
 
threads = thread_map__new(-1, getpid(), UINT_MAX);
if (!threads) {
@@ -371,7 +376,10 @@ int test__switch_tracking(struct test *test 
__maybe_unused, int subtest __maybe_
cpu_clocks_evsel = evlist__last(evlist);
 
/* Second event */
-   err = parse_events(evlist, "cycles:u", NULL);
+   if (!hybrid)
+   err = parse_events(evlist, "cycles:u", NULL);
+   else
+   err = parse_events(evlist, "cpu_core/cycles:u/", NULL);
if (err) {
pr_debug("Failed to parse event cycles:u\n");
goto out_err;
-- 
2.17.1



[PATCH v2 16/27] perf evlist: Warn as events from different hybrid PMUs in a group

2021-03-10 Thread Jin Yao
If a group has events which are from different hybrid PMUs,
shows a warning.

This is to remind the user not to put the core event and atom
event into one group.

  root@ssp-pwrt-002:~# ./perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -- 
sleep 1
  WARNING: Group has events from different hybrid PMUs

   Performance counter stats for 'sleep 1':

 cpu_core/cycles/
   cpu_atom/cycles/

 1.002585908 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-record.c |  3 +++
 tools/perf/builtin-stat.c   |  7 ++
 tools/perf/util/evlist.c| 44 +
 tools/perf/util/evlist.h|  2 ++
 4 files changed, 56 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 363ea1047148..188a1198cd4b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -929,6 +929,9 @@ static int record__open(struct record *rec)
pos = evlist__reset_weak_group(evlist, pos, 
true);
goto try_again;
}
+
+   if (errno == EINVAL && perf_pmu__hybrid_exist())
+   evlist__warn_hybrid_group(evlist);
rc = -errno;
evsel__open_strerror(pos, >target, errno, msg, 
sizeof(msg));
ui__error("%s\n", msg);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7a732508b2b4..6f780a039db0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -239,6 +239,9 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
struct evsel *evsel, *pos, *leader;
char buf[1024];
 
+   if (evlist__hybrid_exist(evlist))
+   return;
+
evlist__for_each_entry(evlist, evsel) {
leader = evsel->leader;
 
@@ -726,6 +729,10 @@ enum counter_recovery {
 static enum counter_recovery stat_handle_error(struct evsel *counter)
 {
char msg[BUFSIZ];
+
+   if (perf_pmu__hybrid_exist() && errno == EINVAL)
+   evlist__warn_hybrid_group(evsel_list);
+
/*
 * PPC returns ENXIO for HW counters until 2.6.37
 * (behavior changed with commit b0a873e).
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f139151b9433..5ec891418cdd 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2224,3 +2224,47 @@ void evlist__invalidate_all_cpus(struct evlist *evlist)
perf_cpu_map__put(evlist->core.all_cpus);
evlist->core.all_cpus = perf_cpu_map__empty_new(1);
 }
+
+static bool group_hybrid_conflict(struct evsel *leader)
+{
+   struct evsel *pos, *prev = NULL;
+
+   for_each_group_evsel(pos, leader) {
+   if (!pos->pmu_name || !perf_pmu__is_hybrid(pos->pmu_name))
+   continue;
+
+   if (prev && strcmp(prev->pmu_name, pos->pmu_name))
+   return true;
+
+   prev = pos;
+   }
+
+   return false;
+}
+
+void evlist__warn_hybrid_group(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_group_leader(evsel) &&
+   evsel->core.nr_members > 1 &&
+   group_hybrid_conflict(evsel)) {
+   WARN_ONCE(1, "WARNING: Group has events from "
+"different hybrid PMUs\n");
+   return;
+   }
+   }
+}
+
+bool evlist__hybrid_exist(struct evlist *evlist)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel__is_hybrid_event(evsel))
+   return true;
+   }
+
+   return false;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 0da683511d98..33dec3bb5739 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -369,4 +369,6 @@ struct evsel *evlist__find_evsel(struct evlist *evlist, int 
idx);
 void evlist__invalidate_all_cpus(struct evlist *evlist);
 
 bool evlist__has_hybrid_events(struct evlist *evlist);
+void evlist__warn_hybrid_group(struct evlist *evlist);
+bool evlist__hybrid_exist(struct evlist *evlist);
 #endif /* __PERF_EVLIST_H */
-- 
2.17.1



[PATCH v2 17/27] perf evsel: Adjust hybrid event and global event mixed group

2021-03-10 Thread Jin Yao
A group mixed with hybrid event and global event is allowed. For example,
group leader is 'cpu-clock' and the group member is 'cpu_atom/cycles/'.

e.g.
perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a

The challenge is their available cpus are not fully matched.
For example, 'cpu-clock' is available on CPU0-CPU23, but 'cpu_core/cycles/'
is available on CPU16-CPU23.

When getting the group id for group member, we must be very careful
because the cpu for 'cpu-clock' is not equal to the cpu for 'cpu_atom/cycles/'.
Actually the cpu here is the index of evsel->core.cpus, not the real CPU ID.
e.g. cpu0 for 'cpu-clock' is CPU0, but cpu0 for 'cpu_atom/cycles/' is CPU16.

Another challenge is for group read. The events in group may be not
available on all cpus. For example the leader is a software event and
it's available on CPU0-CPU1, but the group member is a hybrid event and
it's only available on CPU1. For CPU0, we have only one event, but for CPU1
we have two events. So we need to change the read size according to
the real number of events on that cpu.

Let's see examples,

  root@ssp-pwrt-002:~# ./perf stat -e '{cpu-clock,cpu_atom/cycles/}' -a -vvv -- 
sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 1
size 120
sample_type  IDENTIFIER
read_format  
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  
  perf_event_attr:
type 6
size 120
config   0xa
sample_type  IDENTIFIER
read_format  
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd 20  flags 0x8 = 28
  sys_perf_event_open: pid -1  cpu 17  group_fd 21  flags 0x8 = 29
  sys_perf_event_open: pid -1  cpu 18  group_fd 22  flags 0x8 = 30
  sys_perf_event_open: pid -1  cpu 19  group_fd 23  flags 0x8 = 31
  sys_perf_event_open: pid -1  cpu 20  group_fd 24  flags 0x8 = 32
  sys_perf_event_open: pid -1  cpu 21  group_fd 25  flags 0x8 = 33
  sys_perf_event_open: pid -1  cpu 22  group_fd 26  flags 0x8 = 34
  sys_perf_event_open: pid -1  cpu 23  group_fd 27  flags 0x8 = 35
  cpu-clock: 0: 1002495141 1002496165 1002496165
  cpu-clock: 1: 1002494114 1002494741 1002494741
  cpu-clock: 2: 1002489755 1002491096 1002491096
  cpu-clock: 3: 1002486160 1002487660 1002487660
  cpu-clock: 4: 1002482342 1002483461 1002483461
  cpu-clock: 5: 1002480405 1002480756 1002480756
  cpu-clock: 6: 1002478135 1002478698 1002478698
  cpu-clock: 7: 1002476940 1002477251 1002477251
  cpu-clock: 8: 1002474281 1002475616 1002475616
  cpu-clock: 9: 1002471033 1002471983 1002471983
  cpu-clock: 10: 1002468437 1002469553 1002469553
  cpu-clock: 11: 1002467503 1002467892 1002467892
  cpu-clock: 12: 1002465167 1002466190 1002466190
  cpu-clock: 13: 1002463794 1002464109 1002464109
  

[PATCH v2 18/27] perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU

2021-03-10 Thread Jin Yao
For a hybrid system, the perf subsystem doesn't know which PMU the
events belong to. So the PMU aware version PERF_TYPE_HARDWARE_PMU and
PERF_TYPE_HW_CACHE_PMU are introduced.

Now define the new output[] entries for these two types.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-script.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5915f19cee55..d0e889e636d5 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -275,6 +275,30 @@ static struct {
.invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
},
 
+   [PERF_TYPE_HARDWARE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
+   [PERF_TYPE_HW_CACHE_PMU] = {
+   .user_set = false,
+
+   .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
+ PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
+ PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
+ PERF_OUTPUT_SYM | PERF_OUTPUT_SYMOFFSET |
+ PERF_OUTPUT_DSO | PERF_OUTPUT_PERIOD,
+
+   .invalid_fields = PERF_OUTPUT_TRACE | PERF_OUTPUT_BPF_OUTPUT,
+   },
+
[OUTPUT_TYPE_SYNTH] = {
.user_set = false,
 
-- 
2.17.1



Re: [PATCH] ASoC: soc-core: fix DMI handling

2021-03-10 Thread Takashi Iwai
On Wed, 10 Mar 2021 20:39:27 +0100,
Pierre-Louis Bossart wrote:
> 
> When DMI information is not present, trying to assign the card long
> name results in the following warning.
> 
> WARNING KERN tegra-audio-graph-card sound: ASoC: no DMI vendor name!
> 
> The initial solution suggested was to test if the card device is an
> ACPI one. This causes a regression visible to userspace on all Intel
> platforms, with UCM unable to load card profiles based on DMI
> information: the card devices are not necessarily ACPI ones, e.g. when
> the parent creates platform devices on Intel devices.
> 
> To fix this problem, this patch exports the existing dmi_available
> variable and tests it in the ASoC core.
> 
> Fixes: c014170408bc ("ASoC: soc-core: Prevent warning if no DMI table is 
> present")
> Signed-off-by: Pierre-Louis Bossart 

Reviewed-by: Takashi Iwai 


Takashi


> ---
>  drivers/firmware/dmi_scan.c | 1 +
>  sound/soc/soc-core.c| 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
> index d51ca0428bb8..f191a1f901ac 100644
> --- a/drivers/firmware/dmi_scan.c
> +++ b/drivers/firmware/dmi_scan.c
> @@ -166,6 +166,7 @@ static int __init dmi_checksum(const u8 *buf, u8 len)
>  static const char *dmi_ident[DMI_STRING_MAX];
>  static LIST_HEAD(dmi_devices);
>  int dmi_available;
> +EXPORT_SYMBOL_GPL(dmi_available);
>  
>  /*
>   *   Save a DMI string
> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
> index 16ba54eb8164..c7e4600b2dd4 100644
> --- a/sound/soc/soc-core.c
> +++ b/sound/soc/soc-core.c
> @@ -1574,7 +1574,7 @@ int snd_soc_set_dmi_name(struct snd_soc_card *card, 
> const char *flavour)
>   if (card->long_name)
>   return 0; /* long name already set by driver or from DMI */
>  
> - if (!is_acpi_device_node(card->dev->fwnode))
> + if (!dmi_available)
>   return 0;
>  
>   /* make up dmi long name as: vendor-product-version-board */
> -- 
> 2.25.1
> 


[PATCH v2 15/27] perf stat: Filter out unmatched aggregation for hybrid event

2021-03-10 Thread Jin Yao
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.

Before:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2  1,604,426,524  cpu_core/cycles/
  S0-D0-C4   2  1,604,408,224  cpu_core/cycles/
  S0-D0-C8   2  1,605,995,644  cpu_core/cycles/
  S0-D0-C12  2  1,628,056,554  cpu_core/cycles/
  S0-D0-C16  2  1,611,488,734  cpu_core/cycles/
  S0-D0-C20  2  1,616,314,761  cpu_core/cycles/
  S0-D0-C24  2  1,603,558,295  cpu_core/cycles/
  S0-D0-C28  2  1,603,541,128  cpu_core/cycles/
  S0-D0-C32  0cpu_core/cycles/
  S0-D0-C33  0cpu_core/cycles/
  S0-D0-C34  0cpu_core/cycles/
  S0-D0-C35  0cpu_core/cycles/
  S0-D0-C36  0cpu_core/cycles/
  S0-D0-C37  0cpu_core/cycles/
  S0-D0-C38  0cpu_core/cycles/
  S0-D0-C39  0cpu_core/cycles/

After:

  root@ssp-pwrt-002:~# ./perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  S0-D0-C0   2  1,621,781,943  cpu_core/cycles/
  S0-D0-C4   2  1,621,755,088  cpu_core/cycles/
  S0-D0-C8   2  1,604,276,920  cpu_core/cycles/
  S0-D0-C12  2  1,603,446,963  cpu_core/cycles/
  S0-D0-C16  2  1,604,231,725  cpu_core/cycles/
  S0-D0-C20  2  1,603,435,286  cpu_core/cycles/
  S0-D0-C24  2  1,603,387,250  cpu_core/cycles/
  S0-D0-C28  2  1,604,173,183  cpu_core/cycles/

Signed-off-by: Jin Yao 
---
 tools/perf/util/stat-display.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index ed37d8e7ea1a..2db7c36a03ad 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -634,6 +634,20 @@ static void aggr_cb(struct perf_stat_config *config,
}
 }
 
+static bool aggr_id_hybrid_matched(struct perf_stat_config *config,
+  struct evsel *counter, struct aggr_cpu_id id)
+{
+   struct aggr_cpu_id s;
+
+   for (int i = 0; i < evsel__nr_cpus(counter); i++) {
+   s = config->aggr_get_id(config, evsel__cpus(counter), i);
+   if (cpu_map__compare_aggr_cpu_id(s, id))
+   return true;
+   }
+
+   return false;
+}
+
 static void print_counter_aggrdata(struct perf_stat_config *config,
   struct evsel *counter, int s,
   char *prefix, bool metric_only,
@@ -647,6 +661,12 @@ static void print_counter_aggrdata(struct perf_stat_config 
*config,
double uval;
 
ad.id = id = config->aggr_map->map[s];
+
+   if (perf_pmu__hybrid_exist() &&
+   !aggr_id_hybrid_matched(config, counter, id)) {
+   return;
+   }
+
ad.val = ad.ena = ad.run = 0;
ad.nr = 0;
if (!collect_data(config, counter, aggr_cb, ))
-- 
2.17.1



[PATCH v2 14/27] perf stat: Add default hybrid events

2021-03-10 Thread Jin Yao
Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

  root@otcpl-adl-s-2:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

  109.43 msec task-clock#0.993 CPUs utilized
   1  context-switches  #0.009 K/sec
   0  cpu-migrations#0.000 K/sec
 105  page-faults   #0.960 K/sec
 401,161,982  cycles#3.666 GHz
   1,601,216,357  instructions  #3.99  insn per cycle
 200,217,751  branches  # 1829.686 M/sec
  14,555  branch-misses #0.01% of all branches

 0.110176860 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two events are created for one hardware event.

core cycles,
atom cycles,
core instructions,
atom instructions,
core branches,
atom branches,
core branch-misses,
atom branch-misses

These events will be added to evlist in order on hybrid platform
if '-e' is not set.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:

  root@ssp-pwrt-002:~# ./perf stat  -- ./triad_loop

   Performance counter stats for './triad_loop':

  290.77 msec task-clock#0.996 CPUs utilized
  25  context-switches  #0.086 K/sec
  13  cpu-migrations#0.045 K/sec
 107  page-faults   #0.368 K/sec
 449,620,957  cpu_core/cycles/  # 1546.334 M/sec
 cpu_atom/cycles/  
(0.00%)
   1,601,499,820  cpu_core/instructions/# 5507.870 M/sec
 cpu_atom/instructions/
(0.00%)
 200,272,310  cpu_core/branches/#  688.776 M/sec
 cpu_atom/branches/
(0.00%)
  15,255  cpu_core/branch-misses/   #0.052 M/sec
 cpu_atom/branch-misses/   
(0.00%)

 0.291897676 seconds time elapsed

We can see two events are created for one hardware event.
First one is core event the second one is atom event.

One thing is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6c0a21323814..7a732508b2b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1162,6 +1162,13 @@ static int parse_stat_cgroups(const struct option *opt,
return parse_cgroups(opt, str, unset);
 }
 
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+   struct parse_events_error err;
+
+   return parse_events(evlist, 
"cycles,instructions,branches,branch-misses", );
+}
+
 static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1637,6 +1644,12 @@ static int add_default_attributes(void)
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES  
},
 
+};
+   struct perf_event_attr default_sw_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES   
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS 
},
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS
},
 };
 
 /*
@@ -1874,6 +1887,15 @@ static int add_default_attributes(void)
}
 
if (!evsel_list->core.nr_entries) {
+   perf_pmu__scan(NULL);
+   if (perf_pmu__hybrid_exist()) {
+   if (evlist__add_default_attrs(evsel_list,
+ default_sw_attrs) < 0) {
+   return -1;
+   }
+   return add_default_hybrid_events(evsel_list);
+   }
+
if (target__has_cpu())
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
 
-- 
2.17.1



[PATCH v2 13/27] perf evlist: Create two hybrid 'cycles' events by default

2021-03-10 Thread Jin Yao
When evlist is empty, for example no '-e' specified in perf record,
one default 'cycles' event is added to evlist.

While on hybrid platform, it needs to create two default 'cycles'
events. One is for core, the other is for atom.

This patch actually calls evsel__new_cycles() two times to create
two 'cycles' events.

  root@ssp-pwrt-002:~# ./perf record -vv -- sleep 1
  ...
  
  perf_event_attr:
type 6
size 120
config   0x4
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|PERIOD
read_format  ID
disabled 1
inherit  1
mmap 1
comm 1
freq 1
enable_on_exec   1
task 1
precise_ip   3
sample_id_all1
exclude_guest1
mmap21
comm_exec1
ksymbol  1
bpf_event1
  
  sys_perf_event_open: pid 22300  cpu 0  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid 22300  cpu 1  group_fd -1  flags 0x8 = 6
  sys_perf_event_open: pid 22300  cpu 2  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid 22300  cpu 3  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid 22300  cpu 4  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid 22300  cpu 5  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid 22300  cpu 6  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid 22300  cpu 7  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid 22300  cpu 8  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid 22300  cpu 9  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid 22300  cpu 10  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid 22300  cpu 11  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid 22300  cpu 12  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid 22300  cpu 13  group_fd -1  flags 0x8 = 19
  sys_perf_event_open: pid 22300  cpu 14  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid 22300  cpu 15  group_fd -1  flags 0x8 = 21
  
  perf_event_attr:
type 6
size 120
config   0xa
{ sample_period, sample_freq }   4000
sample_type  IP|TID|TIME|ID|PERIOD
read_format  ID
disabled 1
inherit  1
freq 1
enable_on_exec   1
precise_ip   3
sample_id_all1
exclude_guest1
  
  sys_perf_event_open: pid 22300  cpu 16  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid 22300  cpu 17  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid 22300  cpu 18  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid 22300  cpu 19  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid 22300  cpu 20  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid 22300  cpu 21  group_fd -1  flags 0x8 = 27
  sys_perf_event_open: pid 22300  cpu 22  group_fd -1  flags 0x8 = 28
  sys_perf_event_open: pid 22300  cpu 23  group_fd -1  flags 0x8 = 29
  ...

We can see one core 'cycles' (0x4) is enabled on cpu0-cpu15
and atom 'cycles' (0xa) is enabled on cpu16-cpu23.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-record.c | 10 ++
 tools/perf/util/evlist.c| 32 +++-
 tools/perf/util/evsel.c |  6 +++---
 tools/perf/util/evsel.h |  2 +-
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 35465d1db6dd..363ea1047148 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2786,10 +2786,12 @@ int cmd_record(int argc, const char **argv)
if (record.opts.overwrite)
record.opts.tail_synthesize = true;
 
-   if (rec->evlist->core.nr_entries == 0 &&
-   __evlist__add_default(rec->evlist, !record.opts.no_samples) < 0) {
-   pr_err("Not enough memory for event selector list\n");
-   goto out;
+   if (rec->evlist->core.nr_entries == 0) {
+   perf_pmu__scan(NULL);
+   if (__evlist__add_default(rec->evlist, !record.opts.no_samples) 
< 0) {
+   pr_err("Not enough memory for event selector list\n");
+   goto out;
+   }
}
 
if 

[PATCH v2 11/27] perf parse-events: Support hardware events inside PMU

2021-03-10 Thread Jin Yao
On hybrid platform, some hardware events are only available
on a specific pmu. For example, 'L1-dcache-load-misses' is only
available on 'cpu_core' pmu. And even for the event which can be
available on both pmus, the user also may want to just enable
one event. So now following syntax is supported:

cpu_core//
cpu_core//
cpu_core//

cpu_atom//
cpu_atom//
cpu_atom//

It limits the event to be enabled only on a specified pmu.

The patch uses this idea, for example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".

We create a new "parse_events_state" with the pmu_name and use
parse_events__scanner to scan the term->config (the string "LLC-loads"
in this example). The parse_events_add_cache() will be called during
parsing. The parse_state->pmu_name is used to identify the pmu
where the event is enabled.

Let's see examples:

  root@ssp-pwrt-002:~# ./perf stat -e cpu_core/cycles/,cpu_core/LLC-loads/ -vv 
-- ./triad_loop
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 3
  
  perf_event_attr:
type 7
size 120
config   0x40002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7267  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 449252097 29724 29724
  LLC-loads: 0: 1857 29724 29724
  cycles: 449252097 29724 29724
  LLC-loads: 1857 29724 29724

   Performance counter stats for './triad_loop':

 449,252,097  cpu_core/cycles/
   1,857  cpu_core/LLC-loads/

 0.298898415 seconds time elapsed

  root@ssp-pwrt-002:~# ./perf stat -e cpu_atom/cycles/,cpu_atom/LLC-loads/ -vv 
-- taskset -c 16 ./triad_loop
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0xa
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 3
  
  perf_event_attr:
type 7
size 120
config   0xa0002
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 7339  cpu -1  group_fd -1  flags 0x8 = 4
  cycles: 0: 602020010 343657939 342553275
  LLC-loads: 0: 3537 343657939 342553275
  cycles: 603961400 343657939 342553275
  LLC-loads: 3548 343657939 342553275

   Performance counter stats for 'taskset -c 16 ./triad_loop':

 603,961,400  cpu_atom/cycles/  
(99.68%)
   3,548  cpu_atom/LLC-loads/   
(99.68%)

 0.344904585 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/util/parse-events.c | 100 +++--
 tools/perf/util/parse-events.h |   6 +-
 tools/perf/util/parse-events.y |  21 ++-
 3 files changed, 105 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09e42245f71a..30435adc7a7b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -489,7 +489,8 @@ static int create_hybrid_cache_event(struct list_head 
*list, int *idx,
 static int 

Re: [RFC v2 3/5] arm64: socfpga: rename ARCH_STRATIX10 to ARCH_SOCFPGA64

2021-03-10 Thread Krzysztof Kozlowski
On 10/03/2021 17:42, Arnd Bergmann wrote:
> On Wed, Mar 10, 2021 at 4:54 PM Krzysztof Kozlowski
>  wrote:
>> On 10/03/2021 16:47, Krzysztof Kozlowski wrote:
>>> This edac Altera driver is very weird... it uses the same compatible
>>> differently depending whether this is 32-bit or 64-bit (e.g. Stratix
>>> 10)! On ARMv7 the compatible means for example one IRQ... On ARMv8, we
>>> have two. It's quite a new code (2019 from Intel), not some ancient
>>> legacy, so it should never have been accepted...
>>
>> Oh, it's not that horrible as it sounds. They actually have different
>> compatibles for edac driver with these differences (e.g. in interrupts).
>> They just do not use them and instead check for the basic (common?)
>> compatible and architecture... Anyway without testing I am not the
>> person to fix the edac driver.
> 
> Ok, This should be fixed properly as you describe, but as a quick hack
> it wouldn't be hard to just change the #ifdef to check for CONFIG_64BIT
> instead of CONFIG_ARCH_STRATIX10 during the rename of the config
> symbol.

This would work. The trouble with renaming ARCH_SOCFPGA into
ARCH_INTEL_SOCFPGA is that still SOCFPGA will appear in many other
Kconfig symbols or even directory paths.

Let me use ARCH_INTEL_SOCFPGA for 64bit here and renaming of 32bit a
little bit later.

Best regards,
Krzysztof


[PATCH v2 12/27] perf parse-events: Support hybrid raw events

2021-03-10 Thread Jin Yao
On hybrid platform, same raw event is possible to be available on
both cpu_core pmu and cpu_atom pmu. So it's supported to create
two raw events for one event encoding.

  root@ssp-pwrt-002:~# ./perf stat -e r3c -a -vv -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 4
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 10
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  r3c: 0: 807321251 1002093589 1002093589
  r3c: 1: 807321699 1002088203 1002088203
  r3c: 2: 802163010 1002086701 1002086701
  r3c: 3: 802162967 1002080660 1002080660
  r3c: 4: 801769096 1002077047 1002077047
  r3c: 5: 801766174 1002071197 1002071197
  r3c: 6: 804147338 1002065696 1002065696
  r3c: 7: 804141152 1002055345 1002055345
  r3c: 8: 801743651 1002043364 1002043364
  r3c: 9: 801742285 1002036921 1002036921
  r3c: 10: 804083297 1002032502 1002032502
  r3c: 11: 804084735 1002027992 1002027992
  r3c: 12: 804504507 1002026371 1002026371
  r3c: 13: 804504679 1002022466 1002022466
  r3c: 14: 811424953 1002021767 1002021767
  r3c: 15: 811423320 1002021594 1002021594
  r3c: 0: 810883154 1002021654 1002021654
  r3c: 1: 810881069 1002017334 1002017334
  r3c: 2: 810878689 1002014010 1002014010
  r3c: 3: 810876654 1002011516 1002011516
  r3c: 4: 800488244 1002007858 1002007858
  r3c: 5: 800486260 1002003635 1002003635
  r3c: 6: 800483374 1002000384 1002000384
  r3c: 7: 800481011 1001997122 1001997122
  r3c: 12874304114 16032851415 16032851415
  r3c: 6445458455 8016073513 8016073513

   Performance counter stats for 'system wide':

  12,874,304,114  cpu_core/r3c/
   6,445,458,455  cpu_atom/r3c/

 1.002310991 seconds time elapsed

It also supports the raw event inside pmu. Syntax is similar:

cpu_core//
cpu_atom//

  root@ssp-pwrt-002:~# ./perf stat -e cpu_core/r3c/ -vv  -- ./triad_loop
  Control descriptor is not initialized
  
  perf_event_attr:
type 4
size 120
config   0x3c
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
enable_on_exec   1
exclude_guest1
  
  sys_perf_event_open: pid 12340  cpu -1  group_fd -1  flags 0x8 = 3
  cpu_core/r3c/: 0: 449000613 

[PATCH v2 10/27] perf parse-events: Create two hybrid cache events

2021-03-10 Thread Jin Yao
For cache events, they have pre-defined configs. The kernel needs
to know where the cache event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HW_CACHE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HW_CACHE_PMU'.

The new attr.config layout for PERF_TYPE_HW_CACHE_PMU is

0xDD00CCBBAA
AA: hardware cache ID
BB: hardware cache op ID
CC: hardware cache op result ID
DD: PMU type ID

Similar as hardware event, PMU type ID is retrieved from sysfs.

When enabling a hybrid cache event without specified pmu, such as,
'perf stat -e L1-dcache-loads -a', two events are created
automatically. One is for atom, the other is for core.

  root@ssp-pwrt-002:~# ./perf stat -e L1-dcache-loads -vv -a -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 7
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 7
size 120
config   0xa
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  L1-dcache-loads: 0: 13103284 1002535421 1002535421
  L1-dcache-loads: 1: 12995797 1002532807 1002532807
  L1-dcache-loads: 2: 13428186 1002528572 1002528572
  L1-dcache-loads: 3: 12913469 1002517437 1002517437
  L1-dcache-loads: 4: 12857843 1002507079 1002507079
  L1-dcache-loads: 5: 12812079 1002498279 1002498279
  L1-dcache-loads: 6: 12829938 1002490010 1002490010
  L1-dcache-loads: 7: 12807085 1002481860 1002481860
  L1-dcache-loads: 8: 12907189 1002473181 1002473181
  L1-dcache-loads: 9: 12823095 1002465895 1002465895
  L1-dcache-loads: 10: 12892770 1002459322 1002459322
  L1-dcache-loads: 11: 12789718 1002451607 1002451607
  L1-dcache-loads: 12: 12838931 1002442632 1002442632
  L1-dcache-loads: 13: 12803756 1002434133 1002434133
  L1-dcache-loads: 14: 12840574 1002426060 1002426060
  L1-dcache-loads: 15: 12799075 1002415964 1002415964
  L1-dcache-loads: 0: 39394457 1002406287 1002406287
  L1-dcache-loads: 1: 39372632 1002400502 1002400502
  L1-dcache-loads: 2: 39405247 1002394865 1002394865
  L1-dcache-loads: 3: 39400547 1002389099 1002389099
  L1-dcache-loads: 4: 39410752 1002383106 1002383106
  L1-dcache-loads: 5: 39402983 1002375365 1002375365
  L1-dcache-loads: 6: 39388775 1002369374 1002369374
  L1-dcache-loads: 7: 39408527 1002363344 1002363344
  L1-dcache-loads: 206442789 16039660259 16039660259
  L1-dcache-loads: 315183920 8019081942 8019081942

   Performance counter stats for 'system wide':

 206,442,789  cpu_core/L1-dcache-loads/
 315,183,920  cpu_atom/L1-dcache-loads/

 1.002751663 seconds 

[PATCH v2 09/27] perf parse-events: Create two hybrid hardware events

2021-03-10 Thread Jin Yao
For hardware events, they have pre-defined configs. The kernel
needs to know where the event comes from (e.g. from cpu_core pmu
or from cpu_atom pmu). But the perf type 'PERF_TYPE_HARDWARE'
can't carry pmu information.

So the kernel introduces a new type 'PERF_TYPE_HARDWARE_PMU'.
The new attr.config layout for PERF_TYPE_HARDWARE_PMU is:

0xDD00AA
AA: original hardware event ID
DD: PMU type ID

PMU type ID is retrieved from sysfs. For example,

  cat /sys/devices/cpu_atom/type
  10

  cat /sys/devices/cpu_core/type
  4

When enabling a hybrid hardware event without specified pmu, such as,
'perf stat -e cycles -a', two events are created automatically. One
is for atom, the other is for core.

  root@ssp-pwrt-002:~# ./perf stat -e cycles -vv -a -- sleep 1
  Control descriptor is not initialized
  
  perf_event_attr:
type 6
size 120
config   0x4
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 3
  sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8 = 4
  sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8 = 5
  sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8 = 7
  sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8 = 8
  sys_perf_event_open: pid -1  cpu 5  group_fd -1  flags 0x8 = 9
  sys_perf_event_open: pid -1  cpu 6  group_fd -1  flags 0x8 = 10
  sys_perf_event_open: pid -1  cpu 7  group_fd -1  flags 0x8 = 11
  sys_perf_event_open: pid -1  cpu 8  group_fd -1  flags 0x8 = 12
  sys_perf_event_open: pid -1  cpu 9  group_fd -1  flags 0x8 = 13
  sys_perf_event_open: pid -1  cpu 10  group_fd -1  flags 0x8 = 14
  sys_perf_event_open: pid -1  cpu 11  group_fd -1  flags 0x8 = 15
  sys_perf_event_open: pid -1  cpu 12  group_fd -1  flags 0x8 = 16
  sys_perf_event_open: pid -1  cpu 13  group_fd -1  flags 0x8 = 17
  sys_perf_event_open: pid -1  cpu 14  group_fd -1  flags 0x8 = 18
  sys_perf_event_open: pid -1  cpu 15  group_fd -1  flags 0x8 = 19
  
  perf_event_attr:
type 6
size 120
config   0xa
sample_type  IDENTIFIER
read_format  TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit  1
exclude_guest1
  
  sys_perf_event_open: pid -1  cpu 16  group_fd -1  flags 0x8 = 20
  sys_perf_event_open: pid -1  cpu 17  group_fd -1  flags 0x8 = 21
  sys_perf_event_open: pid -1  cpu 18  group_fd -1  flags 0x8 = 22
  sys_perf_event_open: pid -1  cpu 19  group_fd -1  flags 0x8 = 23
  sys_perf_event_open: pid -1  cpu 20  group_fd -1  flags 0x8 = 24
  sys_perf_event_open: pid -1  cpu 21  group_fd -1  flags 0x8 = 25
  sys_perf_event_open: pid -1  cpu 22  group_fd -1  flags 0x8 = 26
  sys_perf_event_open: pid -1  cpu 23  group_fd -1  flags 0x8 = 27
  cycles: 0: 810754998 1002563650 1002563650
  cycles: 1: 810749852 1002559947 1002559947
  cycles: 2: 808096005 1002555036 1002555036
  cycles: 3: 808090246 1002543496 1002543496
  cycles: 4: 800933425 1002536659 1002536659
  cycles: 5: 800928573 1002528386 1002528386
  cycles: 6: 800924347 1002520527 1002520527
  cycles: 7: 800922009 1002513176 1002513176
  cycles: 8: 800919624 1002507326 1002507326
  cycles: 9: 800917204 1002500663 1002500663
  cycles: 10: 802096579 1002494280 1002494280
  cycles: 11: 802093770 1002486404 1002486404
  cycles: 12: 803284338 1002479491 1002479491
  cycles: 13: 803277609 1002469777 1002469777
  cycles: 14: 800875902 1002458861 1002458861
  cycles: 15: 800873241 1002451350 1002451350
  cycles: 0: 800837379 1002444645 1002444645
  cycles: 1: 800833400 1002438505 1002438505
  cycles: 2: 800829291 1002433698 1002433698
  cycles: 3: 800824390 1002427584 1002427584
  cycles: 4: 800819360 1002422099 1002422099
  cycles: 5: 800814787 1002415845 1002415845
  cycles: 6: 800810125 1002410301 1002410301
  cycles: 7: 800791893 1002386845 1002386845
  cycles: 12855737722 16040169029 16040169029
  cycles: 6406560625 8019379522 8019379522

   Performance counter stats for 'system wide':

  12,855,737,722  cpu_core/cycles/
   6,406,560,625  cpu_atom/cycles/

 1.002774658 seconds time elapsed

type 6 is PERF_TYPE_HARDWARE_PMU.
0x4 in 0x4 indicates the cpu_core pmu.
0xa in 0xa indicates the cpu_atom pmu.

Signed-off-by: Jin Yao 
---
 tools/perf/util/parse-events.c | 73 

[PATCH v2 08/27] perf stat: Uniquify hybrid event name

2021-03-10 Thread Jin Yao
It would be useful to tell user the pmu which the event belongs to.
perf-stat has supported '--no-merge' option and it can print the pmu
name after the event name, such as:

"cycles [cpu_core]"

Now this option is enabled by default for hybrid platform but change
the format to:

"cpu_core/cycles/"

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c  |  3 +++
 tools/perf/util/stat-display.c | 12 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 68ecf68699a9..6c0a21323814 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2390,6 +2390,9 @@ int cmd_stat(int argc, const char **argv)
 
evlist__check_cpu_maps(evsel_list);
 
+   if (perf_pmu__hybrid_exist())
+   stat_config.no_merge = true;
+
/*
 * Initialize thread_map with comm names,
 * so we could print it out on output.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7f09cdaf5b60..ed37d8e7ea1a 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -526,6 +526,7 @@ static void uniquify_event_name(struct evsel *counter)
 {
char *new_name;
char *config;
+   int ret;
 
if (counter->uniquified_name ||
!counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
@@ -540,8 +541,15 @@ static void uniquify_event_name(struct evsel *counter)
counter->name = new_name;
}
} else {
-   if (asprintf(_name,
-"%s [%s]", counter->name, counter->pmu_name) > 0) {
+   if (perf_pmu__hybrid_exist()) {
+   ret = asprintf(_name, "%s/%s/",
+  counter->pmu_name, counter->name);
+   } else {
+   ret = asprintf(_name, "%s [%s]",
+  counter->name, counter->pmu_name);
+   }
+
+   if (ret) {
free(counter->name);
counter->name = new_name;
}
-- 
2.17.1



[PATCH v2 06/27] perf pmu: Add hybrid helper functions

2021-03-10 Thread Jin Yao
The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
can be used to identify the hybrid platform and return the found
hybrid cpu pmu. All the detected hybrid pmus have been saved in
'perf_pmu__hybrid_pmus' list. So we just need to search this list.

perf_pmu__hybrid_type_to_pmu converts the user specified string
to hybrid pmu name. This is used to support the '--cputype' option
in next patches.

perf_pmu__hybrid_exist checks the existing of hybrid pmu.

Signed-off-by: Jin Yao 
---
 tools/perf/util/pmu-hybrid.c | 40 
 tools/perf/util/pmu-hybrid.h | 11 ++
 2 files changed, 51 insertions(+)

diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
index 7316bf46e54b..86ba84d9469c 100644
--- a/tools/perf/util/pmu-hybrid.c
+++ b/tools/perf/util/pmu-hybrid.c
@@ -33,3 +33,43 @@ bool perf_pmu__hybrid_mounted(const char *name)
snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
return file_available(path);
 }
+
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name)
+{
+   struct perf_pmu *pmu;
+
+   if (!name)
+   return NULL;
+
+   perf_pmu__for_each_hybrid_pmu(pmu) {
+   if (!strcmp(name, pmu->name))
+   return pmu;
+   }
+
+   return NULL;
+}
+
+bool perf_pmu__is_hybrid(const char *name)
+{
+   return perf_pmu__find_hybrid_pmu(name) != NULL;
+}
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type)
+{
+   char *pmu_name = NULL;
+
+   if (asprintf(_name, "cpu_%s", type) < 0)
+   return NULL;
+
+   if (perf_pmu__is_hybrid(pmu_name))
+   return pmu_name;
+
+   /*
+* pmu may be not scanned, check the sysfs.
+*/
+   if (perf_pmu__hybrid_mounted(pmu_name))
+   return pmu_name;
+
+   free(pmu_name);
+   return NULL;
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
index 35bed3714438..7fb2246e939a 100644
--- a/tools/perf/util/pmu-hybrid.h
+++ b/tools/perf/util/pmu-hybrid.h
@@ -15,4 +15,15 @@ extern struct list_head perf_pmu__hybrid_pmus;
 
 bool perf_pmu__hybrid_mounted(const char *name);
 
+struct perf_pmu *perf_pmu__find_hybrid_pmu(const char *name);
+
+bool perf_pmu__is_hybrid(const char *name);
+
+char *perf_pmu__hybrid_type_to_pmu(const char *type);
+
+static inline bool perf_pmu__hybrid_exist(void)
+{
+   return !list_empty(_pmu__hybrid_pmus);
+}
+
 #endif /* __PMU_HYBRID_H */
-- 
2.17.1



[PATCH v2 04/27] perf pmu: Save pmu name

2021-03-10 Thread Jin Yao
On hybrid platform, one event is available on one pmu
(such as, available on cpu_core or on cpu_atom).

This patch saves the pmu name to the pmu field of struct perf_pmu_alias.
Then next we can know the pmu which the event can be available on.

Signed-off-by: Jin Yao 
---
 tools/perf/util/pmu.c | 10 +-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 54e586bf19a5..45d8db1af8d2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -283,6 +283,7 @@ void perf_pmu_free_alias(struct perf_pmu_alias *newalias)
zfree(>str);
zfree(>metric_expr);
zfree(>metric_name);
+   zfree(>pmu);
parse_events_terms__purge(>terms);
free(newalias);
 }
@@ -297,6 +298,10 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 
list_for_each_entry(a, alist, list) {
if (!strcasecmp(newalias->name, a->name)) {
+   if (newalias->pmu && a->pmu &&
+   !strcasecmp(newalias->pmu, a->pmu)) {
+   continue;
+   }
perf_pmu_update_alias(a, newalias);
perf_pmu_free_alias(newalias);
return true;
@@ -314,7 +319,8 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
int num;
char newval[256];
char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
-*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL,
+*pmu = NULL;
 
if (pe) {
long_desc = (char *)pe->long_desc;
@@ -324,6 +330,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
metric_expr = (char *)pe->metric_expr;
metric_name = (char *)pe->metric_name;
deprecated = (char *)pe->deprecated;
+   pmu = (char *)pe->pmu;
}
 
alias = malloc(sizeof(*alias));
@@ -389,6 +396,7 @@ static int __perf_pmu__new_alias(struct list_head *list, 
char *dir, char *name,
}
alias->per_pkg = perpkg && sscanf(perpkg, "%d", ) == 1 && num == 1;
alias->str = strdup(newval);
+   alias->pmu = pmu ? strdup(pmu) : NULL;
 
if (deprecated)
alias->deprecated = true;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8164388478c6..0e724d5b84c6 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -72,6 +72,7 @@ struct perf_pmu_alias {
bool deprecated;
char *metric_expr;
char *metric_name;
+   char *pmu;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.17.1



[PATCH v2 02/27] perf jevents: Support unit value "cpu_core" and "cpu_atom"

2021-03-10 Thread Jin Yao
For some Intel platforms, such as Alderlake, which is a hybrid platform
and it consists of atom cpu and core cpu. Each cpu has dedicated event
list. Part of events are available on core cpu, part of events are
available on atom cpu.

The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
json is added with a new field "Unit" to indicate which pmu the event
is available on.

For example, one event in cache.json,

{
"BriefDescription": "Counts the number of load ops retired that",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xd2",
"EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "103",
"UMask": "0x80",
"Unit": "cpu_atom"
},

The unit "cpu_atom" indicates this event is only availabe on "cpu_atom".

In generated pmu-events.c, we can see:

{
.name = "mem_load_uops_retired_misc.mmio",
.event = "period=103,umask=0x80,event=0xd2",
.desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
.topic = "cache",
.pmu = "cpu_atom",
},

But if without this patch, the "uncore_" prefix is added before "cpu_atom",
such as:
.pmu = "uncore_cpu_atom"

That would be a wrong pmu.

Signed-off-by: Jin Yao 
---
 tools/perf/pmu-events/jevents.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
index e1f3f5c8c550..b1a15f57c9ad 100644
--- a/tools/perf/pmu-events/jevents.c
+++ b/tools/perf/pmu-events/jevents.c
@@ -285,6 +285,8 @@ static struct map {
{ "imx8_ddr", "imx8_ddr" },
{ "L3PMC", "amd_l3" },
{ "DFPMC", "amd_df" },
+   { "cpu_core", "cpu_core" },
+   { "cpu_atom", "cpu_atom" },
{}
 };
 
-- 
2.17.1



[PATCH v2 03/27] perf pmu: Simplify arguments of __perf_pmu__new_alias

2021-03-10 Thread Jin Yao
Simplify the arguments of __perf_pmu__new_alias() by passing
the whole 'struct pme_event' pointer.

Signed-off-by: Jin Yao 
---
 tools/perf/util/pmu.c | 36 
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 44ef28302fc7..54e586bf19a5 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -306,18 +306,25 @@ static bool perf_pmu_merge_alias(struct perf_pmu_alias 
*newalias,
 }
 
 static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
-char *desc, char *val,
-char *long_desc, char *topic,
-char *unit, char *perpkg,
-char *metric_expr,
-char *metric_name,
-char *deprecated)
+char *desc, char *val, struct pmu_event *pe)
 {
struct parse_events_term *term;
struct perf_pmu_alias *alias;
int ret;
int num;
char newval[256];
+   char *long_desc = NULL, *topic = NULL, *unit = NULL, *perpkg = NULL,
+*metric_expr = NULL, *metric_name = NULL, *deprecated = NULL;
+
+   if (pe) {
+   long_desc = (char *)pe->long_desc;
+   topic = (char *)pe->topic;
+   unit = (char *)pe->unit;
+   perpkg = (char *)pe->perpkg;
+   metric_expr = (char *)pe->metric_expr;
+   metric_name = (char *)pe->metric_name;
+   deprecated = (char *)pe->deprecated;
+   }
 
alias = malloc(sizeof(*alias));
if (!alias)
@@ -406,8 +413,7 @@ static int perf_pmu__new_alias(struct list_head *list, char 
*dir, char *name, FI
/* Remove trailing newline from sysfs file */
strim(buf);
 
-   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL, NULL, 
NULL,
-NULL, NULL, NULL, NULL);
+   return __perf_pmu__new_alias(list, dir, name, NULL, buf, NULL);
 }
 
 static inline bool pmu_alias_info_file(char *name)
@@ -793,11 +799,7 @@ void pmu_add_cpu_aliases_map(struct list_head *head, 
struct perf_pmu *pmu,
/* need type casts to override 'const' */
__perf_pmu__new_alias(head, NULL, (char *)pe->name,
(char *)pe->desc, (char *)pe->event,
-   (char *)pe->long_desc, (char *)pe->topic,
-   (char *)pe->unit, (char *)pe->perpkg,
-   (char *)pe->metric_expr,
-   (char *)pe->metric_name,
-   (char *)pe->deprecated);
+   pe);
}
 }
 
@@ -864,13 +866,7 @@ static int pmu_add_sys_aliases_iter_fn(struct pmu_event 
*pe, void *data)
  (char *)pe->name,
  (char *)pe->desc,
  (char *)pe->event,
- (char *)pe->long_desc,
- (char *)pe->topic,
- (char *)pe->unit,
- (char *)pe->perpkg,
- (char *)pe->metric_expr,
- (char *)pe->metric_name,
- (char *)pe->deprecated);
+ pe);
}
 
return 0;
-- 
2.17.1



[PATCH v2 07/27] perf evlist: Hybrid event uses its own cpus

2021-03-10 Thread Jin Yao
On hybrid platform, atom events can be only enabled on atom CPUs. Core
events can be only enabled on core CPUs. So for a hybrid event, it can
be only enabled on it's own CPUs.

But the problem for current perf is, the cpus for evsel (via PMU sysfs)
have been merged to evsel_list->core.all_cpus. It might be all CPUs.

So we need to figure out one way to let the hybrid event only use it's
own CPUs.

The idea is to create a new evlist__invalidate_all_cpus to invalidate
the evsel_list->core.all_cpus then evlist__for_each_cpu returns cpu -1
for hybrid evsel. If cpu is -1, hybrid evsel will use it's own cpus.

We will see following code piece in patch.

if (cpu == -1 && !evlist->thread_mode)
evsel__enable_cpus(pos);

It lets the event be enabled on event's own cpus.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c  | 37 ++-
 tools/perf/util/evlist.c   | 72 --
 tools/perf/util/evlist.h   |  4 ++
 tools/perf/util/evsel.h|  8 
 tools/perf/util/python-ext-sources |  2 +
 5 files changed, 117 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2e2e4a8345ea..68ecf68699a9 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -393,6 +393,18 @@ static int read_counter_cpu(struct evsel *counter, struct 
timespec *rs, int cpu)
return 0;
 }
 
+static int read_counter_cpus(struct evsel *counter, struct timespec *rs)
+{
+   int cpu, nr_cpus, err = 0;
+   struct perf_cpu_map *cpus = evsel__cpus(counter);
+
+   nr_cpus = cpus ? cpus->nr : 1;
+   for (cpu = 0; cpu < nr_cpus; cpu++)
+   err = read_counter_cpu(counter, rs, cpu);
+
+   return err;
+}
+
 static int read_affinity_counters(struct timespec *rs)
 {
struct evsel *counter;
@@ -414,8 +426,14 @@ static int read_affinity_counters(struct timespec *rs)
if (evsel__cpu_iter_skip(counter, cpu))
continue;
if (!counter->err) {
-   counter->err = read_counter_cpu(counter, rs,
-   
counter->cpu_iter - 1);
+   if (cpu == -1 && !evsel_list->thread_mode) {
+   counter->err = 
read_counter_cpus(counter, rs);
+   } else if (evsel_list->thread_mode) {
+   counter->err = 
read_counter_cpu(counter, rs, 0);
+   } else {
+   counter->err = 
read_counter_cpu(counter, rs,
+   
counter->cpu_iter - 1);
+   }
}
}
}
@@ -781,6 +799,21 @@ static int __run_perf_stat(int argc, const char **argv, 
int run_idx)
if (group)
evlist__set_leader(evsel_list);
 
+   /*
+* On hybrid platform, the cpus for evsel (via PMU sysfs) have been
+* merged to evsel_list->core.all_cpus. We use 
evlist__invalidate_all_cpus
+* to invalidate the evsel_list->core.all_cpus then evlist__for_each_cpu
+* returns cpu -1 for hybrid evsel. If cpu is -1, hybrid evsel will
+* use it's own cpus.
+*/
+   if (evlist__has_hybrid_events(evsel_list)) {
+   evlist__invalidate_all_cpus(evsel_list);
+   if (!target__has_cpu() ||
+   target__has_per_thread()) {
+   evsel_list->thread_mode = true;
+   }
+   }
+
if (affinity__setup() < 0)
return -1;
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 882cd1f721d9..3ee12fcd0c9f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -381,7 +381,8 @@ bool evsel__cpu_iter_skip_no_inc(struct evsel *ev, int cpu)
 bool evsel__cpu_iter_skip(struct evsel *ev, int cpu)
 {
if (!evsel__cpu_iter_skip_no_inc(ev, cpu)) {
-   ev->cpu_iter++;
+   if (cpu != -1)
+   ev->cpu_iter++;
return false;
}
return true;
@@ -410,6 +411,16 @@ static int evlist__is_enabled(struct evlist *evlist)
return false;
 }
 
+static void evsel__disable_cpus(struct evsel *evsel)
+{
+   int cpu, nr_cpus;
+   struct perf_cpu_map *cpus = evsel__cpus(evsel);
+
+   nr_cpus = cpus ? cpus->nr : 1;
+   for (cpu = 0; cpu < nr_cpus; cpu++)
+   evsel__disable_cpu(evsel, cpu);
+}
+
 static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 {
struct evsel *pos;
@@ -436,7 +447,12 @@ static void __evlist__disable(struct evlist *evlist, char 
*evsel_name)
has_imm = true;
if (pos->immediate != imm)
  

[PATCH v2 05/27] perf pmu: Save detected hybrid pmus to a global pmu list

2021-03-10 Thread Jin Yao
We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:

For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"

For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"

If the 'cpus' file exists, the pmu exists.

But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way. So if the path
"/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the hybrid
pmu exists. All the detected hybrid pmus are linked to a
global list 'perf_pmu__hybrid_pmus' and then next we just need
to iterate the list to get all hybrid pmu by using
perf_pmu__for_each_hybrid_pmu.

Signed-off-by: Jin Yao 
---
 tools/perf/util/Build|  1 +
 tools/perf/util/pmu-hybrid.c | 35 +++
 tools/perf/util/pmu-hybrid.h | 18 ++
 tools/perf/util/pmu.c|  9 -
 tools/perf/util/pmu.h|  4 
 5 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/pmu-hybrid.c
 create mode 100644 tools/perf/util/pmu-hybrid.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e3e12f9d4733..37a8a63c7195 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,6 +69,7 @@ perf-y += parse-events-bison.o
 perf-y += pmu.o
 perf-y += pmu-flex.o
 perf-y += pmu-bison.o
+perf-y += pmu-hybrid.o
 perf-y += trace-event-read.o
 perf-y += trace-event-info.o
 perf-y += trace-event-scripting.o
diff --git a/tools/perf/util/pmu-hybrid.c b/tools/perf/util/pmu-hybrid.c
new file mode 100644
index ..7316bf46e54b
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "fncache.h"
+#include "pmu-hybrid.h"
+
+LIST_HEAD(perf_pmu__hybrid_pmus);
+
+bool perf_pmu__hybrid_mounted(const char *name)
+{
+   char path[PATH_MAX];
+   const char *sysfs;
+
+   if (strncmp(name, "cpu_", 4))
+   return false;
+
+   sysfs = sysfs__mountpoint();
+   if (!sysfs)
+   return false;
+
+   snprintf(path, PATH_MAX, CPUS_TEMPLATE_CPU, sysfs, name);
+   return file_available(path);
+}
diff --git a/tools/perf/util/pmu-hybrid.h b/tools/perf/util/pmu-hybrid.h
new file mode 100644
index ..35bed3714438
--- /dev/null
+++ b/tools/perf/util/pmu-hybrid.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PMU_HYBRID_H
+#define __PMU_HYBRID_H
+
+#include 
+#include 
+#include 
+#include 
+#include "pmu.h"
+
+extern struct list_head perf_pmu__hybrid_pmus;
+
+#define perf_pmu__for_each_hybrid_pmu(pmu) \
+   list_for_each_entry(pmu, _pmu__hybrid_pmus, hybrid_list)
+
+bool perf_pmu__hybrid_mounted(const char *name);
+
+#endif /* __PMU_HYBRID_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 45d8db1af8d2..08280a3e45a8 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -25,6 +25,7 @@
 #include "string2.h"
 #include "strbuf.h"
 #include "fncache.h"
+#include "pmu-hybrid.h"
 
 struct perf_pmu perf_pmu__fake;
 
@@ -613,7 +614,6 @@ static struct perf_cpu_map *__pmu_cpumask(const char *path)
  */
 #define SYS_TEMPLATE_ID"./bus/event_source/devices/%s/identifier"
 #define CPUS_TEMPLATE_UNCORE   "%s/bus/event_source/devices/%s/cpumask"
-#define CPUS_TEMPLATE_CPU  "%s/bus/event_source/devices/%s/cpus"
 
 static struct perf_cpu_map *pmu_cpumask(const char *name)
 {
@@ -645,6 +645,9 @@ static bool pmu_is_uncore(const char *name)
char path[PATH_MAX];
const char *sysfs;
 
+   if (perf_pmu__hybrid_mounted(name))
+   return false;
+
sysfs = sysfs__mountpoint();
snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);
return file_available(path);
@@ -946,6 +949,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
pmu->is_uncore = pmu_is_uncore(name);
if (pmu->is_uncore)
pmu->id = pmu_id(name);
+   pmu->is_hybrid = perf_pmu__hybrid_mounted(name);
pmu->max_precise = pmu_max_precise(name);
pmu_add_cpu_aliases(, pmu);
pmu_add_sys_aliases(, pmu);
@@ -957,6 +961,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
list_splice(, >aliases);
list_add_tail(>list, );
 
+   if (pmu->is_hybrid)
+   list_add_tail(>hybrid_list, _pmu__hybrid_pmus);
+
pmu->default_config = perf_pmu__get_default_config(pmu);
 
return pmu;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 0e724d5b84c6..3b9b4def6032 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "parse-events.h"
 #include "pmu-events/pmu-events.h"
@@ -19,6 +20,7 @@ enum {
 
 #define PERF_PMU_FORMAT_BITS 64
 #define EVENT_SOURCE_DEVICE_PATH 

[PATCH v2 01/27] tools headers uapi: Update tools's copy of linux/perf_event.h

2021-03-10 Thread Jin Yao
To get the changes in:

Liang Kan's patch
("perf: Introduce PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU")

Kan's patch is in review at the moment, but the following perf tool
patches need this interface for hybrid support.

This patch can be removed after Kan's patch is upstreamed.

Signed-off-by: Jin Yao 
---
 tools/include/uapi/linux/perf_event.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index ad15e40d7f5d..c0a511eea498 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -33,6 +33,8 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE  = 3,
PERF_TYPE_RAW   = 4,
PERF_TYPE_BREAKPOINT= 5,
+   PERF_TYPE_HARDWARE_PMU  = 6,
+   PERF_TYPE_HW_CACHE_PMU  = 7,
 
PERF_TYPE_MAX,  /* non-ABI */
 };
@@ -94,6 +96,30 @@ enum perf_hw_cache_op_result_id {
PERF_COUNT_HW_CACHE_RESULT_MAX, /* non-ABI */
 };
 
+/*
+ * attr.config layout for type PERF_TYPE_HARDWARE* and PERF_TYPE_HW_CACHE*
+ * PERF_TYPE_HARDWARE: 0xAA
+ * AA: hardware event ID
+ * PERF_TYPE_HW_CACHE: 0xCCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * PERF_TYPE_HARDWARE_PMU: 0xDD00AA
+ * AA: hardware event ID
+ * DD: PMU type ID
+ * PERF_TYPE_HW_CACHE_PMU: 0xDD00CCBBAA
+ * AA: hardware cache ID
+ * BB: hardware cache op ID
+ * CC: hardware cache op result ID
+ * DD: PMU type ID
+ */
+#define PERF_HW_CACHE_ID_SHIFT 0
+#define PERF_HW_CACHE_OP_ID_SHIFT  8
+#define PERF_HW_CACHE_OP_RESULT_ID_SHIFT   16
+#define PERF_HW_CACHE_EVENT_MASK   0xff
+
+#define PERF_PMU_TYPE_SHIFT32
+
 /*
  * Special "software" events provided by the kernel, even if the hardware
  * does not support performance events. These events measure various
-- 
2.17.1



[PATCH v2 00/27] perf tool: AlderLake hybrid support series 1

2021-03-10 Thread Jin Yao
AlderLake uses a hybrid architecture utilizing Golden Cove cores
(core cpu) and Gracemont cores (atom cpu). Each cpu has dedicated
event list. Some events are available on core cpu, some events
are available on atom cpu and some events can be available on both.

Kernel exports new pmus "cpu_core" and "cpu_atom" through sysfs:
/sys/devices/cpu_core
/sys/devices/cpu_atom

cat /sys/devices/cpu_core/cpus
0-15

cat /sys/devices/cpu_atom/cpus
16-23

In this example, core cpus are 0-15 and atom cpus are 16-23.

To enable a core only event or atom only event:

cpu_core//
or
cpu_atom//

Count the 'cycles' event on core cpus.

  # perf stat -e cpu_core/cycles/ -a -- sleep 1

   Performance counter stats for 'system wide':

  12,853,951,349  cpu_core/cycles/

 1.002581249 seconds time elapsed

If one event is available on both atom cpu and core cpu, two events
are created automatically.

  # perf stat -e cycles -a -- sleep 1

   Performance counter stats for 'system wide':

  12,856,467,438  cpu_core/cycles/
   6,404,634,785  cpu_atom/cycles/

 1.002453013 seconds time elapsed

Group is supported if the events are from same pmu, otherwise a warning
is displayed.   

  # perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -a -- sleep 1

   Performance counter stats for 'system wide':

  12,863,866,968  cpu_core/cycles/
 554,795,017  cpu_core/instructions/

 1.002616117 seconds time elapsed

  # perf stat -e '{cpu_core/cycles/,cpu_atom/instructions/}' -a -- sleep 1
  WARNING: Group has events from different hybrid PMUs

   Performance counter stats for 'system wide':

  12,863,344,830  cpu_core/cycles/
   cpu_atom/instructions/

   1.002568830 seconds time elapsed

Note that, since the whole patchset for AlderLake hybrid support is very
large (40+ patches), for simplicity, it's splitted into several patch
series.

The patch series 1 only supports the basic functionality. The other
supports for perf-c2c/perf-mem/topdown/metrics/topology header and etc
will be added in follow-up series.

v2:
---
- Drop kernel patches (Kan posted the series "Add Alder Lake support for perf 
(kernel)" separately).
- Drop the patches for perf-c2c/perf-mem/topdown/metrics/topology header 
supports,
  which will be added in series 2 or series 3.
- Simplify the arguments of __perf_pmu__new_alias() by passing
  the 'struct pme_event' pointer.
- Check sysfs validity before access.
- Use pmu style event name, such as "cpu_core/cycles/".
- Move command output two chars to the right.
- Move pmu hybrid functions to new created pmu-hybrid.c/pmu-hybrid.h.
  This is to pass the perf test python case.

Jin Yao (27):
  tools headers uapi: Update tools's copy of linux/perf_event.h
  perf jevents: Support unit value "cpu_core" and "cpu_atom"
  perf pmu: Simplify arguments of __perf_pmu__new_alias
  perf pmu: Save pmu name
  perf pmu: Save detected hybrid pmus to a global pmu list
  perf pmu: Add hybrid helper functions
  perf evlist: Hybrid event uses its own cpus
  perf stat: Uniquify hybrid event name
  perf parse-events: Create two hybrid hardware events
  perf parse-events: Create two hybrid cache events
  perf parse-events: Support hardware events inside PMU
  perf parse-events: Support hybrid raw events
  perf evlist: Create two hybrid 'cycles' events by default
  perf stat: Add default hybrid events
  perf stat: Filter out unmatched aggregation for hybrid event
  perf evlist: Warn as events from different hybrid PMUs in a group
  perf evsel: Adjust hybrid event and global event mixed group
  perf script: Support PERF_TYPE_HARDWARE_PMU and PERF_TYPE_HW_CACHE_PMU
  perf tests: Add hybrid cases for 'Parse event definition strings' test
  perf tests: Add hybrid cases for 'Roundtrip evsel->name' test
  perf tests: Skip 'Setup struct perf_event_attr' test for hybrid
  perf tests: Support 'Track with sched_switch' test for hybrid
  perf tests: Support 'Parse and process metrics' test for hybrid
  perf tests: Support 'Session topology' test for hybrid
  perf tests: Support 'Convert perf time to TSC' test for hybrid
  perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid
  perf Documentation: Document intel-hybrid support

 tools/include/uapi/linux/perf_event.h  |  26 ++
 tools/perf/Documentation/intel-hybrid.txt  | 228 +
 tools/perf/Documentation/perf-record.txt   |   1 +
 tools/perf/Documentation/perf-stat.txt |   2 +
 tools/perf/builtin-record.c|  13 +-
 tools/perf/builtin-script.c|  24 ++
 tools/perf/builtin-stat.c  |  69 -
 tools/perf/pmu-events/jevents.c|   2 +
 tools/perf/tests/attr.c|   5 +
 tools/perf/tests/evsel-roundtrip-name.c|  19 +-
 tools/perf/tests/parse-events.c| 171 +
 tools/perf/tests/parse-metric.c|  11 +-
 tools/perf/tests/perf-time-to-tsc.c|  16 ++
 

[RFC PATCH v2] Bluetooth: hci_qca: Add device_may_wakeup support

2021-03-10 Thread Venkata Lakshmi Narayana Gubba
Based on device may wakeup status, Bluez stack will enable/disable
passive scanning with whitelist in BT controller while suspending.
As interrupt from BT SoC is handled by UART driver,we need to use
device handle of UART driver to get the status of device may wakeup

Signed-off-by: Venkata Lakshmi Narayana Gubba 
---
 drivers/bluetooth/hci_qca.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index de36af6..73af901 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1571,6 +1571,20 @@ static void qca_cmd_timeout(struct hci_dev *hdev)
mutex_unlock(>hci_memdump_lock);
 }
 
+static bool qca_prevent_wake(struct hci_dev *hdev)
+{
+   struct hci_uart *hu = hci_get_drvdata(hdev);
+   bool wakeup;
+
+   /* UART driver handles the interrupt from BT SoC.So we need to use
+* device handle of UART driver to get the status of device may wakeup.
+*/
+   wakeup = device_may_wakeup(hu->serdev->ctrl->dev.parent);
+   bt_dev_dbg(hu->hdev, "wakeup status : %d", wakeup);
+
+   return !wakeup;
+}
+
 static int qca_wcn3990_init(struct hci_uart *hu)
 {
struct qca_serdev *qcadev;
@@ -1721,6 +1735,7 @@ static int qca_setup(struct hci_uart *hu)
qca_debugfs_init(hdev);
hu->hdev->hw_error = qca_hw_error;
hu->hdev->cmd_timeout = qca_cmd_timeout;
+   hu->hdev->prevent_wake = qca_prevent_wake;
} else if (ret == -ENOENT) {
/* No patch/nvm-config found, run with original fw/config */
set_bit(QCA_ROM_FW, >flags);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
of Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v3 0/4] KVM: arm64: Add VLPI migration support on GICv4.1

2021-03-10 Thread Shenming Lu
Hi,

Sorry to bother again, I am really hoping a response for this series. :-)

Thanks,
Shenming

On 2021/2/26 16:58, Shenming Lu wrote:
> Hi Marc,
> 
> Gentle ping. Does this series need any further modification? Wish you can 
> pick it up. :-)
> 
> Thanks,
> Shenming
> 
> On 2021/1/27 20:13, Shenming Lu wrote:
>> Hi Marc, sorry for the late commit.
>>
>> In GICv4.1, migration has been supported except for (directly-injected)
>> VLPI. And GICv4.1 Spec explicitly gives a way to get the VLPI's pending
>> state (which was crucially missing in GICv4.0). So we make VLPI migration
>> capable on GICv4.1 in this patch set.
>>
>> In order to support VLPI migration, we need to save and restore all
>> required configuration information and pending states of VLPIs. But
>> in fact, the configuration information of VLPIs has already been saved
>> (or will be reallocated on the dst host...) in vgic(kvm) migration.
>> So we only have to migrate the pending states of VLPIs specially.
>>
>> Below is the related workflow in migration.
>>
>> On the save path:
>>  In migration completion:
>>  pause all vCPUs
>>  |
>>  call each VM state change handler:
>>  pause other devices (just keep from sending interrupts, 
>> and
>>  such as VFIO migration protocol has already realized it 
>> [1])
>>  |
>>  flush ITS tables into guest RAM
>>  |
>>  flush RDIST pending tables (also flush VLPI state here)
>>  |
>>  ...
>> On the resume path:
>>  load each device's state:
>>  restore ITS tables (include pending tables) from guest RAM
>>  |
>>  for other (PCI) devices (paused), if configured to have VLPIs,
>>  establish the forwarding paths of their VLPIs (and transfer
>>  the pending states from kvm's vgic to VPT here)
>>
>> We have tested this series in VFIO migration, and found some related
>> issues in QEMU [2].
>>
>> Links:
>> [1] vfio: UAPI for migration interface for device state:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a8a24f3f6e38103b77cf399c38eb54e1219d00d6
>> [2] vfio: Some fixes and optimizations for VFIO migration:
>> https://patchwork.ozlabs.org/cover/1413263/
>>
>> History:
>>
>> v2 -> v3
>>  - Add the vgic initialized check to ensure that the allocation and enabling
>>of the doorbells have already been done before unmapping the vPEs.
>>  - Check all get_vlpi_state related conditions in save_pending_tables in one 
>> place.
>>  - Nit fixes.
>>
>> v1 -> v2:
>>  - Get the VLPI state from the KVM side.
>>  - Nit fixes.
>>
>> Thanks,
>> Shenming
>>
>>
>> Shenming Lu (3):
>>   KVM: arm64: GICv4.1: Add function to get VLPI state
>>   KVM: arm64: GICv4.1: Try to save hw pending state in
>> save_pending_tables
>>   KVM: arm64: GICv4.1: Give a chance to save VLPI's pending state
>>
>> Zenghui Yu (1):
>>   KVM: arm64: GICv4.1: Restore VLPI's pending state to physical side
>>
>>  .../virt/kvm/devices/arm-vgic-its.rst |  2 +-
>>  arch/arm64/kvm/vgic/vgic-its.c|  6 +-
>>  arch/arm64/kvm/vgic/vgic-v3.c | 61 +--
>>  arch/arm64/kvm/vgic/vgic-v4.c | 33 ++
>>  arch/arm64/kvm/vgic/vgic.h|  1 +
>>  5 files changed, 93 insertions(+), 10 deletions(-)
>>


Re: [PATCH v4 4/5] perf stat: Enable iostat mode for x86 platforms

2021-03-10 Thread liuqi (BA)




On 2021/3/11 0:19, Alexander Antonov wrote:


On 3/9/2021 10:51 AM, liuqi (BA) wrote:

Hi Alexander,

On 2021/2/3 21:58, Alexander Antonov wrote:

This functionality is based on recently introduced sysfs attributes
for Intel® Xeon® Scalable processor family (code name Skylake-SP):
Commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to
IIO PMON mapping")

Mode is intended to provide four I/O performance metrics in MB per each
PCIe root port:
  - Inbound Read: I/O devices below root port read from the host memory
  - Inbound Write: I/O devices below root port write to the host memory
  - Outbound Read: CPU reads from I/O devices below root port
  - Outbound Write: CPU writes to I/O devices below root port

Each metric requiries only one uncore event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
 #EventCount * 4B / (1024 * 1024)

Signed-off-by: Alexander Antonov
---
  tools/perf/Documentation/perf-iostat.txt |  88 ++
  tools/perf/Makefile.perf |   5 +-
  tools/perf/arch/x86/util/Build   |   1 +
  tools/perf/arch/x86/util/iostat.c    | 345 +++
  tools/perf/command-list.txt  |   1 +
  tools/perf/perf-iostat.sh    |  12 +
  6 files changed, 451 insertions(+), 1 deletion(-)
  create mode 100644 tools/perf/Documentation/perf-iostat.txt
  create mode 100644 tools/perf/perf-iostat.sh

diff --git a/tools/perf/Documentation/perf-iostat.txt 
b/tools/perf/Documentation/perf-iostat.txt

new file mode 100644
index ..165176944031
--- /dev/null
+++ b/tools/perf/Documentation/perf-iostat.txt
@@ -0,0 +1,88 @@
+perf-iostat(1)
+===
+
+NAME
+
+perf-iostat - Show I/O performance metrics
+
+SYNOPSIS
+
+[verse]
+'perf iostat' list
+'perf iostat'  --  []
+
+DESCRIPTION
+---
+Mode is intended to provide four I/O performance metrics per each 
PCIe root port:

+
+- Inbound Read   - I/O devices below root port read from the host 
memory, in MB

+
+- Inbound Write  - I/O devices below root port write to the host 
memory, in MB

+
+- Outbound Read  - CPU reads from I/O devices below root port, in MB
+
+- Outbound Write - CPU writes to I/O devices below root port, in MB
+
+OPTIONS
+---
+...::
+    Any command you can specify in a shell.
+
+list::
+    List all PCIe root ports.


I noticed that "iostat" commond and cmd_iostat() callback function is 
not registered in cmd_struct in perf.c. So I think "perf iostat list" 
perhaps can not work properly.


I also test this patchset on x86 platform, and here is the log:

root@ubuntu:/home/lq# ./perf iostat list
perf: 'iostat' is not a perf-command. See 'perf --help'.
root@ubuntu:/home/lq# ./perf stat --iostat
^C
 Performance counter stats for 'system wide':

   port Inbound Read(MB)    Inbound Write(MB) Outbound 
Read(MB)   Outbound Write(MB)

:00    0 0    0  0
:80    0 0    0  0
:17    0 0    0  0
:85    0 0    0  0
:3a    0 0    0  0
:ae    0 0    0  0
:5d    0 0    0  0
:d7    0 0    0  0

   0.611303832 seconds time elapsed


root@ubuntu:/home/lq# ./perf stat --iostat=:17
^C
 Performance counter stats for 'system wide':

   port Inbound Read(MB)    Inbound Write(MB) Outbound 
Read(MB)   Outbound Write(MB)

:17    0 0    0  0

   0.521317572 seconds time elapsed

So how does following perf iostat list work, did I miss something?

Thanks,
Qi


Hello,

The 'iostat' mode uses aliases mechanism in perf same as 'perf archive' and
in this case you don't need to add function callback into cmd_struct.
For example, the command 'perf iostat list' will be converted to
'perf stat --iostat=list'.

After building the perf tool you should have two shell scripts in 
tools/perf

directory and one of them is executable, for example:
# make -C tools/perf
# ls -l tools/perf/perf-iostat*
-rwxr-xr-x 1 root root 290 Mar 10 18:17 perf-iostat
-rw-r--r-- 1 root root 290 Feb  3 15:14 perf-iostat.sh

It should be possible to run 'perf iostat' from build directory:
# cd tools/perf
# ./perf iostat list
S0-uncore_iio_0<:00>
S1-uncore_iio_0<:80>
S0-uncore_iio_1<:17>
S1-uncore_iio_1<:85>
S0-uncore_iio_2<:3a>
S1-uncore_iio_2<:ae>
S0-uncore_iio_3<:5d>
S1-uncore_iio_3<:d7>

Also you can copy 'perf-iostat' to ~/libexec/perf-core/ or just run 
'make install'


# make install
# cp perf /usr/bin/
# ls -lh ~/libexec/perf-core/
total 24K
-rwxr-xr-x 1 root root 1.4K Mar 10 18:17 perf-archive

Re: [PATCH RESEND][next] rtl8xxxu: Fix fall-through warnings for Clang

2021-03-10 Thread Kalle Valo
Kees Cook  writes:

> On Fri, Mar 05, 2021 at 03:40:33PM +0200, Kalle Valo wrote:
>> "Gustavo A. R. Silva"  writes:
>> 
>> > In preparation to enable -Wimplicit-fallthrough for Clang, fix
>> > multiple warnings by replacing /* fall through */ comments with
>> > the new pseudo-keyword macro fallthrough; instead of letting the
>> > code fall through to the next case.
>> >
>> > Notice that Clang doesn't recognize /* fall through */ comments as
>> > implicit fall-through markings.
>> >
>> > Link: https://github.com/KSPP/linux/issues/115
>> > Signed-off-by: Gustavo A. R. Silva 
>> 
>> It's not cool that you ignore the comments you got in [1], then after a
>> while mark the patch as "RESEND" and not even include a changelog why it
>> was resent.
>> 
>> [1] 
>> https://patchwork.kernel.org/project/linux-wireless/patch/d522f387b2d0dde774785c7169c1f25aa529989d.1605896060.git.gustavo...@kernel.org/
>
> Hm, this conversation looks like a miscommunication, mainly? I see
> Gustavo, as requested by many others[1], replacing the fallthrough
> comments with the "fallthrough" statement. (This is more than just a
> "Clang doesn't parse comments" issue.)

v1 was clearly rejected by Jes, so sending a new version without any
changelog or comments is not the way to go. The changelog shoud at least
have had "v1 was rejected but I'm resending this again because " or something like that to make it clear what's happening.

> This could be a tree-wide patch and not bother you, but Greg KH has
> generally advised us to send these changes broken out. Anyway, this
> change still needs to land, so what would be the preferred path? I think
> Gustavo could just carry it for Linus to merge without bothering you if
> that'd be preferred?

I agree with Greg. Please don't do cleanups like this via another tree
as that just creates more work due to conflicts between the trees, which
is a lot more annoying to deal with than applying few patches. But when
submitting patches please follow the rules, don't cut corners.

Jes, I don't like 'fallthrough' either and prefer the original comment,
but the ship has sailed on this one. Maybe we should just take it?

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


[PATCH v2] perf machine: Assign boolean values to a bool variable

2021-03-10 Thread Jiapeng Chong
Fix the following coccicheck warnings:

./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in
function 'symbol__match_regex' with return type bool.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
Changes in v2:
  - Make code simpler.

 tools/perf/util/machine.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b5c2d8b..ba6517b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2037,9 +2037,7 @@ int machine__process_event(struct machine *machine, union 
perf_event *event,
 
 static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
 {
-   if (!regexec(regex, sym->name, 0, NULL, 0))
-   return 1;
-   return 0;
+   return !regexec(regex, sym->name, 0, NULL, 0);
 }
 
 static void ip__resolve_ams(struct thread *thread,
-- 
1.8.3.1



Re: [PATCH v4] usb: gadget: configfs: Fix KASAN use-after-free

2021-03-10 Thread Macpaul Lin
On Thu, 2021-03-11 at 14:42 +0800, Macpaul Lin wrote:
> From: Jim Lin 
> 
> When gadget is disconnected, running sequence is like this.
> . composite_disconnect
> . Call trace:
>   usb_string_copy+0xd0/0x128
>   gadget_config_name_configuration_store+0x4
>   gadget_config_name_attr_store+0x40/0x50
>   configfs_write_file+0x198/0x1f4
>   vfs_write+0x100/0x220
>   SyS_write+0x58/0xa8
> . configfs_composite_unbind
> . configfs_composite_bind
> 
> In configfs_composite_bind, it has
> "cn->strings.s = cn->configuration;"
> 
> When usb_string_copy is invoked. it would
> allocate memory, copy input string, release previous pointed memory space,
> and use new allocated memory.
> 
> When gadget is connected, host sends down request to get information.
> Call trace:
>   usb_gadget_get_string+0xec/0x168
>   lookup_string+0x64/0x98
>   composite_setup+0xa34/0x1ee8
> 
> If gadget is disconnected and connected quickly, in the failed case,
> cn->configuration memory has been released by usb_string_copy kfree but
> configfs_composite_bind hasn't been run in time to assign new allocated
> "cn->configuration" pointer to "cn->strings.s".
> 
> When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
> memory is accessed, "BUG: KASAN: use-after-free" error occurs.
> 
> Signed-off-by: Jim Lin 
> Signed-off-by: Macpaul Lin 
> Cc: sta...@vger.kernel.org
> ---
> Changes in v2:
> Changes in v3:
>  - Change commit description
> Changes in v4:
>  - Fix build error and adapt patch to kernel-5.12-rc1.
>Replace definition "MAX_USB_STRING_WITH_NULL_LEN" with
>"USB_MAX_STRING_WITH_NULL_LEN".
>  - Note: The patch v2 and v3 has been verified by
>Thadeu Lima de Souza Cascardo 
>http://spinics.net/lists/kernel/msg3840792.html

Dear Cascardo,

Would you please help to confirm if you've tested it on Linux PC,
Chrome OS, or an Android OS?

Thanks!
Macpaul Lin

>and
>Macpaul Lin  on Android kernels.
>http://lkml.org/lkml/2020/6/11/8
>  - The patch is suggested to be applied to LTS versions.
> 
>  drivers/usb/gadget/configfs.c |   14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
> index 0d56f33..15a607c 100644
> --- a/drivers/usb/gadget/configfs.c
> +++ b/drivers/usb/gadget/configfs.c
> @@ -97,6 +97,8 @@ struct gadget_config_name {
>   struct list_head list;
>  };
>  
> +#define USB_MAX_STRING_WITH_NULL_LEN (USB_MAX_STRING_LEN+1)
> +
>  static int usb_string_copy(const char *s, char **s_copy)
>  {
>   int ret;
> @@ -106,12 +108,16 @@ static int usb_string_copy(const char *s, char **s_copy)
>   if (ret > USB_MAX_STRING_LEN)
>   return -EOVERFLOW;
>  
> - str = kstrdup(s, GFP_KERNEL);
> - if (!str)
> - return -ENOMEM;
> + if (copy) {
> + str = copy;
> + } else {
> + str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL);
> + if (!str)
> + return -ENOMEM;
> + }
> + strcpy(str, s);
>   if (str[ret - 1] == '\n')
>   str[ret - 1] = '\0';
> - kfree(copy);
>   *s_copy = str;
>   return 0;
>  }



Re: [PATCH 07/11] PM / devfreq: check get_dev_status before start monitor

2021-03-10 Thread Chanwoo Choi
On 3/10/21 1:56 PM, Dong Aisheng wrote:
> On Wed, Mar 10, 2021 at 11:08 AM Chanwoo Choi  wrote:
>>
>> On 3/10/21 11:56 AM, Dong Aisheng wrote:
>>> On Wed, Mar 10, 2021 at 12:12 AM Chanwoo Choi  wrote:

 On 21. 3. 10. 오전 12:58, Chanwoo Choi wrote:
> On 21. 3. 9. 오후 9:58, Dong Aisheng wrote:
>> The devfreq monitor depends on the device to provide load information
>> by .get_dev_status() to calculate the next target freq.
>>
>> And this will cause changing governor to simple ondemand fail
>> if device can't support.
>>
>> Signed-off-by: Dong Aisheng 
>> ---
>>   drivers/devfreq/devfreq.c | 10 +++---
>>   drivers/devfreq/governor.h|  2 +-
>>   drivers/devfreq/governor_simpleondemand.c |  3 +--
>>   3 files changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index 7231fe6862a2..d1787b6c7d7c 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -482,10 +482,13 @@ static void devfreq_monitor(struct work_struct
>> *work)
>>* to be called from governor in response to DEVFREQ_GOV_START
>>* event when device is added to devfreq framework.
>>*/
>> -void devfreq_monitor_start(struct devfreq *devfreq)
>> +int devfreq_monitor_start(struct devfreq *devfreq)
>>   {
>>   if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
>> -return;
>> +return 0;
>> +
>> +if (!devfreq->profile->get_dev_status)
>> +return -EINVAL;

 Again, I think that get_dev_status is not used for all governors.
 So that it cause the governor start fail. Don't check whether
 .get_dev_status is NULL or not.

>>>
>>> I'm not quite understand your point.
>>> it is used by governor_simpleondemand.c and tegra_devfreq_governor.
>>> get_target_freq -> devfreq_update_stats -> get_dev_status
>>
>> The devfreq can add the new governor by anyone.
>> So these functions like devfreq_monitor_* have to support
>> the governors and also must support the governor to be added
>> in the future.
> 
> Yes, but devfreq_monitor_* is only used by polling mode, right?
> The governor using it has to implement get_dev_status unless
> there's an exception in the future.
> 
> Currently this patch wants to address the issue that user can switch
> to ondemand governor (polling mode) by sysfs even devices does
> not support it (no get_dev_status implemented).

As I commented, I'll fix this issue. If devfreq driver doesn't implement
the .get_dev_status, don't show it via available_governors. I think that
it is fundamental solution to fix this issue. So on this version,
don't add the this conditional statement on this function

And on next version, please use the capital letter for first character
on patch title as following:

- PM / devfreq: Check get_dev_status before start monitor

> 
> Regards
> Aisheng
> 
>>
>>>
>>> Without checking, device can switch to ondemand governor if it does not 
>>> support.
>>>
>>> Am i missed something?
>>>
>>> Regards
>>> Aisheng
>>>
>>   switch (devfreq->profile->timer) {
>>   case DEVFREQ_TIMER_DEFERRABLE:
>> @@ -495,12 +498,13 @@ void devfreq_monitor_start(struct devfreq *devfreq)
>>   INIT_DELAYED_WORK(>work, devfreq_monitor);
>>   break;
>>   default:
>> -return;
>> +return -EINVAL;
>>   }
>>   if (devfreq->profile->polling_ms)
>>   queue_delayed_work(devfreq_wq, >work,
>>   msecs_to_jiffies(devfreq->profile->polling_ms));
>> +return 0;
>>   }
>>   EXPORT_SYMBOL(devfreq_monitor_start);
>> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
>> index 5cee3f64fe2b..31af6d072a10 100644
>> --- a/drivers/devfreq/governor.h
>> +++ b/drivers/devfreq/governor.h
>> @@ -75,7 +75,7 @@ struct devfreq_governor {
>>   unsigned int event, void *data);
>>   };
>> -void devfreq_monitor_start(struct devfreq *devfreq);
>> +int devfreq_monitor_start(struct devfreq *devfreq);
>>   void devfreq_monitor_stop(struct devfreq *devfreq);
>>   void devfreq_monitor_suspend(struct devfreq *devfreq);
>>   void devfreq_monitor_resume(struct devfreq *devfreq);
>> diff --git a/drivers/devfreq/governor_simpleondemand.c
>> b/drivers/devfreq/governor_simpleondemand.c
>> index d57b82a2b570..ea287b57cbf3 100644
>> --- a/drivers/devfreq/governor_simpleondemand.c
>> +++ b/drivers/devfreq/governor_simpleondemand.c
>> @@ -89,8 +89,7 @@ static int devfreq_simple_ondemand_handler(struct
>> devfreq *devfreq,
>>   {
>>   switch (event) {
>>   case DEVFREQ_GOV_START:
>> -devfreq_monitor_start(devfreq);
>> -break;
>> +return devfreq_monitor_start(devfreq);
>>   

Re: [PATCH] f2fs: allow to change discard policy based on cached discard cmds

2021-03-10 Thread Chao Yu

On 2021/3/11 9:47, Sahitya Tummala wrote:

With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.


Agreed.



Signed-off-by: Sahitya Tummala 


Reviewed-by: Chao Yu 

Thanks,


Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

2021-03-10 Thread Dmitry Vyukov
On Thu, Mar 11, 2021 at 7:50 AM Dmitry Vyukov  wrote:
>
> On Thu, Mar 11, 2021 at 7:40 AM Alex Ghiti  wrote:
> >
> > Hi Ben,
> >
> > Le 3/10/21 à 5:24 PM, Ben Dooks a écrit :
> > > On 10/03/2021 17:16, Dmitry Vyukov wrote:
> > >> On Wed, Mar 10, 2021 at 5:46 PM syzbot
> > >>  wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> syzbot found the following issue on:
> > >>>
> > >>> HEAD commit:0d7588ab riscv: process: Fix no prototype for
> > >>> arch_dup_tas..
> > >>> git tree:
> > >>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
> > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0
> > >>> kernel config:
> > >>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
> > >>> dashboard link:
> > >>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
> > >>> userspace arch: riscv64
> > >>>
> > >>> Unfortunately, I don't have any reproducer for this issue yet.
> > >>>
> > >>> IMPORTANT: if you fix the issue, please add the following tag to the
> > >>> commit:
> > >>> Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com
> > >>
> > >> +riscv maintainers
> > >>
> > >> This is riscv64-specific.
> > >> I've seen similar crashes in put_user in other places. It looks like
> > >> put_user crashes in the user address is not mapped/protected (?).
> > >
> > > The unmapped case should have been handled.
> > >
> > > I think this issue is that the check for user-mode access added. From
> > > what I read the code may be wrong in
> > >
> > > +if (!user_mode(regs) && addr < TASK_SIZE &&
> > > +unlikely(!(regs->status & SR_SUM)))
> > > +die_kernel_fault("access to user memory without uaccess 
> > > routines",
> > > +addr, regs);
> > >
> > > I think the SR_SUM check might be wrong, as I read the standard the
> > > SR_SUM should be set to disable user-space access. So the check
> > > should be unlikely(regs->status & SR_SUM) to say access without
> > > having disabled the protection.
> >
> > The check that is done seems correct to me: "The SUM (permit Supervisor
> > User Memory access) bit modifies the privilege with which S-mode loads
> > and stores access virtual memory.  *When SUM=0, S-mode memory accesses
> > to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*.
> >   When SUM=1, these accesses are permitted.SUM  has  no  effect  when
> > page-based  virtual  memory  is  not  in  effect".
> >
> > I will try to reproduce the problem locally.
>
> Weird. It crashes with this all the time:
> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
>
> Even on trivial programs that almost don't do anything.
> Maybe it's qemu bug? Do registers look sane in the dump? That SR_SUM, etc.
>
>
> 00:13:27 executing program 1:
> openat$drirender128(0xff9c,
> &(0x7f40)='/dev/dri/renderD128\x00', 0x0, 0x0)
>
> [  812.318182][ T4833] Unable to handle kernel access to user memory
> without uaccess routines at virtual address 250b60d0
> [  812.322304][ T4833] Oops [#1]
> [  812.323196][ T4833] Modules linked in:
> [  812.324110][ T4833] CPU: 1 PID: 4833 Comm: syz-executor.1 Not
> tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> [  812.325862][ T4833] Hardware name: riscv-virtio,qemu (DT)
> [  812.327561][ T4833] epc : schedule_tail+0x72/0xb2
> [  812.328640][ T4833]  ra : schedule_tail+0x70/0xb2
> [  812.330088][ T4833] epc : ffe8c8b0 ra : ffe8c8ae sp
> : ffe0238bbec0
> [  812.331312][ T4833]  gp : ffe005d25378 tp : ffe00a275b00 t0
> : 
> [  812.333014][ T4833]  t1 : 0001 t2 : 000f4240 s0
> : ffe0238bbee0
> [  812.334137][ T4833]  s1 : 250b60d0 a0 : 0036 a1
> : 0003
> [  812.336063][ T4833]  a2 : 1ffc0cfa8b00 a3 : ffec80cc a4
> : 7f467e72c6adf800
> [  812.337398][ T4833]  a5 :  a6 : 00f0 a7
> : ffef8c84
> [  812.339287][ T4833]  s2 : 0004 s3 : ffe0077a96c0 s4
> : ffe020e67fe0
> [  812.340658][ T4833]  s5 : 4020 s6 : ffe0077a9b58 s7
> : ffe067d74850
> [  812.342492][ T4833]  s8 : ffe067d73e18 s9 : 
> s10: ffe00bd72280
> [  812.343668][ T4833]  s11: 00bd067bf638 t3 : 7f467e72c6adf800 t4
> : ffc403ee7fb2
> [  812.345510][ T4833]  t5 : ffc403ee7fba t6 : 0004
> [  812.347004][ T4833] status: 0120 badaddr:
> 250b60d0 cause: 000f
> [  812.348091][ T4833] Call Trace:
> [  812.349291][ T4833] [] schedule_tail+0x72/0xb2
> [  812.350796][ T4833] [] ret_from_exception+0x0/0x14
> [  812.352799][ T4833] Dumping ftrace buffer:
> [  812.354328][ T4833](ftrace buffer empty)
> [  812.428145][ T4833] ---[ end trace 94b077e4d677ee73 ]---
>
>
> 00:10:42 executing program 1:
> bpf$ENABLE_STATS(0x20, 0x0, 0x0)
> bpf$ENABLE_STATS(0x20, 0x0, 0x0)
>
> [  646.536862][ T5163] loop0: detected capacity change from 0 to 1
> [  646.566730][ 

Re: [net-next PATCH v7 02/16] net: phy: Introduce fwnode_mdio_find_device()

2021-03-10 Thread Saravana Kannan
On Wed, Mar 10, 2021 at 10:21 PM Calvin Johnson
 wrote:
>
> Define fwnode_mdio_find_device() to get a pointer to the
> mdio_device from fwnode passed to the function.
>
> Refactor of_mdio_find_device() to use fwnode_mdio_find_device().
>
> Signed-off-by: Calvin Johnson 
> ---
>
> Changes in v7:
> - correct fwnode_mdio_find_device() description
>
> Changes in v6:
> - fix warning for function parameter of fwnode_mdio_find_device()
>
> Changes in v5: None
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
>
>  drivers/net/mdio/of_mdio.c   | 11 +--
>  drivers/net/phy/phy_device.c | 23 +++
>  include/linux/phy.h  |  6 ++
>  3 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c
> index ea9d5855fb52..d5e0970b2561 100644
> --- a/drivers/net/mdio/of_mdio.c
> +++ b/drivers/net/mdio/of_mdio.c
> @@ -347,16 +347,7 @@ EXPORT_SYMBOL(of_mdiobus_register);
>   */
>  struct mdio_device *of_mdio_find_device(struct device_node *np)
>  {
> -   struct device *d;
> -
> -   if (!np)
> -   return NULL;
> -
> -   d = bus_find_device_by_of_node(_bus_type, np);
> -   if (!d)
> -   return NULL;
> -
> -   return to_mdio_device(d);
> +   return fwnode_mdio_find_device(of_fwnode_handle(np));
>  }
>  EXPORT_SYMBOL(of_mdio_find_device);
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index cc38e326405a..daabb17bba00 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2819,6 +2819,29 @@ static bool phy_drv_supports_irq(struct phy_driver 
> *phydrv)
> return phydrv->config_intr && phydrv->handle_interrupt;
>  }
>
> +/**
> + * fwnode_mdio_find_device - Given a fwnode, find the mdio_device
> + * @fwnode: pointer to the mdio_device's fwnode
> + *
> + * If successful, returns a pointer to the mdio_device with the embedded
> + * struct device refcount incremented by one, or NULL on failure.
> + * The caller should call put_device() on the mdio_device after its use.
> + */
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode)
> +{
> +   struct device *d;
> +
> +   if (!fwnode)
> +   return NULL;
> +
> +   d = bus_find_device_by_fwnode(_bus_type, fwnode);

Sorry about the late review, but can you look into using
get_dev_from_fwnode()? As long as you aren't registering two devices
for the same fwnode, it's an O(1) operation instead of having to loop
through a list of devices in a bus. You can check the returned
device's bus type if you aren't sure about not registering two devices
with the same fw_node and then fall back to this looping.

-Saravana

> +   if (!d)
> +   return NULL;
> +
> +   return to_mdio_device(d);
> +}
> +EXPORT_SYMBOL(fwnode_mdio_find_device);
> +
>  /**
>   * phy_probe - probe and init a PHY device
>   * @dev: device to probe and init
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 1a12e4436b5b..f5eb1e3981a1 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1366,11 +1366,17 @@ struct phy_device *phy_device_create(struct mii_bus 
> *bus, int addr, u32 phy_id,
>  bool is_c45,
>  struct phy_c45_device_ids *c45_ids);
>  #if IS_ENABLED(CONFIG_PHYLIB)
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode);
>  struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool 
> is_c45);
>  int phy_device_register(struct phy_device *phy);
>  void phy_device_free(struct phy_device *phydev);
>  #else
>  static inline
> +struct mdio_device *fwnode_mdio_find_device(struct fwnode_handle *fwnode)
> +{
> +   return 0;
> +}
> +static inline
>  struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
>  {
> return NULL;
> --
> 2.17.1
>


Re: [PATCH v2 5/5] PCI: dwc: Move dw_pcie_msi_init() from each users to designware host

2021-03-10 Thread Jisheng Zhang
On Sun, 7 Mar 2021 23:10:12 +0100 Krzysztof Wilczyński  wrote:

> 
> 
> Hi,
> 
> > Currently, dw_pcie_msi_init() allocates and maps page for msi, then
> > program the PCIE_MSI_ADDR_LO and PCIE_MSI_ADDR_HI. The Root Complex
> > may lose power during suspend-to-RAM, so when we resume, we want to
> > redo the latter but not the former. If designware based driver (for
> > example, pcie-tegra194.c) calls dw_pcie_msi_init() in resume path, the
> > previous msi page will be leaked. From another side, except
> > pci-dra7xx.c we can move the dw_pcie_msi_init() from each users to
> > designware host, I.E move the msi page allocation and mapping to
> > dw_pcie_host_init() and move the PCIE_MSI_ADDR_* programming to
> > dw_pcie_setup_rc(). After this moving, we solve the msi page leakage
> > as well.  
> [...]
> 
> A small nitpick.  All the "designware" should be "DesignWare" both in
> the commit message and the subject.  Similarly, "msi" would be "MSI",
> and "I.E" would become "i.e.,".  If you ever sent another version of the
> patch, that is.
> 

Hi

This series was dropped and useless after Rob's excellent dw pcie code clean
up.

Thanks


Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages

2021-03-10 Thread Muchun Song
On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin  wrote:
>
> On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> > Since Roman series "The new cgroup slab memory controller" applied. All
> > slab objects are charged via the new APIs of obj_cgroup. The new APIs
> > introduce a struct obj_cgroup to charge slab objects. It prevents
> > long-living objects from pinning the original memory cgroup in the memory.
> > But there are still some corner objects (e.g. allocations larger than
> > order-1 page on SLUB) which are not charged via the new APIs. Those
> > objects (include the pages which are allocated from buddy allocator
> > directly) are charged as kmem pages which still hold a reference to
> > the memory cgroup.
> >
> > This patch aims to charge the kmem pages by using the new APIs of
> > obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> > an object cgroup. We can use the page_objcg() to get the object
> > cgroup associated with a kmem page. Or we can use page_memcg_check()
> > to get the memory cgroup associated with a kmem page, but caller must
> > ensure that the returned memcg won't be released (e.g. acquire the
> > rcu_read_lock or css_set_lock).
> >
> > Signed-off-by: Muchun Song 
> > ---
> >  include/linux/memcontrol.h |  63 ++--
> >  mm/memcontrol.c| 119 
> > ++---
> >  2 files changed, 128 insertions(+), 54 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 83cbcdcfcc92..07c449af9c0f 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page 
> > *page)
> >  }
> >
> >  /*
> > + * After the initialization objcg->memcg is always pointing at
> > + * a valid memcg, but can be atomically swapped to the parent memcg.
> > + *
> > + * The caller must ensure that the returned memcg won't be released:
> > + * e.g. acquire the rcu_read_lock or css_set_lock.
> > + */
> > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> > +{
> > + return READ_ONCE(objcg->memcg);
> > +}
> > +
> > +/*
> >   * page_memcg - get the memory cgroup associated with a non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> > @@ -422,15 +434,19 @@ static inline struct mem_cgroup 
> > *page_memcg_rcu(struct page *page)
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> > - * or NULL. This function unlike page_memcg() can take any  page
> > + * or NULL. This function unlike page_memcg() can take any page
> >   * as an argument. It has to be used in cases when it's not known if a page
> > - * has an associated memory cgroup pointer or an object cgroups vector.
> > + * has an associated memory cgroup pointer or an object cgroups vector or
> > + * an object cgroup.
> >   *
> >   * Any of the following ensures page and memcg binding stability:
> >   * - the page lock
> >   * - LRU isolation
> >   * - lock_page_memcg()
> >   * - exclusive reference
> > + *
> > + * Should be called under rcu lock which can protect memcg associated with 
> > a
> > + * kmem page from being released.
>
> How about this:
>
> For a non-kmem page any of the following ensures page and memcg binding 
> stability:
> - the page lock
> - LRU isolation
> - lock_page_memcg()
> - exclusive reference
>
> For a kmem page a caller should hold an rcu read lock to protect memcg 
> associated
> with a kmem page from being released.

OK. I will use this. Thanks Roman.

>
> >   */
> >  static inline struct mem_cgroup *page_memcg_check(struct page *page)
> >  {
> > @@ -443,6 +459,13 @@ static inline struct mem_cgroup 
> > *page_memcg_check(struct page *page)
> >   if (memcg_data & MEMCG_DATA_OBJCGS)
> >   return NULL;
> >
> > + if (memcg_data & MEMCG_DATA_KMEM) {
> > + struct obj_cgroup *objcg;
> > +
> > + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> > + return obj_cgroup_memcg(objcg);
> > + }
> > +
> >   return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> > @@ -501,6 +524,25 @@ static inline struct obj_cgroup 
> > **page_objcgs_check(struct page *page)
> >   return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> > +/*
> > + * page_objcg - get the object cgroup associated with a kmem page
> > + * @page: a pointer to the page struct
> > + *
> > + * Returns a pointer to the object cgroup associated with the kmem page,
> > + * or NULL. This function assumes that the page is known to have an
> > + * associated object cgroup. It's only safe to call this function
> > + * against kmem pages (PageMemcgKmem() returns true).
> > + */
> > +static inline struct obj_cgroup *page_objcg(struct page *page)
> > +{
> > + unsigned long memcg_data = page->memcg_data;
> > +
> > + 

Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

2021-03-10 Thread Dmitry Vyukov
On Thu, Mar 11, 2021 at 7:40 AM Alex Ghiti  wrote:
>
> Hi Ben,
>
> Le 3/10/21 à 5:24 PM, Ben Dooks a écrit :
> > On 10/03/2021 17:16, Dmitry Vyukov wrote:
> >> On Wed, Mar 10, 2021 at 5:46 PM syzbot
> >>  wrote:
> >>>
> >>> Hello,
> >>>
> >>> syzbot found the following issue on:
> >>>
> >>> HEAD commit:0d7588ab riscv: process: Fix no prototype for
> >>> arch_dup_tas..
> >>> git tree:
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0
> >>> kernel config:
> >>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
> >>> dashboard link:
> >>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
> >>> userspace arch: riscv64
> >>>
> >>> Unfortunately, I don't have any reproducer for this issue yet.
> >>>
> >>> IMPORTANT: if you fix the issue, please add the following tag to the
> >>> commit:
> >>> Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com
> >>
> >> +riscv maintainers
> >>
> >> This is riscv64-specific.
> >> I've seen similar crashes in put_user in other places. It looks like
> >> put_user crashes in the user address is not mapped/protected (?).
> >
> > The unmapped case should have been handled.
> >
> > I think this issue is that the check for user-mode access added. From
> > what I read the code may be wrong in
> >
> > +if (!user_mode(regs) && addr < TASK_SIZE &&
> > +unlikely(!(regs->status & SR_SUM)))
> > +die_kernel_fault("access to user memory without uaccess routines",
> > +addr, regs);
> >
> > I think the SR_SUM check might be wrong, as I read the standard the
> > SR_SUM should be set to disable user-space access. So the check
> > should be unlikely(regs->status & SR_SUM) to say access without
> > having disabled the protection.
>
> The check that is done seems correct to me: "The SUM (permit Supervisor
> User Memory access) bit modifies the privilege with which S-mode loads
> and stores access virtual memory.  *When SUM=0, S-mode memory accesses
> to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*.
>   When SUM=1, these accesses are permitted.SUM  has  no  effect  when
> page-based  virtual  memory  is  not  in  effect".
>
> I will try to reproduce the problem locally.

Weird. It crashes with this all the time:
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69

Even on trivial programs that almost don't do anything.
Maybe it's qemu bug? Do registers look sane in the dump? That SR_SUM, etc.


00:13:27 executing program 1:
openat$drirender128(0xff9c,
&(0x7f40)='/dev/dri/renderD128\x00', 0x0, 0x0)

[  812.318182][ T4833] Unable to handle kernel access to user memory
without uaccess routines at virtual address 250b60d0
[  812.322304][ T4833] Oops [#1]
[  812.323196][ T4833] Modules linked in:
[  812.324110][ T4833] CPU: 1 PID: 4833 Comm: syz-executor.1 Not
tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
[  812.325862][ T4833] Hardware name: riscv-virtio,qemu (DT)
[  812.327561][ T4833] epc : schedule_tail+0x72/0xb2
[  812.328640][ T4833]  ra : schedule_tail+0x70/0xb2
[  812.330088][ T4833] epc : ffe8c8b0 ra : ffe8c8ae sp
: ffe0238bbec0
[  812.331312][ T4833]  gp : ffe005d25378 tp : ffe00a275b00 t0
: 
[  812.333014][ T4833]  t1 : 0001 t2 : 000f4240 s0
: ffe0238bbee0
[  812.334137][ T4833]  s1 : 250b60d0 a0 : 0036 a1
: 0003
[  812.336063][ T4833]  a2 : 1ffc0cfa8b00 a3 : ffec80cc a4
: 7f467e72c6adf800
[  812.337398][ T4833]  a5 :  a6 : 00f0 a7
: ffef8c84
[  812.339287][ T4833]  s2 : 0004 s3 : ffe0077a96c0 s4
: ffe020e67fe0
[  812.340658][ T4833]  s5 : 4020 s6 : ffe0077a9b58 s7
: ffe067d74850
[  812.342492][ T4833]  s8 : ffe067d73e18 s9 : 
s10: ffe00bd72280
[  812.343668][ T4833]  s11: 00bd067bf638 t3 : 7f467e72c6adf800 t4
: ffc403ee7fb2
[  812.345510][ T4833]  t5 : ffc403ee7fba t6 : 0004
[  812.347004][ T4833] status: 0120 badaddr:
250b60d0 cause: 000f
[  812.348091][ T4833] Call Trace:
[  812.349291][ T4833] [] schedule_tail+0x72/0xb2
[  812.350796][ T4833] [] ret_from_exception+0x0/0x14
[  812.352799][ T4833] Dumping ftrace buffer:
[  812.354328][ T4833](ftrace buffer empty)
[  812.428145][ T4833] ---[ end trace 94b077e4d677ee73 ]---


00:10:42 executing program 1:
bpf$ENABLE_STATS(0x20, 0x0, 0x0)
bpf$ENABLE_STATS(0x20, 0x0, 0x0)

[  646.536862][ T5163] loop0: detected capacity change from 0 to 1
[  646.566730][ T5165] Unable to handle kernel access to user memory
without uaccess routines at virtual address 032f80d0
[  646.586024][ T5165] Oops [#1]
[  646.586640][ T5165] Modules linked in:
[  646.587350][ T5165] CPU: 1 PID: 5165 Comm: syz-executor.1 Not
tainted 

Re: [PATCH 5.10 00/47] 5.10.23-rc2 review

2021-03-10 Thread Naresh Kamboju
On Wed, 10 Mar 2021 at 23:59,  wrote:
>
> From: Greg Kroah-Hartman 
>
> This is the start of the stable review cycle for the 5.10.23 release.
> There are 47 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 12 Mar 2021 18:28:23 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.23-rc2.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

Summary


kernel: 5.10.23-rc2
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.10.y
git commit: 93276f11b3afe08c3f213a3648483b1a8789673b
git describe: v5.10.22-48-g93276f11b3af
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.22-48-g93276f11b3af

No regressions (compared to build v5.10.22)


No fixes (compared to build v5.10.22)


Ran 56156 total tests in the following environments and test suites.

Environments
--
- arc
- arm
- arm64
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- juno-r2-compat
- juno-r2-kasan
- mips
- nxp-ls2088
- nxp-ls2088-64k_page_size
- parisc
- powerpc
- qemu-arm-clang
- qemu-arm64-clang
- qemu-arm64-kasan
- qemu-i386-clang
- qemu-x86_64-clang
- qemu-x86_64-kasan
- qemu-x86_64-kcsan
- qemu_arm
- qemu_arm64
- qemu_arm64-compat
- qemu_i386
- qemu_x86_64
- qemu_x86_64-compat
- riscv
- s390
- sh
- sparc
- x15
- x86
- x86-kasan
- x86_64

Test Suites
---
* build
* linux-log-parser
* install-android-platform-tools-r2600
* kselftest-bpf
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-livepatch
* kselftest-ptrace
* libhugetlbfs
* ltp-containers-tests
* ltp-hugetlb-tests
* ltp-ipc-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* v4l2-compliance
* fwts
* kselftest-kvm
* kselftest-lib
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-zram
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-io-tests
* ltp-math-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* kselftest-
* kselftest-android
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-kexec
* kselftest-lkdtm
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-vm
* kselftest-x86
* ltp-controllers-tests
* ltp-open-posix-tests
* kvm-unit-tests
* kunit
* rcutorture
* ssuite
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-

-- 
Linaro LKFT
https://lkft.linaro.org


Re: linux-next: build failure after merge of the bpf-next tree

2021-03-10 Thread Björn Töpel

On 2021-03-11 01:47, Stephen Rothwell wrote:

Hi all,

After merging the bpf-next tree, today's linux-next build (perf) failed
like this:

make[3]: *** No rule to make target 'libbpf_util.h', needed by 
'/home/sfr/next/perf/staticobjs/xsk.o'.  Stop.



Hi Stephen,

It's an incremental build issue, as pointed out here [1], that is
resolved by cleaning the build.


Cheers,
Björn

[1] 
https://lore.kernel.org/bpf/CAEf4BzYPDF87At4=_gsndxof84oiqyjxgahl7_jvpuntovu...@mail.gmail.com/




Caused by commit

   7e8bbe24cb8b ("libbpf: xsk: Move barriers from libbpf_util.h to xsk.h")

I have used the bpf tree from next-20210310 for today.



[PATCH] scsi: Fix a use after free in st_open

2021-03-10 Thread Lv Yunlong
In st_open, if STp->in_use is true, STp will be freed by
scsi_tape_put(). However, STp is still used by DEBC_printk()
after. It is better to DEBC_printk() before scsi_tape_put().

Signed-off-by: Lv Yunlong 
---
 drivers/scsi/st.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 841ad2fc369a..9ca536aae784 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -1269,8 +1269,8 @@ static int st_open(struct inode *inode, struct file *filp)
spin_lock(_use_lock);
if (STp->in_use) {
spin_unlock(_use_lock);
-   scsi_tape_put(STp);
DEBC_printk(STp, "Device already in use.\n");
+   scsi_tape_put(STp);
return (-EBUSY);
}
 
-- 
2.25.1




Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page

2021-03-10 Thread Muchun Song
On Thu, Mar 11, 2021 at 3:58 AM Roman Gushchin  wrote:
>
> On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote:
> > We want to reuse the obj_cgroup APIs to charge the kmem pages.
> > If we do that, we should store an object cgroup pointer to
> > page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
> >
> >   1) For the slab pages, page->memcg_data points to an object cgroups
> >  vector.
> >
> >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> >  points to an object cgroup.
> >
> >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> >  to a memory cgroup.
> >
> > Currently we always get the memory cgroup associated with a page via
> > page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
> > has to be used in cases when it's not known if a page has an
> > associated memory cgroup pointer or an object cgroups vector. Because
> > the page->memcg_data of the kmem page is not pointing to a memory
> > cgroup in the later patch, the page_memcg() and page_memcg_rcu()
> > cannot be applicable for the kmem pages. In this patch, make
> > page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
> > We do not change the behavior of the page_memcg_check(), it is also
> > applicable for the kmem pages.
> >
> > In the end, there are 3 helpers to get the memcg associated with a page.
> > Usage is as follows.
> >
> >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> >  pages).
> >
> >  - page_memcg()
> >  - page_memcg_rcu()
> >
> >   2) Get the memory cgroup associated with a page. It has to be used in
> >  cases when it's not known if a page has an associated memory cgroup
> >  pointer or an object cgroups vector. Returns NULL for slab pages or
> >  uncharged pages. Otherwise, returns memory cgroup for charged pages
> >  (e.g. the kmem pages, the LRU pages).
> >
> >  - page_memcg_check()
> >
> > In some place, we use page_memcg() to check whether the page is charged.
> > Now introduce page_memcg_charged() helper to do that.
> >
> > This is a preparation for reparenting the kmem pages.
> >
> > Signed-off-by: Muchun Song 
> > ---
> >  include/linux/memcontrol.h | 33 +++--
> >  mm/memcontrol.c| 23 +--
> >  mm/page_alloc.c|  4 ++--
> >  3 files changed, 42 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..83cbcdcfcc92 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
> >
> >  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
> >
> > +/* Return true for charged page, otherwise false. */
> > +static inline bool page_memcg_charged(struct page *page)
> > +{
> > + unsigned long memcg_data = page->memcg_data;
> > +
> > + VM_BUG_ON_PAGE(PageSlab(page), page);
> > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +
> > + return !!memcg_data;
> > +}
> > +
> >  /*
> > - * page_memcg - get the memory cgroup associated with a page
> > + * page_memcg - get the memory cgroup associated with a non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> >   * or NULL. This function assumes that the page is known to have a
> >   * proper memory cgroup pointer. It's not safe to call this function
> > - * against some type of pages, e.g. slab pages or ex-slab pages.
> > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> > + * pages.
> >   *
> >   * Any of the following ensures page and memcg binding stability:
> >   * - the page lock
> > @@ -378,27 +390,31 @@ static inline struct mem_cgroup *page_memcg(struct 
> > page *page)
> >   unsigned long memcg_data = page->memcg_data;
> >
> >   VM_BUG_ON_PAGE(PageSlab(page), page);
> > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> >   VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> >
> >   return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> >  /*
> > - * page_memcg_rcu - locklessly get the memory cgroup associated with a page
> > + * page_memcg_rcu - locklessly get the memory cgroup associated with a 
> > non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> >   * or NULL. This function assumes that the page is known to have a
> >   * proper memory cgroup pointer. It's not safe to call this function
> > - * against some type of pages, e.g. slab pages or ex-slab pages.
> > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> > + * pages.
> >   */
> >  static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
> >  {
> > + unsigned long memcg_data = 

Re: [PATCH 08/11] PM / devfreq: check get_dev_status in devfreq_update_stats

2021-03-10 Thread Chanwoo Choi
On 3/10/21 12:00 PM, Dong Aisheng wrote:
> On Wed, Mar 10, 2021 at 12:20 AM Chanwoo Choi  wrote:
>>
>> On 21. 3. 9. 오후 9:58, Dong Aisheng wrote:
>>> Check .get_dev_status() in devfreq_update_stats in case it's abused
>>> when a device does not provide it.
>>>
>>> Signed-off-by: Dong Aisheng 
>>> ---
>>>   drivers/devfreq/governor.h | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
>>> index 31af6d072a10..67a6dbdd5d23 100644
>>> --- a/drivers/devfreq/governor.h
>>> +++ b/drivers/devfreq/governor.h
>>> @@ -89,6 +89,9 @@ int devfreq_update_target(struct devfreq *devfreq, 
>>> unsigned long freq);
>>>
>>>   static inline int devfreq_update_stats(struct devfreq *df)
>>>   {
>>> + if (!df->profile->get_dev_status)
>>> + return -EINVAL;
>>> +
>>
>> I'm considering the following method instead of returning the error
>> when .get_dev_status is NULL.
>>
>> if (!df->profile->get_dev_status) {
>> df->last_status.total_time = 0;
>> df->last_status.busy_time = 0;
>> df->last_status.current_frequency = 0;
>> return 0;
>> }
> 
> I might  suggest not cause it's meaningless for ondemand governor but
> introducing confusing. Simply return error could make the life a bit easier.
> does it make sense to you?

Actually, I considered the some corner case as following:
We can see the simple_ondemand governor through available_governors
even if the devfreq driver doesn't implement the .get_dev_status.
In this corner case, My intention tried to prevent the error
on this case. But, actually, it is different issue. I'll fix
this issue when get_dev_status is NULL, don't show the
simple_ondemand governor name through available_governors
on other patch.

And I applied it. Thanks.

> 
> Regards
> Aisheng
> 
>>
>>>   return df->profile->get_dev_status(df->dev.parent, >last_status);
>>>   }
>>>   #endif /* _GOVERNOR_H */
>>>
>>
>>
>> --
>> Best Regards,
>> Samsung Electronics
>> Chanwoo Choi
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [PATCH 2/3] net: ethernet: actions: Add Actions Semi Owl Ethernet MAC driver

2021-03-10 Thread Philipp Zabel
Hi Cristian,

On Thu, Mar 11, 2021 at 03:20:13AM +0200, Cristian Ciocaltea wrote:
> Add new driver for the Ethernet MAC used on the Actions Semi Owl
> family of SoCs.
> 
> Currently this has been tested only on the Actions Semi S500 SoC
> variant.
> 
> Signed-off-by: Cristian Ciocaltea 
> ---
[...]
> diff --git a/drivers/net/ethernet/actions/owl-emac.c 
> b/drivers/net/ethernet/actions/owl-emac.c
> new file mode 100644
> index ..ebd8ea88bca4
> --- /dev/null
> +++ b/drivers/net/ethernet/actions/owl-emac.c
> @@ -0,0 +1,1660 @@
[...]
> +static int owl_emac_probe(struct platform_device *pdev)
> +{
[...]
> + priv->reset = devm_reset_control_get(dev, NULL);

Please use

priv->reset = devm_reset_control_get_exclusive(dev, NULL);

instead, to explicitly state that the driver requires exclusive
control over the reset line.

> + if (IS_ERR(priv->reset)) {
> + ret = PTR_ERR(priv->reset);
> + dev_err(dev, "failed to get reset control: %d\n", ret);
> + return ret;

You could use:

return dev_err_probe(dev, PTR_ERR(priv->reset),
 "failed to get reset control");

regards
Philipp


[PATCH v4] usb: gadget: configfs: Fix KASAN use-after-free

2021-03-10 Thread Macpaul Lin
From: Jim Lin 

When gadget is disconnected, running sequence is like this.
. composite_disconnect
. Call trace:
  usb_string_copy+0xd0/0x128
  gadget_config_name_configuration_store+0x4
  gadget_config_name_attr_store+0x40/0x50
  configfs_write_file+0x198/0x1f4
  vfs_write+0x100/0x220
  SyS_write+0x58/0xa8
. configfs_composite_unbind
. configfs_composite_bind

In configfs_composite_bind, it has
"cn->strings.s = cn->configuration;"

When usb_string_copy is invoked. it would
allocate memory, copy input string, release previous pointed memory space,
and use new allocated memory.

When gadget is connected, host sends down request to get information.
Call trace:
  usb_gadget_get_string+0xec/0x168
  lookup_string+0x64/0x98
  composite_setup+0xa34/0x1ee8

If gadget is disconnected and connected quickly, in the failed case,
cn->configuration memory has been released by usb_string_copy kfree but
configfs_composite_bind hasn't been run in time to assign new allocated
"cn->configuration" pointer to "cn->strings.s".

When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling
memory is accessed, "BUG: KASAN: use-after-free" error occurs.

Signed-off-by: Jim Lin 
Signed-off-by: Macpaul Lin 
Cc: sta...@vger.kernel.org
---
Changes in v2:
Changes in v3:
 - Change commit description
Changes in v4:
 - Fix build error and adapt patch to kernel-5.12-rc1.
   Replace definition "MAX_USB_STRING_WITH_NULL_LEN" with
   "USB_MAX_STRING_WITH_NULL_LEN".
 - Note: The patch v2 and v3 has been verified by
   Thadeu Lima de Souza Cascardo 
   http://spinics.net/lists/kernel/msg3840792.html
   and
   Macpaul Lin  on Android kernels.
   http://lkml.org/lkml/2020/6/11/8
 - The patch is suggested to be applied to LTS versions.

 drivers/usb/gadget/configfs.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
index 0d56f33..15a607c 100644
--- a/drivers/usb/gadget/configfs.c
+++ b/drivers/usb/gadget/configfs.c
@@ -97,6 +97,8 @@ struct gadget_config_name {
struct list_head list;
 };
 
+#define USB_MAX_STRING_WITH_NULL_LEN   (USB_MAX_STRING_LEN+1)
+
 static int usb_string_copy(const char *s, char **s_copy)
 {
int ret;
@@ -106,12 +108,16 @@ static int usb_string_copy(const char *s, char **s_copy)
if (ret > USB_MAX_STRING_LEN)
return -EOVERFLOW;
 
-   str = kstrdup(s, GFP_KERNEL);
-   if (!str)
-   return -ENOMEM;
+   if (copy) {
+   str = copy;
+   } else {
+   str = kmalloc(USB_MAX_STRING_WITH_NULL_LEN, GFP_KERNEL);
+   if (!str)
+   return -ENOMEM;
+   }
+   strcpy(str, s);
if (str[ret - 1] == '\n')
str[ret - 1] = '\0';
-   kfree(copy);
*s_copy = str;
return 0;
 }
-- 
1.7.9.5



[PATCH] staging: rtl8723bs: add initial value

2021-03-10 Thread Hao Peng
Add initial value for some uninitialized variable and array.

Signed-off-by: Hao Peng 
---
 drivers/staging/rtl8723bs/core/rtw_ap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8723bs/core/rtw_ap.c 
b/drivers/staging/rtl8723bs/core/rtw_ap.c
index b6f944b37b08..ceea160db38a 100644
--- a/drivers/staging/rtl8723bs/core/rtw_ap.c
+++ b/drivers/staging/rtl8723bs/core/rtw_ap.c
@@ -206,8 +206,8 @@ void expire_timeout_chk(struct adapter *padapter)
struct sta_info *psta = NULL;
struct sta_priv *pstapriv = >stapriv;
u8 chk_alive_num = 0;
-   char chk_alive_list[NUM_STA];
-   int i;
+   char chk_alive_list[NUM_STA] = {0};
+   int i = 0;
 
spin_lock_bh(>auth_list_lock);
 
@@ -308,7 +308,7 @@ void expire_timeout_chk(struct adapter *padapter)
}
}
if (pmlmeext->active_keep_alive_check) {
-   int stainfo_offset;
+   int stainfo_offset = 0;
 
stainfo_offset = rtw_stainfo_offset(pstapriv, 
psta);
if (stainfo_offset_valid(stainfo_offset))
-- 
2.20.1





Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

2021-03-10 Thread Alex Ghiti

Hi Ben,

Le 3/10/21 à 5:24 PM, Ben Dooks a écrit :

On 10/03/2021 17:16, Dmitry Vyukov wrote:

On Wed, Mar 10, 2021 at 5:46 PM syzbot
 wrote:


Hello,

syzbot found the following issue on:

HEAD commit:    0d7588ab riscv: process: Fix no prototype for 
arch_dup_tas..
git tree:   
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes

console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d0
kernel config:  
https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link: 
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69

userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the 
commit:

Reported-by: syzbot+e74b94fe601ab9552...@syzkaller.appspotmail.com


+riscv maintainers

This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).


The unmapped case should have been handled.

I think this issue is that the check for user-mode access added. From
what I read the code may be wrong in

+    if (!user_mode(regs) && addr < TASK_SIZE &&
+    unlikely(!(regs->status & SR_SUM)))
+    die_kernel_fault("access to user memory without uaccess routines",
+    addr, regs);

I think the SR_SUM check might be wrong, as I read the standard the
SR_SUM should be set to disable user-space access. So the check
should be unlikely(regs->status & SR_SUM) to say access without
having disabled the protection.


The check that is done seems correct to me: "The SUM (permit Supervisor 
User Memory access) bit modifies the privilege with which S-mode loads 
and stores access virtual memory.  *When SUM=0, S-mode memory accesses 
to pages that are accessible by U-mode (U=1 in Figure 4.15) will fault*. 
 When SUM=1, these accesses are permitted.SUM  has  no  effect  when 
page-based  virtual  memory  is  not  in  effect".


I will try to reproduce the problem locally.

Thanks,

Alex



Without this, you can end up with an infinite loop in the fault handler.



Unable to handle kernel access to user memory without uaccess 
routines at virtual address 2749f0d0

Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0

Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
  ra : task_pid_vnr include/linux/sched.h:1421 [inline]
  ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffe8c8b0 ra : ffe8c8ae sp : ffe025d17ec0
  gp : ffe005d25378 tp : ffe00f0d t0 : 
  t1 : 0001 t2 : 000f4240 s0 : ffe025d17ee0
  s1 : 2749f0d0 a0 : 002a a1 : 0003
  a2 : 1ffc0cfac500 a3 : ffec80cc a4 : 5ae9db91c19bbe00
  a5 :  a6 : 00f0 a7 : ffe82eba
  s2 : 0004 s3 : ffe00eef96c0 s4 : ffe022c77fe0
  s5 : 4000 s6 : ffe067d74e00 s7 : ffe067d74850
  s8 : ffe067d73e18 s9 : ffe067d74e00 s10: ffe00eef96e8
  s11: 00ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffc4043cafb2
  t5 : ffc4043cafba t6 : 0004
status: 0120 badaddr: 2749f0d0 cause: 
000f

Call Trace:
[] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
    (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

--
You received this message because you are subscribed to the Google 
Groups "syzkaller-bugs" group.
To unsubscribe from this group and stop receiving emails from it, 
send an email to syzkaller-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/syzkaller-bugs/b74f1b05bd316729%40google.com. 



___
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv






Re: [PATCH] net: add net namespace inode for all net_dev events

2021-03-10 Thread Tony Lu
On Wed, Mar 10, 2021 at 11:31:12AM -0500, Steven Rostedt wrote:
> On Wed, 10 Mar 2021 17:03:40 +0800
> Tony Lu  wrote:
> 
> > I use pahole to read vmlinux.o directly with defconfig and
> > CONFIG_DEBUG_INFO enabled, the result shows 22 structs prefixed with
> > trace_event_raw_ that have at least one hole.
> 
> I was thinking of pahole too ;-)
> 
> But the information can also be captured from the format files with simple
> scripts as well. And perhaps be more tuned to see if there's actually a fix
> for them, and ignore reporting it if there is no fix, as all trace events
> are 4 byte aligned, and if we are off by one, sometimes it doesn't matter.

I am going to send a patch to fix this issue later. There are many
events have more than 1 holes, I am trying to pick up the events that are
really to be fixed.


Cheers,
Tony Lu

> 
> -- Steve


Re: [External] Re: [PATCH v18 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap

2021-03-10 Thread Muchun Song
On Thu, Mar 11, 2021 at 1:16 AM Randy Dunlap  wrote:
>
> On 3/10/21 7:37 AM, Michal Hocko wrote:
> > On Mon 08-03-21 18:28:04, Muchun Song wrote:
> >> Add a kernel parameter hugetlb_free_vmemmap to enable the feature of
> >> freeing unused vmemmap pages associated with each hugetlb page on boot.
> >>
> >> We disables PMD mapping of vmemmap pages for x86-64 arch when this
> >> feature is enabled. Because vmemmap_remap_free() depends on vmemmap
> >> being base page mapped.
> >>
> >> Signed-off-by: Muchun Song 
> >> Reviewed-by: Oscar Salvador 
> >> Reviewed-by: Barry Song 
> >> Reviewed-by: Miaohe Lin 
> >> Tested-by: Chen Huang 
> >> Tested-by: Bodeddula Balasubramaniam 
> >> ---
> >>  Documentation/admin-guide/kernel-parameters.txt | 14 ++
> >>  Documentation/admin-guide/mm/hugetlbpage.rst|  3 +++
> >>  arch/x86/mm/init_64.c   |  8 ++--
> >>  include/linux/hugetlb.h | 19 +++
> >>  mm/hugetlb_vmemmap.c| 24 
> >> 
> >>  5 files changed, 66 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> >> b/Documentation/admin-guide/kernel-parameters.txt
> >> index 04545725f187..de91d54573c4 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -1557,6 +1557,20 @@
> >>  Documentation/admin-guide/mm/hugetlbpage.rst.
> >>  Format: size[KMG]
> >>
> >> +hugetlb_free_vmemmap=
> >> +[KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set,
> >> +this controls freeing unused vmemmap pages associated
> >> +with each HugeTLB page. When this option is enabled,
> >> +we disable PMD/huge page mapping of vmemmap pages 
> >> which
> >> +increase page table pages. So if a user/sysadmin only
> >> +uses a small number of HugeTLB pages (as a percentage
> >> +of system memory), they could end up using more memory
> >> +with hugetlb_free_vmemmap on as opposed to off.
> >> +Format: { on | off (default) }
> >
> > Please note this is an admin guide and for those this seems overly low
> > level. I would use something like the following
> >   [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> >   enabled.
> >   Allows heavy hugetlb users to free up some more
> >   memory (6 * PAGE_SIZE for each 2MB hugetlb
> >   page).
> >   This feauture is not free though. Large page
> >   tables are not use to back vmemmap pages which
>
>are not used

Thanks.

>
> >   can lead to a performance degradation for some
> >   workloads. Also there will be memory allocation
> >   required when hugetlb pages are freed from the
> >   pool which can lead to corner cases under heavy
> >   memory pressure.
> >> +
> >> +on:  enable the feature
> >> +off: disable the feature
> >> +
> >>  hung_task_panic=
> >>  [KNL] Should the hung task detector generate panics.
> >>  Format: 0 | 1
>
>
> --
> ~Randy
>


Re: [PATCH v4 2/3] dt-bindings: mtd: Document use of nvmem-cells compatible

2021-03-10 Thread Rafał Miłecki

On 10.03.2021 23:47, Ansuel Smith wrote:

On Wed, Mar 10, 2021 at 11:41:24PM +0100, Rafał Miłecki wrote:

See inline

On 10.03.2021 22:08, Ansuel Smith wrote:

Document nvmem-cells compatible used to treat mtd partitions as a
nvmem provider.

Signed-off-by: Ansuel Smith 
---
   .../bindings/mtd/partitions/nvmem-cells.yaml  | 96 +++
   1 file changed, 96 insertions(+)
   create mode 100644 
Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml

diff --git a/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml 
b/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml
new file mode 100644
index ..f70d7597a6b0
--- /dev/null
+++ b/Documentation/devicetree/bindings/mtd/partitions/nvmem-cells.yaml
@@ -0,0 +1,96 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mtd/partitions/nvmem-cells.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Nvmem cells
+
+description: |
+  Any partition containing the compatible "nvmem-cells" will register as a
+  nvmem provider.
+  Each direct subnodes represents a nvmem cell following the nvmem binding.
+  Nvmem binding to declare nvmem-cells can be found in:
+  Documentation/devicetree/bindings/nvmem/nvmem.yaml
+
+maintainers:
+  - Ansuel Smith 


I think that when Rob wrote:

On 10.03.2021 03:58, Rob Herring wrote:

I think this should reference nvmem.yaml.


he meant you using:

allOf:
   - $ref: "nvmem.yaml#"

(you'll need to adjust binding path).

Please check how it's done in Documentation/devicetree/bindings/nvmem/*.yaml 
files




Aside from that, should I readd the old properties or I can keep the
compatible as the only one required?


What old properties do you mean?

You shouldn't need to add anything to the list of "required" I think.

Some NVMEM providers add "#address-cells" and "#size-cells". That makes
sense if NVMEM provider must provide at least 1 cell. I'm not sure if we
need that for MTD.

Even "compatible" is actually redundant but most YAML files list it for
convenience. Source:

On 10.12.2020 03:48, Rob Herring wrote:
> And drop 'compatible' as required. It's redundant anyways because the
> schema will only be applied if compatible matches.

http://lists.infradead.org/pipermail/linux-mtd/2020-December/084574.html
https://patchwork.ozlabs.org/comment/2597326/


Re: [External] Re: [PATCH v18 6/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap

2021-03-10 Thread Muchun Song
On Wed, Mar 10, 2021 at 11:37 PM Michal Hocko  wrote:
>
> On Mon 08-03-21 18:28:04, Muchun Song wrote:
> > Add a kernel parameter hugetlb_free_vmemmap to enable the feature of
> > freeing unused vmemmap pages associated with each hugetlb page on boot.
> >
> > We disables PMD mapping of vmemmap pages for x86-64 arch when this
> > feature is enabled. Because vmemmap_remap_free() depends on vmemmap
> > being base page mapped.
> >
> > Signed-off-by: Muchun Song 
> > Reviewed-by: Oscar Salvador 
> > Reviewed-by: Barry Song 
> > Reviewed-by: Miaohe Lin 
> > Tested-by: Chen Huang 
> > Tested-by: Bodeddula Balasubramaniam 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 14 ++
> >  Documentation/admin-guide/mm/hugetlbpage.rst|  3 +++
> >  arch/x86/mm/init_64.c   |  8 ++--
> >  include/linux/hugetlb.h | 19 +++
> >  mm/hugetlb_vmemmap.c| 24 
> > 
> >  5 files changed, 66 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index 04545725f187..de91d54573c4 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1557,6 +1557,20 @@
> >   Documentation/admin-guide/mm/hugetlbpage.rst.
> >   Format: size[KMG]
> >
> > + hugetlb_free_vmemmap=
> > + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set,
> > + this controls freeing unused vmemmap pages associated
> > + with each HugeTLB page. When this option is enabled,
> > + we disable PMD/huge page mapping of vmemmap pages 
> > which
> > + increase page table pages. So if a user/sysadmin only
> > + uses a small number of HugeTLB pages (as a percentage
> > + of system memory), they could end up using more memory
> > + with hugetlb_free_vmemmap on as opposed to off.
> > + Format: { on | off (default) }
>
> Please note this is an admin guide and for those this seems overly low

OK.

> level. I would use something like the following
> [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> enabled.
> Allows heavy hugetlb users to free up some more
> memory (6 * PAGE_SIZE for each 2MB hugetlb
> page).
> This feauture is not free though. Large page
> tables are not use to back vmemmap pages which
> can lead to a performance degradation for some
> workloads. Also there will be memory allocation
> required when hugetlb pages are freed from the
> pool which can lead to corner cases under heavy
> memory pressure.

Very thanks. I will update this.

> > +
> > + on:  enable the feature
> > + off: disable the feature
> > +
> >   hung_task_panic=
> >   [KNL] Should the hung task detector generate panics.
> >   Format: 0 | 1
> --
> Michal Hocko
> SUSE Labs


[unixbhas...@gmail.com: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S]

2021-03-10 Thread Bhaskar Chowdhury

- Forwarded message from Bhaskar Chowdhury  -

Date: Thu, 11 Mar 2021 11:40:58 +0530
From: Bhaskar Chowdhury 
To: unixbhas...@gmail.com, linux-i...@vger.kernel.org, 
linux-kernel@vger.kernel.org
Subject: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S
X-Mailer: git-send-email 2.26.2


Mundane spelling fixes.

Signed-off-by: Bhaskar Chowdhury 
---
  Randy and Adrian felt it should go through your tree..so...

 arch/ia64/kernel/fsys.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S
index 0750a716adc7..2094f3249019 100644
--- a/arch/ia64/kernel/fsys.S
+++ b/arch/ia64/kernel/fsys.S
@@ -172,7 +172,7 @@ ENTRY(fsys_gettimeofday)
// r25 = itc_lastcycle value
// r26 = address clocksource cycle_last
// r27 = (not used)
-   // r28 = sequence number at the beginning of critcal section
+   // r28 = sequence number at the beginning of critical section
// r29 = address of itc_jitter
// r30 = time processing flags / memory address
// r31 = pointer to result
@@ -432,7 +432,7 @@ GLOBAL_ENTRY(fsys_bubble_down)
 *  - r29: psr
 *
 * We used to clear some PSR bits here but that requires slow
-* serialization.  Fortuntely, that isn't really necessary.
+* serialization.  Fortunately, that isn't really necessary.
 * The rationale is as follows: we used to clear bits
 * ~PSR_PRESERVED_BITS in PSR.L.  Since
 * PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we
--
2.26.2


- End forwarded message -


Re: [External] Re: [PATCH v18 5/9] mm: hugetlb: set the PageHWPoison to the raw error page

2021-03-10 Thread Muchun Song
On Wed, Mar 10, 2021 at 11:28 PM Michal Hocko  wrote:
>
> On Mon 08-03-21 18:28:03, Muchun Song wrote:
> > Because we reuse the first tail vmemmap page frame and remap it
> > with read-only, we cannot set the PageHWPosion on some tail pages.
> > So we can use the head[4].private (There are at least 128 struct
> > page structures associated with the optimized HugeTLB page, so
> > using head[4].private is safe) to record the real error page index
> > and set the raw error page PageHWPoison later.
>
> Can we have more poisoned tail pages? Also who does consume that index
> and set the HWPoison on the proper tail page?

Good point. I look at the routine of memory failure closely.
If we do not clear the HWPoison of the head page, we cannot
poison another tail page.

So we should not set the destructor of the huge page from
HUGETLB_PAGE_DTOR to NULL_COMPOUND_DTOR
before calling alloc_huge_page_vmemmap(). In this case,
the below check of PageHuge() always returns true.

I need to fix this in the previous patch.

memory_failure()
if (PageHuge(page))
memory_failure_hugetlb()
head = compound_head(page)
if (TestSetPageHWPoison(head))
return

Thanks.

>
> > Signed-off-by: Muchun Song 
> > Reviewed-by: Oscar Salvador 
> > Acked-by: David Rientjes 
> > Tested-by: Chen Huang 
> > Tested-by: Bodeddula Balasubramaniam 
> > ---
> >  mm/hugetlb.c | 80 
> > ++--
> >  1 file changed, 72 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 377e0c1b283f..c0c1b7635ca9 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1304,6 +1304,74 @@ static inline void 
> > destroy_compound_gigantic_page(struct page *page,
> >   unsigned int order) { }
> >  #endif
> >
> > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > +static inline void hwpoison_subpage_deliver(struct hstate *h, struct page 
> > *head)
> > +{
> > + struct page *page;
> > +
> > + if (!PageHWPoison(head) || !free_vmemmap_pages_per_hpage(h))
> > + return;
> > +
> > + page = head + page_private(head + 4);
> > +
> > + /*
> > +  * Move PageHWPoison flag from head page to the raw error page,
> > +  * which makes any subpages rather than the error page reusable.
> > +  */
> > + if (page != head) {
> > + SetPageHWPoison(page);
> > + ClearPageHWPoison(head);
> > + }
> > +}
> > +
> > +static inline void hwpoison_subpage_set(struct hstate *h, struct page 
> > *head,
> > + struct page *page)
> > +{
> > + if (!PageHWPoison(head))
> > + return;
> > +
> > + if (free_vmemmap_pages_per_hpage(h)) {
> > + set_page_private(head + 4, page - head);
> > + } else if (page != head) {
> > + /*
> > +  * Move PageHWPoison flag from head page to the raw error 
> > page,
> > +  * which makes any subpages rather than the error page 
> > reusable.
> > +  */
> > + SetPageHWPoison(page);
> > + ClearPageHWPoison(head);
> > + }
> > +}
> > +
> > +static inline void hwpoison_subpage_clear(struct hstate *h, struct page 
> > *head)
> > +{
> > + if (!PageHWPoison(head) || !free_vmemmap_pages_per_hpage(h))
> > + return;
> > +
> > + set_page_private(head + 4, 0);
> > +}
> > +#else
> > +static inline void hwpoison_subpage_deliver(struct hstate *h, struct page 
> > *head)
> > +{
> > +}
> > +
> > +static inline void hwpoison_subpage_set(struct hstate *h, struct page 
> > *head,
> > + struct page *page)
> > +{
> > + if (PageHWPoison(head) && page != head) {
> > + /*
> > +  * Move PageHWPoison flag from head page to the raw error 
> > page,
> > +  * which makes any subpages rather than the error page 
> > reusable.
> > +  */
> > + SetPageHWPoison(page);
> > + ClearPageHWPoison(head);
> > + }
> > +}
> > +
> > +static inline void hwpoison_subpage_clear(struct hstate *h, struct page 
> > *head)
> > +{
> > +}
> > +#endif
> > +
> >  static int update_and_free_page(struct hstate *h, struct page *page)
> >   __releases(_lock) __acquires(_lock)
> >  {
> > @@ -1357,6 +1425,8 @@ static int update_and_free_page(struct hstate *h, 
> > struct page *page)
> >   return -ENOMEM;
> >   }
> >
> > + hwpoison_subpage_deliver(h, page);
> > +
> >   for (i = 0; i < pages_per_huge_page(h);
> >i++, subpage = mem_map_next(subpage, page, i)) {
> >   subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
> > @@ -1801,14 +1871,7 @@ int dissolve_free_huge_page(struct page *page)
> >   goto retry;
> >   }
> >
> > - /*
> > -  * Move PageHWPoison flag from head page to the raw error 
> > page,
> > -

'make O=' indigestion with module signing

2021-03-10 Thread Valdis Klētnieks
So, I tried doing a 'make O=... allmodconfig', with a setup where the uid of
the build process had write permission to the O= directory, but intentionally
did *not* have write permission to the source tree (so they couldn't mess up
the tree - I got tired of having to repeatedly do 'make mrproper' because of
pilot error)

allmodconfig gave me a .config that had:

CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_FORCE=y
CONFIG_MODULE_SIG_ALL=y
CONFIG_MODULE_SIG_SHA1=y
# CONFIG_MODULE_SIG_SHA224 is not set
# CONFIG_MODULE_SIG_SHA256 is not set
# CONFIG_MODULE_SIG_SHA384 is not set
# CONFIG_MODULE_SIG_SHA512 is not set
CONFIG_MODULE_SIG_HASH="sha1"
CONFIG_IMA_APPRAISE_REQUIRE_MODULE_SIGS=y
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"

What i *expected* was that multiple builds with different O= would each
generate themselves a unique signing key and put it in their own O= directory
and stay out of each other's way.

What actually happened:

  EXTRACT_CERTS   /usr/src/linux-next/"certs/signing_key.pem"
At main.c:142:
- SSL error:0200100D:system library:fopen:Permission denied: 
../crypto/bio/bss_file.c:69
- SSL error:2006D002:BIO routines:BIO_new_file:system lib: 
../crypto/bio/bss_file.c:78
extract-cert: /usr/src/linux-next/certs/signing_key.pem: Permission denied
make[2]: *** [/usr/src/linux-next/certs/Makefile:106: certs/signing_key.x509] 
Error 1
make[1]: *** [/usr/src/linux-next/Makefile:1847: certs] Error 2
make[1]: Leaving directory '/usr/src/linux-next/out/arm64'
make: *** [Makefile:215: __sub-make] Error 2

It tried to put the key into the source tree rather than the build tree.

Before I try to code up a fix for this, is this intentionally designed
behavior, or have I just managed to trip over a rarely-tested corner case?


pgpL1B7YjVYls.pgp
Description: PGP signature


[PATCH] kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)

2021-03-10 Thread Masahiro Yamada
This is a remnant of commit 78046fabe6e7 ("kbuild: determine the output
format of DTC by the target suffix").

The parameter "yaml" is meaningless because cmd_dtc no loner takes $(2).

Reported-by: Rob Herring 
Signed-off-by: Masahiro Yamada 
---

 scripts/Makefile.lib | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 90a4e04cd8f5..8cd67b1b6d15 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -352,7 +352,7 @@ define rule_dtc
 endef
 
 $(obj)/%.dt.yaml: $(src)/%.dts $(DTC) $(DT_TMP_SCHEMA) FORCE
-   $(call if_changed_rule,dtc,yaml)
+   $(call if_changed_rule,dtc)
 
 dtc-tmp = $(subst $(comma),_,$(dot-target).dts.tmp)
 
-- 
2.27.0



[PATCH] scsi: Fix a double free in myrs_cleanup

2021-03-10 Thread Lv Yunlong
In myrs_cleanup, cs->mmio_base will be freed twice by
iounmap().

Fixes: 77266186397c6 ("scsi: myrs: Add Mylex RAID controller (SCSI interface)")
Signed-off-by: Lv Yunlong 
---
 drivers/scsi/myrs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/myrs.c b/drivers/scsi/myrs.c
index 4adf9ded296a..329fd025c718 100644
--- a/drivers/scsi/myrs.c
+++ b/drivers/scsi/myrs.c
@@ -2273,12 +2273,12 @@ static void myrs_cleanup(struct myrs_hba *cs)
if (cs->mmio_base) {
cs->disable_intr(cs);
iounmap(cs->mmio_base);
+   cs->mmio_base = NULL;
}
if (cs->irq)
free_irq(cs->irq, cs);
if (cs->io_addr)
release_region(cs->io_addr, 0x80);
-   iounmap(cs->mmio_base);
pci_set_drvdata(pdev, NULL);
pci_disable_device(pdev);
scsi_host_put(cs->host);
-- 
2.25.1




Re: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S

2021-03-10 Thread Randy Dunlap
On 3/10/21 10:24 PM, Bhaskar Chowdhury wrote:
> On 22:15 Wed 10 Mar 2021, Randy Dunlap wrote:
>> On 3/10/21 10:10 PM, Bhaskar Chowdhury wrote:
>>>
>>> Mundane spelling fixes.
>>>
>>> Signed-off-by: Bhaskar Chowdhury 
>>
>> Acked-by: Randy Dunlap 
>>
>> but no maintainer Cc:ed to pick it up...
>>
> I have seen "maintainer less" files and you have pointed out that too in the
> past...what do I do Randy??? Should run "git blame" to find out ,who touch 
> the file and
> include those mail??? I don't think that would be good idea.
> 
> Bemused!! Throw some light.

Well, we know that ia64 is orphaned, so you need to find someone
to merge the patch. Andrew Morton is usually a good bet. :)

or use 'git log'  to see who has been merging patches
to that file.

>>> ---
>>>  arch/ia64/kernel/fsys.S | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)

-- 
~Randy



Re: [PATCH] ia64: kernel: Few typos fixed in the file fsys.S

2021-03-10 Thread Bhaskar Chowdhury

On 22:15 Wed 10 Mar 2021, Randy Dunlap wrote:

On 3/10/21 10:10 PM, Bhaskar Chowdhury wrote:


Mundane spelling fixes.

Signed-off-by: Bhaskar Chowdhury 


Acked-by: Randy Dunlap 

but no maintainer Cc:ed to pick it up...


I have seen "maintainer less" files and you have pointed out that too in the
past...what do I do Randy??? Should run "git blame" to find out ,who touch the 
file and
include those mail??? I don't think that would be good idea.

Bemused!! Throw some light.

---
 arch/ia64/kernel/fsys.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S
index 0750a716adc7..2094f3249019 100644
--- a/arch/ia64/kernel/fsys.S
+++ b/arch/ia64/kernel/fsys.S
@@ -172,7 +172,7 @@ ENTRY(fsys_gettimeofday)
// r25 = itc_lastcycle value
// r26 = address clocksource cycle_last
// r27 = (not used)
-   // r28 = sequence number at the beginning of critcal section
+   // r28 = sequence number at the beginning of critical section
// r29 = address of itc_jitter
// r30 = time processing flags / memory address
// r31 = pointer to result
@@ -432,7 +432,7 @@ GLOBAL_ENTRY(fsys_bubble_down)
 *  - r29: psr
 *
 * We used to clear some PSR bits here but that requires slow
-* serialization.  Fortuntely, that isn't really necessary.
+* serialization.  Fortunately, that isn't really necessary.
 * The rationale is as follows: we used to clear bits
 * ~PSR_PRESERVED_BITS in PSR.L.  Since
 * PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we
--



--
~Randy



signature.asc
Description: PGP signature


[net-next PATCH v7 16/16] net: dpaa2-mac: Add ACPI support for DPAA2 MAC driver

2021-03-10 Thread Calvin Johnson
Modify dpaa2_mac_get_node() to get the dpmac fwnode from either
DT or ACPI.

Modify dpaa2_mac_get_if_mode() to get interface mode from dpmac_node
which is a fwnode.

Modify dpaa2_pcs_create() to create pcs from dpmac_node fwnode.

Modify dpaa2_mac_connect() to support ACPI along with DT.

Signed-off-by: Calvin Johnson 
---

Changes in v7:
- remove unnecassary checks

Changes in v6:
- use dev_fwnode()
- remove useless else
- replace of_device_is_available() to fwnode_device_is_available()

Changes in v5:
- replace fwnode_get_id() with OF and ACPI function calls

Changes in v4: None
Changes in v3: None
Changes in v2:
- Refactor OF functions to use fwnode functions

 .../net/ethernet/freescale/dpaa2/dpaa2-mac.c  | 84 +++
 1 file changed, 50 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
index ccaf7e35abeb..0cb6760dce88 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
 /* Copyright 2019 NXP */
 
+#include 
+#include 
+
 #include "dpaa2-eth.h"
 #include "dpaa2-mac.h"
 
@@ -34,39 +37,51 @@ static int phy_mode(enum dpmac_eth_if eth_if, 
phy_interface_t *if_mode)
return 0;
 }
 
-/* Caller must call of_node_put on the returned value */
-static struct device_node *dpaa2_mac_get_node(u16 dpmac_id)
+static struct fwnode_handle *dpaa2_mac_get_node(struct device *dev,
+   u16 dpmac_id)
 {
-   struct device_node *dpmacs, *dpmac = NULL;
-   u32 id;
+   struct fwnode_handle *fwnode, *parent, *child  = NULL;
+   struct device_node *dpmacs = NULL;
int err;
+   u32 id;
 
-   dpmacs = of_find_node_by_name(NULL, "dpmacs");
-   if (!dpmacs)
-   return NULL;
+   fwnode = dev_fwnode(dev->parent);
+   if (is_of_node(fwnode)) {
+   dpmacs = of_find_node_by_name(NULL, "dpmacs");
+   if (!dpmacs)
+   return NULL;
+   parent = of_fwnode_handle(dpmacs);
+   } else if (is_acpi_node(fwnode)) {
+   parent = fwnode;
+   }
 
-   while ((dpmac = of_get_next_child(dpmacs, dpmac)) != NULL) {
-   err = of_property_read_u32(dpmac, "reg", );
+   fwnode_for_each_child_node(parent, child) {
+   err = -EINVAL;
+   if (is_acpi_device_node(child))
+   err = acpi_get_local_address(ACPI_HANDLE_FWNODE(child), 
);
+   else if (is_of_node(child))
+   err = of_property_read_u32(to_of_node(child), "reg", 
);
if (err)
continue;
-   if (id == dpmac_id)
-   break;
-   }
 
+   if (id == dpmac_id) {
+   of_node_put(dpmacs);
+   return child;
+   }
+   }
of_node_put(dpmacs);
-
-   return dpmac;
+   return NULL;
 }
 
-static int dpaa2_mac_get_if_mode(struct device_node *node,
+static int dpaa2_mac_get_if_mode(struct fwnode_handle *dpmac_node,
 struct dpmac_attr attr)
 {
phy_interface_t if_mode;
int err;
 
-   err = of_get_phy_mode(node, _mode);
-   if (!err)
-   return if_mode;
+   err = fwnode_get_phy_mode(dpmac_node);
+   if (err > 0)
+   return err;
 
err = phy_mode(attr.eth_if, _mode);
if (!err)
@@ -235,26 +250,27 @@ static const struct phylink_mac_ops dpaa2_mac_phylink_ops 
= {
 };
 
 static int dpaa2_pcs_create(struct dpaa2_mac *mac,
-   struct device_node *dpmac_node, int id)
+   struct fwnode_handle *dpmac_node,
+   int id)
 {
struct mdio_device *mdiodev;
-   struct device_node *node;
+   struct fwnode_handle *node;
 
-   node = of_parse_phandle(dpmac_node, "pcs-handle", 0);
-   if (!node) {
+   node = fwnode_find_reference(dpmac_node, "pcs-handle", 0);
+   if (IS_ERR(node)) {
/* do not error out on old DTS files */
netdev_warn(mac->net_dev, "pcs-handle node not found\n");
return 0;
}
 
-   if (!of_device_is_available(node)) {
+   if (!fwnode_device_is_available(node)) {
netdev_err(mac->net_dev, "pcs-handle node not available\n");
-   of_node_put(node);
+   fwnode_handle_put(node);
return -ENODEV;
}
 
-   mdiodev = of_mdio_find_device(node);
-   of_node_put(node);
+   mdiodev = fwnode_mdio_find_device(node);
+   fwnode_handle_put(node);
if (!mdiodev)
return -EPROBE_DEFER;
 
@@ -283,13 +299,13 @@ static void dpaa2_pcs_destroy(struct dpaa2_mac *mac)
 int dpaa2_mac_connect(struct dpaa2_mac *mac)
 {

[net-next PATCH v7 14/16] net: phylink: introduce phylink_fwnode_phy_connect()

2021-03-10 Thread Calvin Johnson
Define phylink_fwnode_phy_connect() to connect phy specified by
a fwnode to a phylink instance.

Signed-off-by: Calvin Johnson 
---

Changes in v7: None
Changes in v6:
- remove OF check for fixed-link

Changes in v5: None
Changes in v4:
- call phy_device_free() before returning

Changes in v3: None
Changes in v2: None

 drivers/net/phy/phylink.c | 54 +++
 include/linux/phylink.h   |  3 +++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 053c92e02cd8..23753f92e0a6 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -5,6 +5,7 @@
  *
  * Copyright (C) 2015 Russell King
  */
+#include 
 #include 
 #include 
 #include 
@@ -1124,6 +1125,59 @@ int phylink_of_phy_connect(struct phylink *pl, struct 
device_node *dn,
 }
 EXPORT_SYMBOL_GPL(phylink_of_phy_connect);
 
+/**
+ * phylink_fwnode_phy_connect() - connect the PHY specified in the fwnode.
+ * @pl: a pointer to a  phylink returned from phylink_create()
+ * @fwnode: a pointer to a  fwnode_handle.
+ * @flags: PHY-specific flags to communicate to the PHY device driver
+ *
+ * Connect the phy specified @fwnode to the phylink instance specified
+ * by @pl.
+ *
+ * Returns 0 on success or a negative errno.
+ */
+int phylink_fwnode_phy_connect(struct phylink *pl,
+  struct fwnode_handle *fwnode,
+  u32 flags)
+{
+   struct fwnode_handle *phy_fwnode;
+   struct phy_device *phy_dev;
+   int ret;
+
+   /* Fixed links and 802.3z are handled without needing a PHY */
+   if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
+   (pl->cfg_link_an_mode == MLO_AN_INBAND &&
+phy_interface_mode_is_8023z(pl->link_interface)))
+   return 0;
+
+   phy_fwnode = fwnode_get_phy_node(fwnode);
+   if (IS_ERR(phy_fwnode)) {
+   if (pl->cfg_link_an_mode == MLO_AN_PHY)
+   return -ENODEV;
+   return 0;
+   }
+
+   phy_dev = fwnode_phy_find_device(phy_fwnode);
+   /* We're done with the phy_node handle */
+   fwnode_handle_put(phy_fwnode);
+   if (!phy_dev)
+   return -ENODEV;
+
+   ret = phy_attach_direct(pl->netdev, phy_dev, flags,
+   pl->link_interface);
+   if (ret) {
+   phy_device_free(phy_dev);
+   return ret;
+   }
+
+   ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface);
+   if (ret)
+   phy_detach(phy_dev);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(phylink_fwnode_phy_connect);
+
 /**
  * phylink_disconnect_phy() - disconnect any PHY attached to the phylink
  *   instance.
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index d81a714cfbbd..75d4f99090fd 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -439,6 +439,9 @@ void phylink_destroy(struct phylink *);
 
 int phylink_connect_phy(struct phylink *, struct phy_device *);
 int phylink_of_phy_connect(struct phylink *, struct device_node *, u32 flags);
+int phylink_fwnode_phy_connect(struct phylink *pl,
+  struct fwnode_handle *fwnode,
+  u32 flags);
 void phylink_disconnect_phy(struct phylink *);
 
 void phylink_mac_change(struct phylink *, bool up);
-- 
2.17.1



[net-next PATCH v7 15/16] net: phylink: Refactor phylink_of_phy_connect()

2021-03-10 Thread Calvin Johnson
Refactor phylink_of_phy_connect() to use phylink_fwnode_phy_connect().

Signed-off-by: Calvin Johnson 
---

Changes in v7: None
Changes in v6: None
Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/net/phy/phylink.c | 39 +--
 1 file changed, 1 insertion(+), 38 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 23753f92e0a6..ce7e918430c8 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1084,44 +1084,7 @@ EXPORT_SYMBOL_GPL(phylink_connect_phy);
 int phylink_of_phy_connect(struct phylink *pl, struct device_node *dn,
   u32 flags)
 {
-   struct device_node *phy_node;
-   struct phy_device *phy_dev;
-   int ret;
-
-   /* Fixed links and 802.3z are handled without needing a PHY */
-   if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
-   (pl->cfg_link_an_mode == MLO_AN_INBAND &&
-phy_interface_mode_is_8023z(pl->link_interface)))
-   return 0;
-
-   phy_node = of_parse_phandle(dn, "phy-handle", 0);
-   if (!phy_node)
-   phy_node = of_parse_phandle(dn, "phy", 0);
-   if (!phy_node)
-   phy_node = of_parse_phandle(dn, "phy-device", 0);
-
-   if (!phy_node) {
-   if (pl->cfg_link_an_mode == MLO_AN_PHY)
-   return -ENODEV;
-   return 0;
-   }
-
-   phy_dev = of_phy_find_device(phy_node);
-   /* We're done with the phy_node handle */
-   of_node_put(phy_node);
-   if (!phy_dev)
-   return -ENODEV;
-
-   ret = phy_attach_direct(pl->netdev, phy_dev, flags,
-   pl->link_interface);
-   if (ret)
-   return ret;
-
-   ret = phylink_bringup_phy(pl, phy_dev, pl->link_config.interface);
-   if (ret)
-   phy_detach(phy_dev);
-
-   return ret;
+   return phylink_fwnode_phy_connect(pl, of_fwnode_handle(dn), flags);
 }
 EXPORT_SYMBOL_GPL(phylink_of_phy_connect);
 
-- 
2.17.1



[net-next PATCH v7 13/16] net/fsl: Use fwnode_mdiobus_register()

2021-03-10 Thread Calvin Johnson
fwnode_mdiobus_register() internally takes care of both DT
and ACPI cases to register mdiobus. Replace existing
of_mdiobus_register() with fwnode_mdiobus_register().

Note: For both ACPI and DT cases, endianness of MDIO controller
need to be specified using "little-endian" property.

Signed-off-by: Calvin Johnson 
---

Changes in v7:
- Include fwnode_mdio.h
- Alphabetically sort header inclusions

Changes in v6: None
Changes in v5: None
Changes in v4:
- Cleanup xgmac_mdio_probe()

Changes in v3:
- Avoid unnecessary line removal
- Remove unused inclusion of acpi.h

Changes in v2: None

 drivers/net/ethernet/freescale/xgmac_mdio.c | 22 -
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/xgmac_mdio.c 
b/drivers/net/ethernet/freescale/xgmac_mdio.c
index bfa2826c5545..6daf1fb2e9ea 100644
--- a/drivers/net/ethernet/freescale/xgmac_mdio.c
+++ b/drivers/net/ethernet/freescale/xgmac_mdio.c
@@ -2,6 +2,7 @@
  * QorIQ 10G MDIO Controller
  *
  * Copyright 2012 Freescale Semiconductor, Inc.
+ * Copyright 2021 NXP
  *
  * Authors: Andy Fleming 
  *  Timur Tabi 
@@ -11,15 +12,16 @@
  * kind, whether express or implied.
  */
 
-#include 
-#include 
+#include 
 #include 
-#include 
-#include 
+#include 
 #include 
+#include 
 #include 
-#include 
 #include 
+#include 
+#include 
+#include 
 
 /* Number of microseconds to wait for a register to respond */
 #define TIMEOUT1000
@@ -243,10 +245,9 @@ static int xgmac_mdio_read(struct mii_bus *bus, int 
phy_id, int regnum)
 
 static int xgmac_mdio_probe(struct platform_device *pdev)
 {
-   struct device_node *np = pdev->dev.of_node;
-   struct mii_bus *bus;
-   struct resource *res;
struct mdio_fsl_priv *priv;
+   struct resource *res;
+   struct mii_bus *bus;
int ret;
 
/* In DPAA-1, MDIO is one of the many FMan sub-devices. The FMan
@@ -279,13 +280,16 @@ static int xgmac_mdio_probe(struct platform_device *pdev)
goto err_ioremap;
}
 
+   /* For both ACPI and DT cases, endianness of MDIO controller
+* needs to be specified using "little-endian" property.
+*/
priv->is_little_endian = device_property_read_bool(>dev,
   "little-endian");
 
priv->has_a011043 = device_property_read_bool(>dev,
  "fsl,erratum-a011043");
 
-   ret = of_mdiobus_register(bus, np);
+   ret = fwnode_mdiobus_register(bus, pdev->dev.fwnode);
if (ret) {
dev_err(>dev, "cannot register MDIO bus\n");
goto err_registration;
-- 
2.17.1



[net-next PATCH v7 12/16] net: mdiobus: Introduce fwnode_mdiobus_register()

2021-03-10 Thread Calvin Johnson
Introduce fwnode_mdiobus_register() to register PHYs on the  mdiobus.
If the fwnode is DT node, then call of_mdiobus_register().
If it is an ACPI node, then call acpi_mdiobus_register().

Signed-off-by: Calvin Johnson 
---

Changes in v7:
- Move fwnode_mdiobus_register() to fwnode_mdio.c

Changes in v6: None
Changes in v5: None
Changes in v4:
- Remove redundant else from fwnode_mdiobus_register()

Changes in v3:
- Use acpi_mdiobus_register()

Changes in v2: None

 drivers/net/mdio/fwnode_mdio.c | 21 +
 include/linux/fwnode_mdio.h|  5 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 0982e816a6fb..523c2778b287 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,23 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus,
return 0;
 }
 EXPORT_SYMBOL(fwnode_mdiobus_register_phy);
+
+/**
+ * fwnode_mdiobus_register - Register mii_bus and create PHYs from fwnode
+ * @mdio: pointer to mii_bus structure
+ * @fwnode: pointer to fwnode of MDIO bus.
+ *
+ * This function returns of_mdiobus_register() for DT and
+ * acpi_mdiobus_register() for ACPI.
+ */
+int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode)
+{
+   if (is_of_node(fwnode))
+   return of_mdiobus_register(mdio, to_of_node(fwnode));
+
+   if (is_acpi_node(fwnode))
+   return acpi_mdiobus_register(mdio, fwnode);
+
+   return -EINVAL;
+}
+EXPORT_SYMBOL(fwnode_mdiobus_register);
diff --git a/include/linux/fwnode_mdio.h b/include/linux/fwnode_mdio.h
index 8c0392845916..20f22211260b 100644
--- a/include/linux/fwnode_mdio.h
+++ b/include/linux/fwnode_mdio.h
@@ -12,6 +12,7 @@
 int fwnode_mdiobus_register_phy(struct mii_bus *bus,
struct fwnode_handle *child, u32 addr);
 
+int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle 
*fwnode);
 #else /* CONFIG_FWNODE_MDIO */
 static inline int fwnode_mdiobus_register_phy(struct mii_bus *bus,
  struct fwnode_handle *child,
@@ -19,6 +20,10 @@ static inline int fwnode_mdiobus_register_phy(struct mii_bus 
*bus,
 {
return -EINVAL;
 }
+static int fwnode_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle 
*fwnode)
+{
+   return -EINVAL;
+}
 #endif
 
 #endif /* __LINUX_FWNODE_MDIO_H */
-- 
2.17.1



[net-next PATCH v7 11/16] net: mdio: Add ACPI support code for mdio

2021-03-10 Thread Calvin Johnson
Define acpi_mdiobus_register() to Register mii_bus and create PHYs for
each ACPI child node.

Signed-off-by: Calvin Johnson 
---

Changes in v7:
- Include headers directly used in acpi_mdio.c

Changes in v6:
- use GENMASK() and ACPI_COMPANION_SET()
- some cleanup
- remove unwanted header inclusion

Changes in v5:
- add missing MODULE_LICENSE()
- replace fwnode_get_id() with OF and ACPI function calls

Changes in v4: None
Changes in v3: None
Changes in v2: None

 MAINTAINERS  |  1 +
 drivers/net/mdio/Kconfig |  7 +
 drivers/net/mdio/Makefile|  1 +
 drivers/net/mdio/acpi_mdio.c | 56 
 include/linux/acpi_mdio.h| 25 
 5 files changed, 90 insertions(+)
 create mode 100644 drivers/net/mdio/acpi_mdio.c
 create mode 100644 include/linux/acpi_mdio.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 146de41d2656..051377b7fa94 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6680,6 +6680,7 @@ F:Documentation/devicetree/bindings/net/mdio*
 F: Documentation/devicetree/bindings/net/qca,ar803x.yaml
 F: Documentation/networking/phy.rst
 F: drivers/net/mdio/
+F: drivers/net/mdio/acpi_mdio.c
 F: drivers/net/mdio/fwnode_mdio.c
 F: drivers/net/mdio/of_mdio.c
 F: drivers/net/pcs/
diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig
index 2d5bf5ccffb5..fc8c787b448f 100644
--- a/drivers/net/mdio/Kconfig
+++ b/drivers/net/mdio/Kconfig
@@ -36,6 +36,13 @@ config OF_MDIO
help
  OpenFirmware MDIO bus (Ethernet PHY) accessors
 
+config ACPI_MDIO
+   def_tristate PHYLIB
+   depends on ACPI
+   depends on PHYLIB
+   help
+ ACPI MDIO bus (Ethernet PHY) accessors
+
 if MDIO_BUS
 
 config MDIO_DEVRES
diff --git a/drivers/net/mdio/Makefile b/drivers/net/mdio/Makefile
index ea5390e2ef84..e8b739a3df1c 100644
--- a/drivers/net/mdio/Makefile
+++ b/drivers/net/mdio/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 # Makefile for Linux MDIO bus drivers
 
+obj-$(CONFIG_ACPI_MDIO)+= acpi_mdio.o
 obj-$(CONFIG_FWNODE_MDIO)  += fwnode_mdio.o
 obj-$(CONFIG_OF_MDIO)  += of_mdio.o
 
diff --git a/drivers/net/mdio/acpi_mdio.c b/drivers/net/mdio/acpi_mdio.c
new file mode 100644
index ..60a86e3fc246
--- /dev/null
+++ b/drivers/net/mdio/acpi_mdio.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * ACPI helpers for the MDIO (Ethernet PHY) API
+ *
+ * This file provides helper functions for extracting PHY device information
+ * out of the ACPI ASL and using it to populate an mii_bus.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Calvin Johnson ");
+MODULE_LICENSE("GPL");
+
+/**
+ * acpi_mdiobus_register - Register mii_bus and create PHYs from the ACPI ASL.
+ * @mdio: pointer to mii_bus structure
+ * @fwnode: pointer to fwnode of MDIO bus.
+ *
+ * This function registers the mii_bus structure and registers a phy_device
+ * for each child node of @fwnode.
+ */
+int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode)
+{
+   struct fwnode_handle *child;
+   u32 addr;
+   int ret;
+
+   /* Mask out all PHYs from auto probing. */
+   mdio->phy_mask = GENMASK(31, 0);
+   ret = mdiobus_register(mdio);
+   if (ret)
+   return ret;
+
+   ACPI_COMPANION_SET(>dev, to_acpi_device_node(fwnode));
+
+   /* Loop over the child nodes and register a phy_device for each PHY */
+   fwnode_for_each_child_node(fwnode, child) {
+   ret = acpi_get_local_address(ACPI_HANDLE_FWNODE(child), );
+   if (ret || addr >= PHY_MAX_ADDR)
+   continue;
+
+   ret = fwnode_mdiobus_register_phy(mdio, child, addr);
+   if (ret == -ENODEV)
+   dev_err(>dev,
+   "MDIO device at address %d is missing.\n",
+   addr);
+   }
+   return 0;
+}
+EXPORT_SYMBOL(acpi_mdiobus_register);
diff --git a/include/linux/acpi_mdio.h b/include/linux/acpi_mdio.h
new file mode 100644
index ..748d261fe2f9
--- /dev/null
+++ b/include/linux/acpi_mdio.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * ACPI helper for the MDIO (Ethernet PHY) API
+ */
+
+#ifndef __LINUX_ACPI_MDIO_H
+#define __LINUX_ACPI_MDIO_H
+
+#include 
+
+#if IS_ENABLED(CONFIG_ACPI_MDIO)
+int acpi_mdiobus_register(struct mii_bus *mdio, struct fwnode_handle *fwnode);
+#else /* CONFIG_ACPI_MDIO */
+static inline int acpi_mdiobus_register(struct mii_bus *mdio, struct 
fwnode_handle *fwnode)
+{
+   /*
+* Fall back to mdiobus_register() function to register a bus.
+* This way, we don't have to keep compat bits around in drivers.
+*/
+
+   return mdiobus_register(mdio);
+}
+#endif
+
+#endif /* __LINUX_ACPI_MDIO_H */
-- 
2.17.1



[net-next PATCH v7 09/16] of: mdio: Refactor of_mdiobus_register_phy()

2021-03-10 Thread Calvin Johnson
Refactor of_mdiobus_register_phy() to use fwnode_mdiobus_register_phy().

Signed-off-by: Calvin Johnson 
---

Changes in v7:
- include fwnode_mdio.h

Changes in v6: None
Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/net/mdio/of_mdio.c | 39 ++
 1 file changed, 2 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c
index db293e0b8249..eebd4d9e1656 100644
--- a/drivers/net/mdio/of_mdio.c
+++ b/drivers/net/mdio/of_mdio.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -98,43 +99,7 @@ EXPORT_SYMBOL(of_mdiobus_phy_device_register);
 static int of_mdiobus_register_phy(struct mii_bus *mdio,
struct device_node *child, u32 addr)
 {
-   struct mii_timestamper *mii_ts;
-   struct phy_device *phy;
-   bool is_c45;
-   int rc;
-   u32 phy_id;
-
-   mii_ts = of_find_mii_timestamper(child);
-   if (IS_ERR(mii_ts))
-   return PTR_ERR(mii_ts);
-
-   is_c45 = of_device_is_compatible(child,
-"ethernet-phy-ieee802.3-c45");
-
-   if (!is_c45 && !of_get_phy_id(child, _id))
-   phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
-   else
-   phy = get_phy_device(mdio, addr, is_c45);
-   if (IS_ERR(phy)) {
-   unregister_mii_timestamper(mii_ts);
-   return PTR_ERR(phy);
-   }
-
-   rc = of_mdiobus_phy_device_register(mdio, phy, child, addr);
-   if (rc) {
-   unregister_mii_timestamper(mii_ts);
-   phy_device_free(phy);
-   return rc;
-   }
-
-   /* phy->mii_ts may already be defined by the PHY driver. A
-* mii_timestamper probed via the device tree will still have
-* precedence.
-*/
-   if (mii_ts)
-   phy->mii_ts = mii_ts;
-
-   return 0;
+   return fwnode_mdiobus_register_phy(mdio, of_fwnode_handle(child), addr);
 }
 
 static int of_mdiobus_register_device(struct mii_bus *mdio,
-- 
2.17.1



  1   2   3   4   5   6   7   8   9   10   >