Re: [PATCH 2/5] fscrypt: Export fscrypt_d_revalidate

2020-09-22 Thread Eric Biggers
On Wed, Sep 23, 2020 at 01:01:48AM +, Daniel Rosenberg wrote:
> This is in preparation for shifting the responsibility of setting the
> dentry_operations to the filesystem, allowing it to maintain its own
> operations.
> 
> Signed-off-by: Daniel Rosenberg 
> ---
>  fs/crypto/fname.c   | 3 ++-
>  include/linux/fscrypt.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
> index 011830f84d8d..d45db23ff6c4 100644
> --- a/fs/crypto/fname.c
> +++ b/fs/crypto/fname.c
> @@ -541,7 +541,7 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_siphash);
>   * Validate dentries in encrypted directories to make sure we aren't 
> potentially
>   * caching stale dentries after a key has been added.
>   */
> -static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
> +int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
>  {
>   struct dentry *dir;
>   int err;
> @@ -580,6 +580,7 @@ static int fscrypt_d_revalidate(struct dentry *dentry, 
> unsigned int flags)
>  
>   return valid;
>  }
> +EXPORT_SYMBOL_GPL(fscrypt_d_revalidate);
>  
>  const struct dentry_operations fscrypt_d_ops = {
>   .d_revalidate = fscrypt_d_revalidate,
> diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
> index 991ff8575d0e..265b1e9119dc 100644
> --- a/include/linux/fscrypt.h
> +++ b/include/linux/fscrypt.h
> @@ -207,6 +207,7 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
>  bool fscrypt_match_name(const struct fscrypt_name *fname,
>   const u8 *de_name, u32 de_name_len);
>  u64 fscrypt_fname_siphash(const struct inode *dir, const struct qstr *name);
> +extern int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);

Please don't use 'extern' here.

Also FYI, Jeff Layton has sent this same patch as part of the ceph support for
fscrypt: 
https://lkml.kernel.org/linux-fscrypt/20200914191707.380444-4-jlay...@kernel.org

I'd like to apply one of them for 5.10 to get it out of the way for both
patchsets, but I'd like for the commit message to mention both users.

- Eric


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Christoph Hellwig
I suspect the patch below might be better.  Can you send me a full dmesg
with this one applied?  Preferably on top of Jens' for-next branch?


diff --git a/block/genhd.c b/block/genhd.c
index 9d060e79eb31d8..ef2784c69d59ee 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -832,7 +832,9 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * Take an extra ref on queue which will be put on disk_release()
 * so that it sticks around as long as @disk is there.
 */
-   WARN_ON_ONCE(!blk_get_queue(disk->queue));
+   WARN_ON_ONCE(blk_queue_dying(disk->queue));
+   __blk_get_queue(disk->queue);
+   disk->flags |= GENHD_FL_QUEUE_REF;
 
disk_add_events(disk);
blk_integrity_add(disk);
@@ -1564,7 +1566,7 @@ static void disk_release(struct device *dev)
kfree(disk->random);
disk_replace_part_tbl(disk, NULL);
hd_free_part(>part0);
-   if (disk->queue)
+   if (disk->flags & GENHD_FL_QUEUE_REF)
blk_put_queue(disk->queue);
kfree(disk);
 }
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 1c97cf84f011a7..822a619924e3b5 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -133,6 +133,7 @@ struct hd_struct {
 #define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE0x0100
 #define GENHD_FL_NO_PART_SCAN  0x0200
 #define GENHD_FL_HIDDEN0x0400
+#define GENHD_FL_QUEUE_REF 0x0800
 
 enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */



Re: [PATCH v2 0/9] Update to zstd-1.4.6

2020-09-22 Thread Christoph Hellwig
FYI, as mentioned last time:  clear NAK for letting these bad APIs
slip into the overall kernel code.  Please provide proper kernel style
wrappers to avoid these kinds of updates and in the future just change
APIs on an as-needed basis.


Re: [PATCH 1/5] ext4: Use generic casefolding support

2020-09-22 Thread Eric Biggers
On Wed, Sep 23, 2020 at 01:01:47AM +, Daniel Rosenberg wrote:
> This switches ext4 over to the generic support provided in
> the previous patch.
> 
> Since casefolded dentries behave the same in ext4 and f2fs, we decrease
> the maintenance burden by unifying them, and any optimizations will
> immediately apply to both.
> 
> Signed-off-by: Daniel Rosenberg 
> Reviewed-by: Eric Biggers 

You could also add Gabriel's Reviewed-by from last time:
https://lkml.kernel.org/linux-fsdevel/87lfh4djdq@collabora.com/

- Eric


RE: [PATCH V2 2/2] ata: ahci: ceva: Update the driver to support xilinx GT phy

2020-09-22 Thread Piyush Mehta
Hello Philipp,

Thanks for review.

Regards,
Piyush Mehta

-Original Message-
From: Philipp Zabel  
Sent: Tuesday, September 22, 2020 5:36 PM
To: Piyush Mehta ; ax...@kernel.dk; robh...@kernel.org
Cc: linux-...@vger.kernel.org; devicet...@vger.kernel.org; 
linux-kernel@vger.kernel.org; git ; Srinivas Goud 
; Michal Simek 
Subject: Re: [PATCH V2 2/2] ata: ahci: ceva: Update the driver to support 
xilinx GT phy

On Tue, 2020-09-22 at 15:45 +0530, Piyush Mehta wrote:
> SATA controller used in Xilinx ZynqMP platform uses xilinx GT phy 
> which has 4 GT lanes and can used by 4 peripherals at a time.
> SATA controller uses 1 GT phy lane among the 4 GT lanes. To configure 
> the GT lane for SATA controller, the below sequence is expected.
> 
> 1. Assert the SATA controller reset.
> 2. Configure the xilinx GT phy lane for SATA controller (phy_init).
> 3. De-assert the SATA controller reset.
> 4. Wait for PLL of the GT lane used by SATA to be locked (phy_power_on).
> 
> The ahci_platform_enable_resources() by default does the phy_init() 
> and phy_power_on() but the default sequence doesn't work with Xilinx 
> platforms. Because of this reason, updated the driver to support the 
> new sequence.
> 
> Added is_rst_ctrl flag, for backward compatibility with the older 
> sequence. If the reset controller is not available, then the SATA 
> controller will configure with the older sequences.
> 
> Signed-off-by: Piyush Mehta 
> ---
>  drivers/ata/ahci_ceva.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/ahci_ceva.c b/drivers/ata/ahci_ceva.c index 
> b10fd4c..c704906 100644
> --- a/drivers/ata/ahci_ceva.c
> +++ b/drivers/ata/ahci_ceva.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "ahci.h"
>  
>  /* Vendor Specific Register Offsets */ @@ -87,6 +88,7 @@ struct 
> ceva_ahci_priv {
>   u32 axicc;
>   bool is_cci_enabled;
>   int flags;
> + struct reset_control *rst;
>  };
>  
>  static unsigned int ceva_ahci_read_id(struct ata_device *dev, @@ 
> -194,7 +196,7 @@ static int ceva_ahci_probe(struct platform_device *pdev)
>   struct ahci_host_priv *hpriv;
>   struct ceva_ahci_priv *cevapriv;
>   enum dev_dma_attr attr;
> - int rc;
> + int rc, i, is_rst_ctrl = 1;
>  
>   cevapriv = devm_kzalloc(dev, sizeof(*cevapriv), GFP_KERNEL);
>   if (!cevapriv)
> @@ -202,14 +204,47 @@ static int ceva_ahci_probe(struct 
> platform_device *pdev)
>  
>   cevapriv->ahci_pdev = pdev;
>  
> + cevapriv->rst = devm_reset_control_get(>dev, NULL);

Please use devm_reset_control_get_optional_exclusive()

> + if (IS_ERR(cevapriv->rst)) {
> + if (PTR_ERR(cevapriv->rst) != -EPROBE_DEFER)
> + dev_err(>dev, "failed to get reset: %ld\n",
> + PTR_ERR(cevapriv->rst));
> + is_rst_ctrl = 0;

is_rst_ctrl will not be required then.

> + }
> +
>   hpriv = ahci_platform_get_resources(pdev, 0);
>   if (IS_ERR(hpriv))
>   return PTR_ERR(hpriv);
> + if (is_rst_ctrl)
> + rc = ahci_platform_enable_clks(hpriv);
> + else
> + rc = ahci_platform_enable_resources(hpriv);
>  
> - rc = ahci_platform_enable_resources(hpriv);
>   if (rc)
>   return rc;
>  
> + if (is_rst_ctrl) {

This can just be "if (cevapriv->rst)"

> + /* Assert the controller reset */
> + reset_control_assert(cevapriv->rst);
> +
> + for (i = 0; i < hpriv->nports; i++) {
> + rc = phy_init(hpriv->phys[i]);
> + if (rc)
> + return rc;
> + }
> +
> + /* De-assert the controller reset */
> + reset_control_deassert(cevapriv->rst);
> +
> + for (i = 0; i < hpriv->nports; i++) {
> + rc = phy_power_on(hpriv->phys[i]);
> + if (rc) {
> + phy_exit(hpriv->phys[i]);
> + return rc;
> + }
> + }
> + }
> +
>   if (of_property_read_bool(np, "ceva,broken-gen2"))
>   cevapriv->flags = CEVA_FLAG_BROKEN_GEN2;
>  

regards
Philipp


Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events

2020-09-22 Thread Jiri Olsa
On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> When executing perf stat with armv8_pmu events with a workload, it will
> report a segfault as result.

please share the perf stat command line you see that segfault for

thanks,
jirka

> 
> (gdb) bt
> #0  0x00603fc8 in perf_evsel__close_fd_cpu (evsel=,
> cpu=) at evsel.c:122
> #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
> #2  0x004d4718 in evlist__close (evlist=0x70a7cb0) at 
> util/evlist.c:1242
> #3  0x00453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
> argv@entry=0xfaea2f90, run_idx=119, run_idx@entry=1701998435)
> at builtin-stat.c:929
> #4  0x00455058 in run_perf_stat (run_idx=1701998435, 
> argv=0xfaea2f90,
> argc=1) at builtin-stat.c:947
> #5  cmd_stat (argc=1, argv=0xfaea2f90) at builtin-stat.c:2357
> #6  0x004bb888 in run_builtin (p=p@entry=0x9764b8 ,
> argc=argc@entry=4, argv=argv@entry=0xfaea2f90) at perf.c:312
> #7  0x004bbb54 in handle_internal_command (argc=argc@entry=4,
> argv=argv@entry=0xfaea2f90) at perf.c:364
> #8  0x00435378 in run_argv (argcp=,
> argv=) at perf.c:408
> #9  main (argc=4, argv=0xfaea2f90) at perf.c:538
> 
> After debugging, i found the root reason is that the xyarray fd is created
> by evsel__open_per_thread() ignoring the cpu passed in
> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> perf_evsel__close_fd_cpu().
> 
> To address this, add a flag to mark this situation and avoid using the
> affinity technique when closing/enabling/disabling events.
> 
> Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> Fixes: 704e2f5b700d ("perf stat: Use affinity for enabling/disabling events")
> Signed-off-by: Wei Li 
> ---
>  tools/lib/perf/include/internal/evlist.h |  1 +
>  tools/perf/builtin-stat.c|  3 +++
>  tools/perf/util/evlist.c | 23 ++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/perf/include/internal/evlist.h 
> b/tools/lib/perf/include/internal/evlist.h
> index 2d0fa02b036f..c02d7e583846 100644
> --- a/tools/lib/perf/include/internal/evlist.h
> +++ b/tools/lib/perf/include/internal/evlist.h
> @@ -17,6 +17,7 @@ struct perf_evlist {
>   struct list_head entries;
>   int  nr_entries;
>   bool has_user_cpus;
> + bool open_per_thread;
>   struct perf_cpu_map *cpus;
>   struct perf_cpu_map *all_cpus;
>   struct perf_thread_map  *threads;
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index fddc97cac984..6e6ceacce634 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -725,6 +725,9 @@ static int __run_perf_stat(int argc, const char **argv, 
> int run_idx)
>   if (group)
>   perf_evlist__set_leader(evsel_list);
>  
> + if (!(target__has_cpu() && !target__has_per_thread()))
> + evsel_list->core.open_per_thread = true;
> +
>   if (affinity__setup() < 0)
>   return -1;
>  
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e3fa3bf7498a..bf8a3ccc599f 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -383,6 +383,15 @@ void evlist__disable(struct evlist *evlist)
>   int cpu, i, imm = 0;
>   bool has_imm = false;
>  
> + if (evlist->core.open_per_thread) {
> + evlist__for_each_entry(evlist, pos) {
> + if (pos->disabled || !evsel__is_group_leader(pos) || 
> !pos->core.fd)
> + continue;
> + evsel__disable(pos);
> + }
> + goto out;
> + }
> +
>   if (affinity__setup() < 0)
>   return;
>  
> @@ -414,6 +423,7 @@ void evlist__disable(struct evlist *evlist)
>   pos->disabled = true;
>   }
>  
> +out:
>   evlist->enabled = false;
>  }
>  
> @@ -423,6 +433,15 @@ void evlist__enable(struct evlist *evlist)
>   struct affinity affinity;
>   int cpu, i;
>  
> + if (evlist->core.open_per_thread) {
> + evlist__for_each_entry(evlist, pos) {
> + if (!evsel__is_group_leader(pos) || !pos->core.fd)
> + continue;
> + evsel__enable(pos);
> + }
> + goto out;
> + }
> +
>   if (affinity__setup() < 0)
>   return;
>  
> @@ -444,6 +463,7 @@ void evlist__enable(struct evlist *evlist)
>   pos->disabled = false;
>   }
>  
> +out:
>   evlist->enabled = true;
>  }
>  
> @@ -1223,9 +1243,10 @@ void evlist__close(struct evlist *evlist)
>  
>   /*
>* 

Re: [PATCH 1/2] staging: vchiq: fix __user annotations

2020-09-22 Thread Greg Kroah-Hartman
On Tue, Sep 22, 2020 at 10:21:43PM +0200, Arnd Bergmann wrote:
> My earlier patches caused some new sparse warnings, but it turns out
> that a number of those are actual bugs, or at least suspicous code.
> 
> Adding __user annotations to the data structures that are defined in
> uapi headers helps avoid the new warnings, but that causes a different
> set of warnings to show up, as some of these structures are used both
> inside of the kernel and at the user interface but storing pointers to
> different things there.
> 
> Duplicating the vchiq_service_params and vchiq_completion_data structures
> in turn takes care of most of those, and then it turns out that there
> is a 'data' pointer that can be any of a __user address, a dmd_addr_t
> and a kernel pointer in vmalloc space at times.
> 
> I'm trying to annotate these as best I can without changing behavior,
> but there still seems to be a serious bug when user space passes
> a valid vmalloc space address instead of a user pointer. Adding
> comments in the code there, and leaving the warnings in place that
> seem to correspond to actual bugs.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../include/linux/raspberrypi/vchiq.h | 11 ++-
>  .../interface/vchiq_arm/vchiq_2835_arm.c  |  2 +-
>  .../interface/vchiq_arm/vchiq_arm.c   | 95 ---
>  .../interface/vchiq_arm/vchiq_core.c  | 19 ++--
>  .../interface/vchiq_arm/vchiq_core.h  | 10 +-
>  .../interface/vchiq_arm/vchiq_ioctl.h | 29 --
>  6 files changed, 106 insertions(+), 60 deletions(-)

This patch series breaks the build for me:

drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c: In function 
‘vc_vchi_audio_init’:
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c:125:9: error: 
variable ‘param
’ has initializer but incomplete type
  125 |  struct vchiq_service_params params = {
  | ^~~~
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c:126:4: error: 
‘struct vchiq_service_params’ has no member named ‘version’
  126 |   .version  = VC_AUDIOSERV_VER,
  |^~~
In file included from 
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c:8:
drivers/staging/vc04_services/bcm2835-audio/vc_vchi_audioserv_defs.h:8:26: 
warning: excess elements in struct initializer
8 | #define VC_AUDIOSERV_VER 2
  |  ^
drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c:126:15: note: in 
expansion of macro ‘VC_AUDIOSERV_VER’
  126 |   .version  = VC_AUDIOSERV_VER,
  |   ^~~~


and so on...

Care to try a v2?

thanks,

greg k-h


Re: [PATCH v3 0/6] Convert the intel iommu driver to the dma-iommu api

2020-09-22 Thread Lu Baolu

On 9/22/20 7:05 PM, Robin Murphy wrote:
With the previous version of the series I hit a problem on Ivybridge 
where apparently the dma engine width is not respected. At least 
that is my layman interpretation of the errors. From the older thread:


<3> [209.526605] DMAR: intel_iommu_map: iommu width (39) is not 
sufficient for the mapped address (008000)


Relevant iommu boot related messages are:

<6>[    0.184234] DMAR: Host address width 36
<6>[    0.184245] DMAR: DRHD base: 0x00fed9 flags: 0x0
<6>[    0.184288] DMAR: dmar0: reg_base_addr fed9 ver 1:0 cap 
c020e60262 ecap f0101a

<6>[    0.184308] DMAR: DRHD base: 0x00fed91000 flags: 0x1
<6>[    0.184337] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap 
c9008020660262 ecap f0105a
<6>[    0.184357] DMAR: RMRR base: 0x00d8d28000 end: 
0x00d8d46fff
<6>[    0.184377] DMAR: RMRR base: 0x00db00 end: 
0x00df1f
<6>[    0.184398] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 
IOMMU 1

<6>[    0.184414] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
<6>[    0.184428] DMAR-IR: Queued invalidation will be enabled to 
support x2apic and Intr-remapping.

<6>[    0.185173] DMAR-IR: Enabled IRQ remapping in x2apic mode

<6>[    0.878934] DMAR: No ATSR found
<6>[    0.878966] DMAR: dmar0: Using Queued invalidation
<6>[    0.879007] DMAR: dmar1: Using Queued invalidation

<6>[    0.915032] DMAR: Intel(R) Virtualization Technology for 
Directed I/O
<6>[    0.915060] PCI-DMA: Using software bounce buffering for IO 
(SWIOTLB)
<6>[    0.915084] software IO TLB: mapped [mem 
0xc80d4000-0xcc0d4000] (64MB)


(Full boot log at 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7054/fi-ivb-3770/boot0.txt, 
failures at 
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7054/fi-ivb-3770/igt@i915_selftest@l...@blt.html.) 



Does this look familiar or at least plausible to you? Is this 
something your new series has fixed?


This happens during attaching a domain to device. It has nothing to do
with this patch series. I will look into this issue, but not in this
email thread context.


I am not sure what step is attaching domain to device, but these type 
messages:


<3> [209.526605] DMAR: intel_iommu_map: iommu width (39) is not
 >> sufficient for the mapped address (008000)

They definitely appear to happen at runtime, as i915 is getting 
exercised by userspace.


AFAICS this certainly might be related to this series - iommu-dma will 


Oh! I looked at the wrong function. prepare_domain_attach_device()
prints a similar message which made me believe that it was not caused
by the this patches series.

constrain IOVA allocation based on the domain geometry that the driver 
reports, which in this case is set only once when first allocating the 
domain. Thus it looks like both the dmar_domain->gaw adjustment in 
prepare_domain_attach_device() and the domain_use_first_level() business 
in intel_alloc_iova() effectively get lost in this conversion, since the 
domain geometry never gets updated to reflect those additional constraints.


Sounds reasonable. I will look into the code and work out a fix.


> Robin.



Best regards,
baolu


Re: [RFC -V2] autonuma: Migrate on fault among multiple bound nodes

2020-09-22 Thread Huang, Ying
Phil Auld  writes:

> Hi,
>
> On Tue, Sep 22, 2020 at 02:54:01PM +0800 Huang Ying wrote:
>> Now, AutoNUMA can only optimize the page placement among the NUMA nodes if 
>> the
>> default memory policy is used.  Because the memory policy specified 
>> explicitly
>> should take precedence.  But this seems too strict in some situations.  For
>> example, on a system with 4 NUMA nodes, if the memory of an application is 
>> bound
>> to the node 0 and 1, AutoNUMA can potentially migrate the pages between the 
>> node
>> 0 and 1 to reduce cross-node accessing without breaking the explicit memory
>> binding policy.
>> 
>> So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to 
>> bind
>> the memory of the application to multiple nodes, and in the hint page fault
>> handler both the faulting page node and the accessing node are in the policy
>> nodemask, the page will be tried to be migrated to the accessing node to 
>> reduce
>> the cross-node accessing.
>>
>
> Do you have any performance numbers that show the effects of this on
> a workload?

I have done some simple test to confirm that NUMA balancing works in the
target configuration.

As for performance numbers, it's exactly same as that of the original
NUMA balancing in a different configuration.  Between without memory
binding and with memory bound to all NUMA nodes.

>
>> [Peter Zijlstra: provided the simplified implementation method.]
>> 
>> Questions:
>> 
>> Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing
>> globally.  But for the memory areas that are bound to multiple NUMA nodes, 
>> even
>> if the AutoNUMA is enabled globally via the sysctl knob, we still need to 
>> enable
>> AutoNUMA again with a special flag.  Why not just optimize the page 
>> placement if
>> possible as long as AutoNUMA is enabled globally?  The interface would look
>> simpler with that.
>
>
> I agree. I think it should try to do this if globally enabled.

Thanks!

>> 
>> Signed-off-by: "Huang, Ying" 
>> Cc: Andrew Morton 
>> Cc: Ingo Molnar 
>> Cc: Mel Gorman 
>> Cc: Rik van Riel 
>> Cc: Johannes Weiner 
>> Cc: "Matthew Wilcox (Oracle)" 
>> Cc: Dave Hansen 
>> Cc: Andi Kleen 
>> Cc: Michal Hocko 
>> Cc: David Rientjes 
>> ---
>>  mm/mempolicy.c | 17 +++--
>>  1 file changed, 11 insertions(+), 6 deletions(-)
>> 
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index eddbe4e56c73..273969204732 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct 
>> vm_area_struct *vma, unsigned long
>>  break;
>>  
>>  case MPOL_BIND:
>> -
>>  /*
>> - * allows binding to multiple nodes.
>> - * use current page if in policy nodemask,
>> - * else select nearest allowed node, if any.
>> - * If no allowed nodes, use current [!misplaced].
>> + * Allows binding to multiple nodes.  If both current and
>> + * accessing nodes are in policy nodemask, migrate to
>> + * accessing node to optimize page placement. Otherwise,
>> + * use current page if in policy nodemask, else select
>> + * nearest allowed node, if any.  If no allowed nodes, use
>> + * current [!misplaced].
>>   */
>> -if (node_isset(curnid, pol->v.nodes))
>> +if (node_isset(curnid, pol->v.nodes)) {
>> +if (node_isset(thisnid, pol->v.nodes))
>> +goto moron;
>
> Nice label :)

OK.  Because quite some people pay attention to this.  I will rename all
"moron" to "mopron" as suggested by Matthew.  Although MPOL_F_MORON is
defined in include/uapi/linux/mempolicy.h, it is explicitly marked as
internal flags.

Best Regards,
Huang, Ying

>>  goto out;
>> +}
>>  z = first_zones_zonelist(
>>  node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>  gfp_zone(GFP_HIGHUSER),
>> @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct 
>> vm_area_struct *vma, unsigned long
>>  
>>  /* Migrate the page towards the node whose CPU is referencing it */
>>  if (pol->flags & MPOL_F_MORON) {
>> +moron:
>>  polnid = thisnid;
>>  
>>  if (!should_numa_migrate_memory(current, page, curnid, thiscpu))
>> -- 
>> 2.28.0
>> 
>
>
> Cheers,
> Phil


[PATCH v2 1/2] arm64/mm: Introduce zero PGD table

2020-09-22 Thread Gavin Shan
The zero PGD table is used when TTBR_EL1 is changed. It's exactly
the zero page. As the zero page(s) will be allocated dynamically
when colored zero page feature is enabled in subsequent patch. the
zero page(s) aren't usable during early boot stage.

This introduces zero PGD table, which is decoupled from the zero
page(s).

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/mmu_context.h | 6 +++---
 arch/arm64/include/asm/pgtable.h | 2 ++
 arch/arm64/kernel/setup.c| 2 +-
 arch/arm64/kernel/vmlinux.lds.S  | 4 
 arch/arm64/mm/proc.S | 2 +-
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index f2d7537d6f83..6dbc5726fd56 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -36,11 +36,11 @@ static inline void contextidr_thread_switch(struct 
task_struct *next)
 }
 
 /*
- * Set TTBR0 to empty_zero_page. No translations will be possible via TTBR0.
+ * Set TTBR0 to zero_pg_dir. No translations will be possible via TTBR0.
  */
 static inline void cpu_set_reserved_ttbr0(void)
 {
-   unsigned long ttbr = phys_to_ttbr(__pa_symbol(empty_zero_page));
+   unsigned long ttbr = phys_to_ttbr(__pa_symbol(zero_pg_dir));
 
write_sysreg(ttbr, ttbr0_el1);
isb();
@@ -189,7 +189,7 @@ static inline void update_saved_ttbr0(struct task_struct 
*tsk,
return;
 
if (mm == _mm)
-   ttbr = __pa_symbol(empty_zero_page);
+   ttbr = __pa_symbol(zero_pg_dir);
else
ttbr = virt_to_phys(mm->pgd) | ASID(mm) << 48;
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d5d3fbe73953..6953498f4d40 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -474,6 +474,8 @@ static inline bool pud_table(pud_t pud) { return true; }
 PUD_TYPE_TABLE)
 #endif
 
+extern pgd_t zero_pg_dir[PTRS_PER_PGD];
+extern pgd_t zero_pg_end[];
 extern pgd_t init_pg_dir[PTRS_PER_PGD];
 extern pgd_t init_pg_end[];
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 53acbeca4f57..7e83eaed641e 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -366,7 +366,7 @@ void __init __no_sanitize_address setup_arch(char 
**cmdline_p)
 * faults in case uaccess_enable() is inadvertently called by the init
 * thread.
 */
-   init_task.thread_info.ttbr0 = __pa_symbol(empty_zero_page);
+   init_task.thread_info.ttbr0 = __pa_symbol(zero_pg_dir);
 #endif
 
if (boot_args[1] || boot_args[2] || boot_args[3]) {
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7cba7623fcec..3d3c155d10a4 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -137,6 +137,10 @@ SECTIONS
/* everything from this point to __init_begin will be marked RO NX */
RO_DATA(PAGE_SIZE)
 
+   zero_pg_dir = .;
+   . += PAGE_SIZE;
+   zero_pg_end = .;
+
idmap_pg_dir = .;
. += IDMAP_DIR_SIZE;
idmap_pg_end = .;
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 796e47a571e6..90b135c366b3 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -163,7 +163,7 @@ SYM_FUNC_END(cpu_do_resume)
.pushsection ".idmap.text", "awx"
 
 .macro __idmap_cpu_set_reserved_ttbr1, tmp1, tmp2
-   adrp\tmp1, empty_zero_page
+   adrp\tmp1, zero_pg_dir
phys_to_ttbr \tmp2, \tmp1
offset_ttbr1 \tmp2, \tmp1
msr ttbr1_el1, \tmp2
-- 
2.23.0



[PATCH v2 2/2] arm64/mm: Enable color zero pages

2020-09-22 Thread Gavin Shan
This enables color zero pages by allocating contiguous page frames
for it. The number of pages for this is determined by L1 dCache
(or iCache) size, which is probbed from the hardware.

   * Export cache_setup_of_node() so that the cache topology could
 be parsed from device-tree.

   * Add cache_get_info() so that L1 dCache size can be retrieved.

   * Implement setup_zero_pages(), which is called after the page
 allocator begins to work, to allocate the contiguous pages
 needed by color zero page.

   * Reworked ZERO_PAGE() and define __HAVE_COLOR_ZERO_PAGE.

Signed-off-by: Gavin Shan 
---
 arch/arm64/include/asm/cache.h   |  3 ++
 arch/arm64/include/asm/pgtable.h |  9 -
 arch/arm64/kernel/cacheinfo.c| 67 
 arch/arm64/mm/init.c | 37 ++
 arch/arm64/mm/mmu.c  |  7 
 drivers/base/cacheinfo.c |  3 +-
 include/linux/cacheinfo.h|  6 +++
 7 files changed, 121 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index a4d1b5f771f6..a42dbcc6b484 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -89,6 +89,9 @@ static inline int cache_line_size_of_cpu(void)
 }
 
 int cache_line_size(void);
+unsigned int cache_get_info(unsigned int level, unsigned int type,
+   unsigned int *sets, unsigned int *ways,
+   unsigned int *cl_size);
 
 /*
  * Read the effective value of CTR_EL0.
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6953498f4d40..5cb5f8bb090d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -54,8 +54,13 @@ extern void __pgd_error(const char *file, int line, unsigned 
long val);
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
  */
-extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
-#define ZERO_PAGE(vaddr)   phys_to_page(__pa_symbol(empty_zero_page))
+extern unsigned long empty_zero_page;
+extern unsigned long zero_page_mask;
+
+#define __HAVE_COLOR_ZERO_PAGE
+#define ZERO_PAGE(vaddr)   \
+   (virt_to_page((void *)(empty_zero_page +\
+   (((unsigned long)(vaddr)) & zero_page_mask
 
 #define pte_ERROR(pte) __pte_error(__FILE__, __LINE__, pte_val(pte))
 
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 7fa6828bb488..c13b8897323f 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -43,6 +43,73 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
this_leaf->type = type;
 }
 
+unsigned int cache_get_info(unsigned int level, unsigned int type,
+   unsigned int *sets, unsigned int *ways,
+   unsigned int *cl_size)
+{
+   int ret, i, cpu = smp_processor_id();
+   enum cache_type t;
+   struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+   struct cacheinfo ci, *p = NULL;
+
+   /* Sanity check */
+   if (type != CACHE_TYPE_INST && type != CACHE_TYPE_DATA)
+   return 0;
+
+   /* Fetch the cache information if it has been populated */
+   if (this_cpu_ci->num_leaves) {
+   for (i = 0; i < this_cpu_ci->num_leaves; i++) {
+   p = _cpu_ci->info_list[i];
+   if (p->level == level &&
+   (p->type == type || p->type == CACHE_TYPE_UNIFIED))
+   break;
+   }
+
+   ret = (i < this_cpu_ci->num_leaves) ? 0 : -ENOENT;
+   goto out;
+   }
+
+   /*
+* The cache information isn't populated yet, we have to
+* retrieve it from ACPI or device tree.
+*/
+   t = get_cache_type(level);
+   if (t == CACHE_TYPE_NOCACHE ||
+   (t != CACHE_TYPE_SEPARATE && t != type)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   p = 
+   p->type = type;
+   p->level = level;
+   this_cpu_ci->info_list = p;
+   this_cpu_ci->num_levels = 1;
+   this_cpu_ci->num_leaves = 1;
+   if (!acpi_disabled)
+   ret = cache_setup_acpi(cpu);
+   else if (of_have_populated_dt())
+   ret = cache_setup_of_node(cpu);
+   else
+   ret = -EPERM;
+
+   memset(this_cpu_ci, 0, sizeof(*this_cpu_ci));
+
+out:
+   if (!ret) {
+   if (sets)
+   *sets = p->number_of_sets;
+   if (ways)
+   *ways = p->ways_of_associativity;
+   if (cl_size)
+   *cl_size = p->coherency_line_size;
+
+   return p->size;
+   }
+
+   return 0;
+}
+
 static int __init_cache_level(unsigned int cpu)
 {
unsigned int ctype, level, leaves, fw_level;
diff --git 

[PATCH v2 0/2] arm64/mm: Enable color zero pages

2020-09-22 Thread Gavin Shan
The feature of color zero pages isn't enabled on arm64, meaning all
read-only (anonymous) VM areas are backed up by same zero page. It
leads pressure to L1 (data) cache on reading data from them. This
tries to enable color zero pages.

PATCH[1/2] decouples the zero PGD table from zero page
PATCH[2/2] allocates the needed zero pages according to L1 cache size

Changelog
=
v2:
   * Rebased to 5.9.rc6  (Gavin)
   * Retrieve cache topology from ACPI/DT(Will/Robin)

Gavin Shan (2):
  arm64/mm: Introduce zero PGD table
  arm64/mm: Enable color zero pages

 arch/arm64/include/asm/cache.h   |  3 ++
 arch/arm64/include/asm/mmu_context.h |  6 +--
 arch/arm64/include/asm/pgtable.h | 11 -
 arch/arm64/kernel/cacheinfo.c| 67 
 arch/arm64/kernel/setup.c|  2 +-
 arch/arm64/kernel/vmlinux.lds.S  |  4 ++
 arch/arm64/mm/init.c | 37 +++
 arch/arm64/mm/mmu.c  |  7 ---
 arch/arm64/mm/proc.S |  2 +-
 drivers/base/cacheinfo.c |  3 +-
 include/linux/cacheinfo.h|  6 +++
 11 files changed, 132 insertions(+), 16 deletions(-)

-- 
2.23.0



Re: [PATCH 4/7] dmaengine: at_xdmac: adapt perid for mem2mem operations

2020-09-22 Thread Tudor.Ambarus
On 9/23/20 8:30 AM, Tudor Ambarus - M18064 wrote:
> On 9/14/20 5:09 PM, Eugen Hristev wrote:
>> The PERID in the CC register for mem2mem operations must match an unused
>> PERID.
>> The PERID field is 7 bits, but the selected value is 0x3f.
>> On later products we can have more reserved PERIDs for actual peripherals,
>> thus this needs to be increased to maximum size.
>> Changing the value to 0x7f, which is the maximum for 7 bits field.
>>
> 
> Maybe it is worth to explain that for memory-to-memory transfers, PERID
> should be set to an unused peripheral ID, and the maximum value seems the
> safest. Anyway with or without this addressed, one can add:
> 

:) I somehow misread your commit message, you already described that, it's fine.

> Reviewed-by: Tudor Ambarus 
> 
>> Signed-off-by: Eugen Hristev 
>> ---
>>  drivers/dma/at_xdmac.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>


Re: [PATCH v5 2/2] PHY: Ingenic: Add USB PHY driver using generic PHY framework.

2020-09-22 Thread Vinod Koul
On 22-09-20, 00:24, 周琰杰 (Zhou Yanjie) wrote:

> +#define USBPCR_IDPULLUP_LSB  28
> +#define USBPCR_IDPULLUP_MASK GENMASK(29, USBPCR_IDPULLUP_LSB)
> +#define USBPCR_IDPULLUP_ALWAYS   (0x2 << USBPCR_IDPULLUP_LSB)
> +#define USBPCR_IDPULLUP_SUSPEND  (0x1 << USBPCR_IDPULLUP_LSB)
> +#define USBPCR_IDPULLUP_OTG  (0x0 << USBPCR_IDPULLUP_LSB)

why not define these as 0, 1, 2 and then use
FIELD_PREP(value, USBPCR_IDPULLUP_MASK), please do this for rest as
well.

> +static int ingenic_usb_phy_set_mode(struct phy *phy,
> +   enum phy_mode mode, int submode)
> +{
> + struct ingenic_usb_phy *priv = phy_get_drvdata(phy);
> + u32 reg;
> +
> + switch (mode) {
> + case PHY_MODE_USB_HOST:
> + reg = readl(priv->base + REG_USBPCR_OFFSET);
> + reg &= ~(USBPCR_VBUSVLDEXT | USBPCR_VBUSVLDEXTSEL | 
> USBPCR_OTG_DISABLE);

use u32_encode_bits or u32p_replace_bit to program registers using mask
defined
-- 
~Vinod


Re: [PATCH 0/4] bootconfig: Fix a parser bug

2020-09-22 Thread Masami Hiramatsu
Hi Steve,

Thank you for merging previous 3 serieses!
Could you also pick this series as urgent-fix branch?

Thank you,

On Mon, 21 Sep 2020 18:44:33 +0900
Masami Hiramatsu  wrote:

> Hi,
> 
> Here are patches to fix 2 bugs in the parser. One issue happens
> when a key has a siblings and the key repeated with brace after
> sibling nodes. Another one is that the parser keeps tailing
> spaces when we put a comment on the line.
> 
> For example, the minimum example of the 1st issue is here;
> 
> foo
> bar
> foo { buz }
> 
> This should be parsed as
> 
> foo.buz
> bar
> 
> But the bootconfig parser parses it as foo.buz (no bar node)
> because foo->bar link is unlinked when the brace ("foo {") was
> found.
> 
> The second one is simpler, if we have
> 
> foo = val  # comment
> 
> The value's space after the word was not removed.
> 
> foo="val  "
> 
> But this also should be
> 
> foo="val"
> 
> If user needs tailing spaces, they can use quotes, e.g.
> 
> foo = "val  " # comment
> 
> 
> Thank you,
> 
> ---
> 
> Masami Hiramatsu (4):
>   lib/bootconfig: Fix a bug of breaking existing tree nodes
>   lib/bootconfig: Fix to remove tailing spaces after value
>   tools/bootconfig: Add testcases for repeated key with brace
>   tools/bootconfig: Add testcase for tailing space
> 
> 
>  tools/bootconfig/test-bootconfig.sh |   25 +
>  1 file changed, 25 insertions(+)
> 
> --
> Masami Hiramatsu (Linaro) 


-- 
Masami Hiramatsu 


Re: [PATCH 4/7] dmaengine: at_xdmac: adapt perid for mem2mem operations

2020-09-22 Thread Tudor.Ambarus
On 9/14/20 5:09 PM, Eugen Hristev wrote:
> The PERID in the CC register for mem2mem operations must match an unused
> PERID.
> The PERID field is 7 bits, but the selected value is 0x3f.
> On later products we can have more reserved PERIDs for actual peripherals,
> thus this needs to be increased to maximum size.
> Changing the value to 0x7f, which is the maximum for 7 bits field.
> 

Maybe it is worth to explain that for memory-to-memory transfers, PERID
should be set to an unused peripheral ID, and the maximum value seems the
safest. Anyway with or without this addressed, one can add:

Reviewed-by: Tudor Ambarus 

> Signed-off-by: Eugen Hristev 
> ---
>  drivers/dma/at_xdmac.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c
> index fab19e00a7be..81bb90206092 100644
> --- a/drivers/dma/at_xdmac.c
> +++ b/drivers/dma/at_xdmac.c
> @@ -726,7 +726,7 @@ at_xdmac_interleaved_queue_desc(struct dma_chan *chan,
>* match the one of another channel. If not, it could lead to spurious
>* flag status.
>*/
> - u32 chan_cc = AT_XDMAC_CC_PERID(0x3f)
> + u32 chan_cc = AT_XDMAC_CC_PERID(0x7f)
>   | AT_XDMAC_CC_DIF(0)
>   | AT_XDMAC_CC_SIF(0)
>   | AT_XDMAC_CC_MBSIZE_SIXTEEN
> @@ -908,7 +908,7 @@ at_xdmac_prep_dma_memcpy(struct dma_chan *chan, 
> dma_addr_t dest, dma_addr_t src,
>* match the one of another channel. If not, it could lead to spurious
>* flag status.
>*/
> - u32 chan_cc = AT_XDMAC_CC_PERID(0x3f)
> + u32 chan_cc = AT_XDMAC_CC_PERID(0x7f)
>   | AT_XDMAC_CC_DAM_INCREMENTED_AM
>   | AT_XDMAC_CC_SAM_INCREMENTED_AM
>   | AT_XDMAC_CC_DIF(0)
> @@ -1014,7 +1014,7 @@ static struct at_xdmac_desc 
> *at_xdmac_memset_create_desc(struct dma_chan *chan,
>* match the one of another channel. If not, it could lead to spurious
>* flag status.
>*/
> - u32 chan_cc = AT_XDMAC_CC_PERID(0x3f)
> + u32 chan_cc = AT_XDMAC_CC_PERID(0x7f)
>   | AT_XDMAC_CC_DAM_UBS_AM
>   | AT_XDMAC_CC_SAM_INCREMENTED_AM
>   | AT_XDMAC_CC_DIF(0)
> 



Re: [PATCH] [v2] nvme: replace meaningless judgement by checking whether req is null

2020-09-22 Thread Christoph Hellwig
On Tue, Sep 22, 2020 at 03:47:40PM +, Tianxianting wrote:
> Finally, it applied:)
> Thanks again for all your kindly guides to me.

Thanks a lot for the patch!


Re: [PATCH v2 1/2] dt-bindings: phy: cdns,torrent-phy: add reset-names

2020-09-22 Thread Vinod Koul
On 18-09-20, 11:37, Tomi Valkeinen wrote:
> Add reset-names as a required property.
> 
> There are no dts files using torrent phy yet, so it is safe to add a new
> required property.

Applied both, thanks

-- 
~Vinod


[PATCH] dt-bindings: Add LM81 and DS1780 as trivial devices

2020-09-22 Thread Chris Packham
The LM81 and DS1780 are close relatives of the ADM9240 and already
supported by the same driver. Document them as trivial devices.

Signed-off-by: Chris Packham 
---
I wasn't sure if I should put the LM81 under "national" or "ti". In the
end I went with "national" because of all the other existing lm8x variants.

 Documentation/devicetree/bindings/trivial-devices.yaml | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml 
b/Documentation/devicetree/bindings/trivial-devices.yaml
index 4ace8039840a..6cfeee1b4527 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -54,6 +54,8 @@ properties:
   - dallas,ds1682
 # Tiny Digital Thermometer and Thermostat
   - dallas,ds1775
+# CPU Peripheral Monitor
+  - dallas,ds1780
 # CPU Supervisor with Nonvolatile Memory and Programmable I/O
   - dallas,ds4510
 # Digital Thermometer and Thermostat
@@ -296,6 +298,8 @@ properties:
   - national,lm75
 # Serial Interface ACPI-Compatible Microprocessor System Hardware 
Monitor
   - national,lm80
+# Serial Interface ACPI-Compatible Microprocessor System Hardware 
Monitor
+  - national,lm81
 # Temperature sensor with integrated fan control
   - national,lm85
 # I2C ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and 
Thermal Window Comparator
-- 
2.28.0



Re: [PATCH 1/7] dmaengine: at_xdmac: separate register defines into header file

2020-09-22 Thread Tudor.Ambarus
Hi, Eugen,

On 9/14/20 5:09 PM, Eugen Hristev wrote:
> Separate register defines into header file.
> This is required to support a slightly different version of the register
> map in new hardware versions of the XDMAC.
> 
> Signed-off-by: Eugen Hristev 
> ---
>  drivers/dma/at_xdmac.c  | 143 +
>  drivers/dma/at_xdmac_regs.h | 154 
>  2 files changed, 155 insertions(+), 142 deletions(-)
>  create mode 100644 drivers/dma/at_xdmac_regs.h

Even with the sama7g5 support there is a single .c file that includes
the .h. I wouldn't split the registers definitions in a dedicated file.

Cheers,
ta


Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-22 Thread Huang, Ying
Rafael Aquini  writes:

> On Wed, Sep 23, 2020 at 10:21:36AM +0800, Huang, Ying wrote:
>> Hi, Rafael,
>> 
>> Rafael Aquini  writes:
>> 
>> > The swap area descriptor only gets struct swap_cluster_info *cluster_info
>> > allocated if the swapfile is backed by non-rotational storage.
>> > When the swap area is laid on top of ordinary disk spindles, lock_cluster()
>> > will naturally return NULL.
>> 
>> Thanks for reporting.  But the bug looks strange.  Because in a system
>> with only HDD swap devices, during THP swap out, the swap cluster
>> shouldn't be allocated, as in
>> 
>> shrink_page_list()
>>   add_to_swap()
>> get_swap_page()
>>   get_swap_pages()
>> swap_alloc_cluster()
>>
>
> The underlying problem is that swap_info_struct.cluster_info is always NULL 
> on the rotational storage case.

Yes.

> So, it's very easy to follow that constructions 
> like this one, in split_swap_cluster 
>
> ...
> ci = lock_cluster(si, offset);
> cluster_clear_huge(ci);
> ...
>
> will go for a NULL pointer dereference, in that case, given that lock_cluster 
> reads:
>
> ...
>   struct swap_cluster_info *ci;
> ci = si->cluster_info;
> if (ci) {
> ci += offset / SWAPFILE_CLUSTER;
> spin_lock(>lock);
> }
> return ci;
> ...

But on HDD, we shouldn't call split_swap_cluster() at all, because we
will not allocate swap cluster firstly.  So, if we run into this,
there should be some other bug, we need to figure it out.

Best Regards,
Huang, Ying


Re: [PATCH 2/7] MAINTAINERS: add dma/at_xdmac_regs.h to XDMAC driver entry

2020-09-22 Thread Tudor.Ambarus
On 9/14/20 5:09 PM, Eugen Hristev wrote:
> Add new header file for the at_xdmac regs definition to the proper
> MAINTAINERS entry.
> 
> Signed-off-by: Eugen Hristev 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b5cfab015bd6..312ba6ae5fc7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11361,6 +11361,7 @@ F:
> Documentation/devicetree/bindings/dma/atmel-dma.txt
>  F:   drivers/dma/at_hdmac.c
>  F:   drivers/dma/at_hdmac_regs.h
>  F:   drivers/dma/at_xdmac.c
> +F:   drivers/dma/at_xdmac_regs.h

A dedicated entry for at_xdmac_regs.h will not be needed,
but still we're here, let's shrink these lines to only one. A change
like the one from below is welcomed:
+F: drivers/dma/at_*

ta

>  F:   include/dt-bindings/dma/at91.h
>  F:   include/linux/platform_data/dma-atmel.h
>  
> 



linux-next: manual merge of the tip tree with the amdgpu tree

2020-09-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/gpu/drm/amd/amdkfd/kfd_priv.h

between commit:

  59d7115dae02 ("drm/amdkfd: Move process doorbell allocation into kfd device")

from the amdgpu tree and commit:

  c7b6bac9c72c ("drm, iommu: Change type of pasid to u32")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 739db04080d0,922ae138ab85..
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@@ -739,7 -723,8 +739,7 @@@ struct kfd_process 
/* We want to receive a notification when the mm_struct is destroyed */
struct mmu_notifier mmu_notifier;
  
-   uint16_t pasid;
+   u32 pasid;
 -  unsigned int doorbell_index;
  
/*
 * List of kfd_process_device structures,


pgp3elbQWRGUx.pgp
Description: OpenPGP digital signature


Re: [PATCH 3/3] perf list: Add 'pfm' to list libpfm4 events

2020-09-22 Thread Jiri Olsa
On Wed, Sep 23, 2020 at 07:42:41AM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Wed, Sep 23, 2020 at 5:42 AM Jiri Olsa  wrote:
> >
> > On Wed, Sep 09, 2020 at 02:58:49PM +0900, Namhyung Kim wrote:
> >
> > SNIP
> >
> > >  int parse_events__is_hardcoded_term(struct parse_events_term *term)
> > > diff --git a/tools/perf/util/pfm.c b/tools/perf/util/pfm.c
> > > index d735acb6c29c..26ae2c8c0932 100644
> > > --- a/tools/perf/util/pfm.c
> > > +++ b/tools/perf/util/pfm.c
> > > @@ -12,6 +12,7 @@
> > >  #include "util/parse-events.h"
> > >  #include "util/pmu.h"
> > >  #include "util/pfm.h"
> > > +#include "util/string2.h"
> > >
> > >  #include 
> > >  #include 
> > > @@ -227,7 +228,7 @@ print_libpfm_events_raw(pfm_pmu_info_t *pinfo, 
> > > pfm_event_info_t *info)
> > >   printf("%s::%s\n", pinfo->name, info->name);
> > >  }
> > >
> > > -void print_libpfm_events(bool name_only, bool long_desc)
> > > +void print_libpfm_events(const char *event_glob, bool name_only, bool 
> > > long_desc)
> > >  {
> > >   pfm_event_info_t info;
> > >   pfm_pmu_info_t pinfo;
> > > @@ -265,6 +266,9 @@ void print_libpfm_events(bool name_only, bool 
> > > long_desc)
> > >   if (ret != PFM_SUCCESS)
> > >   continue;
> > >
> > > + if (event_glob && !strglobmatch_nocase(info.name, 
> > > event_glob))
> > > + continue;
> >
> > you could mentioned in changelog that it also enables glob
> > matching for pfm events.. but other than then looks ok
> 
> Well, I have mentioned it in the changelog.. :)
> Do you want an example?

ugh.. sry, overlooked that

jirka

> 
> >
> > Acked/Tested-by: Jiri Olsa 
> 
> Thanks
> Namhyung
> 
> >
> > > +
> > >   if (!name_only && !printed_pmu) {
> > >   printf("%s:\n", pinfo.name);
> > >   printed_pmu = true;
> > > diff --git a/tools/perf/util/pfm.h b/tools/perf/util/pfm.h
> > > index 7d70dda87012..036e2d97b260 100644
> > > --- a/tools/perf/util/pfm.h
> > > +++ b/tools/perf/util/pfm.h
> > > @@ -13,7 +13,7 @@
> > >  int parse_libpfm_events_option(const struct option *opt, const char *str,
> > >   int unset);
> > >
> > > -void print_libpfm_events(bool name_only, bool long_desc);
> > > +void print_libpfm_events(const char *event_glob, bool name_only, bool 
> > > long_desc);
> > >
> > >  #else
> > >  #include 
> > > @@ -26,7 +26,8 @@ static inline int parse_libpfm_events_option(
> > >   return 0;
> > >  }
> > >
> > > -static inline void print_libpfm_events(bool name_only __maybe_unused,
> > > +static inline void print_libpfm_events(const char *event_glob 
> > > __maybe_unused,
> > > +bool name_only __maybe_unused,
> > >  bool long_desc __maybe_unused)
> > >  {
> > >  }
> > > --
> > > 2.28.0.526.ge36021eeef-goog
> > >
> >
> 



Re: [RFC PATCH 07/11] drivers/android/binder: convert stats, transaction_log to counter_atomic

2020-09-22 Thread Greg KH
On Tue, Sep 22, 2020 at 07:43:36PM -0600, Shuah Khan wrote:
> counter_atomic is introduced to be used when a variable is used as
> a simple counter and doesn't guard object lifetimes. This clearly
> differentiates atomic_t usages that guard object lifetimes.
> 
> counter_atomic variables will wrap around to 0 when it overflows and
> should not be used to guard resource lifetimes, device usage and
> open counts that control state changes, and pm states.
> 
> stats tracks per-process binder statistics. Unsure if there is a chance
> of this overflowing, other than stats getting reset to 0. Convert it to
> use counter_atomic.
> 
> binder_transaction_log:cur is used to keep track of the current log entry
> location. Overflow is handled in the code. Since it is used as a
> counter, convert it to use counter_atomic.
> 
> This conversion doesn't change the oveflow wrap around behavior.
> 
> Signed-off-by: Shuah Khan 
> ---
>  drivers/android/binder.c  | 41 ---
>  drivers/android/binder_internal.h |  3 ++-
>  2 files changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index f936530a19b0..11a0407c46df 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -66,6 +66,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -172,22 +173,22 @@ enum binder_stat_types {
>  };
>  
>  struct binder_stats {
> - atomic_t br[_IOC_NR(BR_FAILED_REPLY) + 1];
> - atomic_t bc[_IOC_NR(BC_REPLY_SG) + 1];
> - atomic_t obj_created[BINDER_STAT_COUNT];
> - atomic_t obj_deleted[BINDER_STAT_COUNT];
> + struct counter_atomic br[_IOC_NR(BR_FAILED_REPLY) + 1];
> + struct counter_atomic bc[_IOC_NR(BC_REPLY_SG) + 1];
> + struct counter_atomic obj_created[BINDER_STAT_COUNT];
> + struct counter_atomic obj_deleted[BINDER_STAT_COUNT];

These are just debugging statistics, no reason they have to be atomic
variables at all and they should be able to just be "struct counter"
variables instead.

thanks for looking into all of these!

greg k-h


Re: [PATCH] nvme: fix use-after-free during booting

2020-09-22 Thread Christoph Hellwig
On Tue, Sep 22, 2020 at 04:34:45PM -0400, Tong Zhang wrote:
> Hi Christoph,
> I modified the patch a bit and now it works.

So you're still hitting the WARN_ON_ONCE?  I think we need to fix that
as well, but all the ideas I have will turn into a bigger project,
so I think I'll submit this one to Jens, and then do things
incrementally.

Can you share your reproducer?


Re: [PATCH 1/2] KVM: Fix the build error

2020-09-22 Thread Haiwei Li

On 20/9/20 21:09, Paolo Bonzini wrote:

On 14/09/20 11:11, lihaiwei.ker...@gmail.com wrote:

From: Haiwei Li 

When CONFIG_SMP is not set, an build error occurs with message "error:
use of undeclared identifier 'kvm_send_ipi_mask_allbutself'"

Fixes: 0f990222108d ("KVM: Check the allocation of pv cpu mask", 2020-09-01)
Reported-by: kernel test robot 
Signed-off-by: Haiwei Li 
---
  arch/x86/kernel/kvm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b51b727b140..7e8be0421720 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -797,7 +797,9 @@ static __init int kvm_alloc_cpumask(void)
}
}
  
+#if defined(CONFIG_SMP)

apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
+#endif
pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others;
return 0;
  



If CONFIG_SMP is not set you don't need kvm_alloc_cpumask or
pv_ops.mmu.flush_tlb_others at all.  Can you squash these two into the
original patch and re-submit for 5.10?


Hi, Paolo

I'm a little confused. Function kvm_flush_tlb_others doesn't seem to be 
related to CONFIG_SMP.


And my patch like:

---
 arch/x86/kernel/kvm.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 9663ba31347c..1e5da6db519c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -553,7 +553,6 @@ static void kvm_send_ipi_mask_allbutself(const 
struct cpumask *mask, int vector)

 static void kvm_setup_pv_ipi(void)
 {
apic->send_IPI_mask = kvm_send_ipi_mask;
-   apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
pr_info("setup PV IPIs\n");
 }

@@ -619,6 +618,11 @@ static void kvm_flush_tlb_others(const struct 
cpumask *cpumask,

struct kvm_steal_time *src;
struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);

+   if (unlikely(!flushmask)) {
+   native_flush_tlb_others(cpumask, info);
+   return;
+   }
+
cpumask_copy(flushmask, cpumask);
/*
 * We have to call flush only on online vCPUs. And
@@ -765,6 +769,14 @@ static __init int activate_jump_labels(void)
 }
 arch_initcall(activate_jump_labels);

+static void kvm_free_cpumask(void)
+{
+   unsigned int cpu;
+
+   for_each_possible_cpu(cpu)
+   free_cpumask_var(per_cpu(__pv_cpu_mask, cpu));
+}
+
 static __init int kvm_alloc_cpumask(void)
 {
int cpu;
@@ -783,11 +795,20 @@ static __init int kvm_alloc_cpumask(void)

if (alloc)
for_each_possible_cpu(cpu) {
-   zalloc_cpumask_var_node(per_cpu_ptr(&__pv_cpu_mask, 
cpu),
-   GFP_KERNEL, cpu_to_node(cpu));
+   if (!zalloc_cpumask_var_node(
+   per_cpu_ptr(&__pv_cpu_mask, cpu),
+   GFP_KERNEL, cpu_to_node(cpu)))
+   goto zalloc_cpumask_fail;
}

+#if defined(CONFIG_SMP)
+   apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
+#endif
return 0;
+
+zalloc_cpumask_fail:
+   kvm_free_cpumask();
+   return -ENOMEM;
 }
 arch_initcall(kvm_alloc_cpumask);

--
2.18.4

Do you have any suggestion? Thanks.

Haiwei


Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page

2020-09-22 Thread Darrick J. Wong
On Wed, Sep 23, 2020 at 03:48:59AM +0100, Matthew Wilcox wrote:
> On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > > Size the uptodate array dynamically to support larger pages in the
> > > > > page cache.  With a 64kB page, we're only saving 8 bytes per page 
> > > > > today,
> > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > > per page.  Add a few debugging assertions.
> > > > > 
> > > > > Signed-off-by: Matthew Wilcox (Oracle) 
> > > > > Reviewed-by: Dave Chinner 
> > > > 
> > > > Some syscall fuzzing will trigger this on powerpc:
> > > > 
> > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > > 
> > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at 
> > > > fs/iomap/buffered-
> > > > io.c:78 iomap_page_release+0x250/0x270
> > > 
> > > Well, I'm glad it triggered.  That warning is:
> > > WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > > PageUptodate(page));
> > > so there was definitely a problem of some kind.
> > > 
> > > truncate_cleanup_page() calls
> > > do_invalidatepage() calls
> > > iomap_invalidatepage() calls
> > > iomap_page_release()
> > > 
> > > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > > reproduce, perhaps you could try something like this?
> > > 
> > > +void dump_iomap_page(struct page *page, const char *reason)
> > > +{
> > > +   struct iomap_page *iop = to_iomap_page(page);
> > > +   unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, 
> > > page);
> > > +
> > > +   dump_page(page, reason);
> > > +   if (iop)
> > > +   printk("iop:reads %d writes %d uptodate %*pb\n",
> > > +   atomic_read(>read_bytes_pending),
> > > +   atomic_read(>write_bytes_pending),
> > > +   nr_blocks, iop->uptodate);
> > > +   else
> > > +   printk("iop:none\n");
> > > +}
> > > 
> > > and then do something like:
> > > 
> > >   if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > >   dump_iomap_page(page, NULL);
> > 
> > This:
> > 
> > [ 1683.158254][T164965] page:4a6c16cd refcount:2 mapcount:0 
> > mapping:ea017dc5 index:0x2 pfn:0xc365c
> > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 
> > dentry name:"trinity-testfile2"
> > [ 1683.158354][T164965] flags: 0x7fff800015(locked|uptodate|lru)
> > [ 1683.158392][T164965] raw: 007fff800015 c00c019c4b08 
> > c00c019a53c8 c000201c8362c1e8
> > [ 1683.158430][T164965] raw: 0002  
> > 0002 c000201c54db4000
> > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> > [ 1683.158506][T164965] iop:none
> 
> Oh, I'm a fool.  This is after the call to detach_page_private() so
> page->private is NULL and we don't get the iop dumped.
> 
> Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
> but the bitmap is deemed not full.  There are three places where we set
> an iomap page Uptodate:
> 
> 1.  if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> SetPageUptodate(page);
> 
> 2.  if (page_has_private(page))
> iomap_iop_set_range_uptodate(page, off, len);
> else
> SetPageUptodate(page);
> 
> 3.  BUG_ON(page->index);
> ...
> SetPageUptodate(page);
> 
> It can't be #2 because the page has an iop.  It can't be #3 because the
> page->index is not 0.  So at some point in the past, the bitmap was full.
> 
> I don't think it's possible for inode->i_blksize to change, and you
> aren't running with THPs, so it's definitely not possible for thp_size()
> to change.  So i_blocks_per_page() isn't going to change.
> 
> We seem to have allocated enough memory for ->iop because that's also
> based on i_blocks_per_page().
> 
> I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
> I've been trying to reproduce this on x86 with a 1kB block size
> filesystem, and haven't been able to yet.  Maybe I'll try to setup a
> powerpc cross-compilation environment tomorrow.

FWIW I managed to reproduce it with the following fstests configuration
on a 1k block size fs on a x86 machinE:

SECTION  -- -no-sections-
FSTYP-- xfs
MKFS_OPTIONS --  -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024
MOUNT_OPTIONS --  -o usrquota,grpquota,prjquota
HOST_OPTIONS -- local.config
CHECK_OPTIONS -- -g auto
XFS_MKFS_OPTIONS -- -bsize=4096
TIME_FACTOR  -- 1
LOAD_FACTOR  -- 1
TEST_DIR -- /mnt
TEST_DEV -- /dev/sde
SCRATCH_DEV  -- /dev/sdd
SCRATCH_MNT  -- /opt
OVL_UPPER-- 

Lieber Freund (Assalamu Alaikum),?

2020-09-22 Thread Aisha Gaddafi
-- 
Lieber Freund (Assalamu Alaikum),

Ich bin vor einer privaten Suche auf Ihren E-Mail-Kontakt gestoßen
Ihre Hilfe. Mein Name ist Aisha Al-Qaddafi, eine alleinerziehende
Mutter und eine Witwe
mit drei Kindern. Ich bin die einzige leibliche Tochter des Spätlibyschen
Präsident (verstorbener Oberst Muammar Gaddafi).

Ich habe Investmentfonds im Wert von siebenundzwanzig Millionen
fünfhunderttausend
United State Dollar ($ 27.500.000.00) und ich brauche eine
vertrauenswürdige Investition
Manager / Partner aufgrund meines aktuellen Flüchtlingsstatus bin ich jedoch
Möglicherweise interessieren Sie sich für die Unterstützung von
Investitionsprojekten in Ihrem Land
Von dort aus können wir in naher Zukunft Geschäftsbeziehungen aufbauen.

Ich bin bereit, mit Ihnen über das Verhältnis zwischen Investition und
Unternehmensgewinn zu verhandeln
Basis für die zukünftige Investition Gewinne zu erzielen.

Wenn Sie bereit sind, dieses Projekt in meinem Namen zu bearbeiten,
antworten Sie bitte dringend
Damit ich Ihnen mehr Informationen über die Investmentfonds geben kann.

Ihre dringende Antwort wird geschätzt. schreibe mir an diese email adresse (
ayishagdda...@mail.ru ) zur weiteren Diskussion.

Freundliche Grüße
Frau Aisha Al-Qaddafi


Re: [PATCH] csky: Fix a size determination in gpr_get()

2020-09-22 Thread Al Viro
On Wed, Sep 23, 2020 at 10:37:31AM +0800, Guo Ren wrote:

> > What's going on there?  The mapping is really weird - assuming
> > you had v0..v31 in the first 32 elements of regs->vr[], you
> > end up with
> >
> > v0 v1 v2 v3 v2 v3 v6 v7 v4 v5 v10 v11 v6 v7 v14 v15
> > v8 v9 v18 v19 v10 v11 v22 v23 v12 v13 v26 v27 v14 v15 v30 v31
> >
> > in the beginning of the output.  Assuming it is the intended
> > behaviour, it's probably worth some comments...
> FPU & VDSP use the same regs. 32 FPU regs' width is 64b and 16 VDSP
> regs' width is 128b.
> 
> vr[0], vr[1] = fp[0] & vr[0] vr[1], vr[2], vr[3] = vdsp reg[0]
> ...
> vr[60], vr[61] = fp[15] & vr[60] vr[61], vr[62], vr[63] = vdsp reg[15]
> vr[64], vr[65] = fp[16]
> vr[66], vr[67] = fp[17]
> ...
> vr[94], vr[95] = fp[31]
> 
> Yeah, this is confusing and I'll add a comment later.

Umm...  It would help if you described these 3 layouts:
1) kernel-side with VDSP
2) userland (identical to (1)?)
3) kernel-side without VDSP
Still confused...

PS: my apologies re commit message - I left a note to myself when doing
that series and then forgot about it ;-/

Anyway, which tree should it go through?  In any case, that fix is
Acked-by: Al Viro 
and I can take it through vfs.git or you guys can pick in csky tree;
up to you.


Re: [PATCH V3 2/3] arm64/mm/hotplug: Enable MEM_OFFLINE event handling

2020-09-22 Thread Anshuman Khandual



On 09/21/2020 05:35 PM, Anshuman Khandual wrote:
> This enables MEM_OFFLINE memory event handling. It will help intercept any
> possible error condition such as if boot memory some how still got offlined
> even after an explicit notifier failure, potentially by a future change in
> generic hot plug framework. This would help detect such scenarios and help
> debug further.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: Marc Zyngier 
> Cc: Steve Capper 
> Cc: Mark Brown 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/mm/mmu.c | 37 -
>  1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index df3b7415b128..6b171bd88bcf 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1482,13 +1482,40 @@ static int prevent_bootmem_remove_notifier(struct 
> notifier_block *nb,
>   unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
>   unsigned long pfn = arg->start_pfn;
>  
> - if (action != MEM_GOING_OFFLINE)
> + if ((action != MEM_GOING_OFFLINE) && (action != MEM_OFFLINE))
>   return NOTIFY_OK;
>  
> - for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> - ms = __pfn_to_section(pfn);
> - if (early_section(ms))
> - return NOTIFY_BAD;
> + if (action == MEM_GOING_OFFLINE) {
> + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> + ms = __pfn_to_section(pfn);
> + if (early_section(ms)) {
> + pr_warn("Boot memory offlining attempted\n");
> + return NOTIFY_BAD;
> + }
> + }
> + } else if (action == MEM_OFFLINE) {
> + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
> + ms = __pfn_to_section(pfn);
> + if (early_section(ms)) {
> +
> + /*
> +  * This should have never happened. Boot memory
> +  * offlining should have been prevented by this
> +  * very notifier. Probably some memory removal
> +  * procedure might have changed which would then
> +  * require further debug.
> +  */
> + pr_err("Boot memory offlined\n");

It is returning in the first instance, when a section inside the
offline range happen to be part of the boot memory. So wondering
if it would be better to call out here, entire attempted offline
range or just the first section inside that which overlaps with
boot memory ? But some range information here will be helpful.


Re: [PATCH v3 6/6] stm class: ftrace: use different channel accroding to CPU

2020-09-22 Thread Tingwei Zhang
On Fri, Sep 18, 2020 at 08:45:52PM +0800, Alexander Shishkin wrote:
> Tingwei Zhang  writes:
> 
> > @@ -63,6 +65,7 @@ static int __init stm_ftrace_init(void)
> >  {
> > int ret;
> >  
> > +   stm_ftrace.data.nr_chans = num_possible_cpus();
> 
> Not a problem with this patch necesarily, but this made me realize that
> .nr_chans may be larger than:
> 
>  (1) what the policy permits,
>  (2) what the stm device can handle.
> 
> While (1) the user can fix in the policy, they won't be able to fix (2),
> in which case they won't be able to use stm_ftrace at all. I'm thinking
> if a link-time callback would be good enough.
>

Hi Alex,

I'm not sure if I understand this correct. If the nr_chans requested by
stm_ftrace is larger than policy permits or stm device can handle,
stm_assign_first_policy() returns with error so stm_source_link_add()
will fail. User would notice that when link happens.  There's not much
we can do if resource is not enough.
 
> Another thing is that .nr_chans needs to be a power of 2 at the moment.
> 
I'll change to below.
stm_ftrace.data.nr_chans = roundup_pow_of_two(num_possible_cpus());
> Regards,
> --
> Alex
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[tip:x86/cleanups] BUILD SUCCESS 900ffe39fec908e0aa26a30612e43ebc7140db79

2020-09-22 Thread kernel test robot
 lpc18xx_defconfig
mips   ip27_defconfig
sh   sh7724_generic_defconfig
arm mv78xx0_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200921
i386 randconfig-a006-20200921
i386 randconfig-a003-20200921
i386 randconfig-a004-20200921
i386 randconfig-a005-20200921
i386 randconfig-a001-20200921
i386 randconfig-a002-20200922
i386 randconfig-a006-20200922
i386 randconfig-a003-20200922
i386 randconfig-a004-20200922
i386 randconfig-a005-20200922
i386 randconfig-a001-20200922
x86_64   randconfig-a011-20200921
x86_64   randconfig-a013-20200921
x86_64   randconfig-a014-20200921
x86_64   randconfig-a015-20200921
x86_64   randconfig-a012-20200921
x86_64   randconfig-a016-20200921
i386 randconfig-a012-20200921
i386 randconfig-a014-20200921
i386 randconfig-a016-20200921
i386 randconfig-a013-20200921
i386 randconfig-a011-20200921
i386 randconfig-a015-20200921
i386 randconfig-a012-20200920
i386 randconfig-a014-20200920
i386 randconfig-a016-20200920
i386 randconfig-a013-20200920
i386 randconfig-a011-20200920
i386 randconfig-a015-20200920
i386 randconfig-a012-20200923
i386 randconfig-a014-20200923
i386 randconfig-a016-20200923
i386 randconfig-a013-20200923
i386 randconfig-a011-20200923
i386 randconfig-a015-20200923
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a005-20200923
x86_64   randconfig-a003-20200923
x86_64   randconfig-a004-20200923
x86_64   randconfig-a002-20200923
x86_64   randconfig-a006-20200923
x86_64   randconfig-a001-20200923
x86_64   randconfig-a005-20200921
x86_64   randconfig-a003-20200921
x86_64   randconfig-a004-20200921
x86_64   randconfig-a002-20200921
x86_64   randconfig-a006-20200921
x86_64   randconfig-a001-20200921
x86_64   randconfig-a011-20200922
x86_64   randconfig-a013-20200922
x86_64   randconfig-a014-20200922
x86_64   randconfig-a015-20200922
x86_64   randconfig-a012-20200922
x86_64   randconfig-a016-20200922
x86_64   randconfig-a011-20200920
x86_64   randconfig-a013-20200920
x86_64   randconfig-a014-20200920
x86_64   randconfig-a015-20200920
x86_64   randconfig-a012-20200920
x86_64   randconfig-a016-20200920

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-22 Thread Rafael Aquini
On Wed, Sep 23, 2020 at 10:21:36AM +0800, Huang, Ying wrote:
> Hi, Rafael,
> 
> Rafael Aquini  writes:
> 
> > The swap area descriptor only gets struct swap_cluster_info *cluster_info
> > allocated if the swapfile is backed by non-rotational storage.
> > When the swap area is laid on top of ordinary disk spindles, lock_cluster()
> > will naturally return NULL.
> 
> Thanks for reporting.  But the bug looks strange.  Because in a system
> with only HDD swap devices, during THP swap out, the swap cluster
> shouldn't be allocated, as in
> 
> shrink_page_list()
>   add_to_swap()
> get_swap_page()
>   get_swap_pages()
> swap_alloc_cluster()
>

The underlying problem is that swap_info_struct.cluster_info is always NULL 
on the rotational storage case. So, it's very easy to follow that constructions 
like this one, in split_swap_cluster 

...
ci = lock_cluster(si, offset);
cluster_clear_huge(ci);
...

will go for a NULL pointer dereference, in that case, given that lock_cluster 
reads:

...
struct swap_cluster_info *ci;
ci = si->cluster_info;
if (ci) {
ci += offset / SWAPFILE_CLUSTER;
spin_lock(>lock);
}
return ci;
...




linux-next: build failure after merge of the block tree

2020-09-22 Thread Stephen Rothwell
Hi all,

After merging the block tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

fs/io_uring.c: In function 'io_resubmit_prep':
fs/io_uring.c:2357:10: error: 'struct io_kiocb' has no member named 'io'
 2357 |  if (!req->io) {
  |  ^~

Caused by commit

  8f3d749685e4 ("io_uring: don't re-setup vecs/iter in io_resumit_prep() is 
already there")

from Linus' tree interacting with commit

  76c917267129 ("io_uring: get rid of req->io/io_async_ctx union")

from the block tree.

I added the following merge resolution:

From: Stephen Rothwell 
Date: Wed, 23 Sep 2020 14:30:01 +1000
Subject: [PATCH] fix up for "io_uring: get rid of req->io/io_async_ctx union"

Signed-off-by: Stephen Rothwell 
---
 fs/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5aefea5bb383..0a72f4eed845 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2354,7 +2354,7 @@ static bool io_resubmit_prep(struct io_kiocb *req, int 
error)
goto end_req;
}
 
-   if (!req->io) {
+   if (!req->async_data) {
ret = io_import_iovec(rw, req, , , false);
if (ret < 0)
goto end_req;
-- 
2.28.0

-- 
Cheers,
Stephen Rothwell


pgp05l8tuZ4l1.pgp
Description: OpenPGP digital signature


IMPORTANT NOTICE !!! 46.28.37.15

2020-09-22 Thread DR. DONALD MOORE
Dear friend,

How are you today? Hope all is well with you and your family? I hope This
mail meets you in a perfect condition.

I am using this opportunity to thank you for your great effort to our
unfinished transfer of fund into your account due to one reason or the
other best known to you.
 
But I want to inform you that I have successfully transferred the Cheque
out of the company to someone else who was capable of assisting me in
this great venture.

Due to your effort, sincerity, courage and trust worthiness you showed at
the course of the transaction I want to compensate you and show my
gratitude to you with the sum of 20,000.000.00(Twenty Million United States
Of American Dollars) in respect to your lottery winnings Compensation.

I have authorized the finance house in the Ghana where I deposited my money
to issue you international certified bank draft cashable at your bank.

My dear friend I will like you to contact the finance house for the
collection of this international certified bank draft. The name and
contact address of the Person with your Cheque is as follows.

COMPENSATION OFFICER
CONTACT AGENT
BARRISTER. JOSHUA AKUABATA
PHONE NUM: +233573629956
EMAIL: akuabatajoshu...@gmail.com

Contact him with the following information

1. Full Name:
2. Residential Address:
3. Phone Number:
4. Fax Number:
5. Occupation:
6. Sex:
7. Age:
8. Nationality:
9. Country:

At the moment, I am very busy here because of the investment projects
which I and my new partner are having at hand.

Finally, remember that I have forwarded instruction to the finance house
on your behalf to send the bank draft to you as soon as you contact them
without delay. Please I will like you to accept this token with good
faith as this is from the bottom of my heart.

Thanks and God bless you and your family. Hope to hear from you soon.

Best Regards,
Dr. Donald Moore
Controller General


Re: WARNING in ex_handler_uaccess

2020-09-22 Thread Al Viro
On Mon, Sep 21, 2020 at 12:22:19PM +0200, Rasmus Villemoes wrote:

> So, not sure how the above got triggered, but I notice there might be an
> edge case in check_zeroed_user():
> 
>   from -= align;
>   size += align;
> 
>   if (!user_read_access_begin(from, size))
>   return -EFAULT;
> 
>   unsafe_get_user(val, (unsigned long __user *) from, err_fault);
> 
> 
> Suppose size is (size_t)-3 and align is 3. What's the convention for
> access_ok(whatever, 0)? Is that equivalent to access_ok(whatever, 1), or
> is it always true (or $ARCH-dependent)?

It's usually true...

> But, AFAICT, no current caller of check_zeroed_user can end up passing
> in a size that can overflow to 0. E.g. for the case at hand, size cannot
> be more than SIZE_MAX-24.

Might be worth slapping if (unlikely(!size)) return -EFAULT; // overflow
just before user_read_access_begin() to be sure...


Re: [PATCH net-next 4/5] bonding: make Kconfig toggle to disable legacy interfaces

2020-09-22 Thread kernel test robot
Hi Jarod,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jarod-Wilson/bonding-rename-bond-components/20200922-214046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
92ec804f3dbf0d986f8e10850bfff14f316d7aaf
config: i386-randconfig-s031-20200921 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-201-g24bdaac6-dirty
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

>> drivers/net/bonding/bond_procfs.c:11:12: sparse: sparse: symbol 'linkdesc' 
>> was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[RFC PATCH] bonding: linkdesc can be static

2020-09-22 Thread kernel test robot


Signed-off-by: kernel test robot 
---
 bond_procfs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index 91ece68607b23..9b1b37a682728 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -8,7 +8,7 @@
 #include "bonding_priv.h"
 
 #ifdef CONFIG_BONDING_LEGACY_INTERFACES
-const char *linkdesc = "Slave";
+static const char *linkdesc = "Slave";
 #else
 const char *linkdesc = "Link";
 #endif


[PATCH v4] debugobjects: install CPU hotplug callback

2020-09-22 Thread qiang.zhang
From: Zqiang 

Due to CPU hotplug, it may never be online after it's offline,
some objects in percpu pool is never free. in order to avoid
this happening, install CPU hotplug callback, call this callback
func to free objects in percpu pool when CPU going offline.

Signed-off-by: Zqiang 
Acked-by: Waiman Long 
Cc: Andrew Morton 
Cc: "Joel Fernandes (Google)" 
Cc: Qian Cai 
---
 v1->v2:
 Modify submission information.

 v2->v3:
 In CPU hotplug callback func, add clear percpu pool "obj_free" operation.
 capitalize 'CPU', and use shorter preprocessor sequence.

 v3->v4:
 Add Cc and Acked-by tags

 include/linux/cpuhotplug.h |  1 +
 lib/debugobjects.c | 24 
 2 files changed, 25 insertions(+)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 3215023d4852..0c39d57e5342 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -36,6 +36,7 @@ enum cpuhp_state {
CPUHP_X86_MCE_DEAD,
CPUHP_VIRT_NET_DEAD,
CPUHP_SLUB_DEAD,
+   CPUHP_DEBUG_OBJ_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
CPUHP_MM_VMSTAT_DEAD,
CPUHP_SOFTIRQ_DEAD,
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index fe4557955d97..bb69a02c3e7b 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define ODEBUG_HASH_BITS   14
 #define ODEBUG_HASH_SIZE   (1 << ODEBUG_HASH_BITS)
@@ -433,6 +434,24 @@ static void free_object(struct debug_obj *obj)
}
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static int object_cpu_offline(unsigned int cpu)
+{
+   struct debug_percpu_free *percpu_pool;
+   struct hlist_node *tmp;
+   struct debug_obj *obj;
+
+   percpu_pool = per_cpu_ptr(_obj_pool, cpu);
+   hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) {
+   hlist_del(>node);
+   kmem_cache_free(obj_cache, obj);
+   }
+   percpu_pool->obj_free = 0;
+
+   return 0;
+}
+#endif
+
 /*
  * We run out of memory. That means we probably have tons of objects
  * allocated.
@@ -1367,6 +1386,11 @@ void __init debug_objects_mem_init(void)
} else
debug_objects_selftest();
 
+#ifdef CONFIG_HOTPLUG_CPU
+   cpuhp_setup_state_nocalls(CPUHP_DEBUG_OBJ_DEAD, "object:offline", NULL,
+   object_cpu_offline);
+#endif
+
/*
 * Increase the thresholds for allocating and freeing objects
 * according to the number of possible CPUs available in the system.
-- 
2.17.1



linux-next: manual merge of the block tree with Linus' tree

2020-09-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the block tree got a conflict in:

  fs/io_uring.c

between commits:

  4eb8dded6b82 ("io_uring: fix openat/openat2 unified prep handling")
  f5cac8b156e8 ("io_uring: don't use retry based buffered reads for non-async 
bdev")

from Linus' tree and commit:

  76c917267129 ("io_uring: get rid of req->io/io_async_ctx union")
  8f95cf7f28bf ("io_uring: enable file table usage for SQPOLL rings")

from the block tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/io_uring.c
index c9aea6c44372,7ee5e18218c2..
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@@ -3128,12 -3172,12 +3187,13 @@@ static int io_read(struct io_kiocb *req
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct kiocb *kiocb = >rw.kiocb;
struct iov_iter __iter, *iter = &__iter;
+   struct io_async_rw *rw = req->async_data;
ssize_t io_size, ret, ret2;
size_t iov_count;
 +  bool no_async;
  
-   if (req->io)
-   iter = >io->rw.iter;
+   if (rw)
+   iter = >iter;
  
ret = io_import_iovec(READ, req, , iter, !force_nonblock);
if (ret < 0)
@@@ -3193,8 -3236,7 +3253,9 @@@ copy_iov
ret = ret2;
goto out_free;
}
 +  if (no_async)
 +  return -EAGAIN;
+   rw = req->async_data;
/* it's copied and will be cleaned with ->io */
iovec = NULL;
/* now use our persistent iterator, if we aren't already */


pgp_YojSMGvog.pgp
Description: OpenPGP digital signature


Re: [PATCH 2/2] locktorture: call percpu_free_rwsem() to do percpu-rwsem cleanup

2020-09-22 Thread Paul E. McKenney
On Wed, Sep 23, 2020 at 10:24:20AM +0800, Hou Tao wrote:
> Hi Paul,
> 
> > On 2020/9/23 7:24, Paul E. McKenney wrote:
> snip
> 
> >> Fix it by adding an exit hook in lock_torture_ops and
> >> use it to call percpu_free_rwsem() for percpu rwsem torture
> >> before the module is removed, so we can ensure rcu_sync_func()
> >> completes before module exits.
> >>
> >> Also needs to call exit hook if lock_torture_init() fails half-way,
> >> so use ctx->cur_ops != NULL to signal that init hook has been called.
> > 
> > Good catch, but please see below for comments and questions.
> > 
> >> Signed-off-by: Hou Tao 
> >> ---
> >>  kernel/locking/locktorture.c | 28 ++--
> >>  1 file changed, 22 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> >> index bebdf98e6cd78..e91033e9b6f95 100644
> >> --- a/kernel/locking/locktorture.c
> >> +++ b/kernel/locking/locktorture.c
> >> @@ -74,6 +74,7 @@ static void lock_torture_cleanup(void);
> >>   */
> >>  struct lock_torture_ops {
> >>void (*init)(void);
> >> +  void (*exit)(void);
> > 
> > This is fine, but why not also add a flag to the lock_torture_cxt
> > structure that is set when the ->init() function is called?  Perhaps
> > something like this in lock_torture_init():
> > 
> > if (cxt.cur_ops->init) {
> > cxt.cur_ops->init();
> > cxt.initcalled = true;
> > }
> > 
> 
> You are right. Add a new field to indicate the init hook has been
> called is much better than reusing ctx->cur_ops != NULL to do that.
> 
> >>int (*writelock)(void);
> >>void (*write_delay)(struct torture_random_state *trsp);
> >>void (*task_boost)(struct torture_random_state *trsp);
> >> @@ -571,6 +572,11 @@ void torture_percpu_rwsem_init(void)
> >>BUG_ON(percpu_init_rwsem(_rwsem));
> >>  }
> >>  
> >> +static void torture_percpu_rwsem_exit(void)
> >> +{
> >> +  percpu_free_rwsem(_rwsem);
> >> +}
> >> +
> snip
> 
> >> @@ -828,6 +836,12 @@ static void lock_torture_cleanup(void)
> >>cxt.lrsa = NULL;
> >>  
> >>  end:
> >> +  /* If init() has been called, then do exit() accordingly */
> >> +  if (cxt.cur_ops) {
> >> +  if (cxt.cur_ops->exit)
> >> +  cxt.cur_ops->exit();
> >> +  cxt.cur_ops = NULL;
> >> +  }
> > 
> > The above can then be:
> > 
> > if (cxt.initcalled && cxt.cur_ops->exit)
> > cxt.cur_ops->exit();
> > 
> > Maybe you also need to clear cxt.initcalled at this point, but I don't
> > immediately see why that would be needed.
> > 
> Because we are doing cleanup, so I think reset initcalled to false is OK
> after the cleanup is done.

Maybe best to try it both ways and see how each really works?

We might each have our opinions, but the computer's opinion is the one
that really counts.  ;-)

> >>torture_cleanup_end();
> >>  }
> >>  
> >> @@ -835,6 +849,7 @@ static int __init lock_torture_init(void)
> >>  {
> >>int i, j;
> >>int firsterr = 0;
> >> +  struct lock_torture_ops *cur_ops;
> > 
> > And then you don't need this extra pointer.  Not that this pointer is bad
> > in and of itself, but using (!cxt.cur_ops) to indicate that the ->init()
> > function has not been called is an accident waiting to happen.
> > 
> > And the changes below are no longer needed.
> > 
> > Or am I missing something subtle?
> > 
> Thanks for your suggestion. Will send v2.

Looking forward to seeing it!

Thanx, Paul


Re: [PATCH 2/5] fs,nfs: lift compat nfs4 mount data handling into the nfs code

2020-09-22 Thread Al Viro
On Mon, Sep 21, 2020 at 08:11:23PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 12:05:52PM -0400, Anna Schumaker wrote:
> > This is for the binary mount stuff? That was already legacy code when
> > I first started, and mount uses text options now. My preference is for
> > keeping it as close to the original code as possible.
> 
> Ok.  Al, are you fine with the series as-is then?

I can live with that.  I'm not fond of in_compat_syscall() proliferation,
but in this case it's reasonably sane...

OK, applied.


Re: [PATCH v3] x86/uaccess: Use pointer masking to limit uaccess speculation

2020-09-22 Thread Al Viro
On Mon, Sep 14, 2020 at 02:53:54PM -0500, Josh Poimboeuf wrote:
> Al,
> 
> This depends on Christoph's set_fs() removal patches.  Would you be
> willing to take this in your tree?

in #uaccess.x86 and #for-next


RE: [PATCH RFC 0/5] Introduced new Cadence USBSSP DRD Driver.

2020-09-22 Thread Pawel Laszczak
>
>
>On 20-09-22 13:06:26, Pawel Laszczak wrote:
>> Hi,
>>
>> >
>> >On Mon, Jun 29, 2020 at 03:41:49AM +, Peter Chen wrote:
>> >> On 20-06-26 07:19:56, Pawel Laszczak wrote:
>> >> > Hi Felipe,
>> >> >
>> >> > >
>> >> > >Hi,
>> >> > >
>> >> > >Pawel Laszczak  writes:
>> >> > >> This patch introduce new Cadence USBSS DRD driver to linux kernel.
>> >> > >>
>> >> > >> The Cadence USBSS DRD Controller is a highly configurable IP Core 
>> >> > >> which
>> >> > >> can be instantiated as Dual-Role Device (DRD), Peripheral Only and
>> >> > >> Host Only (XHCI)configurations.
>> >> > >>
>> >> > >> The current driver has been validated with FPGA burned. We have 
>> >> > >> support
>> >> > >> for PCIe bus, which is used on FPGA prototyping.
>> >> > >>
>> >> > >> The host side of USBSS-DRD controller is compliance with XHCI
>> >> > >> specification, so it works with standard XHCI Linux driver.
>> >> > >>
>> >> > >> The host side of USBSS DRD controller is compliant with XHCI.
>> >> > >> The architecture for device side is almost the same as for host side,
>> >> > >> and most of the XHCI specification can be used to understand how
>> >> > >> this controller operates.
>> >> > >>
>> >> > >> This controller and driver support Full Speed, Hight Speed, Supper 
>> >> > >> Speed
>> >> > >> and Supper Speed Plus USB protocol.
>> >> > >>
>> >> > >> The prefix cdnsp used in driver has chosen by analogy to cdn3 driver.
>> >> > >> The last letter of this acronym means PLUS. The formal name of 
>> >> > >> controller
>> >> > >> is USBSSP but it's to generic so I've decided to use CDNSP.
>> >> > >>
>> >> > >> The patch 1: adds DT binding.
>> >> > >> The patch 2: adds PCI to platform wrapper used on Cadnece testing
>> >> > >>  platform. It is FPGA based on platform.
>> >> > >> The patches 3-5: add the main part of driver and has been 
>> >> > >> intentionally
>> >> > >>  split into 3 part. In my opinion such division should 
>> >> > >> not
>> >> > >>  affect understanding and reviewing the driver, and 
>> >> > >> cause that
>> >> > >>  main patch (4/5) is little smaller. Patch 3 introduces 
>> >> > >> main
>> >> > >>  header file for driver, 4 is the main part that 
>> >> > >> implements all
>> >> > >>  functionality of driver and 5 introduces tracepoints.
>> >> > >
>> >> > >I'm more interested in how is this different from CDNS3. Aren't they 
>> >> > >SW compatible?
>> >> >
>> >> > In general, the controller can be split into 2 part- DRD part and the 
>> >> > rest UDC.
>> >> >
>> >> > The second part UDC which consist gadget.c, ring.c and mem.c file is 
>> >> > completely different.
>> >> >
>> >> > The DRD part contains drd.c and core.c.
>> >> > cdnsp drd.c is similar to cdns3 drd.c but it's little different. CDNSP 
>> >> > has similar, but has different register space.
>> >> > Some register was moved, some was removed and some was added.
>> >> >
>> >> > core.c is very similar and eventually could be common for both drivers. 
>> >> >  I thought about this but
>> >> > I wanted to avoid interfering with cdns3 driver at this point CDNSP is 
>> >> > still under testing and
>> >> > CDNS3 is used by some products on the market.
>> >>
>> >> Pawel, I suggest adding CDNSP at driver/staging first since it is still
>> >> under testing. When you are thinking the driver (as well as hardware) are
>> >> mature, you could try to add gadget part (eg, gadget-v2) and make
>> >> necessary changes for core.c.
>> >
>> >I only take code for drivers/staging/ that for some reason is not
>> >meeting the normal coding style/rules/whatever.  For stuff that is an
>> >obvious duplicate of existing code like this, and needs to be
>> >rearchitected.  It is much more work to try to convert code once it is
>> >in the tree than to just do it out of the tree on your own and resubmit
>> >it, as you don't have to follow the in-kernel rules of "one patch does
>> >one thing" that you would if it was in staging.
>> >
>> >So don't think that staging is the right place for this, just spend a
>> >few weeks to get it right and then resubmit it.
>> >
>>
>> I had idea to reuse indirect the core.c and drd.c in cdnsp driver. Of 
>> course, I've made
>> the necessary changes to make possible reuse this code.
>> My approach was to add this file in Makefile in cdnsp but this concept 
>> failed.
>> It even worked until I started testing cdns3 and cdnsp as build in kernel :)
>>
>> With this approach I have issue with " multiple definition of .. "
>>
>> How should it look like such reusable code ?
>>
>> After my experience with above concept I think that only way is to move 
>> common code
>> to separate module,  similar as it is in drivers/usb/common directory or 
>> libcomposite.ko module.
>>
>
>Could you use compatible string or IP revision number to dynamic judge
>which part of code you should use? That is to say there is only one
>Cadence 3 USB driver folder -- cdns3, you only add one gadget file for
>cdnsp 

linux-next: build warning after merge of the drm tree

2020-09-22 Thread Stephen Rothwell
Hi all,

After merging the drm tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c: In function 
'cdns_mhdp_fw_activate':
drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c:751:10: warning: conversion 
from 'long unsigned int' to 'unsigned int' changes value from 
'18446744073709551613' to '4294967293' [-Woverflow]
  751 |   writel(~CDNS_APB_INT_MASK_SW_EVENT_INT,
drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c: In function 
'cdns_mhdp_attach':
drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c:1692:10: warning: 
conversion from 'long unsigned int' to 'unsigned int' changes value from 
'18446744073709551613' to '4294967293' [-Woverflow]
 1692 |   writel(~CDNS_APB_INT_MASK_SW_EVENT_INT,
drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c: In function 
'cdns_mhdp_bridge_hpd_enable':
drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c:2125:10: warning: 
conversion from 'long unsigned int' to 'unsigned int' changes value from 
'18446744073709551613' to '4294967293' [-Woverflow]
 2125 |   writel(~CDNS_APB_INT_MASK_SW_EVENT_INT,

Introduced by commit

  fb43aa0acdfd ("drm: bridge: Add support for Cadence MHDP8546 DPI/DP bridge")

-- 
Cheers,
Stephen Rothwell


pgp0g9K6XAhmB.pgp
Description: OpenPGP digital signature


Re: [PATCH] SUNRPC: Fix svc_flush_dcache()

2020-09-22 Thread He Zhe



On 9/22/20 10:14 PM, Chuck Lever wrote:
>
>> On Sep 22, 2020, at 3:13 AM, He Zhe  wrote:
>>
>>
>>
>> On 9/21/20 3:51 AM, Chuck Lever wrote:
>>> On platforms that implement flush_dcache_page(), a large NFS WRITE
>>> triggers the WARN_ONCE in bvec_iter_advance():
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end 
>>> of bvec iter
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at 
>>> include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc]
>>>
>>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace:
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  
>>> svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? 
>>> bvec_iter_advance.isra.0+0x158/0x158 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? del_timer_sync+0x4b/0x55
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  svc_recv+0x1193/0x15e4 
>>> [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? 
>>> try_to_freeze.isra.0+0x6f/0x6f [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? 
>>> refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_xprt_put+0x1e/0x29f 
>>> [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? svc_send+0x39f/0x3c1 
>>> [sunrpc]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  nfsd+0x282/0x345 [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? __kthread_parkme+0x74/0xba
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  kthread+0x2ad/0x2bc
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? nfsd_destroy+0x124/0x124 
>>> [nfsd]
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? test_bit+0x1d/0x27
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ? 
>>> kthread_mod_delayed_work+0x115/0x115
>>> Sep 20 14:01:05 klimt.1015granger.net kernel:  ret_from_fork+0x22/0x30
>>>
>>> Reported-by: He Zhe 
>>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()")
>>> Signed-off-by: Chuck Lever 
>>> ---
>>> net/sunrpc/svcsock.c |2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Hi Zhe-
>>>
>>> If you confirm this fixes your issue and there are no other
>>> objections or regressions, I can submit this for v5.9-rc.
>> I don't quite get why we add "seek" to "size". It seems this action does not
>> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in
>> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus
>> introduces overflow.
>> Why don't we avoid this problem at the very begginning like my v1? That is, 
>> call
>> svc_flush_bvec only when we have received more than we want to seek.
>>
>> len = sock_recvmsg(svsk->sk_sock, , MSG_DONTWAIT);
>> -   if (len > 0)
>> +   if (len > 0 && (size_t)len > (seek & PAGE_MASK))
>> svc_flush_bvec(bvec, len, seek);
> Because this doesn't fix the underlying bug that triggered the
> WARN_ONCE.
>
> svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call
> from a sequence of sock_recvmsg's.
>
> @seek is the running number of bytes that has been received so
> far for the RPC Call we are assembling. @size is the number of
> bytes that was just received in the most recent sock_recvmsg.
>
> We want svc_flush_bvec to flush just the area of @bvec that
> hasn't been flushed yet.
>
> Thus: the current size of the partial Call message in @bvec is
> @seek + @size. The starting location of the flush is
> @seek & PAGE_MASK. This aligns the flush so it starts on a page
> boundary.
>
> This:
>
>  230 struct bvec_iter bi = {
>  231 .bi_size= size + seek,
>  232 };
>
>  235 bvec_iter_advance(bvec, , seek & PAGE_MASK);
>
> advances the bvec_iter to the part of @bvec that hasn't been
> flushed yet.
>
> This loop:
>
>  236 for_each_bvec(bv, bvec, bi, bi)
>  237 flush_dcache_page(bv.bv_page);
>
> flushes each page starting at that point to the end of the bytes
> that have been received so far
>
> In other words, ca07eda33e01 was wrong because it always flushed
> the first section of @bvec, never the later parts of it.

Thanks for clarification. I just tested the patch. It works well.

Zhe

>
>
>> Regards,
>> Zhe
>>
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index d5805fa1d066..c2752e2b9ce3 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, 
>>> char *buf, int remaining)
>>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t 
>>> seek)
>>> {
>>> struct bvec_iter bi = {
>>> -   .bi_size= size,
>>> +   .bi_size= size + seek,
>>> };
>>> struct bio_vec bv;
> --
> Chuck Lever
>
>
>



[PATCH] MIPS: irq: Add missing prototypes for init_IRQ()

2020-09-22 Thread Pujin Shi
init_IRQ() have no prototype, add one in irq.h

Fix the following warnings (treated as error in W=1):
arch/mips/kernel/irq.c:52:13: error: no previous prototype for 'init_IRQ' 
[-Werror=missing-prototypes]

Signed-off-by: Pujin Shi 
---
 arch/mips/include/asm/irq.h | 1 +
 arch/mips/kernel/irq.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/mips/include/asm/irq.h b/arch/mips/include/asm/irq.h
index c5d351786416..992f8040d3d9 100644
--- a/arch/mips/include/asm/irq.h
+++ b/arch/mips/include/asm/irq.h
@@ -21,6 +21,7 @@
 #define IRQ_STACK_START(IRQ_STACK_SIZE - 16)
 
 extern void *irq_stack[NR_CPUS];
+void init_IRQ(void);
 
 /*
  * The highest address on the IRQ stack contains a dummy frame put down in
diff --git a/arch/mips/kernel/irq.c b/arch/mips/kernel/irq.c
index 85b6c60f285d..07d2c86e7ff5 100644
--- a/arch/mips/kernel/irq.c
+++ b/arch/mips/kernel/irq.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
-- 
2.18.1



[PATCH -next v2] ath9k: Remove set but not used variable

2020-09-22 Thread Li Heng
This addresses the following gcc warning with "make W=1":

drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h:1331:18: warning:
‘ar9580_1p0_pcie_phy_clkreq_enable_L1’ defined but not used 
[-Wunused-const-variable=]

drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h:1338:18: warning:
‘ar9580_1p0_pcie_phy_clkreq_disable_L1’ defined but not used 
[-Wunused-const-variable=]

drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h:1345:18: warning:
‘ar9580_1p0_pcie_phy_pll_on_clkreq’ defined but not used 
[-Wunused-const-variable=]

Reported-by: Hulk Robot 
Signed-off-by: Li Heng 
---
 .../net/wireless/ath/ath9k/ar9580_1p0_initvals.h| 21 -
 1 file changed, 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h 
b/drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h
index f4c9bef..fab14e0 100644
--- a/drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h
+++ b/drivers/net/wireless/ath/ath9k/ar9580_1p0_initvals.h
@@ -1328,27 +1328,6 @@ static const u32 ar9580_1p0_baseband_postamble[][5] = {
{0xc284, 0x, 0x, 0x0150, 0x0150},
 };

-static const u32 ar9580_1p0_pcie_phy_clkreq_enable_L1[][2] = {
-   /* Addr  allmodes  */
-   {0x4040, 0x0835365e},
-   {0x4040, 0x0008003b},
-   {0x4044, 0x},
-};
-
-static const u32 ar9580_1p0_pcie_phy_clkreq_disable_L1[][2] = {
-   /* Addr  allmodes  */
-   {0x4040, 0x0831365e},
-   {0x4040, 0x0008003b},
-   {0x4044, 0x},
-};
-
-static const u32 ar9580_1p0_pcie_phy_pll_on_clkreq[][2] = {
-   /* Addr  allmodes  */
-   {0x4040, 0x0831265e},
-   {0x4040, 0x0008003b},
-   {0x4044, 0x},
-};
-
 static const u32 ar9580_1p0_baseband_postamble_dfs_channel[][3] = {
/* Addr  5G  2G*/
{0x9814, 0x3400c00f, 0x3400c00f},
--
2.7.4



[PATCH net-next v2] net: microchip: Make `lan743x_pm_suspend` function return right value

2020-09-22 Thread Zheng Yongjun
drivers/net/ethernet/microchip/lan743x_main.c: In function lan743x_pm_suspend:

`ret` is set but not used. In fact, `pci_prepare_to_sleep` function value should
be the right value of `lan743x_pm_suspend` function, therefore, fix it.

Signed-off-by: Zheng Yongjun 
---
 drivers/net/ethernet/microchip/lan743x_main.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index de93cc6ebc1a..7e236c9ee4b1 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -3038,7 +3038,6 @@ static int lan743x_pm_suspend(struct device *dev)
struct pci_dev *pdev = to_pci_dev(dev);
struct net_device *netdev = pci_get_drvdata(pdev);
struct lan743x_adapter *adapter = netdev_priv(netdev);
-   int ret;
 
lan743x_pcidev_shutdown(pdev);
 
@@ -3051,9 +3050,7 @@ static int lan743x_pm_suspend(struct device *dev)
lan743x_pm_set_wol(adapter);
 
/* Host sets PME_En, put D3hot */
-   ret = pci_prepare_to_sleep(pdev);
-
-   return 0;
+   return pci_prepare_to_sleep(pdev);;
 }
 
 static int lan743x_pm_resume(struct device *dev)
-- 
2.17.1



[PATCH net-next 1/2] net: dsa: untag the bridge pvid from rx skbs

2020-09-22 Thread Florian Fainelli
From: Vladimir Oltean 

Currently the bridge untags VLANs present in its VLAN groups in
__allowed_ingress() only when VLAN filtering is enabled.

But when a skb is seen on the RX path as tagged with the bridge's pvid,
and that bridge has vlan_filtering=0, and there isn't any 8021q upper
with that VLAN either, then we have a problem. The bridge will not untag
it (since it is supposed to remain VLAN-unaware), and pvid-tagged
communication will be broken.

There are 2 situations where we can end up like that:

1. When installing a pvid in egress-tagged mode, like this:

ip link add dev br0 type bridge vlan_filtering 0
ip link set swp0 master br0
bridge vlan del dev swp0 vid 1
bridge vlan add dev swp0 vid 1 pvid

This happens because DSA configures the VLAN membership of the CPU port
using the same flags as swp0 (in this case "pvid and not untagged"), in
an attempt to copy the frame as-is from ingress to the CPU.

However, in this case, the packet may arrive untagged on ingress, it
will be pvid-tagged by the ingress port, and will be sent as
egress-tagged towards the CPU. Otherwise stated, the CPU will see a VLAN
tag where there was none to speak of on ingress.

When vlan_filtering is 1, this is not a problem, as stated in the first
paragraph, because __allowed_ingress() will pop it. But currently, when
vlan_filtering is 0 and we have such a VLAN configuration, we need an
8021q upper (br0.1) to be able to ping over that VLAN, which is not
symmetrical with the vlan_filtering=1 case, and therefore, confusing for
users.

Basically what DSA attempts to do is simply an approximation: try to
copy the skb with (or without) the same VLAN all the way up to the CPU.
But DSA drivers treat CPU port VLAN membership in various ways (which is
a good segue into situation 2). And some of those drivers simply tell
the CPU port to copy the frame unmodified, which is the golden standard
when it comes to VLAN processing (therefore, any driver which can
configure the hardware to do that, should do that, and discard the VLAN
flags requested by DSA on the CPU port).

2. Some DSA drivers always configure the CPU port as egress-tagged, in
an attempt to recover the classified VLAN from the skb. These drivers
cannot work at all with untagged traffic when bridged in
vlan_filtering=0 mode. And they can't go for the easy "just keep the
pvid as egress-untagged towards the CPU" route, because each front port
can have its own pvid, and that might require conflicting VLAN
membership settings on the CPU port (swp1 is pvid for VID 1 and
egress-tagged for VID 2; swp2 is egress-taggeed for VID 1 and pvid for
VID 2; with this simplistic approach, the CPU port, which is really a
separate hardware entity and has its own VLAN membership settings, would
end up being egress-untagged in both VID 1 and VID 2, therefore losing
the VLAN tags of ingress traffic).

So the only thing we can do is to create a helper function for resolving
the problematic case (that is, a function which untags the bridge pvid
when that is in vlan_filtering=0 mode), which taggers in need should
call. It isn't called from the generic DSA receive path because there
are drivers that fall neither in the first nor second category.

Signed-off-by: Vladimir Oltean 
Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h  |  8 ++
 net/dsa/dsa.c  |  9 +++
 net/dsa/dsa_priv.h | 66 ++
 3 files changed, 83 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index d16057c5987a..b539241a7533 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -301,6 +301,14 @@ struct dsa_switch {
 */
boolconfigure_vlan_while_not_filtering;
 
+   /* If the switch driver always programs the CPU port as egress tagged
+* despite the VLAN configuration indicating otherwise, then setting
+* @untag_bridge_pvid will force the DSA receive path to pop the 
bridge's
+* default_pvid VLAN tagged frames to offer a consistent behavior
+* between a vlan_filtering=0 and vlan_filtering=1 bridge device.
+*/
+   booluntag_bridge_pvid;
+
/* In case vlan_filtering_is_global is set, the VLAN awareness state
 * should be retrieved from here and not from the per-port settings.
 */
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 5c18c0214aac..dec4ab59b7c4 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -225,6 +225,15 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct 
net_device *dev,
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, skb->dev);
 
+   if (unlikely(cpu_dp->ds->untag_bridge_pvid)) {
+   nskb = dsa_untag_bridge_pvid(skb);
+   if (!nskb) {
+   kfree_skb(skb);
+   return 0;
+   }
+   skb = nskb;
+   }
+
s = this_cpu_ptr(p->stats64);
u64_stats_update_begin(>syncp);

[PATCH net-next 2/2] net: dsa: b53: Configure VLANs while not filtering

2020-09-22 Thread Florian Fainelli
Update the B53 driver to support VLANs while not filtering. This
requires us to enable VLAN globally within the switch upon driver
initial configuration (dev->vlan_enabled).

We also need to remove the code that dealt with PVID re-configuration in
b53_vlan_filtering() since that function worked under the assumption
that it would only be called to make a bridge VLAN filtering, or not
filtering, and we would attempt to move the port's PVID accordingly.

Now that VLANs are programmed all the time, even in the case of a
non-VLAN filtering bridge, we would be programming a default_pvid for
the bridged switch ports.

We need the DSA receive path to pop the VLAN tag if it is the bridge's
default_pvid because the CPU port is always programmed tagged in the
programmed VLANs. In order to do so we utilize the
dsa_untag_bridge_pvid() helper introduced in the commit before by
setting ds->untag_bridge_pvid to true.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 20 +++-
 drivers/net/dsa/b53/b53_priv.h   |  1 -
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 6a5796c32721..ce18ba0b74eb 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1377,23 +1377,6 @@ EXPORT_SYMBOL(b53_phylink_mac_link_up);
 int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering)
 {
struct b53_device *dev = ds->priv;
-   u16 pvid, new_pvid;
-
-   b53_read16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port), );
-   if (!vlan_filtering) {
-   /* Filtering is currently enabled, use the default PVID since
-* the bridge does not expect tagging anymore
-*/
-   dev->ports[port].pvid = pvid;
-   new_pvid = b53_default_pvid(dev);
-   } else {
-   /* Filtering is currently disabled, restore the previous PVID */
-   new_pvid = dev->ports[port].pvid;
-   }
-
-   if (pvid != new_pvid)
-   b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_PORT_DEF_TAG(port),
-   new_pvid);
 
b53_enable_vlan(dev, dev->vlan_enabled, vlan_filtering);
 
@@ -2619,6 +2602,9 @@ struct b53_device *b53_switch_alloc(struct device *base,
dev->priv = priv;
dev->ops = ops;
ds->ops = _switch_ops;
+   ds->configure_vlan_while_not_filtering = true;
+   ds->untag_bridge_pvid = true;
+   dev->vlan_enabled = ds->configure_vlan_while_not_filtering;
mutex_init(>reg_mutex);
mutex_init(>stats_mutex);
 
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index c55c0a9f1b47..24893b592216 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -91,7 +91,6 @@ enum {
 struct b53_port {
u16 vlan_ctl_mask;
struct ethtool_eee eee;
-   u16 pvid;
 };
 
 struct b53_vlan {
-- 
2.25.1



[PATCH net-next 0/2] net: dsa: b53: Configure VLANs while not filtering

2020-09-22 Thread Florian Fainelli
Hi David, Jakub,

These two patches allow the b53 driver which always configures its CPU
port as egress tagged to behave correctly with VLANs being always
configured whenever a port is added to a bridge.

Vladimir provides a patch that aligns the bridge with vlan_filtering=0
receive path to behave the same as vlan_filtering=1. Per discussion with
Nikolay, this behavior is deemed to be too DSA specific to be done in
the bridge proper.

This is a preliminary series for Vladimir to make
configure_vlan_while_filtering the default behavior for all DSA drivers
in the future.

Thanks!

Florian Fainelli (1):
  net: dsa: b53: Configure VLANs while not filtering

Vladimir Oltean (1):
  net: dsa: untag the bridge pvid from rx skbs

 drivers/net/dsa/b53/b53_common.c | 20 ++
 drivers/net/dsa/b53/b53_priv.h   |  1 -
 include/net/dsa.h|  8 
 net/dsa/dsa.c|  9 +
 net/dsa/dsa_priv.h   | 66 
 5 files changed, 86 insertions(+), 18 deletions(-)

-- 
2.25.1



Re: [PATCH v2 3/3] ARM: dts: Add i2c0 pinctrl information for 98dx3236

2020-09-22 Thread Chris Packham
Hi Jason, Andrew, Gregory,

On 13/09/20 4:16 am, Linus Walleij wrote:
> On Mon, Sep 7, 2020 at 11:17 PM Chris Packham
>  wrote:
>
>> Add pinctrl information for the 98dx3236 (and variants). There is only
>> one choice for i2c0 MPP14 and MPP15.
>>
>> Signed-off-by: Chris Packham 
>> Reviewed-by: Andrew Lunn 
> Reviewed-by: Linus Walleij 
>
> Please merge this through the ARM SoC maintenance path.

Are you able to pick this up via the mvebu tree or should I send it via 
the RMKs patch tracking system?

Incidentally I notice there is no longer a linux-mvebu.git on 
git.infradead.org is there a pending update to MAINTAINERS.


Re: [mm/debug_vm_pgtable/locks] e2aad6f1d2: BUG:unable_to_handle_page_fault_for_address

2020-09-22 Thread Anshuman Khandual



On 09/22/2020 02:50 PM, Aneesh Kumar K.V wrote:
> On 9/22/20 2:22 PM, Anshuman Khandual wrote:
>>
>>
>> On 09/22/2020 09:33 AM, Aneesh Kumar K.V wrote:
>>> On 9/21/20 2:51 PM, kernel test robot wrote:
 Greeting,

 FYI, we noticed the following commit (built with gcc-9):

 commit: e2aad6f1d232b457ea6a3194992dd4c0a83534a5 
 ("mm/debug_vm_pgtable/locks: take correct page table lock")
 https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


 in testcase: trinity
 version: trinity-i386
 with following parameters:

  ï¿½ï¿½ï¿½ï¿½runtime: 300s

 test-description: Trinity is a linux system call fuzz tester.
 test-url: http://codemonkey.org.uk/projects/trinity/


 on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

 caused below changes (please refer to attached dmesg/kmsg for entire 
 log/backtrace):


 +--+++
 |���������������������������������������������������������������������
  | c50eb1ed65 | e2aad6f1d2 |
 +--+++
 | 
 boot_successes������������������������������������������������������
  | 0��������� | 0��������� |
 | 
 boot_failures�������������������������������������������������������
  | 61�������� | 17�������� |
 | 
 BUG:workqueue_lockup-pool�������������������������������������������
  | 1��������� |����������� |
 | BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c | 
 60�������� | 17�������� |
 | 
 BUG:unable_to_handle_page_fault_for_address�������������������������
  | 0��������� | 17�������� |
 | 
 Oops:#[##]����������������������������������������������������������
  | 0��������� | 17�������� |
 | 
 EIP:ptep_get��������������������������������������������������������
  | 0��������� | 17�������� |
 | 
 Kernel_panic-not_syncing:Fatal_exception����������������������������
  | 0��������� | 17�������� |
 +--+++


 If you fix the issue, kindly add following tag
 Reported-by: kernel test robot 


 [�� 28.726464] BUG: sleeping function called from invalid context at 
 mm/page_alloc.c:4822
 [�� 28.727835] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 
 1, name: swapper
 [�� 28.729221] no locks held by swapper/1.
 [�� 28.729954] CPU: 0 PID: 1 Comm: swapper Not tainted 
 5.9.0-rc3-00324-ge2aad6f1d232b4 #1
 [�� 28.731484] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
 BIOS 1.12.0-1 04/01/2014
 [�� 28.732891] Call Trace:
 [�� 28.733295]� ? show_stack+0x48/0x50
 [�� 28.733943]� dump_stack+0x1b/0x1d
 [�� 28.734569]� ___might_sleep+0x205/0x219
 [�� 28.735292]� __might_sleep+0x106/0x10f
 [�� 28.736022]� __alloc_pages_nodemask+0xe0/0x2c8
 [�� 28.736845]� swap_migration_tests+0x62/0x295
 [�� 28.737639]� debug_vm_pgtable+0x587/0x9b5
 [�� 28.738374]� ? pte_advanced_tests+0x267/0x267
 [�� 28.739318]� do_one_initcall+0x129/0x31c
 [�� 28.740023]� ? rcu_read_lock_sched_held+0x46/0x74
 [�� 28.740944]� kernel_init_freeable+0x201/0x250
 [�� 28.741763]� ? rest_init+0xf8/0xf8
 [�� 28.742401]� kernel_init+0xe/0x15d
 [�� 28.743040]� ? rest_init+0xf8/0xf8
 [�� 28.743694]� ret_from_fork+0x1c/0x30
>>>
>>>
>>> This should be fixed by
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/debug_vm_pgtable.c?id=3a4f9a45eadb6ed5fc04686e8db4dc7bb1caec44
>>>
 [�� 28.744364] BUG: 

Re: [PATCH RFC 0/5] Introduced new Cadence USBSSP DRD Driver.

2020-09-22 Thread Peter Chen
On 20-09-22 13:06:26, Pawel Laszczak wrote:
> Hi,
> 
> >
> >On Mon, Jun 29, 2020 at 03:41:49AM +, Peter Chen wrote:
> >> On 20-06-26 07:19:56, Pawel Laszczak wrote:
> >> > Hi Felipe,
> >> >
> >> > >
> >> > >Hi,
> >> > >
> >> > >Pawel Laszczak  writes:
> >> > >> This patch introduce new Cadence USBSS DRD driver to linux kernel.
> >> > >>
> >> > >> The Cadence USBSS DRD Controller is a highly configurable IP Core 
> >> > >> which
> >> > >> can be instantiated as Dual-Role Device (DRD), Peripheral Only and
> >> > >> Host Only (XHCI)configurations.
> >> > >>
> >> > >> The current driver has been validated with FPGA burned. We have 
> >> > >> support
> >> > >> for PCIe bus, which is used on FPGA prototyping.
> >> > >>
> >> > >> The host side of USBSS-DRD controller is compliance with XHCI
> >> > >> specification, so it works with standard XHCI Linux driver.
> >> > >>
> >> > >> The host side of USBSS DRD controller is compliant with XHCI.
> >> > >> The architecture for device side is almost the same as for host side,
> >> > >> and most of the XHCI specification can be used to understand how
> >> > >> this controller operates.
> >> > >>
> >> > >> This controller and driver support Full Speed, Hight Speed, Supper 
> >> > >> Speed
> >> > >> and Supper Speed Plus USB protocol.
> >> > >>
> >> > >> The prefix cdnsp used in driver has chosen by analogy to cdn3 driver.
> >> > >> The last letter of this acronym means PLUS. The formal name of 
> >> > >> controller
> >> > >> is USBSSP but it's to generic so I've decided to use CDNSP.
> >> > >>
> >> > >> The patch 1: adds DT binding.
> >> > >> The patch 2: adds PCI to platform wrapper used on Cadnece testing
> >> > >>  platform. It is FPGA based on platform.
> >> > >> The patches 3-5: add the main part of driver and has been 
> >> > >> intentionally
> >> > >>  split into 3 part. In my opinion such division should not
> >> > >>  affect understanding and reviewing the driver, and cause 
> >> > >> that
> >> > >>  main patch (4/5) is little smaller. Patch 3 introduces 
> >> > >> main
> >> > >>  header file for driver, 4 is the main part that 
> >> > >> implements all
> >> > >>  functionality of driver and 5 introduces tracepoints.
> >> > >
> >> > >I'm more interested in how is this different from CDNS3. Aren't they SW 
> >> > >compatible?
> >> >
> >> > In general, the controller can be split into 2 part- DRD part and the 
> >> > rest UDC.
> >> >
> >> > The second part UDC which consist gadget.c, ring.c and mem.c file is 
> >> > completely different.
> >> >
> >> > The DRD part contains drd.c and core.c.
> >> > cdnsp drd.c is similar to cdns3 drd.c but it's little different. CDNSP 
> >> > has similar, but has different register space.
> >> > Some register was moved, some was removed and some was added.
> >> >
> >> > core.c is very similar and eventually could be common for both drivers.  
> >> > I thought about this but
> >> > I wanted to avoid interfering with cdns3 driver at this point CDNSP is 
> >> > still under testing and
> >> > CDNS3 is used by some products on the market.
> >>
> >> Pawel, I suggest adding CDNSP at driver/staging first since it is still
> >> under testing. When you are thinking the driver (as well as hardware) are
> >> mature, you could try to add gadget part (eg, gadget-v2) and make
> >> necessary changes for core.c.
> >
> >I only take code for drivers/staging/ that for some reason is not
> >meeting the normal coding style/rules/whatever.  For stuff that is an
> >obvious duplicate of existing code like this, and needs to be
> >rearchitected.  It is much more work to try to convert code once it is
> >in the tree than to just do it out of the tree on your own and resubmit
> >it, as you don't have to follow the in-kernel rules of "one patch does
> >one thing" that you would if it was in staging.
> >
> >So don't think that staging is the right place for this, just spend a
> >few weeks to get it right and then resubmit it.
> >
> 
> I had idea to reuse indirect the core.c and drd.c in cdnsp driver. Of course, 
> I've made
> the necessary changes to make possible reuse this code.
> My approach was to add this file in Makefile in cdnsp but this concept 
> failed. 
> It even worked until I started testing cdns3 and cdnsp as build in kernel :)
> 
> With this approach I have issue with " multiple definition of .. "
> 
> How should it look like such reusable code ?
> 
> After my experience with above concept I think that only way is to move 
> common code
> to separate module,  similar as it is in drivers/usb/common directory or 
> libcomposite.ko module.
> 

Could you use compatible string or IP revision number to dynamic judge
which part of code you should use? That is to say there is only one
Cadence 3 USB driver folder -- cdns3, you only add one gadget file for
cdnsp revision?

-- 

Thanks,
Peter Chen

Re: [PATCH v5 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs

2020-09-22 Thread Sean V Kelley
On Mon, Sep 21, 2020 at 4:26 AM Jonathan Cameron
 wrote:
>
> On Fri, 18 Sep 2020 13:45:59 -0700
> Sean V Kelley  wrote:
>
> > A Root Complex Event Collector provides support for
> > terminating error and PME messages from associated RCiEPs.
> >
> > Make use of the RCEC Endpoint Association Extended Capability
> > to identify associated RCiEPs. Link the associated RCiEPs as
> > the RCECs are enumerated.
> >
> > Co-developed-by: Qiuxu Zhuo 
> > Signed-off-by: Qiuxu Zhuo 
> > Signed-off-by: Sean V Kelley 
> A couple of minor things inline plus follow through on not
> special casing the older versions of the capability.
>
> Otherwise looks good to me.
>
> Reviewed-by: Jonathan Cameron 

Thanks again for your feedback on v5.  I will be sure to add in v7.
Apologies again for the email server trouble resulting in partial
patch series landing on the list.

Sean

>
> > ---
> >  drivers/pci/pci.h  |  2 +
> >  drivers/pci/pcie/portdrv_pci.c |  3 ++
> >  drivers/pci/pcie/rcec.c| 96 ++
> >  include/linux/pci.h|  1 +
> >  4 files changed, 102 insertions(+)
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 7b547fc3679a..ddb5872466fb 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -474,9 +474,11 @@ static inline void pci_dpc_init(struct pci_dev *pdev) 
> > {}
> >  #ifdef CONFIG_PCIEPORTBUS
> >  void pci_rcec_init(struct pci_dev *dev);
> >  void pci_rcec_exit(struct pci_dev *dev);
> > +void pcie_link_rcec(struct pci_dev *rcec);
> >  #else
> >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > +static inline void pcie_link_rcec(struct pci_dev *rcec) {}
> >  #endif
> >
> >  #ifdef CONFIG_PCI_ATS
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index 4d880679b9b1..dbeb0155c2c3 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -110,6 +110,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
> >(pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
> >   return -ENODEV;
> >
> > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> > + pcie_link_rcec(dev);
> > +
> >   status = pcie_port_device_register(dev);
> >   if (status)
> >   return status;
> > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > index 519ae086ff41..5630480a6659 100644
> > --- a/drivers/pci/pcie/rcec.c
> > +++ b/drivers/pci/pcie/rcec.c
> > @@ -17,6 +17,102 @@
> >
> >  #include "../pci.h"
> >
> > +struct walk_rcec_data {
> > + struct pci_dev *rcec;
> > + int (*user_callback)(struct pci_dev *dev, void *data);
> > + void *user_data;
> > +};
> > +
> > +static bool rcec_assoc_rciep(struct pci_dev *rcec, struct pci_dev *rciep)
> > +{
> > + unsigned long bitmap = rcec->rcec_ext->bitmap;
> > + unsigned int devn;
> > +
> > + /* An RCiEP found on bus in range */
> Perhaps adjust the comment to say:
> /* An RCiEP found on a different bus in range */
>
> as the actual rcec bus can be in the range as I understand it.
>
> > + if (rcec->bus->number != rciep->bus->number)
> > + return true;
> > +
> > + /* Same bus, so check bitmap */
> > + for_each_set_bit(devn, , 32)
> > + if (devn == rciep->devfn)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static int link_rcec_helper(struct pci_dev *dev, void *data)
> > +{
> > + struct walk_rcec_data *rcec_data = data;
> > + struct pci_dev *rcec = rcec_data->rcec;
> > +
> > + if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) && 
> > rcec_assoc_rciep(rcec, dev)) {
> > + dev->rcec = rcec;
> > + pci_dbg(dev, "PME & error events reported via %s\n", 
> > pci_name(rcec));
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void *userdata)
>
> static, or declare it in a header if we are going to need it elsewhere
> later in the series.
>
> > +{
> > + struct walk_rcec_data *rcec_data = userdata;
> > + struct pci_dev *rcec = rcec_data->rcec;
> > + u8 nextbusn, lastbusn;
> > + struct pci_bus *bus;
> > + unsigned int bnr;
> > +
> > + if (!rcec->rcec_cap)
> > + return;
> > +
> > + /* Walk own bus for bitmap based association */
> > + pci_walk_bus(rcec->bus, cb, rcec_data);
> > +
> > + /* Check whether RCEC BUSN register is present */
> > + if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> > + return;
>
> If you make earlier suggested change go fill in nextbusn = 0xFF
> for the earlier versions of the capability can avoid special casing
> here.
>
> > +
> > + nextbusn = rcec->rcec_ext->nextbusn;
> > + lastbusn = rcec->rcec_ext->lastbusn;
> > +
> > + /* All RCiEP devices are on the same bus as the RCEC */
> > + if (nextbusn == 0xff && lastbusn == 

Re: [PATCH v5 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR

2020-09-22 Thread Sean V Kelley
On Mon, Sep 21, 2020 at 4:33 AM Jonathan Cameron
 wrote:
>
> On Fri, 18 Sep 2020 13:46:00 -0700
> Sean V Kelley  wrote:
>
> > From: Qiuxu Zhuo 
> >
> > When attempting error recovery for an RCiEP associated with an RCEC device,
> > there needs to be a way to update the Root Error Status, the Uncorrectable
> > Error Status and the Uncorrectable Error Severity of the parent RCEC.
> > In some non-native cases in which there is no OS visible device
> > associated with the RCiEP, there is nothing to act upon as the firmware
> > is acting before the OS. So add handling for the linked 'rcec' in AER/ERR
> > while taking into account non-native cases.
> >
> > Co-developed-by: Sean V Kelley 
> > Signed-off-by: Sean V Kelley 
> > Signed-off-by: Qiuxu Zhuo 
> I'll give this a test run later to check I'm not missing anything, but LGTM.
>
> Reviewed-by: Jonathan Cameron 
>
> Thanks,

Appreciate it.

Thanks,

Sean

>
> > ---
> >  drivers/pci/pcie/aer.c |  9 +
> >  drivers/pci/pcie/err.c | 38 --
> >  2 files changed, 29 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 65dff5f3457a..dccdba60b5d9 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev)
> >  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> >  {
> >   int aer = dev->aer_cap;
> > + int rc = 0;
> >   u32 reg32;
> > - int rc;
> > -
> >
> >   /* Disable Root's interrupt in response to error messages */
> >   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, );
> >   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> >   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> >
> > - rc = pci_bus_error_reset(dev);
> > - pci_info(dev, "Root Port link has been reset\n");
> > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
> > + rc = pci_bus_error_reset(dev);
> > + pci_info(dev, "Root Port link has been reset\n");
> > + }
> >
> >   /* Clear Root Error Status */
> >   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, );
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 5380ecc41506..a61a2518163a 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, void 
> > *data)
> >  /**
> >   * pci_bridge_walk - walk bridges potentially AER affected
> >   * @bridge   bridge which may be an RCEC with associated RCiEPs,
> > - *   an RCiEP associated with an RCEC, or a Port.
> > + *   or a Port.
> > + * @dev  an RCiEP lacking an associated RCEC.
> >   * @cb   callback to be called for each device found
> >   * @userdata arbitrary pointer to be passed to callback.
> >   *
> > @@ -160,13 +161,16 @@ static int report_resume(struct pci_dev *dev, void 
> > *data)
> >   * If the device provided has no subordinate bus, call the provided
> >   * callback on the device itself.
> >   */
> > -static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct 
> > pci_dev *, void *),
> > +static void pci_bridge_walk(struct pci_dev *bridge, struct pci_dev *dev,
> > + int (*cb)(struct pci_dev *, void *),
> >   void *userdata)
> >  {
> > - if (bridge->subordinate)
> > + if (bridge && bridge->subordinate)
> >   pci_walk_bus(bridge->subordinate, cb, userdata);
> > - else
> > + else if (bridge)
> >   cb(bridge, userdata);
> > + else
> > + cb(dev, userdata);
> >  }
> >
> >  static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
> > @@ -196,16 +200,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   type = pci_pcie_type(dev);
> >   if (type == PCI_EXP_TYPE_ROOT_PORT ||
> >   type == PCI_EXP_TYPE_DOWNSTREAM ||
> > - type == PCI_EXP_TYPE_RC_EC ||
> > - type == PCI_EXP_TYPE_RC_END)
> > + type == PCI_EXP_TYPE_RC_EC)
> >   bridge = dev;
> > + else if (type == PCI_EXP_TYPE_RC_END)
> > + bridge = dev->rcec;
> >   else
> >   bridge = pci_upstream_bridge(dev);
> >
> >   pci_dbg(dev, "broadcast error_detected message\n");
> >   if (state == pci_channel_io_frozen) {
> > - pci_bridge_walk(bridge, report_frozen_detected, );
> > + pci_bridge_walk(bridge, dev, report_frozen_detected, );
> >   if (type == PCI_EXP_TYPE_RC_END) {
> > + /*
> > +  * The callback only clears the Root Error Status
> > +  * of the RCEC (see aer.c).
> > +  */
> > + if (bridge)
> > + reset_subordinate_devices(bridge);
> > +
> >   status = flr_on_rciep(dev);
> >   if (status != PCI_ERS_RESULT_RECOVERED) 

Re: [PATCH v5 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error

2020-09-22 Thread Sean V Kelley
On Mon, Sep 21, 2020 at 4:15 AM Jonathan Cameron
 wrote:
>
> On Fri, 18 Sep 2020 13:45:58 -0700
> Sean V Kelley  wrote:
>
> > From: Qiuxu Zhuo 
> >
> > Attempt to do function level reset for an RCiEP associated with an
> > RCEC device on fatal error.
>
> I'm not sure the description is correct. Looks like it will do
> the reset even if not associated with an RCEC.
> I'd just cut this down to:
>
> "Attempt to do a function level reset for an RCiEP on fatal error."

Agree. Will change.

>
> I'm not 100% sure doing an flr will actually help in most cass if you've
> reported a fatal error, but I suppose it does no harm!
>
> So with description changed.
> Reviewed-by: Jonathan Cameron 

Will do, thanks.

Sean

>
> >
> > Signed-off-by: Qiuxu Zhuo 
> > ---
> >  drivers/pci/pcie/err.c | 31 ++-
> >  1 file changed, 22 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index e575fa6cee63..5380ecc41506 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -169,6 +169,17 @@ static void pci_bridge_walk(struct pci_dev *bridge, 
> > int (*cb)(struct pci_dev *,
> >   cb(bridge, userdata);
> >  }
> >
> > +static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
> > +{
> > + if (!pcie_has_flr(dev))
> > + return PCI_ERS_RESULT_NONE;
> > +
> > + if (pcie_flr(dev))
> > + return PCI_ERS_RESULT_DISCONNECT;
> > +
> > + return PCI_ERS_RESULT_RECOVERED;
> > +}
> > +
> >  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   pci_channel_state_t state,
> >   pci_ers_result_t (*reset_subordinate_devices)(struct 
> > pci_dev *pdev))
> > @@ -195,15 +206,17 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   if (state == pci_channel_io_frozen) {
> >   pci_bridge_walk(bridge, report_frozen_detected, );
> >   if (type == PCI_EXP_TYPE_RC_END) {
> > - pci_warn(dev, "link reset not possible for RCiEP\n");
> > - status = PCI_ERS_RESULT_NONE;
> > - goto failed;
> > - }
> > -
> > - status = reset_subordinate_devices(bridge);
> > - if (status != PCI_ERS_RESULT_RECOVERED) {
> > - pci_warn(dev, "subordinate device reset failed\n");
> > - goto failed;
> > + status = flr_on_rciep(dev);
> > + if (status != PCI_ERS_RESULT_RECOVERED) {
> > + pci_warn(dev, "function level reset 
> > failed\n");
> > + goto failed;
> > + }
> > + } else {
> > + status = reset_subordinate_devices(bridge);
> > + if (status != PCI_ERS_RESULT_RECOVERED) {
> > + pci_warn(dev, "subordinate device reset 
> > failed\n");
> > + goto failed;
> > + }
> >   }
> >   } else {
> >   pci_bridge_walk(bridge, report_normal_detected, );
>
>


Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page

2020-09-22 Thread Matthew Wilcox
On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > Size the uptodate array dynamically to support larger pages in the
> > > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > per page.  Add a few debugging assertions.
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) 
> > > > Reviewed-by: Dave Chinner 
> > > 
> > > Some syscall fuzzing will trigger this on powerpc:
> > > 
> > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > 
> > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > io.c:78 iomap_page_release+0x250/0x270
> > 
> > Well, I'm glad it triggered.  That warning is:
> > WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > PageUptodate(page));
> > so there was definitely a problem of some kind.
> > 
> > truncate_cleanup_page() calls
> > do_invalidatepage() calls
> > iomap_invalidatepage() calls
> > iomap_page_release()
> > 
> > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > reproduce, perhaps you could try something like this?
> > 
> > +void dump_iomap_page(struct page *page, const char *reason)
> > +{
> > +   struct iomap_page *iop = to_iomap_page(page);
> > +   unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, 
> > page);
> > +
> > +   dump_page(page, reason);
> > +   if (iop)
> > +   printk("iop:reads %d writes %d uptodate %*pb\n",
> > +   atomic_read(>read_bytes_pending),
> > +   atomic_read(>write_bytes_pending),
> > +   nr_blocks, iop->uptodate);
> > +   else
> > +   printk("iop:none\n");
> > +}
> > 
> > and then do something like:
> > 
> > if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > dump_iomap_page(page, NULL);
> 
> This:
> 
> [ 1683.158254][T164965] page:4a6c16cd refcount:2 mapcount:0 
> mapping:ea017dc5 index:0x2 pfn:0xc365c
> [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry 
> name:"trinity-testfile2"
> [ 1683.158354][T164965] flags: 0x7fff800015(locked|uptodate|lru)
> [ 1683.158392][T164965] raw: 007fff800015 c00c019c4b08 
> c00c019a53c8 c000201c8362c1e8
> [ 1683.158430][T164965] raw: 0002  
> 0002 c000201c54db4000
> [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> [ 1683.158506][T164965] iop:none

Oh, I'm a fool.  This is after the call to detach_page_private() so
page->private is NULL and we don't get the iop dumped.

Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
but the bitmap is deemed not full.  There are three places where we set
an iomap page Uptodate:

1.  if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
SetPageUptodate(page);

2.  if (page_has_private(page))
iomap_iop_set_range_uptodate(page, off, len);
else
SetPageUptodate(page);

3.  BUG_ON(page->index);
...
SetPageUptodate(page);

It can't be #2 because the page has an iop.  It can't be #3 because the
page->index is not 0.  So at some point in the past, the bitmap was full.

I don't think it's possible for inode->i_blksize to change, and you
aren't running with THPs, so it's definitely not possible for thp_size()
to change.  So i_blocks_per_page() isn't going to change.

We seem to have allocated enough memory for ->iop because that's also
based on i_blocks_per_page().

I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
I've been trying to reproduce this on x86 with a 1kB block size
filesystem, and haven't been able to yet.  Maybe I'll try to setup a
powerpc cross-compilation environment tomorrow.


Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-22 Thread Dave Chinner
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote:
> Thanks for reviewing NVFS.

Not a review - I've just had a cursory look and not looked any
deeper after I'd noticed various red flags...

> On Tue, 22 Sep 2020, Dave Chinner wrote:
> > IOWs, extent based trees were chosen because of scalability,
> > efficiency, and flexibility reasons before the actual tree structure
> > that it would be implemented with was decided on.  b+trees were used
> > in the implementation because one tree implementation could do
> > everything as all that needed to change btree trees was the pointer
> > and record format.
> 
> I agree that the b+tree were a good choice for XFS.
> 
> In RAM-based maps, red-black trees or avl trees are used often. In 
> disk-based maps, btrees or b+trees are used. That's because in RAM, you 
> are optimizing for the number of cache lines accessed, and on the disk, 
> you are optimizing for the number of blocks accessed.

https://lore.kernel.org/linux-fsdevel/20190416122240.gn29...@dread.disaster.area/

"FWIW, I'm not convinced about the scalability of the rb/interval
tree, to tell you the truth. We got rid of the rbtree in XFS for
cache indexing because the multi-level pointer chasing was just too
expensive to do under a spinlock - it's just not a cache efficient
structure for random index object storage."

All the work I've done since has reinforced this - small node
RCU-aware btrees (4 cachelines per node) scale much, much better
than rbtrees, and they can be made lockless, too.

One of the reasons that btrees are more efficient in memory is the
behaviour of modern CPUs and their hardware prefetchers. It is
actually more time efficient to do a linear search of a small node
and then move to another small node than it is to do a binary search
of a large node in memory.  The CPU usage trade-off between linear
search overhead and chasing another pointer is currently somewhere
between 4 and 8 cachelines or pointers/records in a node on modern
x86-64 CPUs.

SO, yeah, btrees are actually very efficient for in-memory indexes
for the same reasons they are efficient for on-disk structures -
they pack more information per node than a binary structure, and
it's faster to search within a node than is to fetch another node...

> > The result of this is that we have made -zero- changes to the XFS
> > structure and algorithms for SSDs. We don't do different things
> > based on the blkdev rotational flag, or anything like that. XFS
> > behaves exactly the same on spinning disks as it does SSDs as it
> > does PMEM and it performs well on all of them. And that performance
> > doesn't drop away as you increase the scale and capability of the
> > underlying storage.
> > 
> > That's what happens when storage algorithms are designed for
> > concurrency and efficiency at scale rather than optimising for a
> > specific storage characteristic.
> > 
> > NVFS is optimised for a specific storage characteristic (i.e. low
> > latency synchronous storage), so I would absolutely expect it to be
> > faster than XFS on that specific storage. However, claims like this:
> > 
> > > On persistent memory, each access has its own cost, so NVFS uses metadata 
> > > structures that minimize the number of cache lines accessed (rather than 
> > > the number of blocks accessed). For block mapping, NVFS uses the classic 
> > > unix dierct/indirect blocks - if a file block is mapped by a 3-rd level 
> > > indirect block, we do just three memory accesses and we are done. If we 
> > > used b+trees, the number of accesses would be much larger than 3 (we 
> > > would 
> > > have to do binary search in the b+tree nodes).
> > 
> > ... are kinda naive, because you're clearly optimising the wrong
> > aspect of block mapping. Extents solve the block indexing overhead
> > problem; optimising the type of tree you use to index the indirect
> > blocks doesn't avoid the overhead of having to iterate every block
> > for range operations.
> > 
> > IOWs, we use extents because they are space and time efficient for
> > the general use cases. XFS can map 2^21 blocks into a single 16 byte
> > extent record (8GiB file mapping for 4k block size) and so the vast
> > majority of files in a filesystem are mapped with a single extent.
> 
> BTW. How does XFS "predict" the file size? - so that it allocates extent 
> of proper size without knowing how big the file will be?

Oh, there's probably 10-15,000 lines of code involved in getting
that right. There's delayed allocation, speculative preallocation,
extent size hints, about 10 distinct allocation policies including
"allocate exactly at this block or fail" that allow complex
poilicies with multiple fallback conditions to select the best
possible allocation for the given state, there's locality separation
that tries to keep individual workloads in different large
contiguous free spaces, etc.

> > The NVFS indirect block tree has a fan-out of 16,
> 
> No. The top level in the inode contains 16 blocks (11 

Re: [RFC PATCH v2] tools/x86: add kcpuid tool to show raw CPU features

2020-09-22 Thread Feng Tang
Hi Arvind,

On Tue, Sep 22, 2020 at 06:15:23PM -0400, Arvind Sankar wrote:
> On Tue, Sep 22, 2020 at 10:10:24PM +0200, Borislav Petkov wrote:
> > + AMD folks.
> > 
> > On Tue, Sep 22, 2020 at 01:27:50PM +0800, Feng Tang wrote:
> > > End users frequently want to know what features their processor
> > > supports, independent of what the kernel supports.
> > > 
> > > /proc/cpuinfo is great. It is omnipresent and since it is provided by
> > > the kernel it is always as up to date as the kernel. But, it could be
> > > ambiguous about processor features which can be disabled by the kernel
> > > at boot-time or compile-time.
> > > 
> > > There are some user space tools showing more raw features, but they are
> > > not bound with kernel, and go with distros. Many end users are still
> > > using old distros with new kernels (upgraded by themselves), and may
> > > not upgrade the distros only to get a newer tool.
> > > 
> > > So here arise the need for a new tool, which
> > >   * Shows raw cpu features got from running cpuid
> > >   * Be easier to obtain updates for compared to existing userspace
> > > tooling (perhaps distributed like perf)
> > >   * Inherits "modern" kernel development process, in contrast to some
> > > of the existing userspace cpuid tools which are still being developed
> > > without git and distributed in tarballs from non-https sites.
> > >   * Can produce output consistent with /proc/cpuinfo to make comparison
> > > easier.
> 
> Rather than a tool, would additional file(s) in, say,
> /sys/devices/system/cpu/cpu be nicer? They could show the raw CPUID
> features, one file per leaf or sub-leaf, maybe even along with whether
> they were disabled at boot-time.

My thought is we already have in-kernel powerful /proc/cpuinfo, while 
a user space tool could be more flexible for text parsing/layout, and
show different info on user's demand/options.

> > >   * Be in-kernel, could leverage kernel enabling, and even
> > > theoretically consume arch/x86/boot/cpustr.h so it could pick up
> > > new features directly from one-line X86_FEATURE_* definitions.
> 
> That's arch/x86/include/asm/cpufeatures.h right -- cpustr.h is generated
> from that. The table there already has comments which could be extracted
> as the one-line description.

Thanks for the hint! I found the comments in cpufeatures.h is much better
than what I extraced from SDM :), which I should use instead.

One other thing as Boris has mentioned, cpu feature is mixture of raw
silicon features and kernel software ones. Also, cpufeatures.h only
contains shows ont-bit boolean flag, while cpuid has multiple-bits field
containing numbers.

Thanks,
Feng




Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to generic intercepts

2020-09-22 Thread Paolo Bonzini
On 22/09/20 21:11, Babu Moger wrote:
> 
> 
>> -Original Message-
>> From: Paolo Bonzini 
>> Sent: Tuesday, September 22, 2020 8:39 AM
>> To: Sean Christopherson 
>> Cc: Moger, Babu ; vkuzn...@redhat.com;
>> jmatt...@google.com; wanpen...@tencent.com; k...@vger.kernel.org;
>> j...@8bytes.org; x...@kernel.org; linux-kernel@vger.kernel.org;
>> mi...@redhat.com; b...@alien8.de; h...@zytor.com; t...@linutronix.de
>> Subject: Re: [PATCH v6 04/12] KVM: SVM: Modify intercept_exceptions to
>> generic intercepts
>>
>> On 14/09/20 17:06, Sean Christopherson wrote:
 I think these should take a vector instead, and add 64 in the functions.
>>>
>>> And "s/int bit/u32 vector" + BUILD_BUG_ON(vector > 32)?
>>
>> Not sure if we can assume it to be constant, but WARN_ON_ONCE is good
>> enough as far as performance is concerned.  The same int->u32 +
>> WARN_ON_ONCE should be done in patch 1.
> 
> Paolo, Ok sure. Will change "int bit" to "u32 vector". I will send a new
> patch to address this. This needs to be addressed in all these functions,
> vmcb_set_intercept, vmcb_clr_intercept, vmcb_is_intercept,
> set_exception_intercept, clr_exception_intercept, svm_set_intercept,
> svm_clr_intercept, svm_is_intercept.
> 
> Also will add WARN_ON_ONCE(vector > 32); on set_exception_intercept,
> clr_exception_intercept.  Does that sound good?

I can do the fixes myself, no worries.  It should get to kvm/next this week.

Paolo



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-22 Thread Dave Young
+ more people who may care about this param 
On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk  writes:
> 
> > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young  wrote:
> >> 
> >> > crash_kexec_post_notifiers enables running various panic notifier
> >> > before kdump kernel booting. This increases risks of kdump failure.
> >> > It is well documented in kernel-parameters.txt. We do not suggest
> >> > people to enable it together with kdump unless he/she is really sure.
> >> > This is also not suggested to be enabled by default when users are
> >> > not aware in distributions.
> >> > 
> >> > But unfortunately it is enabled by default in systemd, see below
> >> > discussions in a systemd report, we can not convince systemd to change
> >> > it:
> >> > https://github.com/systemd/systemd/issues/16661
> >> > 
> >> > Actually we have got reports about kdump kernel hangs in both s390x
> >> > and powerpcle cases caused by the systemd change,  also some x86 cases
> >> > could also be caused by the same (although that is in Hyper-V code
> >> > instead of systemd, that need to be addressed separately).
> >
> > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> >
> >> > 
> >> > Thus to avoid the auto enablement here just disable the param writable
> >> > permission in sysfs.
> >> > 
> >> 
> >> Well.  I don't think this is at all a desirable way of resolving a
> >> disagreement with the systemd developers
> >> 
> >> At the above github address I'm seeing "ryncsn added a commit to
> >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> >> enable crash_kexec_post_notifiers by default".  So didn't that address
> >> the issue?
> >
> > It does in systemd, but there is a strong interest in making this on
> > by default.
> 
> There is also a strong interest in removing this code entirely from the
> kernel.

Added Hyper-V people and people who created the param, it is below
commit, I also want to remove it if possible, let's see how people
think, but the least way should be to disable the auto setting in both systemd
and kernel:

commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
Author: Masami Hiramatsu 
Date:   Fri Jun 6 14:37:07 2014 -0700

kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after 
panic_notifers

Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg.  This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump.  This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

> 
> This failure is a case in point.
> 
> I think I am at my I told you so point.  This is what all of the testing
> over all the years has said.  Leaving functionality to the peculiarities
> of firmware when you don't have to, and can actually control what is
> going on doesn't work.
> 
> Eric
> 
> 

Thanks
Dave



Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2

2020-09-22 Thread Chunxin Zang
On Wed, Sep 23, 2020 at 3:57 AM Shakeel Butt  wrote:
>
> On Tue, Sep 22, 2020 at 5:37 AM Chunxin Zang  
> wrote:
> >
> > On Tue, Sep 22, 2020 at 6:42 PM Chris Down  wrote:
> > >
> > > Chunxin Zang writes:
> > > >On Tue, Sep 22, 2020 at 5:51 PM Chris Down  wrote:
> > > >>
> > > >> Chunxin Zang writes:
> > > >> >My usecase is that there are two types of services in one server. They
> > > >> >have difference
> > > >> >priorities. Type_A has the highest priority, we need to ensure it's
> > > >> >schedule latency、I/O
> > > >> >latency、memory enough. Type_B has the lowest priority, we expect it
> > > >> >will not affect
> > > >> >Type_A when executed.
> > > >> >So Type_A could use memory without any limit. Type_B could use memory
> > > >> >only when the
> > > >> >memory is absolutely sufficient. But we cannot estimate how much
> > > >> >memory Type_B should
> > > >> >use. Because everything is dynamic. So we can't set Type_B's 
> > > >> >memory.high.
> > > >> >
> > > >> >So we want to release the memory of Type_B when global memory is
> > > >> >insufficient in order
> > > >> >to ensure the quality of service of Type_A . In the past, we used the
> > > >> >'force_empty' interface
> > > >> >of cgroup v1.
> > > >>
> > > >> This sounds like a perfect use case for memory.low on Type_A, and it's 
> > > >> pretty
> > > >> much exactly what we invented it for. What's the problem with that?
> > > >
> > > >But we cannot estimate how much memory Type_A uses at least.
> > >
> > > memory.low allows ballparking, you don't have to know exactly how much it 
> > > uses.
> > > Any amount of protection biases reclaim away from that cgroup.
> > >
> > > >For example:
> > > >total memory: 100G
> > > >At the beginning, Type_A was in an idle state, and it only used 10G of 
> > > >memory.
> > > >The load is very low. We want to run Type_B to avoid wasting machine 
> > > >resources.
> > > >When Type_B runs for a while, it used 80G of memory.
> > > >At this time Type_A is busy, it needs more memory.
> > >
> > > Ok, so set memory.low for Type_A close to your maximum expected value.
> >
> > Please forgive me for not being able to understand why setting
> > memory.low for Type_A can solve the problem.
> > In my scene, Type_A is the most important, so I will set 100G to memory.low.
> > But 'memory.low' only takes effect passively when the kernel is
> > reclaiming memory. It means that reclaim Type_B's memory only when
> > Type_A  in alloc memory slow path. This will affect Type_A's
> > performance.
> > We want to reclaim Type_B's memory in advance when A is expected to be busy.
> >
>
> How will you know when to reclaim from B? Are you polling /proc/meminfo?
>

Monitor global memory usage through the daemon. If the memory is used
80% or 90%, it will reclaim B's memory.

> From what I understand, you want to proactively reclaim from B, so
> that A does not go into global reclaim and in the worst case kill B,
> right?

Yes, it is.

>
> BTW you can use memory.high to reclaim from B by setting it lower than
> memory.current of B and reset it to 'max' once the reclaim is done.
> Since 'B' is not high priority (I am assuming not a latency sensitive
> workload), B hitting temporary memory.high should not be an issue.
> Also I am assuming you don't much care about the amount of memory to
> be reclaimed from B, so I think memory.high can fulfil your use-case.
> However if in future you decide to proactively reclaim from all the
> jobs based on their priority i.e. more aggressive reclaim from B and a
> little bit reclaim from A then memory.high is not a good interface.
>
> Shakeel

Thanks for these suggestions, I will give it a try.

Best wishes
Chunxin


Re: [PATCH 10/10] rpmsg: ns: Make Name service module transport agnostic

2020-09-22 Thread kernel test robot
Hi Mathieu,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20200921]
[cannot apply to linux/master linus/master rpmsg/for-next v5.9-rc6 v5.9-rc5 
v5.9-rc4 v5.9-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mathieu-Poirier/rpmsg-Make-RPMSG-name-service-modular/20200922-081745
base:b10b8ad862118bf42c28a98b0f067619aadcfb23
config: i386-randconfig-s001-20200921 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-201-g24bdaac6-dirty
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/rpmsg/virtio_rpmsg_bus.c:165:43: sparse: sparse: incorrect type in 
argument 2 (different base types) @@ expected restricted __virtio16 
[usertype] val @@ got unsigned short [usertype] val @@
   drivers/rpmsg/virtio_rpmsg_bus.c:165:43: sparse: expected restricted 
__virtio16 [usertype] val
   drivers/rpmsg/virtio_rpmsg_bus.c:165:43: sparse: got unsigned short 
[usertype] val
   drivers/rpmsg/virtio_rpmsg_bus.c:173:31: sparse: sparse: incorrect type in 
return expression (different base types) @@ expected unsigned short @@ 
got restricted __virtio16 @@
   drivers/rpmsg/virtio_rpmsg_bus.c:173:31: sparse: expected unsigned short
   drivers/rpmsg/virtio_rpmsg_bus.c:173:31: sparse: got restricted 
__virtio16
   drivers/rpmsg/virtio_rpmsg_bus.c:181:43: sparse: sparse: incorrect type in 
argument 2 (different base types) @@ expected restricted __virtio32 
[usertype] val @@ got unsigned int [usertype] val @@
   drivers/rpmsg/virtio_rpmsg_bus.c:181:43: sparse: expected restricted 
__virtio32 [usertype] val
   drivers/rpmsg/virtio_rpmsg_bus.c:181:43: sparse: got unsigned int 
[usertype] val
   drivers/rpmsg/virtio_rpmsg_bus.c:189:31: sparse: sparse: incorrect type in 
return expression (different base types) @@ expected unsigned int @@ 
got restricted __virtio32 @@
   drivers/rpmsg/virtio_rpmsg_bus.c:189:31: sparse: expected unsigned int
   drivers/rpmsg/virtio_rpmsg_bus.c:189:31: sparse: got restricted 
__virtio32
>> drivers/rpmsg/virtio_rpmsg_bus.c:267:26: sparse: sparse: incorrect type in 
>> assignment (different base types) @@ expected unsigned int [addressable] 
>> [usertype] addr @@ got restricted __virtio32 @@
>> drivers/rpmsg/virtio_rpmsg_bus.c:267:26: sparse: expected unsigned int 
>> [addressable] [usertype] addr
   drivers/rpmsg/virtio_rpmsg_bus.c:267:26: sparse: got restricted 
__virtio32
>> drivers/rpmsg/virtio_rpmsg_bus.c:268:27: sparse: sparse: incorrect type in 
>> assignment (different base types) @@ expected unsigned int [addressable] 
>> [usertype] flags @@ got restricted __virtio32 @@
>> drivers/rpmsg/virtio_rpmsg_bus.c:268:27: sparse: expected unsigned int 
>> [addressable] [usertype] flags
   drivers/rpmsg/virtio_rpmsg_bus.c:268:27: sparse: got restricted 
__virtio32
   drivers/rpmsg/virtio_rpmsg_bus.c:291:26: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected unsigned int [addressable] 
[usertype] addr @@ got restricted __virtio32 @@
   drivers/rpmsg/virtio_rpmsg_bus.c:291:26: sparse: expected unsigned int 
[addressable] [usertype] addr
   drivers/rpmsg/virtio_rpmsg_bus.c:291:26: sparse: got restricted 
__virtio32
   drivers/rpmsg/virtio_rpmsg_bus.c:292:27: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected unsigned int [addressable] 
[usertype] flags @@ got restricted __virtio32 @@
   drivers/rpmsg/virtio_rpmsg_bus.c:292:27: sparse: expected unsigned int 
[addressable] [usertype] flags
   drivers/rpmsg/virtio_rpmsg_bus.c:292:27: sparse: got restricted 
__virtio32

# 
https://github.com/0day-ci/linux/commit/ab159ea48198df2ab06ff9fe97e63cca354bff20
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mathieu-Poirier/rpmsg-Make-RPMSG-name-service-modular/20200922-081745
git checkout ab159ea48198df2ab06ff9fe97e63cca354bff20
vim +267 drivers/rpmsg/virtio_rpmsg_bus.c

dd032e0b67fcd61 Mathieu Poirier   2020-09-21  167  
dd032e0b67fcd61 Mathieu Poirier   2020-09-21  168  static u16 
virtio_rpmsg_cpu_to_transport16(struct rpmsg_device *rpdev, u16 val)
dd032e0b67fcd61 Mathieu Poirier   2020-09-21  169  {
dd032e0b67fcd61 Mathieu Poirier   2020-09-21  170   struct 
virtio_rpmsg_channel *vch = to_virtio_rpmsg_channel(rpdev);
dd032e0b67fcd61 Mathieu Poirier   2020-09-21  17

Re: [PATCH] csky: Fix a size determination in gpr_get()

2020-09-22 Thread Guo Ren
On Wed, Sep 23, 2020 at 8:23 AM Al Viro  wrote:
>
> On Wed, Sep 23, 2020 at 08:03:20AM +0800, Guo Ren wrote:
> > Thx Duan,
> >
> > Acked-by: Guo Ren 
> >
> > Hi AI,
> >
> > I found the broken commit still has a question:
> >
> > > commit dcad7854fcce6a2d49b6a3ead5bbefeff047e559
> > > Author: Al Viro 
> > > Date:   Tue Jun 16 15:28:29 2020 -0400
> >
> > >csky: switch to ->regset_get()
> >
> > >NB: WTF "- what the fuck :(" is fpregs_get() playing at???
> > The fpregs_get() is for REGSET_FPR regset used by ptrace (gdb) and all
> > fp regs are stored in threads' context.
> > So, WTF question for?
>
> The part under
> #if defined(CONFIG_CPU_HAS_FPUV2) && !defined(CONFIG_CPU_HAS_VDSP)
>
> What's going on there?  The mapping is really weird - assuming
> you had v0..v31 in the first 32 elements of regs->vr[], you
> end up with
>
> v0 v1 v2 v3 v2 v3 v6 v7 v4 v5 v10 v11 v6 v7 v14 v15
> v8 v9 v18 v19 v10 v11 v22 v23 v12 v13 v26 v27 v14 v15 v30 v31
>
> in the beginning of the output.  Assuming it is the intended
> behaviour, it's probably worth some comments...
FPU & VDSP use the same regs. 32 FPU regs' width is 64b and 16 VDSP
regs' width is 128b.

vr[0], vr[1] = fp[0] & vr[0] vr[1], vr[2], vr[3] = vdsp reg[0]
...
vr[60], vr[61] = fp[15] & vr[60] vr[61], vr[62], vr[63] = vdsp reg[15]
vr[64], vr[65] = fp[16]
vr[66], vr[67] = fp[17]
...
vr[94], vr[95] = fp[31]

Yeah, this is confusing and I'll add a comment later.




--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [External] Re: [PATCH] mm/memcontrol: Add the drop_cache interface for cgroup v2

2020-09-22 Thread Chunxin Zang
On Tue, Sep 22, 2020 at 8:43 PM Chris Down  wrote:
>
> Chunxin Zang writes:
> >Please forgive me for not being able to understand why setting
> >memory.low for Type_A can solve the problem.
> >In my scene, Type_A is the most important, so I will set 100G to memory.low.
> >But 'memory.low' only takes effect passively when the kernel is
> >reclaiming memory. It means that reclaim Type_B's memory only when
> >Type_A  in alloc memory slow path. This will affect Type_A's
> >performance.
> >We want to reclaim Type_B's memory in advance when A is expected to be busy.
>
> That's what kswapd reclaim is for, so this distinction is meaningless without
> measurements :-)

Thanks for these suggestions, I will give it a try.

Best wishes
Chunxin


Re: [PATCH] Revert "iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions"

2020-09-22 Thread Baoquan He
Forgot CC-ing Jerry, add him.

On 09/23/20 at 10:26am, Baoquan He wrote:
> A regression failure of kdump kernel boot was reported on a HPE system.
> Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
> exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
> fix the failure.
> 
> With the commit, kdump kernel will always print below error message, then
> naturally AMD iommu can't function normally during kdump kernel bootup.
> 
>   ~
>   AMD-Vi: [Firmware Bug]: IVRS invalid checksum
> 
> Why commit 387caf0b759ac43 causing it haven't been made clear.

Hi Joerg, Adrian

We only have one machine which can reproduce the issue, it's a gen10-01
of HPE. If any log or info are needed, please let me know, I can attach
here.

Thanks
Baoquan

> 
> From the commit log, a discussion thread link is pasted. In that discussion
> thread, Adrian told the fix is for a system with already broken BIOS, and
> Joerg suggested two options. Finally option 2) is taken. Maybe option 1)
> should be the right approach?
> 
>   1) Bail out and disable the IOMMU as the BIOS screwed up
>   2) Treat per-device exclusion ranges just as r/w unity-mapped
>  regions.
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040117.html
> Signed-off-by: Baoquan He 
> ---
>  drivers/iommu/amd/init.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 9aa1eae26634..bbe7ceae5949 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -1109,17 +1109,22 @@ static int __init add_early_maps(void)
>   */
>  static void __init set_device_exclusion_range(u16 devid, struct ivmd_header 
> *m)
>  {
> + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
> +
>   if (!(m->flags & IVMD_FLAG_EXCL_RANGE))
>   return;
>  
> - /*
> -  * Treat per-device exclusion ranges as r/w unity-mapped regions
> -  * since some buggy BIOSes might lead to the overwritten exclusion
> -  * range (exclusion_start and exclusion_length members). This
> -  * happens when there are multiple exclusion ranges (IVMD entries)
> -  * defined in ACPI table.
> -  */
> - m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
> + if (iommu) {
> + /*
> +  * We only can configure exclusion ranges per IOMMU, not
> +  * per device. But we can enable the exclusion range per
> +  * device. This is done here
> +  */
> + set_dev_entry_bit(devid, DEV_ENTRY_EX);
> + iommu->exclusion_start = m->range_start;
> + iommu->exclusion_length = m->range_length;
> + }
> +
>  }
>  
>  /*
> -- 
> 2.17.2
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 



Re: [PATCH 1/5] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr1-alt-addr' property

2020-09-22 Thread Rob Herring
On Wed, Sep 16, 2020 at 04:18:27PM +0800, Ran Wang wrote:
> From: Biwen Li 
> 
> The 'fsl,ippdexpcr1-alt-addr' property is used to handle an errata A-008646
> on LS1021A
> 
> Signed-off-by: Biwen Li 
> Signed-off-by: Ran Wang 
> ---
>  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt 
> b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> index 5a33619..1be58a3 100644
> --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> @@ -34,6 +34,11 @@ Chassis VersionExample Chips
>  Optional properties:
>   - little-endian : RCPM register block is Little Endian. Without it RCPM
> will be Big Endian (default case).
> + - fsl,ippdexpcr1-alt-addr : The property is related to a hardware issue
> +   on SoC LS1021A and only needed on SoC LS1021A.
> +   Must include 2 entries:
> +   The first entry must be a link to the SCFG device node.
> +   The 2nd entry must be offset of register IPPDEXPCR1 in SCFG.

You don't need a DT change for this. You can find SCFG node by its 
compatible string and then the offset should be known given this issue 
is only on 1 SoC.

Rob


[PATCH] Revert "iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions"

2020-09-22 Thread Baoquan He
A regression failure of kdump kernel boot was reported on a HPE system.
Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
fix the failure.

With the commit, kdump kernel will always print below error message, then
naturally AMD iommu can't function normally during kdump kernel bootup.

  ~
  AMD-Vi: [Firmware Bug]: IVRS invalid checksum

Why commit 387caf0b759ac43 causing it haven't been made clear.

>From the commit log, a discussion thread link is pasted. In that discussion
thread, Adrian told the fix is for a system with already broken BIOS, and
Joerg suggested two options. Finally option 2) is taken. Maybe option 1)
should be the right approach?

  1) Bail out and disable the IOMMU as the BIOS screwed up
  2) Treat per-device exclusion ranges just as r/w unity-mapped
 regions.

https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040117.html
Signed-off-by: Baoquan He 
---
 drivers/iommu/amd/init.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9aa1eae26634..bbe7ceae5949 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1109,17 +1109,22 @@ static int __init add_early_maps(void)
  */
 static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
 {
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
if (!(m->flags & IVMD_FLAG_EXCL_RANGE))
return;
 
-   /*
-* Treat per-device exclusion ranges as r/w unity-mapped regions
-* since some buggy BIOSes might lead to the overwritten exclusion
-* range (exclusion_start and exclusion_length members). This
-* happens when there are multiple exclusion ranges (IVMD entries)
-* defined in ACPI table.
-*/
-   m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
+   if (iommu) {
+   /*
+* We only can configure exclusion ranges per IOMMU, not
+* per device. But we can enable the exclusion range per
+* device. This is done here
+*/
+   set_dev_entry_bit(devid, DEV_ENTRY_EX);
+   iommu->exclusion_start = m->range_start;
+   iommu->exclusion_length = m->range_length;
+   }
+
 }
 
 /*
-- 
2.17.2



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-22 Thread Dave Young
On 09/21/20 at 04:18pm, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young  wrote:
> > 
> > > crash_kexec_post_notifiers enables running various panic notifier
> > > before kdump kernel booting. This increases risks of kdump failure.
> > > It is well documented in kernel-parameters.txt. We do not suggest
> > > people to enable it together with kdump unless he/she is really sure.
> > > This is also not suggested to be enabled by default when users are
> > > not aware in distributions.
> > > 
> > > But unfortunately it is enabled by default in systemd, see below
> > > discussions in a systemd report, we can not convince systemd to change
> > > it:
> > > https://github.com/systemd/systemd/issues/16661
> > > 
> > > Actually we have got reports about kdump kernel hangs in both s390x
> > > and powerpcle cases caused by the systemd change,  also some x86 cases
> > > could also be caused by the same (although that is in Hyper-V code
> > > instead of systemd, that need to be addressed separately).
> 
> Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> 
> > > 
> > > Thus to avoid the auto enablement here just disable the param writable
> > > permission in sysfs.
> > > 
> > 
> > Well.  I don't think this is at all a desirable way of resolving a
> > disagreement with the systemd developers
> > 
> > At the above github address I'm seeing "ryncsn added a commit to
> > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > enable crash_kexec_post_notifiers by default".  So didn't that address
> > the issue?
> 
> It does in systemd, but there is a strong interest in making this on by 
> default.

I understand there could be such interest, but we have to keep in mind
that any extra things after a system crash can cause kdump unreliable.

I do not object people to use pstore, but I do object to enable the
notifiers by default.

BTW, crash notifiers are not limited to pstore, there are quite a log of
other pieces like led trigger etc.

Thanks
Dave



Re: [PATCH 2/2] locktorture: call percpu_free_rwsem() to do percpu-rwsem cleanup

2020-09-22 Thread Hou Tao
Hi Paul,

> On 2020/9/23 7:24, Paul E. McKenney wrote:
snip

>> Fix it by adding an exit hook in lock_torture_ops and
>> use it to call percpu_free_rwsem() for percpu rwsem torture
>> before the module is removed, so we can ensure rcu_sync_func()
>> completes before module exits.
>>
>> Also needs to call exit hook if lock_torture_init() fails half-way,
>> so use ctx->cur_ops != NULL to signal that init hook has been called.
> 
> Good catch, but please see below for comments and questions.
> 
>> Signed-off-by: Hou Tao 
>> ---
>>  kernel/locking/locktorture.c | 28 ++--
>>  1 file changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
>> index bebdf98e6cd78..e91033e9b6f95 100644
>> --- a/kernel/locking/locktorture.c
>> +++ b/kernel/locking/locktorture.c
>> @@ -74,6 +74,7 @@ static void lock_torture_cleanup(void);
>>   */
>>  struct lock_torture_ops {
>>  void (*init)(void);
>> +void (*exit)(void);
> 
> This is fine, but why not also add a flag to the lock_torture_cxt
> structure that is set when the ->init() function is called?  Perhaps
> something like this in lock_torture_init():
> 
>   if (cxt.cur_ops->init) {
>   cxt.cur_ops->init();
>   cxt.initcalled = true;
>   }
> 

You are right. Add a new field to indicate the init hook has been
called is much better than reusing ctx->cur_ops != NULL to do that.

>>  int (*writelock)(void);
>>  void (*write_delay)(struct torture_random_state *trsp);
>>  void (*task_boost)(struct torture_random_state *trsp);
>> @@ -571,6 +572,11 @@ void torture_percpu_rwsem_init(void)
>>  BUG_ON(percpu_init_rwsem(_rwsem));
>>  }
>>  
>> +static void torture_percpu_rwsem_exit(void)
>> +{
>> +percpu_free_rwsem(_rwsem);
>> +}
>> +
snip

>> @@ -828,6 +836,12 @@ static void lock_torture_cleanup(void)
>>  cxt.lrsa = NULL;
>>  
>>  end:
>> +/* If init() has been called, then do exit() accordingly */
>> +if (cxt.cur_ops) {
>> +if (cxt.cur_ops->exit)
>> +cxt.cur_ops->exit();
>> +cxt.cur_ops = NULL;
>> +}
> 
> The above can then be:
> 
>   if (cxt.initcalled && cxt.cur_ops->exit)
>   cxt.cur_ops->exit();
> 
> Maybe you also need to clear cxt.initcalled at this point, but I don't
> immediately see why that would be needed.
> 
Because we are doing cleanup, so I think reset initcalled to false is OK
after the cleanup is done.

>>  torture_cleanup_end();
>>  }
>>  
>> @@ -835,6 +849,7 @@ static int __init lock_torture_init(void)
>>  {
>>  int i, j;
>>  int firsterr = 0;
>> +struct lock_torture_ops *cur_ops;
> 
> And then you don't need this extra pointer.  Not that this pointer is bad
> in and of itself, but using (!cxt.cur_ops) to indicate that the ->init()
> function has not been called is an accident waiting to happen.
> 
> And the changes below are no longer needed.
> 
> Or am I missing something subtle?
> 
Thanks for your suggestion. Will send v2.

Thanks.




Re: [PATCH] csky: Fix a size determination in gpr_get()

2020-09-22 Thread Zhenzhong Duan
On Wed, Sep 23, 2020 at 12:29 AM Al Viro  wrote:
>
> On Tue, Sep 22, 2020 at 05:15:05PM +0800, Zhenzhong Duan wrote:
> > "*" is missed  in size determination as we are passing register set
> > rather than a pointer.
>
> Ack.  I can push it to Linus today, unless you want it to go through
> csky tree.  Preferences?

I prefer pushing to linus.

Regards
Zhenzhong


Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-22 Thread Huang, Ying
Hi, Rafael,

Rafael Aquini  writes:

> The swap area descriptor only gets struct swap_cluster_info *cluster_info
> allocated if the swapfile is backed by non-rotational storage.
> When the swap area is laid on top of ordinary disk spindles, lock_cluster()
> will naturally return NULL.

Thanks for reporting.  But the bug looks strange.  Because in a system
with only HDD swap devices, during THP swap out, the swap cluster
shouldn't be allocated, as in

shrink_page_list()
  add_to_swap()
get_swap_page()
  get_swap_pages()
swap_alloc_cluster()

Where si->free_clusters is checked, and it should be empty for HDD.  So
in shrink_page_list(), the THP should have been split.  While in
split_huge_page_to_list(), PageSwapCache() is checked before calling
split_swap_cluster().  So this appears strange.

All in all, it appears that we need to find the real root cause of the
bug.

Did you test with the latest upstream kernel?  Can you help trace the
return value of swap_alloc_cluster()?  Can you share the swap device
information?

Best Regards,
Huang, Ying

> CONFIG_THP_SWAP exposes cluster_info infrastructure to a broader number of
> use cases, and split_swap_cluster(), which is the counterpart of 
> split_huge_page()
> for the THPs in the swapcache, misses checking the return of lock_cluster 
> before
> operating on the cluster_info pointer.
>
> This patch addresses that issue by adding a proper check for the pointer
> not being NULL in the wrappers cluster_{is,clear}_huge(), in order to avoid
> crashes similar to the one below:
>
> [ 5758.157556] BUG: kernel NULL pointer dereference, address: 0007
> [ 5758.165331] #PF: supervisor write access in kernel mode
> [ 5758.171161] #PF: error_code(0x0002) - not-present page
> [ 5758.176894] PGD 0 P4D 0
> [ 5758.179721] Oops: 0002 [#1] SMP PTI
> [ 5758.183614] CPU: 10 PID: 316 Comm: kswapd1 Kdump: loaded Tainted: G S  
>  - ---  5.9.0-0.rc3.1.tst.el8.x86_64 #1
> [ 5758.196717] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS 
> SE5C600.86B.02.01.0002.082220131453 08/22/2013
> [ 5758.208176] RIP: 0010:split_swap_cluster+0x47/0x60
> [ 5758.213522] Code: c1 e3 06 48 c1 eb 0f 48 8d 1c d8 48 89 df e8 d0 20 6a 00 
> 80 63 07 fb 48 85 db 74 16 48 89 df c6 07 00 66 66 66 90 31 c0 5b c3 <80> 24 
> 25 07 00 00 00 fb 31 c0 5b c3 b8 f0 ff ff ff 5b c3 66 0f 1f
> [ 5758.234478] RSP: 0018:b147442d7af0 EFLAGS: 00010246
> [ 5758.240309] RAX:  RBX: 0014b217 RCX: 
> b14779fd9000
> [ 5758.248281] RDX: 0014b217 RSI: 9c52f2ab1400 RDI: 
> 0014b217
> [ 5758.256246] RBP: e00c51168080 R08: e00c5116fe08 R09: 
> 9c52fffd3000
> [ 5758.264208] R10: e00c511537c8 R11: 9c52fffd3c90 R12: 
> 
> [ 5758.272172] R13: e00c5117 R14: e00c5117 R15: 
> e00c51168040
> [ 5758.280134] FS:  () GS:9c52f2a8() 
> knlGS:
> [ 5758.289163] CS:  0010 DS:  ES:  CR0: 80050033
> [ 5758.295575] CR2: 0007 CR3: 22a0e003 CR4: 
> 000606e0
> [ 5758.303538] Call Trace:
> [ 5758.306273]  split_huge_page_to_list+0x88b/0x950
> [ 5758.311433]  deferred_split_scan+0x1ca/0x310
> [ 5758.316202]  do_shrink_slab+0x12c/0x2a0
> [ 5758.320491]  shrink_slab+0x20f/0x2c0
> [ 5758.324482]  shrink_node+0x240/0x6c0
> [ 5758.328469]  balance_pgdat+0x2d1/0x550
> [ 5758.332652]  kswapd+0x201/0x3c0
> [ 5758.336157]  ? finish_wait+0x80/0x80
> [ 5758.340147]  ? balance_pgdat+0x550/0x550
> [ 5758.344525]  kthread+0x114/0x130
> [ 5758.348126]  ? kthread_park+0x80/0x80
> [ 5758.352214]  ret_from_fork+0x22/0x30
> [ 5758.356203] Modules linked in: fuse zram rfkill sunrpc intel_rapl_msr 
> intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp 
> mgag200 iTCO_wdt crct10dif_pclmul iTCO_vendor_support drm_kms_helper 
> crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt 
> fb_sys_fops cec rapl joydev intel_cstate ipmi_si ipmi_devintf drm 
> intel_uncore i2c_i801 ipmi_msghandler pcspkr lpc_ich mei_me i2c_smbus mei 
> ioatdma ip_tables xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg igb ahci 
> libahci i2c_algo_bit crc32c_intel libata dca wmi dm_mirror dm_region_hash 
> dm_log dm_mod
> [ 5758.412673] CR2: 0007
> [0.00] Linux version 5.9.0-0.rc3.1.tst.el8.x86_64 
> (mockbu...@x86-vm-15.build.eng.bos.redhat.com) (gcc (GCC) 8.3.1 20191121 (Red 
> Hat 8.3.1-5), GNU ld version 2.30-79.el8) #1 SMP Wed Sep 9 16:03:34 EDT 2020
>
> Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap out")
> Signed-off-by: Rafael Aquini 
> ---
>  mm/swapfile.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 12f59e641b5e..37ddf5e5c53b 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -324,14 +324,15 @@ static inline void cluster_set_null(struct 
> swap_cluster_info *info)
>  
>  static inline bool 

[PATCH v20 3/3] Input: new da7280 haptic driver

2020-09-22 Thread Roy Im
Adds support for the Dialog DA7280 LRA/ERM Haptic Driver with
multiple mode and integrated waveform memory and wideband support.
It communicates via an I2C bus to the device.

Reviewed-by: Jes Sorensen .

Signed-off-by: Roy Im 

---
v20:
- Simplified the code with dev_err_probe().
- Removed some work queues.
v19:
- Corrected some errors and replaced some code to proper one.
- Improved work queues and removed sysfs attributes
- Added changes for gpix id in upload effect.
v18:
- Corrected comments in Kconfig
- Updated to preferred style for multi line comments in c file.
v17:
- fixed an issue.
v16:
- Corrected some code and updated description in Kconfig.
v15:
- Removed some defines and updated some comments.
v14:
- Updated pwm related code, alignments and comments.
v13:
- Updated some conditions in pwm function and alignments.
v12: No changes.
v11: 
- Updated the pwm related code, comments and typo.
v10: 
- Updated the pwm related function and added some comments.
v9: 
- Removed the header file and put the definitions into the c file.
- Updated the pwm code and error logs with %pE
v8: 
- Added changes to support FF_PERIODIC/FF_CUSTOM and FF_CONSTANT.
- Updated the dt-related code.
- Removed memless related functions.
v7: 
- Added more attributes to handle one value per file.
- Replaced and updated the dt-related code and functions called.
- Fixed error/functions.
v6: No changes.
v5: Fixed errors in Kconfig file.
v4: Updated code as dt-bindings are changed.
v3: No changes.
v2: Fixed kbuild error/warning


 drivers/input/misc/Kconfig  |   12 +
 drivers/input/misc/Makefile |1 +
 drivers/input/misc/da7280.c | 1375 +++
 3 files changed, 1388 insertions(+)
 create mode 100644 drivers/input/misc/da7280.c

diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index 362e8a0..d38b466 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -869,4 +869,16 @@ config INPUT_STPMIC1_ONKEY
  To compile this driver as a module, choose M here: the
  module will be called stpmic1_onkey.
 
+config INPUT_DA7280_HAPTICS
+   tristate "Dialog Semiconductor DA7280 haptics support"
+   depends on INPUT && I2C
+   select REGMAP_I2C
+   help
+ Say Y to enable support for the Dialog DA7280 haptics driver.
+ The haptics can be controlled by PWM or GPIO
+ with I2C communication.
+
+ To compile this driver as a module, choose M here: the
+ module will be called da7280.
+
 endif
diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
index a48e5f2..9cfd6ab 100644
--- a/drivers/input/misc/Makefile
+++ b/drivers/input/misc/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_INPUT_CMA3000)   += cma3000_d0x.o
 obj-$(CONFIG_INPUT_CMA3000_I2C)+= cma3000_d0x_i2c.o
 obj-$(CONFIG_INPUT_COBALT_BTNS)+= cobalt_btns.o
 obj-$(CONFIG_INPUT_CPCAP_PWRBUTTON)+= cpcap-pwrbutton.o
+obj-$(CONFIG_INPUT_DA7280_HAPTICS) += da7280.o
 obj-$(CONFIG_INPUT_DA9052_ONKEY)   += da9052_onkey.o
 obj-$(CONFIG_INPUT_DA9055_ONKEY)   += da9055_onkey.o
 obj-$(CONFIG_INPUT_DA9063_ONKEY)   += da9063_onkey.o
diff --git a/drivers/input/misc/da7280.c b/drivers/input/misc/da7280.c
new file mode 100644
index 000..21d4d37
--- /dev/null
+++ b/drivers/input/misc/da7280.c
@@ -0,0 +1,1375 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DA7280 Haptic device driver
+ *
+ * Copyright (c) 2020 Dialog Semiconductor.
+ * Author: Roy Im 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Registers */
+#define DA7280_IRQ_EVENT1 0x03
+#define DA7280_IRQ_EVENT_WARNING_DIAG 0x04
+#define DA7280_IRQ_EVENT_SEQ_DIAG 0x05
+#define DA7280_IRQ_STATUS10x06
+#define DA7280_IRQ_MASK1  0x07
+#define DA7280_FRQ_LRA_PER_H  0x0A
+#define DA7280_FRQ_LRA_PER_L  0x0B
+#define DA7280_ACTUATOR1  0x0C
+#define DA7280_ACTUATOR2  0x0D
+#define DA7280_ACTUATOR3  0x0E
+#define DA7280_CALIB_V2I_H0x0F
+#define DA7280_CALIB_V2I_L0x10
+#define DA7280_TOP_CFG1   0x13
+#define DA7280_TOP_CFG2   0x14
+#define DA7280_TOP_CFG4   0x16
+#define DA7280_TOP_INT_CFG1   0x17
+#define DA7280_TOP_CTL1   0x22
+#define DA7280_TOP_CTL2   0x23
+#define DA7280_SEQ_CTL2   0x28
+#define DA7280_GPI_0_CTL  0x29
+#define DA7280_GPI_1_CTL  0x2A
+#define DA7280_GPI_2_CTL  0x2B

[PATCH v20 1/3] MAINTAINERS: da7280 updates to the Dialog Semiconductor search terms

2020-09-22 Thread Roy Im
This patch adds the da7280 bindings doc and driver to the Dialog
Semiconductor support list.

Signed-off-by: Roy Im 

---
v20: No changes.
v19: No changes.
v18: No changes.
v17: No changes.
v16: No changes.
v15: No changes.
v14: No changes.
v13: No changes.
v12: Corrected file list order.
v11: No changes.
v10: No changes.
v9: No changes.
v8: No changes.
v7: No changes.
v6: No changes.
v5: No changes.
v4: No changes.
v3: No changes.
v2: No changes.


 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d746519..6eff440 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5081,6 +5081,7 @@ M:Support Opensource 

 S: Supported
 W: http://www.dialog-semiconductor.com/products
 F: Documentation/devicetree/bindings/input/da90??-onkey.txt
+F: Documentation/devicetree/bindings/input/dlg,da72??.txt
 F: Documentation/devicetree/bindings/mfd/da90*.txt
 F: Documentation/devicetree/bindings/regulator/da92*.txt
 F: Documentation/devicetree/bindings/regulator/slg51000.txt
@@ -5091,6 +5092,7 @@ F:Documentation/hwmon/da90??.rst
 F: drivers/gpio/gpio-da90??.c
 F: drivers/hwmon/da90??-hwmon.c
 F: drivers/iio/adc/da91??-*.c
+F: drivers/input/misc/da72??.[ch]
 F: drivers/input/misc/da90??_onkey.c
 F: drivers/input/touchscreen/da9052_tsi.c
 F: drivers/leds/leds-da90??.c
-- 
end-of-patch for PATCH v20



[PATCH v20 0/3] da7280: haptic driver submission

2020-09-22 Thread Roy Im
This patch adds support for the Dialog DA7280 Haptic driver IC.

In this patch set the following is provided:

[PATCH v20 1/3] MAINTAINERS file update for DA7280
[PATCH v20 2/3] DA7280 DT Binding
[PATCH v20 3/3] DA7280 Driver

This patch applies against linux-mainline and v5.9-rc6

Thank you,
Roy Im, Dialog Semiconductor Ltd.

Roy Im (3):
  MAINTAINERS: da7280 updates to the Dialog Semiconductor search terms
  dt-bindings: input: Add document bindings for DA7280
  Input: new da7280 haptic driver

 .../devicetree/bindings/input/dlg,da7280.txt   |  109 ++
 MAINTAINERS|2 +
 drivers/input/misc/Kconfig |   12 +
 drivers/input/misc/Makefile|1 +
 drivers/input/misc/da7280.c| 1375 
 5 files changed, 1499 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/dlg,da7280.txt
 create mode 100644 drivers/input/misc/da7280.c

-- 
end-of-patch for PATCH v20



[PATCH v20 2/3] dt-bindings: input: Add document bindings for DA7280

2020-09-22 Thread Roy Im
Add device tree binding information for DA7280 haptic driver.
Example bindings for DA7280 are added.

Reviewed-by: Rob Herring .

Signed-off-by: Roy Im 

---
v20: No changes.
v19: No changes.
v18: No changes.
v17: No changes.
v16: No changes.
v15: No changes.
v14: No changes.
v13: No changes.
v12: No changes.
v11: No changes.
v10: No changes.
v9: No changes.
v8: Updated descriptions for new properties.
v7: No changes.
v6: No changes.
v5: Updated descriptions and fixed errors.
v4: Fixed commit message, properties.
v3: Fixed subject format.
v2: No changes


 .../devicetree/bindings/input/dlg,da7280.txt   | 109 +
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/input/dlg,da7280.txt

diff --git a/Documentation/devicetree/bindings/input/dlg,da7280.txt 
b/Documentation/devicetree/bindings/input/dlg,da7280.txt
new file mode 100644
index 000..e6b719d
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/dlg,da7280.txt
@@ -0,0 +1,109 @@
+Dialog Semiconductor DA7280 Haptics bindings
+
+Required properties:
+- compatible: Should be "dlg,da7280".
+- reg: Specifies the I2C slave address.
+
+- interrupt-parent : Specifies the phandle of the interrupt controller to
+  which the IRQs from DA7280 are delivered to.
+
+- dlg,actuator-type: Set Actuator type. it should be one of:
+  "LRA" - Linear Resonance Actuator type.
+  "ERM-bar" - Bar type Eccentric Rotating Mass.
+  "ERM-coin" - Coin type Eccentric Rotating Mass.
+
+- dlg,const-op-mode: Haptic operation mode for FF_CONSTANT.
+  Possible values:
+   1 - Direct register override(DRO) mode triggered by i2c(default),
+   2 - PWM data source mode controlled by PWM duty,
+- dlg,periodic-op-mode: Haptic operation mode for FF_PERIODIC.
+  Possible values:
+   1 - Register triggered waveform memory(RTWM) mode, the pattern
+   assigned to the PS_SEQ_ID played as much times as PS_SEQ_LOOP,
+   2 - Edge triggered waveform memory(ETWM) mode, external GPI(N)
+   control are required to enable/disable and it needs to keep
+   device enabled by sending magnitude (X > 0),
+   the pattern is assigned to the GPI(N)_SEQUENCE_ID below.
+   The default value is 1 for both of the operation modes.
+   For more details, please see the datasheet.
+
+- dlg,nom-microvolt: Nominal actuator voltage rating.
+  Valid values: 0 - 600.
+- dlg,abs-max-microvolt: Absolute actuator maximum voltage rating.
+  Valid values: 0 - 600.
+- dlg,imax-microamp: Actuator max current rating.
+  Valid values: 0 - 252000.
+  Default: 13.
+- dlg,impd-micro-ohms: the impedance of the actuator in micro ohms.
+  Valid values: 0 - 15.
+
+Optional properties:
+- pwms : phandle to the physical PWM(Pulse Width Modulation) device.
+  PWM properties should be named "pwms". And number of cell is different
+  for each pwm device.
+  (See Documentation/devicetree/bindings/pwm/pwm.txt
+   for further information relating to pwm properties)
+
+- dlg,ps-seq-id: the PS_SEQ_ID(pattern ID in waveform memory inside chip)
+  to play back when RTWM-MODE is enabled.
+  Valid range: 0 - 15.
+- dlg,ps-seq-loop: the PS_SEQ_LOOP, Number of times the pre-stored sequence
+  pointed to by PS_SEQ_ID or GPI(N)_SEQUENCE_ID is repeated.
+  Valid range: 0 - 15.
+- dlg,gpiN-seq-id: the GPI(N)_SEQUENCE_ID, pattern to play
+  when gpi0 is triggered, 'N' must be 0 - 2.
+  Valid range: 0 - 15.
+- dlg,gpiN-mode: the pattern mode which can select either
+  "Single-pattern" or "Multi-pattern", 'N' must be 0 - 2.
+- dlg,gpiN-polarity: gpiN polarity which can be chosen among
+  "Rising-edge", "Falling-edge" and "Both-edge",
+  'N' must be 0 - 2
+  Haptic will work by this edge option in case of ETWM mode.
+
+- dlg,resonant-freq-hz: use in case of LRA.
+  the frequency range: 50 - 300.
+  Default: 205.
+
+- dlg,bemf-sens-enable: Enable for internal loop computations.
+- dlg,freq-track-enable: Enable for resonant frequency tracking.
+- dlg,acc-enable: Enable for active acceleration.
+- dlg,rapid-stop-enable: Enable for rapid stop.
+- dlg,amp-pid-enable: Enable for the amplitude PID.
+- dlg,mem-array: Customized waveform memory(patterns) data downloaded to
+  the device during initialization. This is an array of 100 values(u8).
+
+For further information, see device datasheet.
+
+==
+
+Example:
+
+   haptics: da7280-haptics@4a {
+   compatible = "dlg,da7280";
+   reg = <0x4a>;
+   interrupt-parent = <>;
+   interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
+   dlg,actuator-type = "LRA";
+   dlg,dlg,const-op-mode = <1>;
+   dlg,dlg,periodic-op-mode = <1>;
+   dlg,nom-microvolt = <200>;
+   dlg,abs-max-microvolt = <200>;
+   dlg,imax-microamp = <17>;
+   dlg,resonant-freq-hz = <180>;
+   dlg,impd-micro-ohms = <1050>;
+   dlg,freq-track-enable;
+  

Re: [PATCH -next] powerpc/perf: Fix symbol undeclared warning

2020-09-22 Thread Athira Rajeev



> On 21-Sep-2020, at 4:55 PM, Wang Wensheng  wrote:
> 
> Build kernel with `C=2`:
> arch/powerpc/perf/isa207-common.c:24:18: warning: symbol
> 'isa207_pmu_format_attr' was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:101:5: warning: symbol 'p9_dd21_bl_ev'
> was not declared. Should it be static?
> arch/powerpc/perf/power9-pmu.c:115:5: warning: symbol 'p9_dd22_bl_ev'
> was not declared. Should it be static?

Hi, 

It will be good to include a comment in the commit message saying what is the 
fix here. 
ex, declare p9_dd21_bl_ev/p9_dd22_bl_ev as static variable.

Thanks
Athira
> 
> Signed-off-by: Wang Wensheng 
> ---
> arch/powerpc/perf/isa207-common.c | 2 +-
> arch/powerpc/perf/power9-pmu.c| 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/perf/isa207-common.c 
> b/arch/powerpc/perf/isa207-common.c
> index 964437adec18..85dc860b265b 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop,"config:32-35");
> PMU_FORMAT_ATTR(thresh_start, "config:36-39");
> PMU_FORMAT_ATTR(thresh_cmp,   "config:40-49");
> 
> -struct attribute *isa207_pmu_format_attr[] = {
> +static struct attribute *isa207_pmu_format_attr[] = {
>   _attr_event.attr,
>   _attr_pmcxsel.attr,
>   _attr_mark.attr,
> diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
> index 2a57e93a79dc..4a315fad1f99 100644
> --- a/arch/powerpc/perf/power9-pmu.c
> +++ b/arch/powerpc/perf/power9-pmu.c
> @@ -98,7 +98,7 @@ extern u64 PERF_REG_EXTENDED_MASK;
> /* PowerISA v2.07 format attribute structure*/
> extern struct attribute_group isa207_pmu_format_group;
> 
> -int p9_dd21_bl_ev[] = {
> +static int p9_dd21_bl_ev[] = {
>   PM_MRK_ST_DONE_L2,
>   PM_RADIX_PWC_L1_HIT,
>   PM_FLOP_CMPL,
> @@ -112,7 +112,7 @@ int p9_dd21_bl_ev[] = {
>   PM_DISP_HELD_SYNC_HOLD,
> };
> 
> -int p9_dd22_bl_ev[] = {
> +static int p9_dd22_bl_ev[] = {
>   PM_DTLB_MISS_16G,
>   PM_DERAT_MISS_2M,
>   PM_DTLB_MISS_2M,
> -- 
> 2.25.0
> 



[PATCH v7 0/2] Add Intel LGM soc DMA support

2020-09-22 Thread Amireddy Mallikarjuna reddy
Add DMA controller driver for Lightning Mountain(LGM) family of SoCs.

The main function of the DMA controller is the transfer of data from/to any
DPlus compliant peripheral to/from the memory. A memory to memory copy
capability can also be configured.
This ldma driver is used for configure the device and channnels for data
and control paths.

These controllers provide DMA capabilities for a variety of on-chip
devices such as SSC, HSNAND and GSWIP.

-
Future Plans:
-
LGM SOC also supports Hardware Memory Copy engine.
The role of the HW Memory copy engine is to offload memory copy operations
from the CPU.

Amireddy Mallikarjuna reddy (2):
  dt-bindings: dma: Add bindings for intel LGM SOC
  Add Intel LGM soc DMA support.

 .../devicetree/bindings/dma/intel,ldma.yaml|  135 ++
 drivers/dma/Kconfig|2 +
 drivers/dma/Makefile   |1 +
 drivers/dma/lgm/Kconfig|9 +
 drivers/dma/lgm/Makefile   |2 +
 drivers/dma/lgm/lgm-dma.c  | 1765 
 include/linux/dma/lgm_dma.h|   27 +
 7 files changed, 1941 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/intel,ldma.yaml
 create mode 100644 drivers/dma/lgm/Kconfig
 create mode 100644 drivers/dma/lgm/Makefile
 create mode 100644 drivers/dma/lgm/lgm-dma.c
 create mode 100644 include/linux/dma/lgm_dma.h

-- 
2.11.0



Re: [PATCH V4] drm/dp_mst: Retrieve extended DPCD caps for topology manager

2020-09-22 Thread Koba Ko
Thanks for the review.
Sorry for that I thought the review tag should be appended by myself.
One thing to confirm with you, will you or I push this patch to drm-misc-next ?

Thanks a lot.

On Wed, Sep 23, 2020 at 2:01 AM Lyude Paul  wrote:
>
> One last change I realized we should do is print the name of the AUX adapter
> in question. I don't mind just adding that myself before I push it though so
> you don't need to send a respin.
>
> Going to go push this to drm-misc-next, thanks!
>
> On Tue, 2020-09-22 at 14:53 +0800, Koba Ko wrote:
> > As per DP-1.3, First check DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT.
> > If DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is 1,read the DP_DP13_DPCD_REV to
> > get the faster capability.
> > If DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is 0,read DP_DPCD_REV.
> >
> > Signed-off-by: Koba Ko 
> > Reviewed-by: Lyude Paul 
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index e87542533640..63f8809b9aa4 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -3686,9 +3686,9 @@ int drm_dp_mst_topology_mgr_set_mst(struct
> > drm_dp_mst_topology_mgr *mgr, bool ms
> >   WARN_ON(mgr->mst_primary);
> >
> >   /* get dpcd info */
> > - ret = drm_dp_dpcd_read(mgr->aux, DP_DPCD_REV, mgr->dpcd,
> > DP_RECEIVER_CAP_SIZE);
> > - if (ret != DP_RECEIVER_CAP_SIZE) {
> > - DRM_DEBUG_KMS("failed to read DPCD\n");
> > + ret = drm_dp_read_dpcd_caps(mgr->aux, mgr->dpcd);
> > + if (ret < 0) {
> > + drm_dbg_kms(mgr->dev, "failed to read DPCD, ret %d\n",
> > ret);
> >   goto out_unlock;
> >   }
> >
> --
> Cheers,
> Lyude Paul (she/her)
> Software Engineer at Red Hat
>


[PATCH v3 4/4] drm_dp_cec: add MST support

2020-09-22 Thread Sam McNally
With DP v2.0 errata E5, CEC tunneling can be supported through an MST
topology.

There are some minor differences for CEC tunneling through an MST
topology compared to CEC tunneling to an SST port:
- CEC IRQs are delivered via a sink event notify message
- CEC-related DPCD registers are accessed via remote DPCD reads and
  writes.

This results in the MST implementation diverging from the existing SST
implementation:
- sink event notify messages with CEC_IRQ ID set indicate CEC IRQ rather
  than ESI1
- setting edid and handling CEC IRQs, which can be triggered from
  contexts where locks held preclude HPD handling, are deferred to avoid
  remote DPCD access which would block until HPD handling is performed
  or a timeout

Register and unregister for all MST connectors, ensuring their
drm_dp_aux_cec struct won't be accessed uninitialized.

Reviewed-by: Hans Verkuil 
Signed-off-by: Sam McNally 
---

Changes in v3:
- Fixed whitespace in drm_dp_cec_mst_irq_work()
- Moved drm_dp_cec_mst_set_edid_work() with the other set_edid functions

Changes in v2:
- Used aux->is_remote instead of aux->cec.is_mst, removing the need for
  the previous patch in the series
- Added a defensive check for null edid in the deferred set_edid work,
  in case the edid is no longer valid at that point

 drivers/gpu/drm/drm_dp_cec.c  | 68 +--
 drivers/gpu/drm/drm_dp_mst_topology.c | 24 ++
 include/drm/drm_dp_helper.h   |  4 ++
 3 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c
index 3ab2609f9ec7..1020b2cffdf0 100644
--- a/drivers/gpu/drm/drm_dp_cec.c
+++ b/drivers/gpu/drm/drm_dp_cec.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Unfortunately it turns out that we have a chicken-and-egg situation
@@ -248,6 +249,10 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux)
if (!aux->transfer)
return;
 
+   if (aux->is_remote) {
+   schedule_work(>cec.mst_irq_work);
+   return;
+   }
mutex_lock(>cec.lock);
if (!aux->cec.adap)
goto unlock;
@@ -276,6 +281,23 @@ static bool drm_dp_cec_cap(struct drm_dp_aux *aux, u8 
*cec_cap)
return true;
 }
 
+static void drm_dp_cec_mst_irq_work(struct work_struct *work)
+{
+   struct drm_dp_aux *aux = container_of(work, struct drm_dp_aux,
+ cec.mst_irq_work);
+   struct drm_dp_mst_port *port =
+   container_of(aux, struct drm_dp_mst_port, aux);
+
+   port = drm_dp_mst_topology_get_port_validated(port->mgr, port);
+   if (!port)
+   return;
+   mutex_lock(>cec.lock);
+   if (aux->cec.adap)
+   drm_dp_cec_handle_irq(aux);
+   mutex_unlock(>cec.lock);
+   drm_dp_mst_topology_put_port(port);
+}
+
 /*
  * Called if the HPD was low for more than drm_dp_cec_unregister_delay
  * seconds. This unregisters the CEC adapter.
@@ -297,7 +319,8 @@ static void drm_dp_cec_unregister_work(struct work_struct 
*work)
  * were unchanged and just update the CEC physical address. Otherwise
  * unregister the old CEC adapter and create a new one.
  */
-void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const struct edid *edid)
+static void drm_dp_cec_handle_set_edid(struct drm_dp_aux *aux,
+  const struct edid *edid)
 {
struct drm_connector *connector = aux->cec.connector;
u32 cec_caps = CEC_CAP_DEFAULTS | CEC_CAP_NEEDS_HPD |
@@ -306,10 +329,6 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const 
struct edid *edid)
unsigned int num_las = 1;
u8 cap;
 
-   /* No transfer function was set, so not a DP connector */
-   if (!aux->transfer)
-   return;
-
 #ifndef CONFIG_MEDIA_CEC_RC
/*
 * CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by
@@ -320,6 +339,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const 
struct edid *edid)
 */
cec_caps &= ~CEC_CAP_RC;
 #endif
+   cancel_work_sync(>cec.mst_irq_work);
cancel_delayed_work_sync(>cec.unregister_work);
 
mutex_lock(>cec.lock);
@@ -375,8 +395,40 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const 
struct edid *edid)
 unlock:
mutex_unlock(>cec.lock);
 }
+
+void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const struct edid *edid)
+{
+   /* No transfer function was set, so not a DP connector */
+   if (!aux->transfer)
+   return;
+
+   if (aux->is_remote)
+   schedule_work(>cec.mst_set_edid_work);
+   else
+   drm_dp_cec_handle_set_edid(aux, edid);
+}
 EXPORT_SYMBOL(drm_dp_cec_set_edid);
 
+static void drm_dp_cec_mst_set_edid_work(struct work_struct *work)
+{
+   struct drm_dp_aux *aux =
+   container_of(work, struct drm_dp_aux, cec.mst_set_edid_work);
+   struct drm_dp_mst_port *port =
+   

[PATCH v3 2/4] drm_dp_mst_topology: use correct AUX channel

2020-09-22 Thread Sam McNally
From: Hans Verkuil 

For adapters behind an MST hub use the correct AUX channel.

Signed-off-by: Hans Verkuil 
[sa...@chromium.org: rebased, removing redundant changes]
Signed-off-by: Sam McNally 
---

(no changes since v1)

 drivers/gpu/drm/drm_dp_mst_topology.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 15b6cc39a754..0d753201adbd 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2255,6 +2255,9 @@ drm_dp_mst_topology_unlink_port(struct 
drm_dp_mst_topology_mgr *mgr,
drm_dp_mst_topology_put_port(port);
 }
 
+static ssize_t
+drm_dp_mst_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg);
+
 static struct drm_dp_mst_port *
 drm_dp_mst_add_port(struct drm_device *dev,
struct drm_dp_mst_topology_mgr *mgr,
@@ -2271,9 +2274,13 @@ drm_dp_mst_add_port(struct drm_device *dev,
port->port_num = port_number;
port->mgr = mgr;
port->aux.name = "DPMST";
+   mutex_init(>aux.hw_mutex);
+   mutex_init(>aux.cec.lock);
port->aux.dev = dev->dev;
port->aux.is_remote = true;
 
+   port->aux.transfer = drm_dp_mst_aux_transfer;
+
/* initialize the MST downstream port's AUX crc work queue */
drm_dp_remote_aux_init(>aux);
 
@@ -3503,6 +3510,35 @@ static int drm_dp_send_up_ack_reply(struct 
drm_dp_mst_topology_mgr *mgr,
return 0;
 }
 
+static ssize_t
+drm_dp_mst_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg)
+{
+   struct drm_dp_mst_port *port =
+   container_of(aux, struct drm_dp_mst_port, aux);
+   int ret;
+
+   switch (msg->request & ~DP_AUX_I2C_MOT) {
+   case DP_AUX_NATIVE_WRITE:
+   case DP_AUX_I2C_WRITE:
+   case DP_AUX_I2C_WRITE_STATUS_UPDATE:
+   ret = drm_dp_send_dpcd_write(port->mgr, port, msg->address,
+msg->size, msg->buffer);
+   break;
+
+   case DP_AUX_NATIVE_READ:
+   case DP_AUX_I2C_READ:
+   ret = drm_dp_send_dpcd_read(port->mgr, port, msg->address,
+   msg->size, msg->buffer);
+   break;
+
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   return ret;
+}
+
 static int drm_dp_get_vc_payload_bw(u8 dp_link_bw, u8  dp_link_count)
 {
if (dp_link_bw == 0 || dp_link_count == 0)
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v3 3/4] drm_dp_mst_topology: export two functions

2020-09-22 Thread Sam McNally
From: Hans Verkuil 

These are required for the CEC MST support.

Signed-off-by: Hans Verkuil 
Signed-off-by: Sam McNally 
---

(no changes since v1)

 drivers/gpu/drm/drm_dp_mst_topology.c | 6 ++
 include/drm/drm_dp_mst_helper.h   | 4 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 0d753201adbd..c783a2a1c114 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -62,8 +62,6 @@ struct drm_dp_pending_up_req {
 static bool dump_dp_payload_table(struct drm_dp_mst_topology_mgr *mgr,
  char *buf);
 
-static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
-
 static int drm_dp_dpcd_write_payload(struct drm_dp_mst_topology_mgr *mgr,
 int id,
 struct drm_dp_payload *payload);
@@ -1864,7 +1862,7 @@ static void drm_dp_mst_topology_get_port(struct 
drm_dp_mst_port *port)
  * drm_dp_mst_topology_try_get_port()
  * drm_dp_mst_topology_get_port()
  */
-static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
+void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
 {
topology_ref_history_lock(port->mgr);
 
@@ -1935,7 +1933,7 @@ drm_dp_mst_topology_get_port_validated_locked(struct 
drm_dp_mst_branch *mstb,
return NULL;
 }
 
-static struct drm_dp_mst_port *
+struct drm_dp_mst_port *
 drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
   struct drm_dp_mst_port *port)
 {
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index c7c79e0ced18..d036222e0d64 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -754,6 +754,10 @@ drm_dp_mst_detect_port(struct drm_connector *connector,
   struct drm_dp_mst_topology_mgr *mgr,
   struct drm_dp_mst_port *port);
 
+struct drm_dp_mst_port *drm_dp_mst_topology_get_port_validated
+(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port);
+void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port);
+
 struct edid *drm_dp_mst_get_edid(struct drm_connector *connector, struct 
drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port);
 
 
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH v3 1/4] dp/dp_mst: Add support for sink event notify messages

2020-09-22 Thread Sam McNally
Sink event notify messages are used for MST CEC IRQs. Add parsing
support for sink event notify messages in preparation for handling MST
CEC IRQs.

Signed-off-by: Sam McNally 
---

(no changes since v1)

 drivers/gpu/drm/drm_dp_mst_topology.c | 37 ++-
 include/drm/drm_dp_mst_helper.h   | 14 ++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 17dbed0a9800..15b6cc39a754 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -1027,6 +1027,30 @@ static bool 
drm_dp_sideband_parse_resource_status_notify(struct drm_dp_sideband_
return false;
 }
 
+static bool drm_dp_sideband_parse_sink_event_notify(
+   struct drm_dp_sideband_msg_rx *raw,
+   struct drm_dp_sideband_msg_req_body *msg)
+{
+   int idx = 1;
+
+   msg->u.sink_event.port_number = (raw->msg[idx] & 0xf0) >> 4;
+   idx++;
+   if (idx > raw->curlen)
+   goto fail_len;
+
+   memcpy(msg->u.sink_event.guid, >msg[idx], 16);
+   idx += 16;
+   if (idx > raw->curlen)
+   goto fail_len;
+
+   msg->u.sink_event.event_id = (raw->msg[idx] << 8) | (raw->msg[idx + 1]);
+   idx++;
+   return true;
+fail_len:
+   DRM_DEBUG_KMS("sink event notify parse length fail %d %d\n", idx, 
raw->curlen);
+   return false;
+}
+
 static bool drm_dp_sideband_parse_req(struct drm_dp_sideband_msg_rx *raw,
  struct drm_dp_sideband_msg_req_body *msg)
 {
@@ -1038,6 +1062,8 @@ static bool drm_dp_sideband_parse_req(struct 
drm_dp_sideband_msg_rx *raw,
return drm_dp_sideband_parse_connection_status_notify(raw, msg);
case DP_RESOURCE_STATUS_NOTIFY:
return drm_dp_sideband_parse_resource_status_notify(raw, msg);
+   case DP_SINK_EVENT_NOTIFY:
+   return drm_dp_sideband_parse_sink_event_notify(raw, msg);
default:
DRM_ERROR("Got unknown request 0x%02x (%s)\n", msg->req_type,
  drm_dp_mst_req_type_str(msg->req_type));
@@ -3875,6 +3901,8 @@ drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr 
*mgr,
guid = msg->u.conn_stat.guid;
else if (msg->req_type == DP_RESOURCE_STATUS_NOTIFY)
guid = msg->u.resource_stat.guid;
+   else if (msg->req_type == DP_SINK_EVENT_NOTIFY)
+   guid = msg->u.sink_event.guid;
 
if (guid)
mstb = drm_dp_get_mst_branch_device_by_guid(mgr, guid);
@@ -3948,7 +3976,8 @@ static int drm_dp_mst_handle_up_req(struct 
drm_dp_mst_topology_mgr *mgr)
drm_dp_sideband_parse_req(>up_req_recv, _req->msg);
 
if (up_req->msg.req_type != DP_CONNECTION_STATUS_NOTIFY &&
-   up_req->msg.req_type != DP_RESOURCE_STATUS_NOTIFY) {
+   up_req->msg.req_type != DP_RESOURCE_STATUS_NOTIFY &&
+   up_req->msg.req_type != DP_SINK_EVENT_NOTIFY) {
DRM_DEBUG_KMS("Received unknown up req type, ignoring: %x\n",
  up_req->msg.req_type);
kfree(up_req);
@@ -3976,6 +4005,12 @@ static int drm_dp_mst_handle_up_req(struct 
drm_dp_mst_topology_mgr *mgr)
DRM_DEBUG_KMS("Got RSN: pn: %d avail_pbn %d\n",
  res_stat->port_number,
  res_stat->available_pbn);
+   } else if (up_req->msg.req_type == DP_SINK_EVENT_NOTIFY) {
+   const struct drm_dp_sink_event_notify *sink_event =
+   _req->msg.u.sink_event;
+
+   DRM_DEBUG_KMS("Got SEN: pn: %d event_id %d\n",
+ sink_event->port_number, sink_event->event_id);
}
 
up_req->hdr = mgr->up_req_recv.initial_hdr;
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index 6ae5860d8644..c7c79e0ced18 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -402,6 +402,19 @@ struct drm_dp_resource_status_notify {
u16 available_pbn;
 };
 
+#define DP_SINK_EVENT_PANEL_REPLAY_ACTIVE_FRAME_CRC_ERROR  BIT(0)
+#define DP_SINK_EVENT_PANEL_REPLAY_RFB_STORAGE_ERROR   BIT(1)
+#define DP_SINK_EVENT_DSC_RC_BUFFER_UNDER_RUN  BIT(2)
+#define DP_SINK_EVENT_DSC_RC_BUFFER_OVERFLOW   BIT(3)
+#define DP_SINK_EVENT_DSC_CHUNK_LENGTH_ERROR   BIT(4)
+#define DP_SINK_EVENT_CEC_IRQ_EVENTBIT(5)
+
+struct drm_dp_sink_event_notify {
+   u8 port_number;
+   u8 guid[16];
+   u16 event_id;
+};
+
 struct drm_dp_query_payload_ack_reply {
u8 port_number;
u16 allocated_pbn;
@@ -413,6 +426,7 @@ struct drm_dp_sideband_msg_req_body {
struct drm_dp_connection_status_notify conn_stat;
struct drm_dp_port_number_req port_num;

Re: [PATCH] docs: admin-guide: update kdump documentation due to change of crash URL

2020-09-22 Thread lijiang
在 2020年09月18日 16:09, Lianbo Jiang 写道:
> Since crash utility has moved to github, the original URL is no longer
   ^
  has been moved to github

Because of the above mistake, I'd like to correct it and reply it with the v2.

Thanks.

> available. Let's update it accordingly.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Lianbo Jiang 
> ---
>  Documentation/admin-guide/kdump/kdump.rst | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..75a9dd98e76e 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers 
> kernel option on the
>  dump kernel.
>  
>  You can also use the Crash utility to analyze dump files in Kdump
> -format. Crash is available on Dave Anderson's site at the following URL:
> +format. Crash is available at the following URL:
>  
> -   http://people.redhat.com/~anderson/
> +   https://github.com/crash-utility/crash
> +
> +Crash document can be found at:
> +   https://crash-utility.github.io/
>  
>  Trigger Kdump on WARN()
>  ===
> 



[PATCH -next] virtiofs: Move the assignment to ret outside the loop

2020-09-22 Thread Jing Xiangfeng
There is no need to do the assignment each time. So move the assignment
to ret outside the loop.

Signed-off-by: Jing Xiangfeng 
---
 fs/fuse/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index e394dba08cc4..f18cd7b53ec7 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1259,9 +1259,9 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax 
*fcd)
pr_debug("%s: dax mapped %ld pages. nr_ranges=%ld\n",
__func__, nr_pages, nr_ranges);
 
+   ret = -ENOMEM;
for (i = 0; i < nr_ranges; i++) {
range = kzalloc(sizeof(struct fuse_dax_mapping), GFP_KERNEL);
-   ret = -ENOMEM;
if (!range)
goto out_err;
 
-- 
2.26.0.106.g9fadedd



Re: [PATCH] perf stat: Skip duration_time in setup_system_wide

2020-09-22 Thread Jin, Yao

Hi Arnaldo,

On 9/23/2020 2:02 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Sep 22, 2020 at 02:56:30PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Tue, Sep 22, 2020 at 09:50:04AM +0800, Jin Yao escreveu:

Some metrics (such as DRAM_BW_Use) consists of uncore events and
duration_time. For uncore events, counter->core.system_wide is
true. But for duration_time, counter->core.system_wide is false
so target.system_wide is set to false.

Then 'enable_on_exec' is set in perf_event_attr of uncore event.
Kernel will return error when trying to open the uncore event.

This patch skips the duration_time in setup_system_wide then
target.system_wide will be set to true for the evlist of uncore
events + duration_time.

Before (tested on skylake desktop):

  # perf stat -M DRAM_BW_Use -- sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for 
event (arb/event=0x84,umask=0x1/).
  /bin/dmesg | grep -i perf may provide additional information.

After:

  # perf stat -M DRAM_BW_Use -- sleep 1

   Performance counter stats for 'system wide':

 169  arb/event=0x84,umask=0x1/ # 0.00 DRAM_BW_Use
  40,427  arb/event=0x81,umask=0x1/
   1,000,902,197 ns   duration_time

 1.000902197 seconds time elapsed

Fixes: 648b5af3f3ae ("libperf: Move 'system_wide' from 'struct evsel' to 'struct 
perf_evsel'")


Humm, what makes you think that this cset was the one introducing this
problem? It just moves evsel->system_wide to evsel->core.system_wide.


Apart from that I reproduced the problem and after applying your patch
it seems cured:

   [acme@quaco perf]$ grep 'model name' -m1 /proc/cpuinfo
   model name   : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz

Before (with -v to see details):

   [root@quaco ~]# perf stat -v -M DRAM_BW_Use -- sleep 1
   Using CPUID GenuineIntel-6-8E-A
   metric expr 64 * ( arb@event\=0x81\,umask\=0x1@ + 
arb@event\=0x84\,umask\=0x1@ ) / 100 / duration_time / 1000 for DRAM_BW_Use
   found event duration_time
   found event arb/event=0x84,umask=0x1/
   found event arb/event=0x81,umask=0x1/
   adding {arb/event=0x84,umask=0x1/,arb/event=0x81,umask=0x1/}:W,duration_time
   Control descriptor is not initialized
   Warning:
   arb/event=0x84,umask=0x1/ event is not supported by the kernel.
   Error:
   The sys_perf_event_open() syscall returned with 22 (Invalid argument) for 
event (arb/event=0x84,umask=0x1/).
   /bin/dmesg | grep -i perf may provide additional information.
   
   [root@quaco ~]#


After:

   [root@quaco ~]# perf stat -M DRAM_BW_Use -- sleep 1
   
Performance counter stats for 'system wide':
   
2,806  arb/event=0x84,umask=0x1/ # 0.63 DRAM_BW_Use

   10,001,820  arb/event=0x81,umask=0x1/
1,016,875,686 ns   duration_time
   
  1.016875686 seconds time elapsed
   
   [root@quaco ~]#


So I'm removing that fixes and adding this one, that I think is where
"duration_time" was being considered...

Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide 
monitoring")



Yes, this fixes is much better, thanks.


Also, wouldn't it be better to have the duration_time event with its
evsel->core.system_wide set to true?



That looks to be another solution, should be OK too I think. :)

But anyway we need a test.

Thanks
Jin Yao


- Arnaldo



Re: [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page

2020-09-22 Thread Alex Shi



在 2020/9/22 下午2:13, Hugh Dickins 写道:
> On Mon, 24 Aug 2020, Alex Shi wrote:
> 
>> In the func munlock_vma_page, the page must be PageLocked as well as
>> pages in split_huge_page series funcs. Thus the PageLocked is enough
>> to serialize both funcs.
>>
>> So we could relief the TestClearPageMlocked/hpage_nr_pages which are not
>> necessary under lru lock.
>>
>> As to another munlock func __munlock_pagevec, which no PageLocked
>> protection and should remain lru protecting.
>>
>> Signed-off-by: Alex Shi 
> I made some comments on the mlock+munlock situation last week:
> I won't review this 24/32 and 25/32 now, but will take a look
> at your github tree tomorrow instead.  Perhaps I'll find you have
> already done the fixes, perhaps I'll find you have merged these back
> into earlier patches.  And I won't be reviewing beyond this point:
> this is enough for now, I think.
> 

Yes, these 2 patches was fixed as your suggested on 
https://github.com/alexshi/linux.git lruv19.5 

83f8582dcd5a mm/mlock: remove lru_lock on TestClearPageMlocked
20836d10f0ed mm/mlock: remove __munlock_isolate_lru_page

Thanks!
Alex


[PATCH 5/5] perf test: Add expand cgroup event test

2020-09-22 Thread Namhyung Kim
It'll expand given events for cgroups A, B and C.

  $ ./perf test -v expansion
  69: Event expansion for cgroups  :
  --- start ---
  test child forked, pid 983140
  metric expr 1 / IPC for CPI
  metric expr instructions / cycles for IPC
  found event instructions
  found event cycles
  adding {instructions,cycles}:W
  copying metric event for cgroup 'A': instructions (idx=0)
  copying metric event for cgroup 'B': instructions (idx=0)
  copying metric event for cgroup 'C': instructions (idx=0)
  test child finished with 0
   end 
  Event expansion for cgroups: Ok

Cc: John Garry 
Signed-off-by: Namhyung Kim 
---
 tools/perf/tests/Build   |   1 +
 tools/perf/tests/builtin-test.c  |   4 +
 tools/perf/tests/expand-cgroup.c | 241 +++
 tools/perf/tests/tests.h |   1 +
 4 files changed, 247 insertions(+)
 create mode 100644 tools/perf/tests/expand-cgroup.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 69bea7996f18..4d15bf6041fb 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -61,6 +61,7 @@ perf-y += demangle-java-test.o
 perf-y += pfm.o
 perf-y += parse-metric.o
 perf-y += pe-file-parsing.o
+perf-y += expand-cgroup.o
 
 $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build
$(call rule_mkdir)
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 651b8ea3354a..132bdb3e6c31 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -345,6 +345,10 @@ static struct test generic_tests[] = {
.desc = "PE file support",
.func = test__pe_file_parsing,
},
+   {
+   .desc = "Event expansion for cgroups",
+   .func = test__expand_cgroup_events,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c
new file mode 100644
index ..d5771e4d094f
--- /dev/null
+++ b/tools/perf/tests/expand-cgroup.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "tests.h"
+#include "debug.h"
+#include "evlist.h"
+#include "cgroup.h"
+#include "rblist.h"
+#include "metricgroup.h"
+#include "parse-events.h"
+#include "pmu-events/pmu-events.h"
+#include "pfm.h"
+#include 
+#include 
+#include 
+#include 
+
+static int test_expand_events(struct evlist *evlist,
+ struct rblist *metric_events)
+{
+   int i, ret = TEST_FAIL;
+   int nr_events;
+   bool was_group_event;
+   int nr_members;  /* for the first evsel only */
+   const char cgrp_str[] = "A,B,C";
+   const char *cgrp_name[] = { "A", "B", "C" };
+   int nr_cgrps = ARRAY_SIZE(cgrp_name);
+   char **ev_name;
+   struct evsel *evsel;
+
+   TEST_ASSERT_VAL("evlist is empty", !perf_evlist__empty(evlist));
+
+   nr_events = evlist->core.nr_entries;
+   ev_name = calloc(nr_events, sizeof(*ev_name));
+   if (ev_name == NULL) {
+   pr_debug("memory allocation failure\n");
+   return TEST_FAIL;
+   }
+   i = 0;
+   evlist__for_each_entry(evlist, evsel) {
+   ev_name[i] = strdup(evsel->name);
+   if (ev_name[i] == NULL) {
+   pr_debug("memory allocation failure\n");
+   goto out;
+   }
+   i++;
+   }
+   /* remember grouping info */
+   was_group_event = evsel__is_group_event(evlist__first(evlist));
+   nr_members = evlist__first(evlist)->core.nr_members;
+
+   ret = evlist__expand_cgroup(evlist, cgrp_str, metric_events, false);
+   if (ret < 0) {
+   pr_debug("failed to expand events for cgroups\n");
+   goto out;
+   }
+
+   ret = TEST_FAIL;
+   if (evlist->core.nr_entries != nr_events * nr_cgrps) {
+   pr_debug("event count doesn't match\n");
+   goto out;
+   }
+
+   i = 0;
+   evlist__for_each_entry(evlist, evsel) {
+   if (strcmp(evsel->name, ev_name[i % nr_events])) {
+   pr_debug("event name doesn't match:\n");
+   pr_debug("  evsel[%d]: %s\n  expected: %s\n",
+i, evsel->name, ev_name[i % nr_events]);
+   goto out;
+   }
+   if (strcmp(evsel->cgrp->name, cgrp_name[i / nr_events])) {
+   pr_debug("cgroup name doesn't match:\n");
+   pr_debug("  evsel[%d]: %s\n  expected: %s\n",
+i, evsel->cgrp->name, cgrp_name[i / 
nr_events]);
+   goto out;
+   }
+
+   if ((i % nr_events) == 0) {
+   if (evsel__is_group_event(evsel) != was_group_event) {
+   pr_debug("event group doesn't match: got %s, 
expect %s\n",
+

[PATCH 4/5] perf tools: Allow creation of cgroup without open

2020-09-22 Thread Namhyung Kim
This is a preparation for a test case of expanding events for multiple
cgroups.  Instead of using real system cgroup, the test will use fake
cgroups so it needs a way to have them without a open file descriptor.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-stat.c |  2 +-
 tools/perf/util/cgroup.c  | 19 ---
 tools/perf/util/cgroup.h  |  2 +-
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 66a33d97192d..d9d5de6f3108 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2256,7 +2256,7 @@ int cmd_stat(int argc, const char **argv)
}
 
if (evlist__expand_cgroup(evsel_list, stat_config.cgroup_list,
- _config.metric_events) < 0)
+ _config.metric_events, true) < 0)
goto out;
}
 
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index dcd18ef268a1..d82f4cad762c 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -51,7 +51,7 @@ static struct cgroup *evlist__find_cgroup(struct evlist 
*evlist, const char *str
return NULL;
 }
 
-static struct cgroup *cgroup__new(const char *name)
+static struct cgroup *cgroup__new(const char *name, bool do_open)
 {
struct cgroup *cgroup = zalloc(sizeof(*cgroup));
 
@@ -61,9 +61,14 @@ static struct cgroup *cgroup__new(const char *name)
cgroup->name = strdup(name);
if (!cgroup->name)
goto out_err;
-   cgroup->fd = open_cgroup(name);
-   if (cgroup->fd == -1)
-   goto out_free_name;
+
+   if (do_open) {
+   cgroup->fd = open_cgroup(name);
+   if (cgroup->fd == -1)
+   goto out_free_name;
+   } else {
+   cgroup->fd = -1;
+   }
}
 
return cgroup;
@@ -79,7 +84,7 @@ struct cgroup *evlist__findnew_cgroup(struct evlist *evlist, 
const char *name)
 {
struct cgroup *cgroup = evlist__find_cgroup(evlist, name);
 
-   return cgroup ?: cgroup__new(name);
+   return cgroup ?: cgroup__new(name, true);
 }
 
 static int add_cgroup(struct evlist *evlist, const char *str)
@@ -197,7 +202,7 @@ int parse_cgroups(const struct option *opt, const char *str,
 }
 
 int evlist__expand_cgroup(struct evlist *evlist, const char *str,
- struct rblist *metric_events)
+ struct rblist *metric_events, bool open_cgroup)
 {
struct evlist *orig_list, *tmp_list;
struct evsel *pos, *evsel, *leader;
@@ -235,7 +240,7 @@ int evlist__expand_cgroup(struct evlist *evlist, const char 
*str,
if (!name)
goto out_err;
 
-   cgrp = cgroup__new(name);
+   cgrp = cgroup__new(name, open_cgroup);
free(name);
if (cgrp == NULL)
goto out_err;
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index eea6df8ee373..162906f3412a 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -26,7 +26,7 @@ struct rblist;
 
 struct cgroup *evlist__findnew_cgroup(struct evlist *evlist, const char *name);
 int evlist__expand_cgroup(struct evlist *evlist, const char *cgroups,
- struct rblist *metric_events);
+ struct rblist *metric_events, bool open_cgroup);
 
 void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
-- 
2.28.0.681.g6f77f65b4e-goog



[PATCH 2/5] perf stat: Add --for-each-cgroup option

2020-09-22 Thread Namhyung Kim
The --for-each-cgroup option is a syntax sugar to monitor large number
of cgroups easily.  Current command line requires to list all the
events and cgroups even if users want to monitor same events for each
cgroup.  This patch addresses that usage by copying given events for
each cgroup on user's behalf.

For instance, if they want to monitor 6 events for 200 cgroups each
they should write 1200 event names (with -e) AND 1200 cgroup names
(with -G) on the command line.  But with this change, they can just
specify 6 events and 200 cgroups with a new option.

A simpler example below: It wants to measure 3 events for 2 cgroups
('A' and 'B').  The result is that total 6 events are counted like
below.

  $ ./perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 
1

   Performance counter stats for 'system wide':

  988.18 msec cpu-clock A #0.987 CPUs utilized
   3,153,761,702  cyclesA #3.200 GHz
  (100.00%)
   8,067,769,847  instructions  A #2.57  insn per cycle 
  (100.00%)
  982.71 msec cpu-clock B #0.982 CPUs utilized
   3,136,093,298  cyclesB #3.182 GHz
  (99.99%)
   8,109,619,327  instructions  B #2.58  insn per cycle 
  (99.99%)

 1.001228054 seconds time elapsed

Signed-off-by: Namhyung Kim 
---
 tools/perf/Documentation/perf-stat.txt |  5 ++
 tools/perf/builtin-stat.c  | 27 -
 tools/perf/util/cgroup.c   | 79 ++
 tools/perf/util/cgroup.h   |  1 +
 tools/perf/util/stat.h |  1 +
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 7d18694e592a..bb17c9caec78 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -166,6 +166,11 @@ use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G 
foo'.
 If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, 
this
 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
 
+--for-each-cgroup name::
+Expand event list for each cgroup in "name" (allow multiple cgroups separated
+by comma).  This has same effect that repeating -e option and -G option for
+each event x name.  This option cannot be used with -G/--cgroup option.
+
 -o file::
 --output file::
 Print the output into the designated file.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7f8d756d9408..23abf14b6e16 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1051,6 +1051,17 @@ static int parse_control_option(const struct option *opt,
return evlist__parse_control(str, >ctl_fd, >ctl_fd_ack, 
>ctl_fd_close);
 }
 
+static int parse_stat_cgroups(const struct option *opt,
+ const char *str, int unset)
+{
+   if (stat_config.cgroup_list) {
+   pr_err("--cgroup and --for-each-cgroup cannot be used 
together\n");
+   return -1;
+   }
+
+   return parse_cgroups(opt, str, unset);
+}
+
 static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", _run,
"hardware transaction statistics"),
@@ -1094,7 +1105,9 @@ static struct option stat_options[] = {
OPT_STRING('x', "field-separator", _config.csv_sep, "separator",
   "print counts with custom separator"),
OPT_CALLBACK('G', "cgroup", _list, "name",
-"monitor event in cgroup name only", parse_cgroups),
+"monitor event in cgroup name only", parse_stat_cgroups),
+   OPT_STRING(0, "for-each-cgroup", _config.cgroup_list, "name",
+   "expand events for each cgroup"),
OPT_STRING('o', "output", _name, "file", "output file name"),
OPT_BOOLEAN(0, "append", _file, "append to the output file"),
OPT_INTEGER(0, "log-fd", _fd,
@@ -2234,6 +2247,18 @@ int cmd_stat(int argc, const char **argv)
if (add_default_attributes())
goto out;
 
+   if (stat_config.cgroup_list) {
+   if (nr_cgroups > 0) {
+   pr_err("--cgroup and --for-each-cgroup cannot be used 
together\n");
+   parse_options_usage(stat_usage, stat_options, "G", 1);
+   parse_options_usage(NULL, stat_options, 
"for-each-cgroup", 0);
+   goto out;
+   }
+
+   if (evlist__expand_cgroup(evsel_list, stat_config.cgroup_list) 
< 0)
+   goto out;
+   }
+
target__validate();
 
if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 050dea9f1e88..8b6a4fa49082 100644
--- 

[PATCH 1/5] perf evsel: Add evsel__clone() function

2020-09-22 Thread Namhyung Kim
The evsel__clone() is to create an exactly same evsel from same
attributes.  The function assumes the given evsel is not configured
yet so it cares fields set during event parsing.  Those fields are now
moved together as Jiri suggested.  Note that metric events will be
handled by later patch.

It will be used by perf stat to generate separate events for each
cgroup.

Signed-off-by: Namhyung Kim 
---
 tools/perf/util/evsel.c | 104 
 tools/perf/util/evsel.h |  93 ---
 2 files changed, 158 insertions(+), 39 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index fd865002cbbd..c63dd9f7e9fe 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -331,6 +331,110 @@ struct evsel *evsel__new_cycles(bool precise)
goto out;
 }
 
+static int evsel__copy_config_terms(struct evsel *dst, struct evsel *src)
+{
+   struct evsel_config_term *pos, *tmp;
+
+   list_for_each_entry(pos, >config_terms, list) {
+   tmp = malloc(sizeof(*tmp));
+   if (tmp == NULL)
+   return -ENOMEM;
+
+   *tmp = *pos;
+   if (tmp->free_str) {
+   tmp->val.str = strdup(pos->val.str);
+   if (tmp->val.str == NULL) {
+   free(tmp);
+   return -ENOMEM;
+   }
+   }
+   list_add_tail(>list, >config_terms);
+   }
+   return 0;
+}
+
+/**
+ * evsel__clone - create a new evsel copied from @orig
+ * @orig: original evsel
+ *
+ * The assumption is that @orig is not configured nor opened yet.
+ * So we only care about the attributes that can be set while it's parsed.
+ */
+struct evsel *evsel__clone(struct evsel *orig)
+{
+   struct evsel *evsel;
+
+   BUG_ON(orig->core.fd);
+   BUG_ON(orig->counts);
+   BUG_ON(orig->priv);
+   BUG_ON(orig->per_pkg_mask);
+
+   /* cannot handle BPF objects for now */
+   if (orig->bpf_obj)
+   return NULL;
+
+   evsel = evsel__new(>core.attr);
+   if (evsel == NULL)
+   return NULL;
+
+   evsel->core.cpus = perf_cpu_map__get(orig->core.cpus);
+   evsel->core.own_cpus = perf_cpu_map__get(orig->core.own_cpus);
+   evsel->core.threads = perf_thread_map__get(orig->core.threads);
+   evsel->core.nr_members = orig->core.nr_members;
+   evsel->core.system_wide = orig->core.system_wide;
+
+   if (orig->name) {
+   evsel->name = strdup(orig->name);
+   if (evsel->name == NULL)
+   goto out_err;
+   }
+   if (orig->group_name) {
+   evsel->group_name = strdup(orig->group_name);
+   if (evsel->group_name == NULL)
+   goto out_err;
+   }
+   if (orig->pmu_name) {
+   evsel->pmu_name = strdup(orig->pmu_name);
+   if (evsel->pmu_name == NULL)
+   goto out_err;
+   }
+   if (orig->filter) {
+   evsel->filter = strdup(orig->filter);
+   if (evsel->filter == NULL)
+   goto out_err;
+   }
+   evsel->cgrp = cgroup__get(orig->cgrp);
+   evsel->tp_format = orig->tp_format;
+   evsel->handler = orig->handler;
+   evsel->leader = orig->leader;
+
+   evsel->max_events = orig->max_events;
+   evsel->tool_event = orig->tool_event;
+   evsel->unit = orig->unit;
+   evsel->scale = orig->scale;
+   evsel->snapshot = orig->snapshot;
+   evsel->per_pkg = orig->per_pkg;
+   evsel->percore = orig->percore;
+   evsel->precise_max = orig->precise_max;
+   evsel->use_uncore_alias = orig->use_uncore_alias;
+   evsel->is_libpfm_event = orig->is_libpfm_event;
+
+   evsel->exclude_GH = orig->exclude_GH;
+   evsel->sample_read = orig->sample_read;
+   evsel->auto_merge_stats = orig->auto_merge_stats;
+   evsel->collect_stat = orig->collect_stat;
+   evsel->weak_group = orig->weak_group;
+
+   if (evsel__copy_config_terms(evsel, orig) < 0)
+   goto out_err;
+
+   return evsel;
+
+out_err:
+   evsel__delete(evsel);
+   return NULL;
+}
+
 /*
  * Returns pointer with encoded error via  interface.
  */
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 35e3f6d66085..79a860d8e3ee 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -42,65 +42,79 @@ enum perf_tool_event {
  */
 struct evsel {
struct perf_evsel   core;
-   struct evlist   *evlist;
-   char*filter;
+   struct evlist   *evlist;
+   off_t   id_offset;
+   int idx;
+   int id_pos;
+   int is_pos;
+   unsigned intsample_size;
+
+   /*
+* These fields can be set in the parse-events 

[PATCH 3/5] perf tools: Copy metric events properly when expand cgroups

2020-09-22 Thread Namhyung Kim
The metricgroup__copy_metric_events() is to handle metrics events when
expanding event for cgroups.  As the metric events keep pointers to
evsel, it should be refreshed when events are cloned during the
operation.

The perf_stat__collect_metric_expr() is also called in case an event
has a metric directly.

During the copy, it references evsel by index as the evlist now has
cloned evsels for the given cgroup.

Cc: John Garry 
Cc: Kajol Jain 
Cc: Ian Rogers 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-stat.c |  3 +-
 tools/perf/util/cgroup.c  | 15 ++-
 tools/perf/util/cgroup.h  |  4 +-
 tools/perf/util/evlist.c  | 11 +
 tools/perf/util/evlist.h  |  1 +
 tools/perf/util/metricgroup.c | 85 +++
 tools/perf/util/metricgroup.h |  6 +++
 7 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 23abf14b6e16..66a33d97192d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2255,7 +2255,8 @@ int cmd_stat(int argc, const char **argv)
goto out;
}
 
-   if (evlist__expand_cgroup(evsel_list, stat_config.cgroup_list) 
< 0)
+   if (evlist__expand_cgroup(evsel_list, stat_config.cgroup_list,
+ _config.metric_events) < 0)
goto out;
}
 
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
index 8b6a4fa49082..dcd18ef268a1 100644
--- a/tools/perf/util/cgroup.c
+++ b/tools/perf/util/cgroup.c
@@ -3,6 +3,9 @@
 #include "evsel.h"
 #include "cgroup.h"
 #include "evlist.h"
+#include "rblist.h"
+#include "metricgroup.h"
+#include "stat.h"
 #include 
 #include 
 #include 
@@ -193,10 +196,12 @@ int parse_cgroups(const struct option *opt, const char 
*str,
return 0;
 }
 
-int evlist__expand_cgroup(struct evlist *evlist, const char *str)
+int evlist__expand_cgroup(struct evlist *evlist, const char *str,
+ struct rblist *metric_events)
 {
struct evlist *orig_list, *tmp_list;
struct evsel *pos, *evsel, *leader;
+   struct rblist orig_metric_events;
struct cgroup *cgrp = NULL;
const char *p, *e, *eos = str + strlen(str);
int ret = -1;
@@ -216,6 +221,8 @@ int evlist__expand_cgroup(struct evlist *evlist, const char 
*str)
/* save original events and init evlist */
perf_evlist__splice_list_tail(orig_list, >core.entries);
evlist->core.nr_entries = 0;
+   orig_metric_events = *metric_events;
+   rblist__init(metric_events);
 
for (;;) {
p = strchr(str, ',');
@@ -255,6 +262,11 @@ int evlist__expand_cgroup(struct evlist *evlist, const 
char *str)
cgroup__put(cgrp);
nr_cgroups++;
 
+   perf_stat__collect_metric_expr(tmp_list);
+   if (metricgroup__copy_metric_events(tmp_list, cgrp, 
metric_events,
+   _metric_events) < 0)
+   break;
+
perf_evlist__splice_list_tail(evlist, _list->core.entries);
tmp_list->core.nr_entries = 0;
 
@@ -268,6 +280,7 @@ int evlist__expand_cgroup(struct evlist *evlist, const char 
*str)
 out_err:
evlist__delete(orig_list);
evlist__delete(tmp_list);
+   rblist__exit(_metric_events);
 
return ret;
 }
diff --git a/tools/perf/util/cgroup.h b/tools/perf/util/cgroup.h
index 32893018296f..eea6df8ee373 100644
--- a/tools/perf/util/cgroup.h
+++ b/tools/perf/util/cgroup.h
@@ -22,9 +22,11 @@ struct cgroup *cgroup__get(struct cgroup *cgroup);
 void cgroup__put(struct cgroup *cgroup);
 
 struct evlist;
+struct rblist;
 
 struct cgroup *evlist__findnew_cgroup(struct evlist *evlist, const char *name);
-int evlist__expand_cgroup(struct evlist *evlist, const char *cgroups);
+int evlist__expand_cgroup(struct evlist *evlist, const char *cgroups,
+ struct rblist *metric_events);
 
 void evlist__set_default_cgroup(struct evlist *evlist, struct cgroup *cgroup);
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ee7b576d3b12..aae79b2b5041 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1964,3 +1964,14 @@ int evlist__ctlfd_process(struct evlist *evlist, enum 
evlist_ctl_cmd *cmd)
 
return err;
 }
+
+struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
+{
+   struct evsel *evsel;
+
+   evlist__for_each_entry(evlist, evsel) {
+   if (evsel->idx == idx)
+   return evsel;
+   }
+   return NULL;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index bc38a53f6a1a..e1a450322bc5 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -386,4 +386,5 @@ int evlist__ctlfd_ack(struct evlist *evlist);
 #define EVLIST_ENABLED_MSG "Events enabled\n"
 #define EVLIST_DISABLED_MSG 

[PATCHSET v4 0/5] perf stat: Expand events for each cgroup

2020-09-22 Thread Namhyung Kim
Hello,

When we profile cgroup events with perf stat, it's very annoying to
specify events and cgroups on the command line as it requires the
mapping between events and cgroups.  (Note that perf record can use
cgroup sampling but it's not usable for perf stat).

I guess most cases we just want to use a same set of events (N) for
all cgroups (M), but we need to specify NxM events and NxM cgroups.
This is not good especially when profiling large number of cgroups:
say M=200.

So I added --for-each-cgroup option to make it easy for that case.  It
will create NxM events from N events and M cgroups.  One more upside
is that it can handle metrics too.

For example, the following example measures IPC metric for 3 cgroups

  $ cat perf-expand-cgrp.sh
  #!/bin/sh
  
  METRIC=${1:-IPC}
  CGROUP_DIR=/sys/fs/cgroup/perf_event
  
  sudo mkdir $CGROUP_DIR/A $CGROUP_DIR/B $CGROUP_DIR/C
  
  # add backgroupd workload for each cgroup
  echo $$ | sudo tee $CGROUP_DIR/A/cgroup.procs > /dev/null
  yes > /dev/null &
  echo $$ | sudo tee $CGROUP_DIR/B/cgroup.procs > /dev/null
  yes > /dev/null &
  echo $$ | sudo tee $CGROUP_DIR/C/cgroup.procs > /dev/null
  yes > /dev/null &

  # run 'perf stat' in the root cgroup
  echo $$ | sudo tee $CGROUP_DIR/cgroup.procs > /dev/null
  perf stat -a -M $METRIC --for-each-cgroup A,B,C sleep 1
  
  kill %1 %2 %3
  sudo rmdir $CGROUP_DIR/A $CGROUP_DIR/B $CGROUP_DIR/C

  
  $ ./perf-expand-cgrp.sh IPC
  
   Performance counter stats for 'system wide':
  
  11,284,850,010  inst_retired.any  A # 2.71 IPC

   4,157,915,982  cpu_clk_unhalted.thread   A   

  11,342,188,640  inst_retired.any  B # 2.72 IPC

   4,173,014,732  cpu_clk_unhalted.thread   B   

  11,135,863,604  inst_retired.any  C # 2.67 IPC

   4,171,375,184  cpu_clk_unhalted.thread   C   

  
 1.011948803 seconds time elapsed


* Changes from v3:
 - rename to evlist__find_evsel  (Jiri)
 - add documentation  (Arnaldo)
 - check -G option together with --for-each-cgroup

* Changes from v2:
 - put relevant fields in evsel together  (Jiri)
 - add various error checks  (Jiri)
 - split cgroup open patch  (Jiri)

* Changes from v1:
 - rename the option to --for-each-cgroup  (Jiri)
 - copy evsel fields explicitly  (Jiri)
 - add libpfm4 test  (Ian)


The code is available at 'perf/cgroup-multiply-v4' branch on

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks
Namhyung


Namhyung Kim (5):
  perf evsel: Add evsel__clone() function
  perf stat: Add --for-each-cgroup option
  perf tools: Copy metric events properly when expand cgroups
  perf tools: Allow creation of cgroup without open
  perf test: Add expand cgroup event test

 tools/perf/Documentation/perf-stat.txt |   5 +
 tools/perf/builtin-stat.c  |  28 ++-
 tools/perf/tests/Build |   1 +
 tools/perf/tests/builtin-test.c|   4 +
 tools/perf/tests/expand-cgroup.c   | 241 +
 tools/perf/tests/tests.h   |   1 +
 tools/perf/util/cgroup.c   | 107 ++-
 tools/perf/util/cgroup.h   |   3 +
 tools/perf/util/evlist.c   |  11 ++
 tools/perf/util/evlist.h   |   1 +
 tools/perf/util/evsel.c| 104 +++
 tools/perf/util/evsel.h|  93 ++
 tools/perf/util/metricgroup.c  |  85 +
 tools/perf/util/metricgroup.h  |   6 +
 tools/perf/util/stat.h |   1 +
 15 files changed, 646 insertions(+), 45 deletions(-)
 create mode 100644 tools/perf/tests/expand-cgroup.c

-- 
2.28.0.681.g6f77f65b4e-goog



Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-22 Thread David Ahern
On 9/22/20 7:52 AM, Michael Jeanson wrote:
>>>
>>> the test setup is bad. You have r1 dropping the MTU in VRF red, but not
>>> telling VRF red how to send back the ICMP. e.g., for IPv4 add:
>>>
>>>ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue
>>>
>>> do the same for v6.
>>>
>>> Also, I do not see a reason for r2; I suggest dropping it. What you are
>>> testing is icmp crossing VRF with route leaking, so there should not be
>>> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and
>>> the return via r2).
> 
> The objective of the test was to replicate a clients environment where
> packets are crossing from a VRF which has a route back to the source to
> one which doesn't while reaching a ttl of 0. If the route lookup for the
> icmp error is done on the interface in the first VRF, it can be routed to
> the source but not on the interface in the second VRF which is the
> current behaviour for icmp errors generated while crossing between VRFs.
> 
> There may be a better test case that doesn't involve asymmetric routing
> to test this but it's the only way I found to replicate this.
> 

It should work without asymmetric routing; adding the return route to
the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It
should work for TTL as well.

Adding a second pass on the tests with the return through r2 is fine,
but add a first pass for the more typical case.


Re: [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock

2020-09-22 Thread Alex Shi



在 2020/9/22 下午1:53, Hugh Dickins 写道:
>> Now pgdat.lru_lock was replaced by lruvec lock. It's not used anymore.
>>
>> Signed-off-by: Alex Shi 
>> Reviewed-by: Alexander Duyck 
> I don't take pleasure in spoiling your celebrations and ceremonies,
> but I strongly agree with AlexD that this should simply be merged
> into the big one, 20/32.  That can be ceremony enough.
> 

folded into that patch.
Thanks!


Re: [PATCH 1/2] dt-bindings: crypto: update ccree optional params

2020-09-22 Thread Rob Herring
On Wed, Sep 16, 2020 at 10:19:49AM +0300, Gilad Ben-Yossef wrote:
> Document ccree driver supporting new optional parameters allowing to
> customize the DMA transactions cache parameters and ACE bus sharability
> properties.
> 
> Signed-off-by: Gilad Ben-Yossef 
> ---
>  Documentation/devicetree/bindings/crypto/arm-cryptocell.txt | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/crypto/arm-cryptocell.txt 
> b/Documentation/devicetree/bindings/crypto/arm-cryptocell.txt
> index 6130e6eb4af8..1a1603e457a8 100644
> --- a/Documentation/devicetree/bindings/crypto/arm-cryptocell.txt
> +++ b/Documentation/devicetree/bindings/crypto/arm-cryptocell.txt
> @@ -13,6 +13,10 @@ Required properties:
>  Optional properties:
>  - clocks: Reference to the crypto engine clock.
>  - dma-coherent: Present if dma operations are coherent.
> +- awcache: Set write transactions cache attributes
> +- arcache: Set read transactions cache attributes

dma-coherent already implies these are 011x, 101x or 111x. In my limited 
experience configuring these (Calxeda SATA and ethernet), writeback, 
write-allocate was pretty much always optimal. 

> +- awdomain: Set write transactions ACE sharability domain (712, 703, 713 
> only)
> +- ardomain: Set read transactions ACE sharability domain (712, 703, 713 only)

This probably needs something common. We may need something for Mali, 
too. I don't think different settings for read and write makes much 
sense nor does anything beyond IS or OS. 

These could also just be implied by the compatible string (and requiring 
an SoC specific one).

Rob


Re: [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru

2020-09-22 Thread Alex Shi



在 2020/9/22 下午1:44, Hugh Dickins 写道:
> On Mon, 24 Aug 2020, Alex Shi wrote:
> 
>> From: Hugh Dickins 
>>
>> Use the relock function to replace relocking action. And try to save few
>> lock times.
>>
>> Signed-off-by: Hugh Dickins 
>> Signed-off-by: Alex Shi 
>> Reviewed-by: Alexander Duyck 
> NAK. Who wrote this rubbish? Oh, did I? Maybe something you extracted
> from my tarball. No, we don't need any of this now, as explained when
> going through 20/32.
> 

removed in lruv19.5

Thanks!


  1   2   3   4   5   6   7   8   9   10   >