date:20200719

Re: [PATCH for v5.9] ARM: dts: efm32: Replace HTTP links with HTTPS ones

2020-07-19 Thread Uwe Kleine-König

Hello,

On Sun, Jul 19, 2020 at 11:59:58AM +0200, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.

I would have described this in prose instead. (Something like: Replace
http: URL scheme with https:. As of today the old and new URLs result in
the same content to be accessed.)

Other than that:

Acked-by: Uwe Kleine-König 

> diff --git a/arch/arm/mach-efm32/Makefile.boot 
> b/arch/arm/mach-efm32/Makefile.boot
> index cec195d4fcba..5dde7328a7a9 100644
> --- a/arch/arm/mach-efm32/Makefile.boot
> +++ b/arch/arm/mach-efm32/Makefile.boot
> @@ -1,4 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  # Empty file waiting for deletion once Makefile.boot isn't needed any more.
>  # Patch waits for application at
> -# http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7889/1 .
> +# https://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7889/1 .

:-|

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary

2020-07-19 Thread Jordan Niethe

On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria
 wrote:
>
> Power10 has removed 512 bytes boundary from match criteria. i.e. The watch
> range can cross 512 bytes boundary.
It looks like this change is not mentioned in ISA v3.1 Book III 9.4
Data Address Watchpoint. It could be useful to mention that in the
commit message.
Also I wonder if could add a test for this to the ptrace-hwbreak selftest?

>
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/kernel/hw_breakpoint.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
> b/arch/powerpc/kernel/hw_breakpoint.c
> index c55e67bab271..1f4a1efa0074 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -418,8 +418,9 @@ static int hw_breakpoint_validate_len(struct 
> arch_hw_breakpoint *hw)
>
> if (dawr_enabled()) {
> max_len = DAWR_MAX_LEN;
> -   /* DAWR region can't cross 512 bytes boundary */
> -   if (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 
> 1, SZ_512))
> +   /* DAWR region can't cross 512 bytes boundary on p10 
> predecessors */
> +   if (!cpu_has_feature(CPU_FTR_ARCH_31) &&
> +   (ALIGN_DOWN(start_addr, SZ_512) != ALIGN_DOWN(end_addr - 
> 1, SZ_512)))
> return -EINVAL;
> } else if (IS_ENABLED(CONFIG_PPC_8xx)) {
> /* 8xx can setup a range without limitation */
> --
> 2.26.2
>

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Oleg Nesterov

On 07/20, Jiri Slaby wrote:
>
> On 18. 07. 20, 19:14, Oleg Nesterov wrote:
> >
> > This is already wrong. But
> >
> > Where does this __might_sleep() come from ??? I ses no blocking calls
> > in ptrace_stop(). Not to mention it is called with ->siglock held and
> > right after this lock is dropped we take tasklist_lock.
>
> Decoded stacktrace:
>
> > ptrace_stop (include/linux/freezer.h:57 include/linux/freezer.h:67 
> > include/linux/freezer.h:128 include/linux/freezer.h:173 
> > kernel/signal.c:2217)
> > ptrace_do_notify (kernel/signal.c:2272)
> > ptrace_notify (arch/x86/include/asm/paravirt.h:656 
> > arch/x86/include/asm/qspinlock.h:55 include/linux/spinlock.h:211 
> > include/linux/spinlock_api_smp.h:167 include/linux/spinlock.h:403 
> > kernel/signal.c:2282)
> > syscall_trace_enter (include/linux/tracehook.h:73 
> > include/linux/tracehook.h:104 arch/x86/entry/common.c:159)
> > do_syscall_64 (arch/x86/entry/common.c:380)
> > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:124)
>
> It is try_to_freeze_unsafe in try_to_freeze in freezable_schedule in
> ptrace_stop.

Aha, thanks a lot!

try_to_freeze_unsafe() is called after schedule() which must return with
->state = RUNNING, so this matches another WARN_ON(current->state) added
by debugging patch after freezable_schedule().

Somehow I decided __might_sleep() was called before read_unlock/schedule.

> >
> >   How this connects to the debugging patch I sent? Did you see this 
> > warning
> >   without that patch?
>
> I suppose this made it appear:
> +CONFIG_PREEMPT_COUNT=y
> -# CONFIG_DEBUG_ATOMIC_SLEEP is not set
> +CONFIG_DEBUG_ATOMIC_SLEEP=y

I see,

> Please see my other e-mail, all this is with dbfb089d360b applied. Maybe
> it makes more sense now?

Yes. Thanks Jiri!

Oleg.

Re: [PATCH v6 2/6] mm/vmscan: protect the workingset on anonymous LRU

2020-07-19 Thread Joonsoo Kim

2020년 7월 17일 (금) 오후 10:59, Johannes Weiner 님이 작성:
>
> On Wed, Jun 17, 2020 at 02:26:19PM +0900, js1...@gmail.com wrote:
> > From: Joonsoo Kim 
> >
> > In current implementation, newly created or swap-in anonymous page
> > is started on active list. Growing active list results in rebalancing
> > active/inactive list so old pages on active list are demoted to inactive
> > list. Hence, the page on active list isn't protected at all.
> >
> > Following is an example of this situation.
> >
> > Assume that 50 hot pages on active list. Numbers denote the number of
> > pages on active/inactive list (active | inactive).
> >
> > 1. 50 hot pages on active list
> > 50(h) | 0
> >
> > 2. workload: 50 newly created (used-once) pages
> > 50(uo) | 50(h)
> >
> > 3. workload: another 50 newly created (used-once) pages
> > 50(uo) | 50(uo), swap-out 50(h)
> >
> > This patch tries to fix this issue.
> > Like as file LRU, newly created or swap-in anonymous pages will be
> > inserted to the inactive list. They are promoted to active list if
> > enough reference happens. This simple modification changes the above
> > example as following.
> >
> > 1. 50 hot pages on active list
> > 50(h) | 0
> >
> > 2. workload: 50 newly created (used-once) pages
> > 50(h) | 50(uo)
> >
> > 3. workload: another 50 newly created (used-once) pages
> > 50(h) | 50(uo), swap-out 50(uo)
> >
> > As you can see, hot pages on active list would be protected.
> >
> > Note that, this implementation has a drawback that the page cannot
> > be promoted and will be swapped-out if re-access interval is greater than
> > the size of inactive list but less than the size of total(active+inactive).
> > To solve this potential issue, following patch will apply workingset
> > detection that is applied to file LRU some day before.
> >
> > v6: Before this patch, all anon pages (inactive + active) are considered
> > as workingset. However, with this patch, only active pages are considered
> > as workingset. So, file refault formula which uses the number of all
> > anon pages is changed to use only the number of active anon pages.
>
> I can see that also from the code, but it doesn't explain why.
>
> And I'm not sure this is correct. I can see two problems with it.
>
> After your patch series, there is still one difference between anon
> and file: cache trim mode. If the "use-once" anon dominate most of
> memory and you have a small set of heavily thrashing files, it would
> not get recognized. File refaults *have* to compare their distance to
> the *entire* anon set, or we could get trapped in cache trimming mode
> even as file pages with access frequencies <= RAM are thrashing.
>
> On the anon side, there is no cache trimming mode. But even if we're
> not in cache trimming mode and active file is already being reclaimed,
> we have to recognize thrashing on the anon side when reuse frequencies
> are within available RAM. Otherwise we treat an inactive file that is
> not being reused as having the same value as an anon page that is
> being reused. And then we may reclaim file and anon at the same rate
> even as anon is thrashing and file is not. That's not right.
>
> We need to activate everything with a reuse frequency <= RAM. Reuse
> frequency is refault distance plus size of the inactive list the page
> was on. This means anon distances should be compared to active anon +
> inactive file + active file, and file distances should be compared to
> active file + inactive_anon + active anon.

You're right. Maybe, I'm confused about something at that time. I will change
it as you suggested.

Thanks.

Re: [patch V3 01/13] entry: Provide generic syscall entry functionality

2020-07-19 Thread Thomas Gleixner

Andy Lutomirski  writes:
>> On Jul 19, 2020, at 3:17 AM, Thomas Gleixner  wrote:
>> 
>> Andy Lutomirski  writes:
 On Sat, Jul 18, 2020 at 7:16 AM Thomas Gleixner  wrote:
 Andy Lutomirski  writes:
> FWIW, TIF_USER_RETURN_NOTIFY is a bit of an odd duck: it's an
> entry/exit word *and* a context switch word.  The latter is because
> it's logically a per-cpu flag, not a per-task flag, and the context
> switch code moves it around so it's always set on the running task.
 
 Gah, I missed the context switch thing of that. That stuff is hideous.
>>> 
>>> It's also delightful because anything that screws up that dance (such
>>> as failure to do the exit-to-usermode path exactly right) likely
>>> results in an insta-root-hole.  If we fail to run user return
>>> notifiers, we can run user code with incorrect syscall MSRs, etc.
>> 
>> Looking at it deeper, having that thing in the loop is a pointless
>> exercise. This really wants to be done _after_ the loop.
>> 
> As long as we’re confident that nothing after the loop can set the flag again.

Yes, because that's the direct way off to user space.

Thanks,

tglx

Re: [PATCHv3 00/19] perf metric: Add support to reuse metric

2020-07-19 Thread kajoljain




On 7/19/20 11:43 PM, Jiri Olsa wrote:
> hi,
> this patchset is adding the support to reused metric in
> another metric.
> 
> For example, to define IPC by using CPI with change like:
> 
>  {
>  "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
> -"MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
> +"MetricExpr": "1/CPI",
>  "MetricGroup": "TopDownL1",
>  "MetricName": "IPC"
>  },
> 
> I won't be able to find all the possible places we could
> use this at, so I wonder you guys (who was asking for this)
> would try it and come up with comments if there's something
> missing or we could already use it at some places.
> 
> It's based on Arnaldo's tmp.perf/core.
> 
> v3 changes:
>   - added some acks
>   - some patches got merged
>   - added missing zalloc include [John Garry]
>   - added ids array outside the egroup object [Ian]
>   - removed wrong m->has_constraint assignment [Ian]
>   - renamed 'list' to 'metric_list' [Ian]
>   - fixed group metric and added test for it [Paul A. Clarke]
>   - fixed memory leak [Arnaldo]
>   - using lowercase keys for metrics in hashmap, because jevents
> converts metric_expr to lowercase
> 
> Also available in here:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/metric
> 

Hi Jiri,
   I am trying to review these patches and also test it in power box. I am 
testing your `perf/metric` branch.
With your current patches,some of hv-24x7 events not giving appropriate result
while doing "--metric-only" command. I can't see corresponding output for all 
chips.

This is output on power9 machine:

Without your patches on upstream kernel:

command# ./perf stat --metric-only -M PowerBUS_Frequency -I 1000 -C 0
#   time GHz  PowerBUS_Frequency_0 GHz  PowerBUS_Frequency_1 
 1.738772.02.0 
 2.0002405512.01.9 
^C 2.4525905320.90.9 

 Performance counter stats for 'CPU(s) 0':

   4.84.8 

   2.452654834 seconds time elapsed

With your patches on perf/metric branch:

command# ./perf stat --metric-only -M PowerBUS_Frequency -I 1000 -C 0
assertion failed at util/metricgroup.c:709
#   time GHz  PowerBUS_Frequency_0 
 1.738752.0 
 2.0003807062.0 
^C 2.6589621821.3 

 Performance counter stats for 'CPU(s) 0':

   5.2 

Please let me know, if I am missing something. 

Thanks,
Kajol Jain

> thanks,
> jirka
> 
> 
> ---
> Jiri Olsa (19):
>   perf metric: Fix memory leak in expr__add_id function
>   perf metric: Add expr__add_id function
>   perf metric: Change expr__get_id to return struct expr_id_data
>   perf metric: Add expr__del_id function
>   perf metric: Add macros for iterating map events
>   perf metric: Add add_metric function
>   perf metric: Rename __metricgroup__add_metric to __add_metric
>   perf metric: Collect referenced metrics in struct metric_ref_node
>   perf metric: Collect referenced metrics in struct metric_expr
>   perf metric: Add referenced metrics to hash data
>   perf metric: Compute referenced metrics
>   perf metric: Add events for the current list
>   perf metric: Add cache_miss_cycles to metric parse test
>   perf metric: Add DCache_L2 to metric parse test
>   perf metric: Add recursion check when processing nested metrics
>   perf metric: Make compute_single function more precise
>   perf metric: Add metric group test
>   perf metric: Rename struct egroup to metric
>   perf metric: Rename group_list to metric_list
> 
>  tools/perf/tests/parse-metric.c | 206 
> +
>  tools/perf/util/expr.c  | 143 -
>  tools/perf/util/expr.h  |  30 +++-
>  tools/perf/util/expr.y  |  16 +++--
>  tools/perf/util/metricgroup.c   | 466 
> +++
>  tools/perf/util/metricgroup.h   |   6 ++
>  tools/perf/util/stat-shadow.c   |  20 --
>  7 files changed, 751 insertions(+), 136 deletions(-)
>

Re: [PATCH v5 4/4] printk: use the lockless ringbuffer

2020-07-19 Thread Marco Elver

On Sun, Jul 19, 2020 at 12:43PM +0900, Sergey Senozhatsky wrote:
> Hi Marco,
> 
> On (20/07/18 14:10), Marco Elver wrote:
> > 
> > It seems this causes a regression observed at least with newline-only
> > printks. I noticed this during -next testing because various debugging
> > tools (K*SAN, lockdep, etc.) use e.g. pr_{err,warn,info}("\n") to format
> > reports.
> > 
> > Without wanting to wait for a report from one of these debugging tools,
> > a simple reproducer is below. Without this patch, the expected newline
> > is printed.
> 
> Empty/blank lines carry no valuable payload, could you please explain
> why do you consider this to be a regression?

Empty/blank lines are visually valuable.

Did I miss a discussion somewhere that this change is acceptable?
Unfortunately, I can't find it mentioned in the commit message, and
therefore assumed it's a regression.

As I said, a number of debugging tools use them to format reports to be
more readable (visually separate title and report body, and separate
parts of the report). Also, such reports are often parsed by CI systems,
and by changing the reports, these CI systems may break. But those are
just the usecases I'm acutely aware of -- please see a full list of
newline-print users below.

Breaking the observable behaviour of a widely used interface such as
printk doesn't seem right. Where the newline-print is inappropriate,
wouldn't removing that newline-print be more appropriate (instead of
forcing this behaviour on everyone)?

Thanks,
-- Marco

-- >8 --

$> git grep -En '\<(printk|pr_err|pr_warn|pr_info)\>\("\\n"\)'
arch/alpha/kernel/core_wildfire.c:650:  printk("\n");
arch/alpha/kernel/core_wildfire.c:658:  printk("\n");
arch/alpha/kernel/traps.c:120:  printk("\n");
arch/arm/mach-footbridge/dc21285.c:168: printk("\n");
arch/arm/mach-footbridge/dc21285.c:185: printk("\n");
arch/arm/mach-footbridge/dc21285.c:214: printk("\n");
arch/arm/mach-footbridge/dc21285.c:229: printk("\n");
arch/arm/plat-orion/mpp.c:44:   printk("\n");
arch/arm/plat-orion/mpp.c:81:   printk("\n");
arch/arm/probes/kprobes/test-core.c:576:pr_info("\n");
arch/c6x/kernel/traps.c:30: pr_err("\n");
arch/h8300/kernel/process.c:101:printk("\n");
arch/h8300/kernel/traps.c:84:   pr_info("\n");
arch/h8300/kernel/traps.c:93:   pr_info("\n");
arch/ia64/kernel/acpi.c:514:printk("\n");
arch/ia64/kernel/process.c:105: printk("\n");
arch/ia64/kernel/sal.c:173: printk("\n");
arch/ia64/kernel/time.c:306:printk("\n");
arch/ia64/kernel/unaligned.c:48:printk("\n");
arch/m68k/mm/kmap.c:283:printk("\n");
arch/m68k/mm/motorola.c:383:printk("\n");
arch/microblaze/pci/xilinx_pci.c:109:   pr_info("\n");
arch/mips/jazz/jazzdma.c:170:   printk("\n");
arch/mips/jazz/jazzdma.c:276:   printk("\n");
arch/mips/jazz/jazzdma.c:345:   printk("\n");
arch/mips/jazz/jazzdma.c:351:   printk("\n");
arch/mips/kernel/sysrq.c:31:pr_info("\n");
arch/mips/kernel/sysrq.c:33:pr_info("\n");
arch/mips/kernel/traps.c:378:   printk("\n");
arch/mips/kernel/traps.c:1588:  pr_info("\n");
arch/mips/kvm/tlb.c:81: pr_info("\n");
arch/mips/lib/dump_tlb.c:187:   printk("\n");
arch/mips/lib/r3k_dump_tlb.c:67:printk("\n");
arch/mips/math-emu/ieee754d.c:58:   printk("\n");
arch/mips/math-emu/ieee754d.c:97:   printk("\n");
arch/mips/mm/cerr-sb1.c:74: printk("\n");
arch/mips/mm/cerr-sb1.c:85: printk("\n");
arch/mips/mm/cerr-sb1.c:126:printk("\n");
arch/mips/mm/cerr-sb1.c:416:printk("\n");
arch/mips/mm/cerr-sb1.c:561:printk("\n");
arch/mips/netlogic/xlr/fmn-config.c:64: pr_info("\n");
arch/mips/netlogic/xlr/fmn-config.c:78: pr_info("\n");
arch/mips/sni/setup.c:106:  printk("\n");
arch/openrisc/kernel/traps.c:120:   printk("\n");
arch/openrisc/kernel/traps.c:168:   printk("\n");
arch/openrisc/kernel/traps.c:183:   printk("\n");
arch/openrisc/kernel/traps.c:196:   printk("\n");
arch/parisc/kernel/drivers.c:1007:  pr_info("\n");
arch/parisc/kernel/pci-dma.c:70:printk("\n");
arch/parisc/kernel/traps.c:452: printk("\n");
arch/parisc/mm/fault.c:245: pr_warn("\n");
arch/powerpc/kernel/traps.c:213:printk("\n");
arch/powerpc/math-emu/fabs.c:15:printk("\n");
arch/powerpc/math-emu/fctiw.c:24:   printk("\n");
arch/powerpc/math-emu/fctiwz.c:31:  printk("\n");
arch/powerpc/math-emu/fmr.c:15: printk("\n");
arch/powerpc/math-emu/fnabs.c:15:   printk("\n");
arch/powerpc/math-emu/fneg.c:15:printk("\n");
arch/powerpc/math-emu/lfd.c:17: printk("\n");
arch/powerpc/math-emu/stfd.c:13:printk("\n");
arch/powerpc/mm/book3s64/slb.c:271: pr_err("\n");
arch/powerpc/mm/nohash/44x.c:194:   printk("\n");
arch/powerpc/platforms/52xx/efika.c:111:printk("\n");
arch/powerpc/platforms/chrp/pci.c:257:  printk("\n");
arch/powerpc/platfo

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Oleg Nesterov

On 07/20, Jiri Slaby wrote:
>
> You tackled it, we cherry-picked dbfb089d360 to our kernels. Ccing more
> people.

Thanks... so with this patch __schedule() does

prev_state = prev->state;

...

if (!preempt && prev_state && prev_state == prev->state) {
if (signal_pending_state(prev_state, prev)) {
prev->state = TASK_RUNNING;
} else {

and ptrace_freeze_traced() can change ->state in between. This means
that this task can return from __schedule() with ->state != RUNNING,
this can explain BUG_ON(task_is_stopped_or_traced) in do_notify_parent()
you reported.

Oleg.

[PATCH v2] staging: dpaa2-ethsw: fix switch/case fallthrough warning

2020-07-19 Thread Marian Posteuca

Fix the fallthrough warning that is reported by checkpatch.

Signed-off-by: Marian Posteuca 
---
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c 
b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index 2fb75a7c9314..db10fd18914d 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -1362,7 +1362,7 @@ static int port_switchdev_blocking_event(struct 
notifier_block *unused,
return NOTIFY_DONE;
 
switch (event) {
-   case SWITCHDEV_PORT_OBJ_ADD: /* fall through */
+   case SWITCHDEV_PORT_OBJ_ADD:
case SWITCHDEV_PORT_OBJ_DEL:
return ethsw_switchdev_port_obj_event(event, dev, ptr);
case SWITCHDEV_PORT_ATTR_SET:
-- 
2.26.2

Re: [PATCH net-next v2] net: ena: Fix using plain integer as NULL pointer in ena_init_napi_in_range

2020-07-19 Thread Shay Agroskin




Joe Perches  writes:


On Mon, 2020-07-20 at 10:53 +0800, Wang Hai wrote:

Fix sparse build warning:

drivers/net/ethernet/amazon/ena/ena_netdev.c:2193:34: warning:
 Using plain integer as NULL pointer

[]
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c

[]
@@ -2190,11 +2190,10 @@ static void 
ena_del_napi_in_range(struct ena_adapter *adapter,
 static void ena_init_napi_in_range(struct ena_adapter 
 *adapter,

   int first_index, int count)
 {
-   struct ena_napi *napi = {0};
int i;
 
 	for (i = first_index; i < first_index + count; i++) {

-   napi = &adapter->ena_napi[i];
+   struct ena_napi *napi = &adapter->ena_napi[i];
 
 		netif_napi_add(adapter->netdev,

   &adapter->ena_napi[i].napi,


Another possible change is to this statement:

netif_napi_add(adapter->netdev,
   &napi->napi,
   etc...);


Yup, missed that myself. Wang, if you don't mind please apply 
Joe's change as well.


Thanks, Shay

[PATCH v4 8/8] scsi: ufs: Fix a racing problem btw error handler and runtime PM ops

2020-07-19 Thread Can Guo

Current IRQ handler blocks scsi requests before scheduling eh_work, when
error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a
scsi cmd, most likely the SSU cmd, since scsi requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/reusme is
blocked by the scsi cmd. Some changes and code re-arrangement can be made
to resolve it.

o In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
  should stay into the same spin lock. This is to make sure that no more
  commands leak into doorbell after hba->ufshcd_state is changed.
o Don't block scsi requests before scheduling eh_work, let error handler
  block scsi requests when it is ready to start error recovery.
o Don't let scsi layer keep requeuing the scsi cmds sent from hba runtime
  PM ops, let them pass or fail them. Let them pass if eh_work is scheduled
  due to non-fatal errors. Fail them fail if eh_work is scheduled due to
  fatal errors, otherwise the cmds may eventually time out since UFS is in
  bad state, which gets error handler blocked for too long. If we fail the
  scsi cmds sent from hba runtime PM ops, hba runtime PM ops fails too, but
  it does not hurt since error handler can recover hba runtime PM error.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 84 +++
 1 file changed, 49 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index ae78d5d..e9d8c4f 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -126,7 +126,8 @@ enum {
UFSHCD_STATE_RESET,
UFSHCD_STATE_ERROR,
UFSHCD_STATE_OPERATIONAL,
-   UFSHCD_STATE_EH_SCHEDULED,
+   UFSHCD_STATE_EH_SCHEDULED_FATAL,
+   UFSHCD_STATE_EH_SCHEDULED_NON_FATAL,
 };
 
 /* UFSHCD error handling flags */
@@ -2515,34 +2516,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
if (!down_read_trylock(&hba->clk_scaling_lock))
return SCSI_MLQUEUE_HOST_BUSY;
 
-   spin_lock_irqsave(hba->host->host_lock, flags);
-   switch (hba->ufshcd_state) {
-   case UFSHCD_STATE_OPERATIONAL:
-   break;
-   case UFSHCD_STATE_EH_SCHEDULED:
-   case UFSHCD_STATE_RESET:
-   err = SCSI_MLQUEUE_HOST_BUSY;
-   goto out_unlock;
-   case UFSHCD_STATE_ERROR:
-   set_host_byte(cmd, DID_ERROR);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   default:
-   dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
-   __func__, hba->ufshcd_state);
-   set_host_byte(cmd, DID_BAD_TARGET);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   }
-
-   /* if error handling is in progress, don't issue commands */
-   if (ufshcd_eh_in_progress(hba)) {
-   set_host_byte(cmd, DID_ERROR);
-   cmd->scsi_done(cmd);
-   goto out_unlock;
-   }
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
-
hba->req_abort_count = 0;
 
err = ufshcd_hold(hba, true);
@@ -2578,11 +2551,50 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
/* Make sure descriptors are ready before ringing the doorbell */
wmb();
 
-   /* issue command to the controller */
spin_lock_irqsave(hba->host->host_lock, flags);
+   switch (hba->ufshcd_state) {
+   case UFSHCD_STATE_OPERATIONAL:
+   case UFSHCD_STATE_EH_SCHEDULED_NON_FATAL:
+   break;
+   case UFSHCD_STATE_EH_SCHEDULED_FATAL:
+   /*
+* If we are here, eh_work is either scheduled or running.
+* Before eh_work sets ufshcd_state to STATE_RESET, it flushes
+* runtime PM ops by calling pm_runtime_get_sync(). If a scsi
+* cmd, e.g. the SSU cmd, is sent by PM ops, it can never be
+* finished if we let SCSI layer keep retrying it, which gets
+* eh_work stuck forever. Neither can we let it pass, because
+* ufs now is not in good status, so the SSU cmd may eventually
+* time out, blocking eh_work for too long. So just let it fail.
+*/
+   if (hba->pm_op_in_progress) {
+   hba->force_reset = true;
+   set_host_byte(cmd, DID_BAD_TARGET);
+   goto out_compl_cmd;
+   }
+   case UFSHCD_STATE_RESET:
+   err = SCSI_MLQUEUE_HOST_BUSY;
+   goto out_compl_cmd;
+   case UFSHCD_STATE_ERROR:
+   set_host_byte(cmd, DID_ERROR);
+   goto out_compl_cmd;
+   default:
+   dev_WARN_ONCE(hba->dev, 1, "%s: invalid state %d\n",
+   __func__, hba->ufshcd_state);
+   set_host_byte(cmd, DID_BAD_TARGET);
+   goto out_

[PATCH v4 1/8] scsi: ufs: Add checks before setting clk-gating states

2020-07-19 Thread Can Guo

Clock gating features can be turned on/off selectively which means its
state information is only important if it is enabled. This change makes
sure that we only look at state of clk-gating if it is enabled.

Signed-off-by: Can Guo 
Reviewed-by: Avri Altman 
---
 drivers/scsi/ufs/ufshcd.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index cdff7e5..99bd3e4 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1839,6 +1839,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba)
if (!ufshcd_is_clkgating_allowed(hba))
return;
 
+   hba->clk_gating.state = CLKS_ON;
+
hba->clk_gating.delay_ms = 150;
INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work);
INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work);
@@ -2541,7 +2543,8 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *cmd)
err = SCSI_MLQUEUE_HOST_BUSY;
goto out;
}
-   WARN_ON(hba->clk_gating.state != CLKS_ON);
+   WARN_ON(ufshcd_is_clkgating_allowed(hba) &&
+   (hba->clk_gating.state != CLKS_ON));
 
lrbp = &hba->lrb[tag];
 
@@ -8315,8 +8318,11 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
/* If link is active, device ref_clk can't be switched off */
__ufshcd_setup_clocks(hba, false, true);
 
-   hba->clk_gating.state = CLKS_OFF;
-   trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state);
+   if (ufshcd_is_clkgating_allowed(hba)) {
+   hba->clk_gating.state = CLKS_OFF;
+   trace_ufshcd_clk_gating(dev_name(hba->dev),
+   hba->clk_gating.state);
+   }
 
/* Put the host controller in low power mode if possible */
ufshcd_hba_vreg_set_lpm(hba);
@@ -8456,6 +8462,11 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
if (hba->clk_scaling.is_allowed)
ufshcd_suspend_clkscaling(hba);
ufshcd_setup_clocks(hba, false);
+   if (ufshcd_is_clkgating_allowed(hba)) {
+   hba->clk_gating.state = CLKS_OFF;
+   trace_ufshcd_clk_gating(dev_name(hba->dev),
+   hba->clk_gating.state);
+   }
 out:
hba->pm_op_in_progress = 0;
if (ret)
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH v4 7/8] scsi: ufs: Move dumps in IRQ handler to error handler

2020-07-19 Thread Can Guo

Sometime dumps in IRQ handler are heavy enough to cause system stability
issues, move them to error handler.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 68705a1..ae78d5d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5681,6 +5681,21 @@ static void ufshcd_err_handler(struct work_struct *work)
UFSHCD_UIC_DL_TCx_REPLAY_ERROR
needs_reset = true;
 
+   if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR |
+ UFSHCD_UIC_HIBERN8_MASK)) {
+   bool pr_prdt = !!(hba->saved_err & SYSTEM_BUS_FATAL_ERROR);
+
+   dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 0x%x\n",
+   __func__, hba->saved_err, hba->saved_uic_err);
+   spin_unlock_irqrestore(hba->host->host_lock, flags);
+   ufshcd_print_host_state(hba);
+   ufshcd_print_pwr_info(hba);
+   ufshcd_print_host_regs(hba);
+   ufshcd_print_tmrs(hba, hba->outstanding_tasks);
+   ufshcd_print_trs(hba, hba->outstanding_reqs, pr_prdt);
+   spin_lock_irqsave(hba->host->host_lock, flags);
+   }
+
/*
 * if host reset is required then skip clearing the pending
 * transfers forcefully because they will get cleared during
@@ -5899,22 +5914,6 @@ static irqreturn_t ufshcd_check_errors(struct ufs_hba 
*hba)
 
/* block commands from scsi mid-layer */
ufshcd_scsi_block_requests(hba);
-
-   /* dump controller state before resetting */
-   if (hba->saved_err & (INT_FATAL_ERRORS | UIC_ERROR)) {
-   bool pr_prdt = !!(hba->saved_err &
-   SYSTEM_BUS_FATAL_ERROR);
-
-   dev_err(hba->dev, "%s: saved_err 0x%x saved_uic_err 
0x%x\n",
-   __func__, hba->saved_err,
-   hba->saved_uic_err);
-
-   ufshcd_print_host_regs(hba);
-   ufshcd_print_pwr_info(hba);
-   ufshcd_print_tmrs(hba, hba->outstanding_tasks);
-   ufshcd_print_trs(hba, hba->outstanding_reqs,
-   pr_prdt);
-   }
ufshcd_schedule_eh_work(hba);
retval |= IRQ_HANDLED;
}
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH v4 6/8] scsi: ufs: Recover hba runtime PM error in error handler

2020-07-19 Thread Can Guo

Current error handler cannot work well or recover hba runtime PM error if
ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit
error or SSU cmd error. When this happens, error handler may fail doing
full reset and restore because error handler always assumes that powers,
IRQs and clocks are ready after pm_runtime_get_sync returns, but actually
they are not if ufshcd_reusme fails [1]. Besides, if ufschd_suspend/resume
fails due to UFS error, runtime PM framework saves the error value to
dev.power.runtime_error. After that, hba dev runtime suspend/resume would
not be invoked anymore unless runtime_error is cleared [2].

In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1],
error handler cannot assume anything of pm_runtime_get_sync, meaning error
handler should explicitly turn ON powers, IRQs and clocks again. To get the
hba runtime PM work as regard for scenario [2], error handler can clear the
runtime_error by calling pm_runtime_set_active() if full reset and restore
succeeds. And, more important, if pm_runtime_set_active() returns no error,
which means runtime_error has been cleared, we also need to resume those
scsi devices under hba in case any of them has failed to be resumed due to
hba runtime resume failure. This is to unblock blk_queue_enter in case
there are bios waiting inside it.

In addition, if ufshcd_resume errors out, ufshcd_release in ufshcd_resume
would be skipped. After hba runtime PM error is recovered in error handler,
we should do ufshcd_release once to get clock gating back to work.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 90 +++
 drivers/scsi/ufs/ufshcd.h |  1 +
 2 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c01743a..68705a1 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ufshcd.h"
 #include "ufs_quirks.h"
 #include "unipro.h"
@@ -229,6 +230,10 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 struct ufs_pa_layer_attr *pwr_mode);
 static void ufshcd_schedule_eh_work(struct ufs_hba *hba);
+static int ufshcd_setup_hba_vreg(struct ufs_hba *hba, bool on);
+static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on);
+static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
+struct ufs_vreg *vreg);
 static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba);
 static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable);
@@ -5553,6 +5558,77 @@ static inline void ufshcd_schedule_eh_work(struct 
ufs_hba *hba)
}
 }
 
+static void ufshcd_err_handler_prepare(struct ufs_hba *hba)
+{
+   pm_runtime_get_sync(hba->dev);
+   /*
+* Don't assume anything of pm_runtime_get_sync(), if resume fails,
+* irq and clocks can be OFF, and powers can be OFF or in LPM.
+*/
+   ufshcd_setup_vreg(hba, true);
+   ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq);
+   ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq2);
+   ufshcd_setup_hba_vreg(hba, true);
+   ufshcd_enable_irq(hba);
+
+   ufshcd_hold(hba, false);
+   if (!ufshcd_is_clkgating_allowed(hba))
+   ufshcd_setup_clocks(hba, true);
+
+   if (ufshcd_is_clkscaling_supported(hba)) {
+   cancel_work_sync(&hba->clk_scaling.suspend_work);
+   cancel_work_sync(&hba->clk_scaling.resume_work);
+   ufshcd_suspend_clkscaling(hba);
+   }
+}
+
+static void ufshcd_err_handler_unprepare(struct ufs_hba *hba)
+{
+   /* If clk_gating is held by pm ops, release it */
+   if (pm_runtime_active(hba->dev) && hba->clk_gating.held_by_pm) {
+   hba->clk_gating.held_by_pm = false;
+   ufshcd_release(hba);
+   }
+   ufshcd_release(hba);
+   if (ufshcd_is_clkscaling_supported(hba))
+   ufshcd_resume_clkscaling(hba);
+   pm_runtime_put(hba->dev);
+}
+
+#ifdef CONFIG_PM
+static void ufshcd_recover_pm_error(struct ufs_hba *hba)
+{
+   struct Scsi_Host *shost = hba->host;
+   struct scsi_device *sdev;
+   struct request_queue *q;
+   int ret;
+
+   /*
+* Set RPM status of hba device to RPM_ACTIVE,
+* this also clears its runtime error.
+*/
+   ret = pm_runtime_set_active(hba->dev);
+   /*
+* If hba device had runtime error, we also need to resume those
+* scsi devices under hba in case any of them has failed to be
+* resumed due to hba runtime resume failure. This is to unblock
+* blk_queue_enter in case there are bios waiting inside it.
+*/
+   if (!ret) {
+   list_for_each_entry(sdev, &shost->__devices, siblings) {
+

[PATCH v4 5/8] scsi: ufs: Fix concurrency of error handler and other error recovery paths

2020-07-19 Thread Can Guo

Error recovery can be invoked from multiple paths, including hibern8
enter/exit, some vendor vops, ufshcd_eh_host_reset_handler(), resume and
eh_work scheduled from IRQ context. Ultimately, these paths are trying to
invoke ufshcd_reset_and_restore(), in either sync or async manner.

Having both sync and async manners at the same time has some problems

- If link recovery happens during clock scaling work, acquring scaling_lock
  in ufshcd_exec_dev_cmd() would cause dead lock, because scaling_lock is
  already held by scaling work before link recovery happens.

- If link recovery happens during ungate work, ufshcd_hold() would be
  called recursively. Although commit 53c12d0ef6fcb
  ("scsi: ufs: fix error recovery after the hibern8 exit failure") [1]
  fixed a deadlock due to recursive calls of ufshcd_hold() by adding a
  check of eh_in_progress into ufshcd_hold, this check allows eh_work to
  run in parallel when link recovery is running.

- Similar concurrency can also happen when error recovery is invoked from
  eh_host_reset_handler. Although it tries to avoid that from happening
  by flushing eh_work before start invoking reset and restore, after flush
  work returns, eh_work can still be scheduled and running in parallel.

- Concurrency can even happen between eh_works. eh_work, currently queued
  on system_wq, is allowed to have multiple instances running in parallel,
  but we don't have proper protection for that.

If any of above concurrency happens, error recovery would fail and lead
ufs device and host into bad states. To fix the concurrency problem, this
change queues eh_work on a single threaded workqueue and remove link
recovery calls from hibern8 enter/exit path. Meanwhile, make use of eh_work
in eh_host_reset_handler instead of calling ufshcd_reset_and_restore. This
unifies UFS error recovery mechanism.

In addition, according to the UFSHCI JEDEC spec, hibern8 enter/exit error
occurs when the link is broken. This essentially applies to any power mode
change operations (since they all use PACP_PWR cmds in UniPro layer). So,
in this change, if a power mode change operation (including AH8 enter/exit)
fails, mark link state as UIC_LINK_BROKEN_STATE and schedule the eh_work.
In this case, error handler needs to do a full reset and restore to recover
the link back to active. Before the link state is recovered to active,
ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufs-sysfs.c |   1 +
 drivers/scsi/ufs/ufshcd.c| 268 +++
 drivers/scsi/ufs/ufshcd.h|   9 ++
 3 files changed, 151 insertions(+), 127 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 2d71d23..02d379f00 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -16,6 +16,7 @@ static const char *ufschd_uic_link_state_to_string(
case UIC_LINK_OFF_STATE:return "OFF";
case UIC_LINK_ACTIVE_STATE: return "ACTIVE";
case UIC_LINK_HIBERN8_STATE:return "HIBERN8";
+   case UIC_LINK_BROKEN_STATE: return "BROKEN";
default:return "UNKNOWN";
}
 }
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a22414b..c01743a 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -228,6 +228,7 @@ static int ufshcd_scale_clks(struct ufs_hba *hba, bool 
scale_up);
 static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 struct ufs_pa_layer_attr *pwr_mode);
+static void ufshcd_schedule_eh_work(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_enable(struct ufs_hba *hba);
 static int ufshcd_wb_buf_flush_disable(struct ufs_hba *hba);
 static int ufshcd_wb_ctrl(struct ufs_hba *hba, bool enable);
@@ -1571,11 +1572,6 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
spin_lock_irqsave(hba->host->host_lock, flags);
hba->clk_gating.active_reqs++;
 
-   if (ufshcd_eh_in_progress(hba)) {
-   spin_unlock_irqrestore(hba->host->host_lock, flags);
-   return 0;
-   }
-
 start:
switch (hba->clk_gating.state) {
case CLKS_ON:
@@ -1653,6 +1649,7 @@ static void ufshcd_gate_work(struct work_struct *work)
struct ufs_hba *hba = container_of(work, struct ufs_hba,
clk_gating.gate_work.work);
unsigned long flags;
+   int ret;
 
spin_lock_irqsave(hba->host->host_lock, flags);
/*
@@ -1679,8 +1676,11 @@ static void ufshcd_gate_work(struct work_struct *work)
 
/* put the link into hibern8 mode before turning off clocks */
if (ufshcd_can_hibern8_during_gating(hba)) {
-   if (ufshcd_uic_hibern8_enter(hba)) {
+   ret = ufshcd_uic_hibern8_enter(hba);
+   if (ret) {
hba->clk_gating.state = CLKS_ON;
+

[PATCH v4 2/8] scsi: ufs: Fix imbalanced scsi_block_reqs_cnt caused by ufshcd_hold()

2020-07-19 Thread Can Guo

The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be
decreased back in ufshcd_ungate_work() in a paired way. However, if
specific ufshcd_hold/release sequences are met, it is possible that
scsi_block_reqs_cnt is increased twice but only one ungate work is
queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold() and
ufshcd_ungate_work() in a paired way, increase it only if queue_work()
returns true.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 99bd3e4..2907828 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1611,12 +1611,12 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
 */
/* fallthrough */
case CLKS_OFF:
-   ufshcd_scsi_block_requests(hba);
hba->clk_gating.state = REQ_CLKS_ON;
trace_ufshcd_clk_gating(dev_name(hba->dev),
hba->clk_gating.state);
-   queue_work(hba->clk_gating.clk_gating_workq,
-  &hba->clk_gating.ungate_work);
+   if (queue_work(hba->clk_gating.clk_gating_workq,
+  &hba->clk_gating.ungate_work))
+   ufshcd_scsi_block_requests(hba);
/*
 * fall through to check if we should wait for this
 * work to be done or not.
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c

2020-07-19 Thread Naresh Kamboju

This kernel oops while boot linux mainline kernel on arm64  db410c device.

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  git commit: f8456690ba8eb18ea4714e68554e242a04f65cff
  git describe: v5.8-rc5-48-gf8456690ba8e
  make_kernelversion: 5.8.0-rc5
  kernel-config:
https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config

[5.444121] Unable to handle kernel NULL pointer dereference at
virtual address 0018
[5.456615]   ESR = 0x9604
[5.464471]   SET = 0, FnV = 0
[5.464487]   EA = 0, S1PTW = 0
[5.466521] Data abort info:
[5.469971]   ISV = 0, ISS = 0x0004
[5.472768]   CM = 0, WnR = 0
[5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000
[5.479349] [0018] pgd=, p4d=
[5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP
[5.492448] Modules linked in: crct10dif_ce adv7511(+)
qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg
drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng
videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns
rmtfs_mem fuse
[5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1
[5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
[5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8
[5.541148] lr : free_io_pgtable_ops+0x28/0x58
[5.546350] sp : 80001219b5f0
[5.550689] x29: 80001219b5f0 x28: 0013
[5.554078] x27: 0100 x26: 36add3b8
[5.559459] x25: 8915e910 x24: 3a5458c0
[5.564753] x23: 0003 x22: 36a37058
[5.570049] x21: 36a3a100 x20: 36a3a480
[5.575344] x19: 36a37158 x18: 
[5.580639] x17:  x16: 
[5.585935] x15: 0004 x14: 0368
[5.591229] x13:  x12: 39c61798
[5.596525] x11: 39c616d0 x10: 4000
[5.601820] x9 :  x8 : 39c616f8
[5.607114] x7 :  x6 : 09f699a0
[5.612410] x5 : 80001219b520 x4 : 36a3a000
[5.617705] x3 : 09f69904 x2 : 
[5.623001] x1 : 8000107e27e8 x0 : 3a545810
[5.628297] Call trace:
[5.633592]  qcom_iommu_tlb_inv_context+0x18/0xa8
[5.635764]  free_io_pgtable_ops+0x28/0x58
[5.640624]  qcom_iommu_domain_free+0x38/0x60
[5.644617]  iommu_group_release+0x4c/0x70
[5.649045]  kobject_put+0x6c/0x120
[5.653035]  kobject_del+0x64/0x90
[5.656421]  kobject_put+0xfc/0x120
[5.659893]  iommu_group_remove_device+0xdc/0xf0
[5.663281]  iommu_release_device+0x44/0x70
[5.668142]  iommu_bus_notifier+0xbc/0xd0
[5.672048]  notifier_call_chain+0x54/0x98
[5.676214]  blocking_notifier_call_chain+0x48/0x70
[5.680209]  device_del+0x26c/0x3a0
[5.684981]  platform_device_del.part.0+0x1c/0x88
[5.688453]  platform_device_unregister+0x24/0x40
[5.693316]  of_platform_device_destroy+0xe4/0xf8
[5.698002]  device_for_each_child+0x5c/0xa8
[5.702689]  of_platform_depopulate+0x3c/0x80
[5.707144]  msm_pdev_probe+0x1c4/0x308 [msm]
[5.711286]  platform_drv_probe+0x54/0xa8
[5.715624]  really_probe+0xd8/0x320
[5.719617]  driver_probe_device+0x58/0xb8
[5.723263]  device_driver_attach+0x74/0x80
[5.727168]  __driver_attach+0x58/0xe0
[5.731248]  bus_for_each_dev+0x70/0xc0
[5.735067]  driver_attach+0x24/0x30
[5.738801]  bus_add_driver+0x14c/0x1f0
[5.742619]  driver_register+0x64/0x120
[5.746178]  __platform_driver_register+0x48/0x58
[5.750099]  msm_drm_register+0x58/0x70 [msm]
[5.754861]  do_one_initcall+0x54/0x1a0
[5.759200]  do_init_module+0x54/0x200
[5.762846]  load_module+0x1d1c/0x2300
[5.74]  __do_sys_finit_module+0xd8/0xf0
[5.770398]  __arm64_sys_finit_module+0x20/0x30
[5.774826]  el0_svc_common.constprop.0+0x6c/0x168
[5.779078]  do_el0_svc+0x24/0x90
[5.783939]  el0_sync_handler+0x90/0x198
[5.787323]  el0_sync+0x158/0x180
[5.791323] Code: 910003fd f9417404 b4000484 f9401482 (b9401846)
[5.794532] ---[ end trace 3d6a53241629e560 ]---

full crash log details.
https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.8-rc5-48-gf8456690ba8e/testrun/2945157/suite/linux-log-parser/test/check-kernel-oops-1573988/log

-- 
Linaro LKFT
https://lkft.linaro.org

[PATCH v4 4/8] scsi: ufs: Add some debug infos to ufshcd_print_host_state

2020-07-19 Thread Can Guo

The infos of the last interrupt status and its timestamp are very helpful
when debug system stability issues, e.g. IRQ starvation, so add them to
ufshcd_print_host_state. Meanwhile, UFS device infos like model name and
its FW version also come in handy during debug. In addition, this change
makes cleanup to some prints in ufshcd_print_host_regs as similar prints
are already available in ufshcd_print_host_state.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 31 ++-
 drivers/scsi/ufs/ufshcd.h |  5 +
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 2907828..a22414b 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -411,15 +411,6 @@ static void ufshcd_print_err_hist(struct ufs_hba *hba,
 static void ufshcd_print_host_regs(struct ufs_hba *hba)
 {
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
-   dev_err(hba->dev, "hba->ufs_version = 0x%x, hba->capabilities = 0x%x\n",
-   hba->ufs_version, hba->capabilities);
-   dev_err(hba->dev,
-   "hba->outstanding_reqs = 0x%x, hba->outstanding_tasks = 0x%x\n",
-   (u32)hba->outstanding_reqs, (u32)hba->outstanding_tasks);
-   dev_err(hba->dev,
-   "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt = %d\n",
-   ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp),
-   hba->ufs_stats.hibern8_exit_cnt);
 
ufshcd_print_err_hist(hba, &hba->ufs_stats.pa_err, "pa_err");
ufshcd_print_err_hist(hba, &hba->ufs_stats.dl_err, "dl_err");
@@ -438,8 +429,6 @@ static void ufshcd_print_host_regs(struct ufs_hba *hba)
ufshcd_print_err_hist(hba, &hba->ufs_stats.host_reset, "host_reset");
ufshcd_print_err_hist(hba, &hba->ufs_stats.task_abort, "task_abort");
 
-   ufshcd_print_clk_freqs(hba);
-
ufshcd_vops_dbg_register_dump(hba);
 }
 
@@ -499,6 +488,8 @@ static void ufshcd_print_tmrs(struct ufs_hba *hba, unsigned 
long bitmap)
 
 static void ufshcd_print_host_state(struct ufs_hba *hba)
 {
+   struct scsi_device *sdev_ufs = hba->sdev_ufs_device;
+
dev_err(hba->dev, "UFS Host state=%d\n", hba->ufshcd_state);
dev_err(hba->dev, "outstanding reqs=0x%lx tasks=0x%lx\n",
hba->outstanding_reqs, hba->outstanding_tasks);
@@ -511,12 +502,24 @@ static void ufshcd_print_host_state(struct ufs_hba *hba)
dev_err(hba->dev, "Auto BKOPS=%d, Host self-block=%d\n",
hba->auto_bkops_enabled, hba->host->host_self_blocked);
dev_err(hba->dev, "Clk gate=%d\n", hba->clk_gating.state);
+   dev_err(hba->dev,
+   "last_hibern8_exit_tstamp at %lld us, hibern8_exit_cnt=%d\n",
+   ktime_to_us(hba->ufs_stats.last_hibern8_exit_tstamp),
+   hba->ufs_stats.hibern8_exit_cnt);
+   dev_err(hba->dev, "last intr at %lld us, last intr status=0x%x\n",
+   ktime_to_us(hba->ufs_stats.last_intr_ts),
+   hba->ufs_stats.last_intr_status);
dev_err(hba->dev, "error handling flags=0x%x, req. abort count=%d\n",
hba->eh_flags, hba->req_abort_count);
-   dev_err(hba->dev, "Host capabilities=0x%x, caps=0x%x\n",
-   hba->capabilities, hba->caps);
+   dev_err(hba->dev, "hba->ufs_version=0x%x, Host capabilities=0x%x, 
caps=0x%x\n",
+   hba->ufs_version, hba->capabilities, hba->caps);
dev_err(hba->dev, "quirks=0x%x, dev. quirks=0x%x\n", hba->quirks,
hba->dev_quirks);
+   if (sdev_ufs)
+   dev_err(hba->dev, "UFS dev info: %.8s %.16s rev %.4s\n",
+   sdev_ufs->vendor, sdev_ufs->model, sdev_ufs->rev);
+
+   ufshcd_print_clk_freqs(hba);
 }
 
 /**
@@ -5951,6 +5954,8 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 
spin_lock(hba->host->host_lock);
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
+   hba->ufs_stats.last_intr_status = intr_status;
+   hba->ufs_stats.last_intr_ts = ktime_get();
 
/*
 * There could be max of hba->nutrs reqs in flight and in worst case
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 656c069..5b2cdaf 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -406,6 +406,8 @@ struct ufs_err_reg_hist {
 
 /**
  * struct ufs_stats - keeps usage/err statistics
+ * @last_intr_status: record the last interrupt status.
+ * @last_intr_ts: record the last interrupt timestamp.
  * @hibern8_exit_cnt: Counter to keep track of number of exits,
  * reset this after link-startup.
  * @last_hibern8_exit_tstamp: Set time after the hibern8 exit.
@@ -425,6 +427,9 @@ struct ufs_err_reg_hist {
  * @tsk_abort: tracks task abort events
  */
 struct ufs_stats {
+   u32 last_intr_status;
+   ktime_t last_intr_ts;
+
u32 hibern8_exit_cnt;
ktime_t last_hibern8_exit_tstamp;
 
-- 
Qualcomm

[PATCH v4 3/8] ufs: ufs-qcom: Fix a few BUGs in func ufs_qcom_dump_dbg_regs()

2020-07-19 Thread Can Guo

Dumping testbus registers needs to sleep a bit intermittently as there are
too many of them. Skip them for those contexts where sleep is not allowed.

Meanwhile, if ufs_qcom_dump_dbg_regs() calls ufs_qcom_testbus_config() from
ufshcd_suspend/resume and/or clk gate/ungate context, pm_runtime_get_sync()
and ufshcd_hold() will cause racing problems. Fix it by removing the
unnecessary calls of pm_runtime_get_sync() and ufshcd_hold().

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufs-qcom.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 2e6ddb5..3743c17 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1604,9 +1604,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host)
 */
}
mask <<= offset;
-
-   pm_runtime_get_sync(host->hba->dev);
-   ufshcd_hold(host->hba, false);
ufshcd_rmwl(host->hba, TEST_BUS_SEL,
(u32)host->testbus.select_major << 19,
REG_UFS_CFG1);
@@ -1619,8 +1616,6 @@ int ufs_qcom_testbus_config(struct ufs_qcom_host *host)
 * committed before returning.
 */
mb();
-   ufshcd_release(host->hba);
-   pm_runtime_put_sync(host->hba->dev);
 
return 0;
 }
@@ -1658,11 +1653,13 @@ static void ufs_qcom_dump_dbg_regs(struct ufs_hba *hba)
 
/* sleep a bit intermittently as we are dumping too much data */
ufs_qcom_print_hw_debug_reg_all(hba, NULL, ufs_qcom_dump_regs_wrapper);
-   udelay(1000);
-   ufs_qcom_testbus_read(hba);
-   udelay(1000);
-   ufs_qcom_print_unipro_testbus(hba);
-   udelay(1000);
+   if (in_task()) {
+   udelay(1000);
+   ufs_qcom_testbus_read(hba);
+   udelay(1000);
+   ufs_qcom_print_unipro_testbus(hba);
+   udelay(1000);
+   }
 }
 
 /**
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH 12/20] dt-bindings: dma: renesas,usb-dmac: Add binding for r8a774e1

2020-07-19 Thread Vinod Koul

On 16-07-20, 18:18, Lad Prabhakar wrote:
> Add binding for R8A774E1 SoC (RZ/G2H).

Applied, thanks

-- 
~Vinod

Re: [PATCH 09/20] dt-bindings: phy: renesas,usb3-phy: Add r8a774e1 support

2020-07-19 Thread Vinod Koul

On 16-07-20, 18:18, Lad Prabhakar wrote:
> Document RZ/G2H (R8A774E1) SoC bindings.

Applied, thanks

-- 
~Vinod

Re: [PATCH 05/20] dt-bindings: phy: renesas,usb2-phy: Add r8a774e1 support

2020-07-19 Thread Vinod Koul

On 16-07-20, 18:18, Lad Prabhakar wrote:
> Document SoC specific bindings for RZ/G2H (r8a774e1) SoC.

Applied, thanks

-- 
~Vinod

[Patch v1 4/4] arm64: tegra: Add GPCDMA node in dt

2020-07-19 Thread Rajesh Gumasta

Add device tree node for GPCDMA controller on Tegra186 target
and Tegra194 target.

Signed-off-by: Rajesh Gumasta 
---
 arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi |  4 +++
 arch/arm64/boot/dts/nvidia/tegra186.dtsi   | 46 ++
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 44 
 3 files changed, 94 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi
index 2fcaa2e..56ed8d8 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi
@@ -54,6 +54,10 @@
};
};
 
+   dma@260 {
+   status = "okay";
+   };
+
memory-controller@2c0 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 58100fb..91bb17e 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -70,6 +70,52 @@
snps,rxpbl = <8>;
};
 
+   gpcdma: dma@260 {
+   compatible = "nvidia,tegra186-gpcdma";
+   reg = <0x0 0x260 0x0 0x21>;
+   resets = <&bpmp TEGRA186_RESET_GPCDMA>;
+   reset-names = "gpcdma";
+   interrupts = ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ;
+   #dma-cells = <1>;
+   iommus = <&smmu TEGRA186_SID_GPCDMA_0>;
+   dma-coherent;
+   nvidia,start-dma-channel-index = <1>;
+   dma-channels = <31>;
+   status = "disabled";
+   };
+
aconnect {
compatible = "nvidia,tegra186-aconnect",
 "nvidia,tegra210-aconnect";
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 4bc187a..0bd67bd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -68,6 +68,50 @@
snps,rxpbl = <8>;
};
 
+   gpcdma: dma@260 {
+   compatible = "nvidia,tegra194-gpcdma";
+   reg = <0x0 0x260 0x0 0x21>;
+   resets = <&bpmp TEGRA194_RESET_GPCDMA>;
+   reset-names = "gpcdma";
+   interrupts = ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+ ,
+

[Patch v1 1/4] dt-bindings: dma: Add DT binding document

2020-07-19 Thread Rajesh Gumasta

Add DT binding document for Nvidia Tegra GPCDMA controller.

Signed-off-by: Rajesh Gumasta 
---
 .../bindings/dma/nvidia,tegra-gpc-dma.yaml | 99 ++
 1 file changed, 99 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/dma/nvidia,tegra-gpc-dma.yaml

diff --git a/Documentation/devicetree/bindings/dma/nvidia,tegra-gpc-dma.yaml 
b/Documentation/devicetree/bindings/dma/nvidia,tegra-gpc-dma.yaml
new file mode 100644
index 000..39827ab
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/nvidia,tegra-gpc-dma.yaml
@@ -0,0 +1,99 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/dma/nvidia,tegra-gpc-dma.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Nvidia Tegra GPC DMA Controller Device Tree Bindings
+
+description: |
+  Tegra GPC DMA controller is a general purpose dma used for faster data
+  transfers between memory to memory, memory to device and device to memory.
+  Terms 'dma' and 'gpcdma' can be used interchangeably.
+
+maintainers:
+  - Jon Hunter 
+  - Rajesh Gumasta 
+
+allOf:
+  - $ref: "dma-controller.yaml#"
+
+properties:
+  "#dma-cells":
+const: 1
+
+  compatible:
+- enum:
+  - nvidia,tegra186-gpcdma
+  - nvidia,tegra194-gpcdma
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  resets:
+maxItems: 1
+
+  reset-names:
+const: gpcdma
+
+  iommus:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - resets
+  - reset-names
+  - "#dma-cells"
+  - iommus
+
+examples:
+  - |
+gpcdma: dma@260 {
+ compatible = "nvidia,tegra186-gpcdma";
+ reg = <0x0 0x260 0x0 0x21>;
+ resets = <&bpmp TEGRA186_RESET_GPCDMA>;
+ reset-names = "gpcdma";
+ interrupts = ;
+   #dma-cells = <1>;
+  iommus = <&smmu TEGRA_SID_GPCDMA_0>;
+  dma-coherent;
+   };
+
+...
-- 
2.7.4

[Patch v1 2/4] dma: tegra: Adding Tegra GPC DMA controller driver

2020-07-19 Thread Rajesh Gumasta

v4 changes: Removed pending dma desc list and other unused
data structures

v3 changes: Removed free list for dma_desc and sg

Adding GPC DMA controller driver for Tegra186 and Tegra194. The driver
supports dma transfers between memory to memory, IO to memory and
memory to IO.

Signed-off-by: Pavan Kunapuli 
Signed-off-by: Rajesh Gumasta 
---
 drivers/dma/Kconfig |   12 +
 drivers/dma/Makefile|1 +
 drivers/dma/tegra-gpc-dma.c | 1512 +++
 3 files changed, 1525 insertions(+)
 create mode 100644 drivers/dma/tegra-gpc-dma.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index e9ed916..be4c395 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -639,6 +639,18 @@ config TEGRA210_ADMA
  peripheral and vice versa. It does not support memory to
  memory data transfer.
 
+config TEGRA_GPC_DMA
+   tristate "NVIDIA Tegra GPC DMA support"
+   depends on ARCH_TEGRA_186_SOC || ARCH_TEGRA_194_SOC || COMPILE_TEST
+   select DMA_ENGINE
+   help
+ Support for the NVIDIA Tegra186 and Tegra194 GPC DMA controller
+ driver. The DMA controller is having multiple DMA channel which
+ can be configured for different peripherals like UART, SPI, etc
+ which are on APB bus.
+ This DMA controller transfers data from memory to peripheral fifo
+ or vice versa. It also supports memory to memory data transfer.
+
 config TIMB_DMA
tristate "Timberdale FPGA DMA support"
depends on MFD_TIMBERDALE || COMPILE_TEST
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index e60f813..43e3a6e 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -76,6 +76,7 @@ obj-$(CONFIG_S3C24XX_DMAC) += s3c24xx-dma.o
 obj-$(CONFIG_TXX9_DMAC) += txx9dmac.o
 obj-$(CONFIG_TEGRA20_APB_DMA) += tegra20-apb-dma.o
 obj-$(CONFIG_TEGRA210_ADMA) += tegra210-adma.o
+obj-$(CONFIG_TEGRA_GPC_DMA) += tegra-gpc-dma.o
 obj-$(CONFIG_TIMB_DMA) += timb_dma.o
 obj-$(CONFIG_UNIPHIER_MDMAC) += uniphier-mdmac.o
 obj-$(CONFIG_UNIPHIER_XDMAC) += uniphier-xdmac.o
diff --git a/drivers/dma/tegra-gpc-dma.c b/drivers/dma/tegra-gpc-dma.c
new file mode 100644
index 000..92b7219
--- /dev/null
+++ b/drivers/dma/tegra-gpc-dma.c
@@ -0,0 +1,1512 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * DMA driver for Nvidia's Tegra186 and Tegra194 GPC DMA controller.
+ *
+ * Copyright (c) 2014-2020, NVIDIA CORPORATION.  All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "virt-dma.h"
+
+/* CSR register */
+#define TEGRA_GPCDMA_CHAN_CSR  0x00
+#define TEGRA_GPCDMA_CSR_ENB   BIT(31)
+#define TEGRA_GPCDMA_CSR_IE_EOCBIT(30)
+#define TEGRA_GPCDMA_CSR_ONCE  BIT(27)
+#define TEGRA_GPCDMA_CSR_FC_MODE_NO_MMIO   (0 << 24)
+#define TEGRA_GPCDMA_CSR_FC_MODE_ONE_MMIO  (1 << 24)
+#define TEGRA_GPCDMA_CSR_FC_MODE_TWO_MMIO  (2 << 24)
+#define TEGRA_GPCDMA_CSR_FC_MODE_FOUR_MMIO (3 << 24)
+#define TEGRA_GPCDMA_CSR_DMA_IO2MEM_NO_FC  (0 << 21)
+#define TEGRA_GPCDMA_CSR_DMA_IO2MEM_FC (1 << 21)
+#define TEGRA_GPCDMA_CSR_DMA_MEM2IO_NO_FC  (2 << 21)
+#define TEGRA_GPCDMA_CSR_DMA_MEM2IO_FC (3 << 21)
+#define TEGRA_GPCDMA_CSR_DMA_MEM2MEM   (4 << 21)
+#define TEGRA_GPCDMA_CSR_DMA_FIXED_PAT (6 << 21)
+#define TEGRA_GPCDMA_CSR_REQ_SEL_SHIFT 16
+#define TEGRA_GPCDMA_CSR_REQ_SEL_MASK  0x1F
+#define TEGRA_GPCDMA_CSR_REQ_SEL_UNUSED0x4
+#define TEGRA_GPCDMA_CSR_IRQ_MASK  BIT(15)
+#define TEGRA_GPCDMA_CSR_WEIGHT_SHIFT  10
+
+/* STATUS register */
+#define TEGRA_GPCDMA_CHAN_STATUS   0x004
+#define TEGRA_GPCDMA_STATUS_BUSY   BIT(31)
+#define TEGRA_GPCDMA_STATUS_ISE_EOCBIT(30)
+#define TEGRA_GPCDMA_STATUS_PING_PONG  BIT(28)
+#define TEGRA_GPCDMA_STATUS_DMA_ACTIVITY   BIT(27)
+#define TEGRA_GPCDMA_STATUS_CHANNEL_PAUSE  BIT(26)
+#define TEGRA_GPCDMA_STATUS_CHANNEL_RX BIT(25)
+#define TEGRA_GPCDMA_STATUS_CHANNEL_TX BIT(24)
+#define TEGRA_GPCDMA_STATUS_IRQ_INTR_STA   BIT(23)
+#define TEGRA_GPCDMA_STATUS_IRQ_STABIT(21)
+#define TEGRA_GPCDMA_STATUS_IRQ_TRIG_STA   BIT(20)
+
+#define TEGRA_GPCDMA_CHAN_CSRE 0x008
+#define TEGRA_GPCDMA_CHAN_CSRE_PAUSE   BIT(31)
+
+/* Source address */
+#define TEGRA_GPCDMA_CHAN_SRC_PTR  0x00C
+
+/* Destination address */
+#define TEGRA_GPCDMA_CHAN_DST_PTR  0x010
+
+/* High address pointer */
+#define TEGRA_GPCDMA_CHAN_HIGH_ADDR_PTR0x014
+#define TEGRA_GPCDMA_HIGH_ADDR_SCR_PTR_SHIFT   0
+#define TEGRA_GPCDMA_HIGH_ADDR_SCR_PTR_MASK0xFF
+#define TEGRA_GPCDMA_HIGH_ADDR_DST_PTR_SHIFT   16
+#define TEGRA_GPCDMA_HIGH_ADDR_DST_PTR_MAS

Re: [PATCH v1 07/15] scsi: hisi_sas_v3_hw: use generic power management

2020-07-19 Thread Vaibhav Gupta

On Mon, Jul 20, 2020 at 02:16:45PM +0800, chenxiang (M) wrote:
> Hi Vaibhav,
> 
> 在 2020/7/17 14:34, Vaibhav Gupta 写道:
> > With legacy PM, drivers themselves were responsible for managing the
> > device's power states and takes care of register states.
> > 
> > After upgrading to the generic structure, PCI core will take care of
> > required tasks and drivers should do only device-specific operations.
> > 
> > The driver was calling pci_save/restore_state(), pci_choose_state(),
> > pci_enable/disable_device() and pci_set_power_state() which is no more
> > needed.
> > 
> > Compile-tested only.
> > 
> > Signed-off-by: Vaibhav Gupta 
> 
> Reviewed-by: Xiang Chen 
> Just a small comment, below.
> 
> > ---
> >   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 32 --
> >   1 file changed, 10 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
> > b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > index 55e2321a65bc..45605a520bc8 100644
> > --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > @@ -3374,13 +3374,13 @@ enum {
> > hip08,
> >   };
> > -static int hisi_sas_v3_suspend(struct pci_dev *pdev, pm_message_t state)
> > +static int __maybe_unused hisi_sas_v3_suspend(struct device *dev_d)
> >   {
> > +   struct pci_dev *pdev = to_pci_dev(dev_d);
> > struct sas_ha_struct *sha = pci_get_drvdata(pdev);
> > struct hisi_hba *hisi_hba = sha->lldd_ha;
> > struct device *dev = hisi_hba->dev;
> > struct Scsi_Host *shost = hisi_hba->shost;
> > -   pci_power_t device_state;
> > int rc;
> > if (!pdev->pm_cap) {
> > @@ -3406,21 +3406,15 @@ static int hisi_sas_v3_suspend(struct pci_dev 
> > *pdev, pm_message_t state)
> > hisi_sas_init_mem(hisi_hba);
> > -   device_state = pci_choose_state(pdev, state);
> > -   dev_warn(dev, "entering operating state [D%d]\n",
> > -   device_state);
> 
> Please retain above print to keep consistence with the print in function
> hisi_sas_v3_resume().
>
Okay, Thanks for the review :)
This is will be fixed in v2 patch-series along with other changes.

-- Vaibhav Gupta 
> > -   pci_save_state(pdev);
> > -   pci_disable_device(pdev);
> > -   pci_set_power_state(pdev, device_state);
> > -
> > hisi_sas_release_tasks(hisi_hba);
> > sas_suspend_ha(sha);
> > return 0;
> >   }
> > -static int hisi_sas_v3_resume(struct pci_dev *pdev)
> > +static int __maybe_unused hisi_sas_v3_resume(struct device *dev_d)
> >   {
> > +   struct pci_dev *pdev = to_pci_dev(dev_d);
> > struct sas_ha_struct *sha = pci_get_drvdata(pdev);
> > struct hisi_hba *hisi_hba = sha->lldd_ha;
> > struct Scsi_Host *shost = hisi_hba->shost;
> > @@ -3430,16 +3424,8 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)
> > dev_warn(dev, "resuming from operating state [D%d]\n",
> >  device_state);
> > -   pci_set_power_state(pdev, PCI_D0);
> > -   pci_enable_wake(pdev, PCI_D0, 0);
> > -   pci_restore_state(pdev);
> > -   rc = pci_enable_device(pdev);
> > -   if (rc) {
> > -   dev_err(dev, "enable device failed during resume (%d)\n", rc);
> > -   return rc;
> > -   }
> > +   device_wakeup_disable(dev_d);
> > -   pci_set_master(pdev);
> > scsi_unblock_requests(shost);
> > clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
> > @@ -3447,7 +3433,6 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)
> > rc = hw_init_v3_hw(hisi_hba);
> > if (rc) {
> > scsi_remove_host(shost);
> > -   pci_disable_device(pdev);
> > return rc;
> > }
> > hisi_hba->hw->phys_init(hisi_hba);
> > @@ -3468,13 +3453,16 @@ static const struct pci_error_handlers 
> > hisi_sas_err_handler = {
> > .reset_done = hisi_sas_reset_done_v3_hw,
> >   };
> > +static SIMPLE_DEV_PM_OPS(hisi_sas_v3_pm_ops,
> > +hisi_sas_v3_suspend,
> > +hisi_sas_v3_resume);
> > +
> >   static struct pci_driver sas_v3_pci_driver = {
> > .name   = DRV_NAME,
> > .id_table   = sas_v3_pci_table,
> > .probe  = hisi_sas_v3_probe,
> > .remove = hisi_sas_v3_remove,
> > -   .suspend= hisi_sas_v3_suspend,
> > -   .resume = hisi_sas_v3_resume,
> > +   .driver.pm  = &hisi_sas_v3_pm_ops,
> > .err_handler= &hisi_sas_err_handler,
> >   };
> 
>

Re: [PATCH 5/5] cpuidle-pseries: Block Extended CEDE(1) which adds no additional value.

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:39]:

> From: "Gautham R. Shenoy" 
> 
> The Extended CEDE state with latency-hint = 1 is only different from
> normal CEDE (with latency-hint = 0) in that a CPU in Extended CEDE(1)
> does not wakeup on timer events. Both CEDE and Extended CEDE(1) map to
> the same hardware idle state. Since we already get SMT folding from
> the normal CEDE, the Extended CEDE(1) doesn't provide any additional
> value. This patch blocks Extended CEDE(1).
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 

> ---
>  drivers/cpuidle/cpuidle-pseries.c | 57 
> ---
>  1 file changed, 54 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 6f893cd..be0b8b2 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -350,6 +350,43 @@ static int pseries_cpuidle_driver_init(void)
>   return 0;
>  }
> 
> +#define XCEDE1_HINT  1
> +#define ERR_NO_VALUE_ADD (-1)
> +#define ERR_NO_EE_WAKEUP (-2)
> +
> +/*
> + * Returns 0 if the Extende CEDE state with @hint is not blocked in
> + * cpuidle framework.
> + *
> + * Returns ERR_NO_EE_WAKEUP if the Extended CEDE state is blocked due
> + * to not being responsive to external interrupts.
> + *
> + * Returns ERR_NO_VALUE_ADD if the Extended CEDE state does not provide
> + * added value addition over the normal CEDE.
> + */
> +static int cpuidle_xcede_blocked(u8 hint, u64 latency_us, u8 
> responsive_to_irqs)
> +{
> +
> + /*
> +  * We will only allow extended CEDE states that are responsive
> +  * to irqs do not require an H_PROD to be woken up.
> +  */
> + if (!responsive_to_irqs)
> + return ERR_NO_EE_WAKEUP;
> +
> + /*
> +  * We already obtain SMT folding benefits from CEDE (which is
> +  * CEDE with hint 0). Furthermore, CEDE is also responsive to
> +  * timer-events, while XCEDE1 requires an external
> +  * interrupt/H_PROD to be woken up. Hence, block XCEDE1 since
> +  * it adds no further value.
> +  */
> + if (hint == XCEDE1_HINT)
> + return ERR_NO_VALUE_ADD;
> +
> + return 0;
> +}
> +
>  static int add_pseries_idle_states(void)
>  {
>   int nr_states = 2; /* By default we have snooze, CEDE */
> @@ -365,15 +402,29 @@ static int add_pseries_idle_states(void)
>   char name[CPUIDLE_NAME_LEN];
>   unsigned int latency_hint = xcede_records[i].latency_hint;
>   u64 residency_us;
> + int rc;
> +
> + if (latency_us < min_latency_us)
> + min_latency_us = latency_us;
> +
> + rc = cpuidle_xcede_blocked(latency_hint, latency_us,
> +xcede_records[i].responsive_to_irqs);
> 
> - if (!xcede_records[i].responsive_to_irqs) {
> + if (rc) {
> + switch (rc) {
> + case ERR_NO_VALUE_ADD:
> + pr_info("cpuidle : Skipping XCEDE%d. No 
> additional value-add\n",
> + latency_hint);
> + break;
> + case ERR_NO_EE_WAKEUP:
>   pr_info("cpuidle : Skipping XCEDE%d. Not responsive to 
> IRQs\n",
>   latency_hint);
> + break;
> + }
> +
>   continue;
>   }
> 
> - if (latency_us < min_latency_us)
> - min_latency_us = latency_us;
>   snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> 
>   /*


We need these heuristics to select/reject idle states exposed by
platform firmware to Linux primarily because not all states are really
useful to Linux on a given setup.

--Vaidy

Re: [PATCH v3 0/2] phy: renesas: rcar-gen3-usb2: fix an issue and minor update

2020-07-19 Thread Vinod Koul

On 17-07-20, 20:44, Yoshihiro Shimoda wrote:
> The patch 1 can fix an issue which SError happen if CONFIG_DEBUG_SHIRQ
> is enabled.
> The patch 2 is a incremental patch from patch 1. It's better to avoid
> unexpected behaviors if request_irq() failed.

Applied, thanks

-- 
~Vinod

[Patch v1 0/4] Add Nvidia Tegra GPC-DMA driver

2020-07-19 Thread Rajesh Gumasta

Add support for Nvida Tegra general purpose DMA driver for
Tegra186 and Tegra194 platform.

Patch 1: Add dt-binding document for Tegra GPCDMA driver
Patch 2: Add Tegra GPCDMA driver
Patch 3: Enable Tegra GPCDMA as module
Patch 4: Add GPCDMA DT node for Tegra186 and Tegra194

Rajesh Gumasta (4):
  dt-bindings: dma: Add DT binding document
  dma: tegra: Adding Tegra GPC DMA controller driver
  arm64: configs: enable tegra gpc dma
  arm64: tegra: Add GPCDMA node in dt

 .../bindings/dma/nvidia,tegra-gpc-dma.yaml |   99 ++
 arch/arm64/boot/dts/nvidia/tegra186-p3310.dtsi |4 +
 arch/arm64/boot/dts/nvidia/tegra186.dtsi   |   46 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |   44 +
 arch/arm64/configs/defconfig   |1 +
 drivers/dma/Kconfig|   12 +
 drivers/dma/Makefile   |1 +
 drivers/dma/tegra-gpc-dma.c| 1512 
 8 files changed, 1719 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/dma/nvidia,tegra-gpc-dma.yaml
 create mode 100644 drivers/dma/tegra-gpc-dma.c

-- 
2.7.4

[Patch v1 3/4] arm64: configs: enable tegra gpc dma

2020-07-19 Thread Rajesh Gumasta

Enable TEGRA_GPC_DMA in defconfig for Tegra186 and Tegra196 gpc
dma controller driver

Signed-off-by: Rajesh Gumasta 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 883e8ba..600f568 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -775,6 +775,7 @@ CONFIG_MV_XOR_V2=y
 CONFIG_OWL_DMA=y
 CONFIG_PL330_DMA=y
 CONFIG_TEGRA20_APB_DMA=y
+CONFIG_TEGRA_GPC_DMA=m
 CONFIG_QCOM_BAM_DMA=y
 CONFIG_QCOM_HIDMA_MGMT=y
 CONFIG_QCOM_HIDMA=y
-- 
2.7.4

Re: [PATCH 4/5] cpuidle-pseries : Include extended CEDE states in cpuidle framework

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:38]:

> From: "Gautham R. Shenoy" 
> 
> This patch exposes those extended CEDE states to the cpuidle framework
> which are responsive to external interrupts and do not need an H_PROD.
> 
> Since as per the PAPR, all the extended CEDE states are non-responsive
> to timers, we indicate this to the cpuidle subsystem via the
> CPUIDLE_FLAG_TIMER_STOP flag for all those extende CEDE states which
> can wake up on external interrupts.
> 
> With the patch, we are able to see the extended CEDE state with
> latency hint = 1 exposed via the cpuidle framework.
> 
>   $ cpupower idle-info
>   CPUidle driver: pseries_idle
>   CPUidle governor: menu
>   analyzing CPU 0:
> 
>   Number of idle states: 3
>   Available idle states: snooze CEDE XCEDE1
>   snooze:
>   Flags/Description: snooze
>   Latency: 0
>   Usage: 33429446
>   Duration: 27006062
>   CEDE:
>   Flags/Description: CEDE
>   Latency: 1
>   Usage: 10272
>   Duration: 110786770
>   XCEDE1:
>   Flags/Description: XCEDE1
>   Latency: 12
>   Usage: 26445
>   Duration: 1436433815
> 
> Benchmark results:
> TLDR: Over all we do not see any additional benefit from having XCEDE1 over
> CEDE.
> 
> ebizzy :
> 2 threads bound to a big-core. With this patch, we see a 3.39%
> regression compared to with only CEDE0 latency fixup.
> x With only CEDE0 latency fixup
> * With CEDE0 latency fixup + CEDE1
> N   Min   MaxMedian   AvgStddev
> x  10   2893813   5834474   5832448 5327281.3 1055941.4
> *  10   2907329   5834923   5831398 5146614.6 1193874.8
> 
> context_switch2:
> With the context_switch2 there are no observable regressions in the
> results.
> 
> context_switch2 CPU0 CPU1 (Same Big-core, different small-cores).
> No difference with and without patch.
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500343644348778345444 345584.02 1035.1658
> * 500344310347646345776 345877.22 802.19501
> 
> context_switch2 CPU0 CPU8 (different big-cores). Minor 0.05% improvement
> with patch
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500287562288756288162 288134.76 262.24328
> * 500287874288960288306 288274.66 187.57034
> 
> schbench:
> No regressions observed with schbench
> 
> Without Patch:
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 40
>   90.0th: 50
>   95.0th: 61
>   *99.0th: 13648
>   99.5th: 14768
>   99.9th: 15664
>   min=0, max=29812
> 
> With Patch:
> Latency percentiles (usec)
>   50.0th: 30
>   75.0th: 40
>   90.0th: 51
>   95.0th: 59
>   *99.0th: 13616
>   99.5th: 14512
>   99.9th: 15696
>   min=0, max=15996
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 

> ---
>  drivers/cpuidle/cpuidle-pseries.c | 50 
> +++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 502f906..6f893cd 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -362,9 +362,59 @@ static int add_pseries_idle_states(void)
>   for (i = 0; i < nr_xcede_records; i++) {
>   u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
>   u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> + char name[CPUIDLE_NAME_LEN];
> + unsigned int latency_hint = xcede_records[i].latency_hint;
> + u64 residency_us;
> +
> + if (!xcede_records[i].responsive_to_irqs) {
> + pr_info("cpuidle : Skipping XCEDE%d. Not responsive to 
> IRQs\n",
> + latency_hint);
> + continue;
> + }
> 
>   if (latency_us < min_latency_us)
>   min_latency_us = latency_us;
> + snprintf(name, CPUIDLE_NAME_LEN, "XCEDE%d", latency_hint);
> +
> + /*
> +  * As per the section 14.14.1 of PAPR version 2.8.1
> +  * says that alling H_CEDE with the value of the cede
> +  * latency specifier set greater than zero allows the
> +  * processor timer facility to be disabled (so as not
> +  * to cause gratuitous wake-ups - the use of H_PROD,
> +  * or other external interrupt is required to wake the
> +  * processor in this case).
> +  *
> +  * So, inform the cpuidle-subsystem that the timer
> +  * will be stopped for these states.
> +  *
> +  * Also, bump up the latency by 10us, since cpu

Re: [PATCH v2] ARM: dts: zynq: Fix ethernet PHY for v5 schematics

2020-07-19 Thread Michal Simek

Hi Matwey,

On 19. 07. 20 11:57, Matwey V. Kornilov wrote:
> 
> Ping?

Are you pinging me or Anton?
I really want another person to test this patch.
If there is none else who can test it I think it is time to remove
support for this board.

Thanks,
Michal

Re: linux-next: manual merge of the tip tree with the crypto tree

2020-07-19 Thread Stephen Rothwell

Hi Uros,

On Mon, 20 Jul 2020 08:13:51 +0200 Uros Bizjak  wrote:
>
> On Mon, Jul 20, 2020 at 6:03 AM Stephen Rothwell  
> wrote:
> 
> > > Please find attached the incremental patch that puts back integer
> > > parts of inst.h. This resolves the conflict with the tip tree.  
> >
> > The tip tree change needs the XMM parts kept as well, sorry.  
> 
> Strange, because I did test my patch with the tip tree from
> 'origin/master' at commit a282cddefe90c4b21ef2c22a76a7c3ebd3ec6b86 and
> the compilation produced the same lonely rdpid %eax in
> .altinstr_replacement section. AFAICS, the header is included only for
> RDPID macro, where XMM registers are unused.
> 
> > So I ended up just removing the actual now unused crypto instruction
> > macros.  
> 
> To avoid any further troubles, this is also OK with me.

Sorry, I see what happened now.  Since your patch was not in the crypto
tree yet, I did a fixup to the tip tree merge based on your patch, but
did it by hand and didn't remove the XMM bits from the REG_TYPE
macro ...

So your original patch is probably all good (especially since you
actually tested it :-))

-- 
Cheers,
Stephen Rothwell


pgpDk0tJESrNQ.pgp
Description: OpenPGP digital signature

[PATCH 3/5] mm/hugetlb.c: Remove the unnecessary non_swap_entry()

2020-07-19 Thread Baoquan He

The checking is_migration_entry() and is_hwpoison_entry() are stricter
than non_swap_entry(), means they have covered the conditional check
which non_swap_entry() is doing.

Hence remove the unnecessary non_swap_entry() in is_hugetlb_entry_migration()
and is_hugetlb_entry_hwpoisoned() to simplify code.

Signed-off-by: Baoquan He 
---
 mm/hugetlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a58f976a9dd9..467894d8332a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3748,7 +3748,7 @@ bool is_hugetlb_entry_migration(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
-   if (non_swap_entry(swp) && is_migration_entry(swp))
+   if (is_migration_entry(swp))
return true;
else
return false;
@@ -3761,7 +3761,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
if (huge_pte_none(pte) || pte_present(pte))
return false;
swp = pte_to_swp_entry(pte);
-   if (non_swap_entry(swp) && is_hwpoison_entry(swp))
+   if (is_hwpoison_entry(swp))
return true;
else
return false;
-- 
2.17.2

[PATCH 5/5] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

2020-07-19 Thread Baoquan He

A customer complained that no any message is printed out when failed to
allocate explicitly specified number of persistent huge pages. That
specifying can be done by writing into /proc/sys/vm/nr_hugepages to
increase the persisten huge pages.

In the current code, it takes the best effort way to allocate the expected
number of huge pages. If only succeeding to get part of them, no any
information is printed out.

Here try to send out warning message if the expected number of huge pages
adjustment is not achieved, including increasing and decreasing the count
of persistent huge pages.

Signed-off-by: Baoquan He 
---
 mm/hugetlb.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 467894d8332a..1dfb5d9e4e06 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2661,7 +2661,7 @@ static int adjust_pool_surplus(struct hstate *h, 
nodemask_t *nodes_allowed,
 static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
  nodemask_t *nodes_allowed)
 {
-   unsigned long min_count, ret;
+   unsigned long min_count, ret, old_max;
NODEMASK_ALLOC(nodemask_t, node_alloc_noretry, GFP_KERNEL);
 
/*
@@ -2723,6 +2723,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned 
long count, int nid,
 * pool might be one hugepage larger than it needs to be, but
 * within all the constraints specified by the sysctls.
 */
+   old_max = persistent_huge_pages(h);
while (h->surplus_huge_pages && count > persistent_huge_pages(h)) {
if (!adjust_pool_surplus(h, nodes_allowed, -1))
break;
@@ -2779,6 +2780,16 @@ static int set_max_huge_pages(struct hstate *h, unsigned 
long count, int nid,
}
 out:
h->max_huge_pages = persistent_huge_pages(h);
+   if (count != h->max_huge_pages) {
+   char buf[32];
+
+   string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+   pr_warn("HugeTLB: %s %lu of page size %s failed. Only %s %lu 
hugepages.\n",
+   count > old_max ? "increasing" : "decreasing",
+   abs(count - old_max), buf,
+   count > old_max ? "increased" : "decreased",
+   abs(old_max - h->max_huge_pages));
+   }
spin_unlock(&hugetlb_lock);
 
NODEMASK_FREE(node_alloc_noretry);
-- 
2.17.2

[PATCH 4/5] doc/vm: fix typo in in the hugetlb admin documentation

2020-07-19 Thread Baoquan He

Change 'pecify' to 'Specify'.

Signed-off-by: Baoquan He 
---
 Documentation/admin-guide/mm/hugetlbpage.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst 
b/Documentation/admin-guide/mm/hugetlbpage.rst
index 015a5f7d7854..f7b1c7462991 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -131,7 +131,7 @@ hugepages
parameter is preceded by an invalid hugepagesz parameter, it will
be ignored.
 default_hugepagesz
-   pecify the default huge page size.  This parameter can
+   Specify the default huge page size.  This parameter can
only be specified once on the command line.  default_hugepagesz can
optionally be followed by the hugepages parameter to preallocate a
specific number of huge pages of default size.  The number of default
-- 
2.17.2

[PATCH 1/5] mm/hugetlb.c: Fix typo of glb_reserve

2020-07-19 Thread Baoquan He

The local variable is for global reservation of region.

Signed-off-by: Baoquan He 
---
 mm/hugetlb.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f24acb3af741..191a585bb315 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3649,7 +3649,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct 
*vma)
struct resv_map *resv = vma_resv_map(vma);
struct hugepage_subpool *spool = subpool_vma(vma);
unsigned long reserve, start, end;
-   long gbl_reserve;
+   long glb_reserve;
 
if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER))
return;
@@ -3664,8 +3664,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct 
*vma)
 * Decrement reserve counts.  The global reserve count may be
 * adjusted if the subpool has a minimum size.
 */
-   gbl_reserve = hugepage_subpool_put_pages(spool, reserve);
-   hugetlb_acct_memory(h, -gbl_reserve);
+   glb_reserve = hugepage_subpool_put_pages(spool, reserve);
+   hugetlb_acct_memory(h, -glb_reserve);
}
 
kref_put(&resv->refs, resv_map_release);
@@ -5054,7 +5054,7 @@ int hugetlb_reserve_pages(struct inode *inode,
struct hugepage_subpool *spool = subpool_inode(inode);
struct resv_map *resv_map;
struct hugetlb_cgroup *h_cg = NULL;
-   long gbl_reserve, regions_needed = 0;
+   long glb_reserve, regions_needed = 0;
 
/* This should never happen */
if (from > to) {
@@ -5121,10 +5121,10 @@ int hugetlb_reserve_pages(struct inode *inode,
/*
 * There must be enough pages in the subpool for the mapping. If
 * the subpool has a minimum size, there may be some global
-* reservations already in place (gbl_reserve).
+* reservations already in place (glb_reserve).
 */
-   gbl_reserve = hugepage_subpool_get_pages(spool, chg);
-   if (gbl_reserve < 0) {
+   glb_reserve = hugepage_subpool_get_pages(spool, chg);
+   if (glb_reserve < 0) {
ret = -ENOSPC;
goto out_uncharge_cgroup;
}
@@ -5133,7 +5133,7 @@ int hugetlb_reserve_pages(struct inode *inode,
 * Check enough hugepages are available for the reservation.
 * Hand the pages back to the subpool if there are not
 */
-   ret = hugetlb_acct_memory(h, gbl_reserve);
+   ret = hugetlb_acct_memory(h, glb_reserve);
if (ret < 0) {
goto out_put_pages;
}
@@ -5153,7 +5153,7 @@ int hugetlb_reserve_pages(struct inode *inode,
add = region_add(resv_map, from, to, regions_needed, h, h_cg);
 
if (unlikely(add < 0)) {
-   hugetlb_acct_memory(h, -gbl_reserve);
+   hugetlb_acct_memory(h, -glb_reserve);
goto out_put_pages;
} else if (unlikely(chg > add)) {
/*
@@ -5200,7 +5200,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long 
start, long end,
struct resv_map *resv_map = inode_resv_map(inode);
long chg = 0;
struct hugepage_subpool *spool = subpool_inode(inode);
-   long gbl_reserve;
+   long glb_reserve;
 
/*
 * Since this routine can be called in the evict inode path for all
@@ -5225,8 +5225,8 @@ long hugetlb_unreserve_pages(struct inode *inode, long 
start, long end,
 * If the subpool has a minimum size, the number of global
 * reservations to be released may be adjusted.
 */
-   gbl_reserve = hugepage_subpool_put_pages(spool, (chg - freed));
-   hugetlb_acct_memory(h, -gbl_reserve);
+   glb_reserve = hugepage_subpool_put_pages(spool, (chg - freed));
+   hugetlb_acct_memory(h, -glb_reserve);
 
return 0;
 }
-- 
2.17.2

[PATCH 0/5] mm/hugetlb: Small cleanup and improvement

2020-07-19 Thread Baoquan He

Patch 1~4 are small cleanup.

Patch 5 is to add warning message when failed to increase or decrease
the expected number of persistent huge pages by writing into
/proc/sys/vm/nr_hugepages
/sys/kernel/mm/hugepages/hugepages-xxx/nr_hugepages.

Baoquan He (5):
  mm/hugetlb.c: Fix typo of glb_reserve
  mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool
  mm/hugetlb.c: Remove the unnecessary non_swap_entry()
  doc/vm: fix typo in in the hugetlb admin documentation
  mm/hugetl.c: warn out if expected count of huge pages adjustment is
not achieved

 Documentation/admin-guide/mm/hugetlbpage.rst |  2 +-
 mm/hugetlb.c | 49 
 2 files changed, 31 insertions(+), 20 deletions(-)

-- 
2.17.2

[PATCH 2/5] mm/hugetlb.c: make is_hugetlb_entry_hwpoisoned return bool

2020-07-19 Thread Baoquan He

Just like his neighbour is_hugetlb_entry_migration() has done.

Signed-off-by: Baoquan He 
---
 mm/hugetlb.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 191a585bb315..a58f976a9dd9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3754,17 +3754,17 @@ bool is_hugetlb_entry_migration(pte_t pte)
return false;
 }
 
-static int is_hugetlb_entry_hwpoisoned(pte_t pte)
+static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
 {
swp_entry_t swp;
 
if (huge_pte_none(pte) || pte_present(pte))
-   return 0;
+   return false;
swp = pte_to_swp_entry(pte);
if (non_swap_entry(swp) && is_hwpoison_entry(swp))
-   return 1;
+   return true;
else
-   return 0;
+   return false;
 }
 
 int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
-- 
2.17.2

Re: [PATCH v10 1/2] phy: qualcomm: add qcom ipq806x dwc usb phy driver

2020-07-19 Thread Vinod Koul

On 17-07-20, 15:16, Ansuel Smith wrote:
> This has lost in the original push for the dwc3 qcom driver.
> This is needed for ipq806x SoC as without this the usb ports
> doesn't work at all.

Applied both, thanks

My script found below errors with W=1, can you please send fixes for
these

drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:140: warning: Function parameter or 
member 'phy_dwc3' not described in 'usb_phy_write_readback'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:140: warning: Function parameter or 
member 'offset' not described in 'usb_phy_write_readback'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:140: warning: Function parameter or 
member 'mask' not described in 'usb_phy_write_readback'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:140: warning: Function parameter or 
member 'val' not described in 'usb_phy_write_readback'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:182: warning: Function parameter or 
member 'phy_dwc3' not described in 'usb_ss_write_phycreg'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:182: warning: Function parameter or 
member 'addr' not described in 'usb_ss_write_phycreg'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:182: warning: Function parameter or 
member 'val' not described in 'usb_ss_write_phycreg'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:219: warning: Function parameter or 
member 'phy_dwc3' not described in 'usb_ss_read_phycreg'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:219: warning: Function parameter or 
member 'addr' not described in 'usb_ss_read_phycreg'
drivers/phy/qualcomm/phy-qcom-ipq806x-usb.c:219: warning: Function parameter or 
member 'val' not described in 'usb_ss_read_phycreg'

-- 
~Vinod

Re: [PATCH 3/5] cpuidle-pseries : Fixup exit latency for CEDE(0)

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:37]:

> From: "Gautham R. Shenoy" 
> 
> We are currently assuming that CEDE(0) has exit latency 10us, since
> there is no way for us to query from the platform.  However, if the
> wakeup latency of an Extended CEDE state is smaller than 10us, then we
> can be sure that the exit latency of CEDE(0) cannot be more than that.
> that.
> 
> In this patch, we fix the exit latency of CEDE(0) if we discover an
> Extended CEDE state with wakeup latency smaller than 10us. The new
> value is 1us lesser than the smallest wakeup latency among the
> Extended CEDE states.
> 
> Benchmark results:
> 
> ebizzy:
> 2 ebizzy threads bound to the same big-core. 25% improvement in the
> avg records/s with patch.
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x  10   2491089   5834307   5398375   4244335 1596244.9
> *  10   2893813   5834474   5832448 5327281.3 1055941.4
> 
> context_switch2 :
> There is no major regression observed with this patch as seen from the
> context_switch2 benchmark.
> 
> context_switch2 across CPU0 CPU1 (Both belong to same big-core, but different
> small cores). We observe a minor 0.14% regression in the number of
> context-switches (higher is better).
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500348872362236354712 354745.69  2711.827
> * 500349422361452353942  354215.4 2576.9258
> 
> context_switch2 across CPU0 CPU8 (Different big-cores). We observe a 0.37%
> improvement in the number of context-switches (higher is better).
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500287956294940288896 288977.23 646.59295
> * 500288300294646289582 290064.76 1161.9992
> 
> schbench:
> No major difference could be seen until the 99.9th percentile.
> 
> Without-patch
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 39
>   90.0th: 49
>   95.0th: 59
>   *99.0th: 13104
>   99.5th: 14672
>   99.9th: 15824
>   min=0, max=17993
> 
> With-patch:
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 40
>   90.0th: 50
>   95.0th: 61
>   *99.0th: 13648
>   99.5th: 14768
>   99.9th: 15664
>   min=0, max=29812
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 


> ---
>  drivers/cpuidle/cpuidle-pseries.c | 34 --
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index c13549b..502f906 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -353,12 +353,42 @@ static int pseries_cpuidle_driver_init(void)
>  static int add_pseries_idle_states(void)
>  {
>   int nr_states = 2; /* By default we have snooze, CEDE */
> + int i;
> + u64 min_latency_us = dedicated_states[1].exit_latency; /* CEDE latency 
> */
> 
>   if (parse_cede_parameters())
>   return nr_states;
> 
> - pr_info("cpuidle : Skipping the %d Extended CEDE idle states\n",
> - nr_xcede_records);
> + for (i = 0; i < nr_xcede_records; i++) {
> + u64 latency_tb = xcede_records[i].wakeup_latency_tb_ticks;
> + u64 latency_us = tb_to_ns(latency_tb) / NSEC_PER_USEC;
> +
> + if (latency_us < min_latency_us)
> + min_latency_us = latency_us;
> + }
> +
> + /*
> +  * We are currently assuming that CEDE(0) has exit latency
> +  * 10us, since there is no way for us to query from the
> +  * platform.
> +  *
> +  * However, if the wakeup latency of an Extended CEDE state is
> +  * smaller than 10us, then we can be sure that CEDE(0)
> +  * requires no more than that.
> +  *
> +  * Perform the fix-up.
> +  */
> + if (min_latency_us < dedicated_states[1].exit_latency) {
> + u64 cede0_latency = min_latency_us - 1;
> +
> + if (cede0_latency <= 0)
> + cede0_latency = min_latency_us;
> +
> + dedicated_states[1].exit_latency = cede0_latency;
> + dedicated_states[1].target_residency = 10 * (cede0_latency);
> + pr_info("cpuidle : Fixed up CEDE exit latency to %llu us\n",
> + cede0_latency);
> + }


As per PAPR spec the CEDE hints are in increasing order of exit
latency.  Hence a given state's exit latency cannot exceed the one
following it.  The quirk is such that the first one (hint 0) is
implicit and hence we have to use the above logic to extract its
characteristics.

--Vaidy

[PATCH] net: hns: use eth_broadcast_addr() to assign broadcast address

2020-07-19 Thread Xu Wang

This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().

Signed-off-by: Xu Wang 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 1c5243cc1dc6..acfa86e5296f 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -1724,7 +1724,7 @@ static void hns_dsaf_setup_mc_mask(struct dsaf_device 
*dsaf_dev,
   u8 port_num, u8 *mask, u8 *addr)
 {
if (MAC_IS_BROADCAST(addr))
-   memset(mask, 0xff, ETH_ALEN);
+   eth_broadcast_addr(mask);
else
memcpy(mask, dsaf_dev->mac_cb[port_num]->mc_mask, ETH_ALEN);
 }
-- 
2.17.1

Re: [PATCH 1/1] riscv: Enable compiler optimizations

2020-07-19 Thread Ard Biesheuvel

On Sun, 19 Jul 2020 at 07:45, Palmer Dabbelt  wrote:
>
> On Tue, 07 Jul 2020 22:43:54 PDT (-0700), maoche...@eswin.com wrote:
> > Enable ARCH_HAS_FAST_MULTIPLIER and ARCH_SUPPORTS_INT128 for better
> > code generation.
> > These 2 configurations works fine on GCC-9.3 and GCC-10.1
> >
> > Signed-off-by: Chenxi Mao 
> > ---
> >  arch/riscv/Kconfig | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 128192e14ff2..f21b7e5188ca 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -26,6 +26,8 @@ config RISCV
> >   select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> >   select ARCH_WANT_FRAME_POINTERS
> >   select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> > + select ARCH_HAS_FAST_MULTIPLIER
> > + select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 5 || CC_IS_CLANG
>
> IIRC int128 is only supported on 64-bit targets, and we already have this 
> under
> RV64I but with a different set of requirements.  It looks like everyone has a
> different set of requirements, but I feel like CC_HAS_INT128 is the right way
> to go.  I'm not sure why we have the version check there.  Ard: do you know 
> why
> we would need both?
>

That looks like copy/paste from the arm64 tree to me, where the names
of the libgcc routines were changed at some point, and we only support
the newer ones in the kernel.

> The fast multiplier one looks fine to me.  They're supposed to stay
> alphabetical, though.
>
> >   select CLONE_BACKWARDS
> >   select COMMON_CLK
> >   select EDAC_SUPPORT

Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

2020-07-19 Thread Anshuman Khandual

On 07/17/2020 10:32 PM, Mike Kravetz wrote:
> On 7/16/20 10:02 PM, Anshuman Khandual wrote:
>>
>>
>> On 07/16/2020 11:55 PM, Mike Kravetz wrote:
>>> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001
>>> From: Mike Kravetz 
>>> Date: Tue, 14 Jul 2020 15:54:46 -0700
>>> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic
>>>  hstate
>>>
>>> Instead of calling hugetlb_cma_reserve() directly from arch specific
>>> code, call from hugetlb_add_hstate when adding a gigantic hstate.
>>> hugetlb_add_hstate is either called from arch specific huge page setup,
>>> or as the result of hugetlb command line processing.  In either case,
>>> this is late enough in the init process that all numa memory information
>>> should be initialized.  And, it is early enough to still use early
>>> memory allocator.
>>
>> This assumes that hugetlb_add_hstate() is called from the arch code at
>> the right point in time for the generic HugeTLB to do the required CMA
>> reservation which is not ideal. I guess it must have been a reason why
>> CMA reservation should always called by the platform code which knows
>> the boot sequence timing better.
> 
> Actually, the code does not make the assumption that hugetlb_add_hstate
> is called from arch specific huge page setup.  It can even be called later
> at the time of hugetlb command line processing.

Yes, now that hugetlb_cma_reserve() has been moved into hugetlb_add_hstate().
But then there is an explicit warning while trying to mix both the command
line options i.e hugepagesz= and hugetlb_cma=. The proposed code here have
not changed that behavior and hence the following warning should have been
triggered here as well.

1) hugepagesz_setup()
hugetlb_add_hstate()
 hugetlb_cma_reserve()

2) hugepages_setup()
hugetlb_hstate_alloc_pages()when order >= MAX_ORDER

if (hstate_is_gigantic(h)) {
if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip 
boot time allocation\n");
break;
}
if (!alloc_bootmem_huge_page(h))
break;
}

Nonetheless, it does not make sense to mix both memblock and CMA based huge
page pre-allocations. But looking at this again, could this warning be ever
triggered till now ? Unless, a given platform calls hugetlb_cma_reserve()
before _setup("hugepages=", hugepages_setup). Anyways, there seems to be
good reasons to keep both memblock and CMA based pre-allocations in place.
But mixing them together (as done in the proposed code here) does not seem
to be right.

> 
> My 'reasoning' is that gigantic pages can currently be preallocated from
> bootmem/memblock_alloc at the time of command line processing.  Therefore,
> we should be able to reserve bootmem for CMA at the same time.  Is there
> something wrong with this reasoning?  I tested this on x86 by removing the
> call to hugetlb_add_hstate from arch specific code and instead forced the
> call at command line processing time.  The ability to reserve CMA was the
> same.

There is no problem with that reasoning. __setup() triggered function should
be able perform CMA reservation. But as pointed out before, it does not make
sense to mix both CMA reservation and memblock based pre-allocation.

> 
> Yes, the CMA reservation interface says it should be called from arch
> specific code.  However, if we currently depend on the ability to do
> memblock_alloc at hugetlb command line processing time for gigantic page
> preallocation, then I think we can do the CMA reservation here as well.

IIUC, CMA reservation and memblock alloc have some differences in terms of
how the memory can be used later on, will have to dig deeper on this. But
the comment section near cma_declare_contiguous_nid() is a concern.

 * This function reserves memory from early allocator. It should be
 * called by arch specific code once the early allocator (memblock or bootmem)
 * has been activated and all other subsystems have already allocated/reserved
 * memory. This function allows to create custom reserved areas.

> 
> Thinking about it some more, I suppose there could be some arch code that
> could call hugetlb_add_hstate too early in the boot process.  But, I do
> not think we have an issue with calling it too late.
> 

Calling it too late might have got the page allocator initialized completely
and then CMA reservation would not be possible afterwards. Also calling it
too early would prevent other subsystems which might need memory reservation
in specific physical ranges.

Re: [PATCH v2] libbpf bpf_helpers: Use __builtin_offsetof for offsetof

2020-07-19 Thread Andrii Nakryiko

On Sun, Jul 19, 2020 at 11:18 PM Ian Rogers  wrote:
>
> The non-builtin route for offsetof has a dependency on size_t from
> stdlib.h/stdint.h that is undeclared and may break targets.
> The offsetof macro in bpf_helpers may disable the same macro in other
> headers that have a #ifdef offsetof guard. Rather than add additional
> dependencies improve the offsetof macro declared here to use the
> builtin that is available since llvm 3.7 (the first with a BPF backend).
>
> Signed-off-by: Ian Rogers 
> ---

Acked-by: Andrii Nakryiko 

>  tools/lib/bpf/bpf_helpers.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index a510d8ed716f..bc14db706b88 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -40,7 +40,7 @@
>   * Helper macro to manipulate data structures
>   */
>  #ifndef offsetof
> -#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
> +#define offsetof(TYPE, MEMBER)  __builtin_offsetof(TYPE, MEMBER)
>  #endif
>  #ifndef container_of
>  #define container_of(ptr, type, member)\
> --
> 2.28.0.rc0.105.gf9edc3c819-goog
>

Re: [PATCH 3/5] dma-mapping: make support for dma ops optional

2020-07-19 Thread Christoph Hellwig

On Sat, Jul 18, 2020 at 10:17:14AM -0700, Guenter Roeck wrote:
> On Wed, Jul 08, 2020 at 05:24:47PM +0200, Christoph Hellwig wrote:
> > Avoid the overhead of the dma ops support for tiny builds that only
> > use the direct mapping.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> For ppc:pmac32_defconfig and other configurations, this patch results in:

Fixed and force pushed.

[PATCH v2] libbpf bpf_helpers: Use __builtin_offsetof for offsetof

2020-07-19 Thread Ian Rogers

The non-builtin route for offsetof has a dependency on size_t from
stdlib.h/stdint.h that is undeclared and may break targets.
The offsetof macro in bpf_helpers may disable the same macro in other
headers that have a #ifdef offsetof guard. Rather than add additional
dependencies improve the offsetof macro declared here to use the
builtin that is available since llvm 3.7 (the first with a BPF backend).

Signed-off-by: Ian Rogers 
---
 tools/lib/bpf/bpf_helpers.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index a510d8ed716f..bc14db706b88 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -40,7 +40,7 @@
  * Helper macro to manipulate data structures
  */
 #ifndef offsetof
-#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
+#define offsetof(TYPE, MEMBER)  __builtin_offsetof(TYPE, MEMBER)
 #endif
 #ifndef container_of
 #define container_of(ptr, type, member)\
-- 
2.28.0.rc0.105.gf9edc3c819-goog

Re: [PATCH v1 07/15] scsi: hisi_sas_v3_hw: use generic power management

2020-07-19 Thread chenxiang (M)


Hi Vaibhav,

在 2020/7/17 14:34, Vaibhav Gupta 写道:

With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states.

After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.

The driver was calling pci_save/restore_state(), pci_choose_state(),
pci_enable/disable_device() and pci_set_power_state() which is no more
needed.

Compile-tested only.

Signed-off-by: Vaibhav Gupta 


Reviewed-by: Xiang Chen 
Just a small comment, below.


---
  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 32 --
  1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 55e2321a65bc..45605a520bc8 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -3374,13 +3374,13 @@ enum {
hip08,
  };
  
-static int hisi_sas_v3_suspend(struct pci_dev *pdev, pm_message_t state)

+static int __maybe_unused hisi_sas_v3_suspend(struct device *dev_d)
  {
+   struct pci_dev *pdev = to_pci_dev(dev_d);
struct sas_ha_struct *sha = pci_get_drvdata(pdev);
struct hisi_hba *hisi_hba = sha->lldd_ha;
struct device *dev = hisi_hba->dev;
struct Scsi_Host *shost = hisi_hba->shost;
-   pci_power_t device_state;
int rc;
  
  	if (!pdev->pm_cap) {

@@ -3406,21 +3406,15 @@ static int hisi_sas_v3_suspend(struct pci_dev *pdev, 
pm_message_t state)
  
  	hisi_sas_init_mem(hisi_hba);
  
-	device_state = pci_choose_state(pdev, state);

-   dev_warn(dev, "entering operating state [D%d]\n",
-   device_state);


Please retain above print to keep consistence with the print in function 
hisi_sas_v3_resume().



-   pci_save_state(pdev);
-   pci_disable_device(pdev);
-   pci_set_power_state(pdev, device_state);
-
hisi_sas_release_tasks(hisi_hba);
  
  	sas_suspend_ha(sha);

return 0;
  }
  
-static int hisi_sas_v3_resume(struct pci_dev *pdev)

+static int __maybe_unused hisi_sas_v3_resume(struct device *dev_d)
  {
+   struct pci_dev *pdev = to_pci_dev(dev_d);
struct sas_ha_struct *sha = pci_get_drvdata(pdev);
struct hisi_hba *hisi_hba = sha->lldd_ha;
struct Scsi_Host *shost = hisi_hba->shost;
@@ -3430,16 +3424,8 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)
  
  	dev_warn(dev, "resuming from operating state [D%d]\n",

 device_state);
-   pci_set_power_state(pdev, PCI_D0);
-   pci_enable_wake(pdev, PCI_D0, 0);
-   pci_restore_state(pdev);
-   rc = pci_enable_device(pdev);
-   if (rc) {
-   dev_err(dev, "enable device failed during resume (%d)\n", rc);
-   return rc;
-   }
+   device_wakeup_disable(dev_d);
  
-	pci_set_master(pdev);

scsi_unblock_requests(shost);
clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
  
@@ -3447,7 +3433,6 @@ static int hisi_sas_v3_resume(struct pci_dev *pdev)

rc = hw_init_v3_hw(hisi_hba);
if (rc) {
scsi_remove_host(shost);
-   pci_disable_device(pdev);
return rc;
}
hisi_hba->hw->phys_init(hisi_hba);
@@ -3468,13 +3453,16 @@ static const struct pci_error_handlers 
hisi_sas_err_handler = {
.reset_done = hisi_sas_reset_done_v3_hw,
  };
  
+static SIMPLE_DEV_PM_OPS(hisi_sas_v3_pm_ops,

+hisi_sas_v3_suspend,
+hisi_sas_v3_resume);
+
  static struct pci_driver sas_v3_pci_driver = {
.name   = DRV_NAME,
.id_table   = sas_v3_pci_table,
.probe  = hisi_sas_v3_probe,
.remove = hisi_sas_v3_remove,
-   .suspend= hisi_sas_v3_suspend,
-   .resume = hisi_sas_v3_resume,
+   .driver.pm  = &hisi_sas_v3_pm_ops,
.err_handler= &hisi_sas_err_handler,
  };

Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

2020-07-19 Thread Anshuman Khandual




On 07/17/2020 11:07 PM, Mike Kravetz wrote:
> On 7/17/20 2:51 AM, Anshuman Khandual wrote:
>>
>>
>> On 07/17/2020 02:06 PM, Will Deacon wrote:
>>> On Fri, Jul 17, 2020 at 10:32:53AM +0530, Anshuman Khandual wrote:


 On 07/16/2020 11:55 PM, Mike Kravetz wrote:
> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001
> From: Mike Kravetz 
> Date: Tue, 14 Jul 2020 15:54:46 -0700
> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic
>  hstate
>
> Instead of calling hugetlb_cma_reserve() directly from arch specific
> code, call from hugetlb_add_hstate when adding a gigantic hstate.
> hugetlb_add_hstate is either called from arch specific huge page setup,
> or as the result of hugetlb command line processing.  In either case,
> this is late enough in the init process that all numa memory information
> should be initialized.  And, it is early enough to still use early
> memory allocator.

 This assumes that hugetlb_add_hstate() is called from the arch code at
 the right point in time for the generic HugeTLB to do the required CMA
 reservation which is not ideal. I guess it must have been a reason why
 CMA reservation should always called by the platform code which knows
 the boot sequence timing better.
>>>
>>> Ha, except we've moved it around two or three times already in the last
>>> month or so, so I'd say we don't have a clue when to call it in the arch
>>> code.
>>
>> The arch dependency is not going way with this change either. Just that
>> its getting transferred to hugetlb_add_hstate() which gets called from
>> arch_initcall() in every architecture.
>>
>> The perfect timing here happens to be because of arch_initcall() instead.
>> This is probably fine, as long as
>>
>> 0. hugetlb_add_hstate() is always called at arch_initcall()
> 
> In another reply, I give reasoning why it would be safe to call even later
> at hugetlb command line processing time.

Understood, but there is a time window in which CMA reservation is available
irrespective of whether it is called from arch or generic code. Finding this
right time window and ensuring that N_MEMORY nodemask is initialized, easier
done in the platform code.

> 
>> 1. N_MEMORY mask is guaranteed to be initialized at arch_initcall()
> 
> This is a bit more difficult to guarantee.  I find the init sequence hard to
> understand.  Looking at the arm code, arch_initcall(hugetlbpage_init)
> happens after N_MEMORY mask is setup.  I can't imagine any arch code setting
> up huge pages before N_MEMORY.  But, I suppose it is possible and we would
> need to somehow guarantee this.

Ensuring that N_MEMORY nodemask is initialized from the generic code is even
more difficult.

> 
>> 2. CMA reservation is available to be called at arch_initcall()
> 
> Since I am pretty sure we can delay the reservation until hugetlb command
> line processing time, it would be great if it was always done there.

But moving hugetlb CMA reservation completely during command line processing
has got another concern of mixing with existing memblock based pre-allocation.

> Unfortunately, I can not immediately think of an easy way to do this.
> 

It is rationale to move CMA reservation stuff into generic HugeTLB but there
are some challenges which needs to be solved comprehensively. The patch here
from Barry does solve a short term problem (N_ONLINE ---> N_MEMORY) for now
which IMHO should be considered. Moving CMA reservation into generic HugeTLB
would require some more thoughts and can be attempted later.

Re: linux-next: manual merge of the tip tree with the crypto tree

2020-07-19 Thread Uros Bizjak

On Mon, Jul 20, 2020 at 6:03 AM Stephen Rothwell  wrote:

> > Please find attached the incremental patch that puts back integer
> > parts of inst.h. This resolves the conflict with the tip tree.
>
> The tip tree change needs the XMM parts kept as well, sorry.

Strange, because I did test my patch with the tip tree from
'origin/master' at commit a282cddefe90c4b21ef2c22a76a7c3ebd3ec6b86 and
the compilation produced the same lonely rdpid %eax in
.altinstr_replacement section. AFAICS, the header is included only for
RDPID macro, where XMM registers are unused.

> So I ended up just removing the actual now unused crypto instruction
> macros.

To avoid any further troubles, this is also OK with me.

Uros.

[PATCH V3 2/4] soc: qcom: smd-rpm: Add IPQ6018 compatible

2020-07-19 Thread Kathiravan T

This patch adds a compatible for the rpm on the Qualcomm IPQ6018 platform.

Signed-off-by: Kathiravan T 
---
 drivers/soc/qcom/smd-rpm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/qcom/smd-rpm.c b/drivers/soc/qcom/smd-rpm.c
index 877b138..0ebd3ad 100644
--- a/drivers/soc/qcom/smd-rpm.c
+++ b/drivers/soc/qcom/smd-rpm.c
@@ -230,6 +230,7 @@ static void qcom_smd_rpm_remove(struct rpmsg_device *rpdev)
 
 static const struct of_device_id qcom_smd_rpm_of_match[] = {
{ .compatible = "qcom,rpm-apq8084" },
+   { .compatible = "qcom,rpm-ipq6018" },
{ .compatible = "qcom,rpm-msm8916" },
{ .compatible = "qcom,rpm-msm8936" },
{ .compatible = "qcom,rpm-msm8974" },
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH V3 4/4] dt-bindings: regulator: convert QCOM SMD-RPM regulator document to YAML schema

2020-07-19 Thread Kathiravan T

Convert qcom,smd-rpm-regulator.txt document to YAML schema

Reviewed-by: Rob Herring 
Signed-off-by: Kathiravan T 
---
 .../bindings/regulator/qcom,smd-rpm-regulator.txt  | 321 -
 .../bindings/regulator/qcom,smd-rpm-regulator.yaml | 106 +++
 2 files changed, 106 insertions(+), 321 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
 create mode 100644 
Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml

diff --git 
a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt 
b/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
deleted file mode 100644
index 728c001..000
--- a/Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
+++ /dev/null
@@ -1,321 +0,0 @@
-QCOM SMD RPM REGULATOR
-
-The Qualcomm RPM over SMD regulator is modelled as a subdevice of the RPM.
-Because SMD is used as the communication transport mechanism, the RPM resides 
as
-a subnode of the SMD.  As such, the SMD-RPM regulator requires that the SMD and
-RPM nodes be present.
-
-Please refer to Documentation/devicetree/bindings/soc/qcom/qcom,smd.txt for
-information pertaining to the SMD node.
-
-Please refer to Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt for
-information regarding the RPM node.
-
-== Regulator
-
-Regulator nodes are identified by their compatible:
-
-- compatible:
-   Usage: required
-   Value type: 
-   Definition: must be one of:
-   "qcom,rpm-mp5496-regulators"
-   "qcom,rpm-pm8841-regulators"
-   "qcom,rpm-pm8916-regulators"
-   "qcom,rpm-pm8941-regulators"
-   "qcom,rpm-pm8950-regulators"
-   "qcom,rpm-pm8994-regulators"
-   "qcom,rpm-pm8998-regulators"
-   "qcom,rpm-pma8084-regulators"
-   "qcom,rpm-pmi8994-regulators"
-   "qcom,rpm-pmi8998-regulators"
-   "qcom,rpm-pms405-regulators"
-
-- vdd_s1-supply:
-- vdd_s2-supply:
-- vdd_s3-supply:
-- vdd_s4-supply:
-- vdd_s5-supply:
-- vdd_s6-supply:
-- vdd_s7-supply:
-- vdd_s8-supply:
-   Usage: optional (pm8841 only)
-   Value type: 
-   Definition: reference to regulator supplying the input pin, as
-   described in the data sheet
-
-- vdd_s1-supply:
-- vdd_s2-supply:
-- vdd_s3-supply:
-- vdd_s4-supply:
-- vdd_l1_l2_l3-supply:
-- vdd_l4_l5_l6-supply:
-- vdd_l7-supply:
-- vdd_l8_l9_l10_l11_l12_l13_l14_l15_l16_l17_l18-supply:
-   Usage: optional (pm8916 only)
-   Value type: 
-   Definition: reference to regulator supplying the input pin, as
-   described in the data sheet
-
-- vdd_s1-supply:
-- vdd_s2-supply:
-- vdd_s3-supply:
-- vdd_s4-supply:
-- vdd_s4-supply:
-- vdd_s5-supply:
-- vdd_s6-supply:
-- vdd_l1_l19-supply:
-- vdd_l2_l23-supply:
-- vdd_l3-supply:
-- vdd_l4_l5_l6_l7_l16-supply:
-- vdd_l8_l11_l12_l17_l22-supply:
-- vdd_l9_l10_l13_l14_l15_l18-supply:
-- vdd_l20-supply:
-- vdd_l21-supply:
-   Usage: optional (pm8950 only)
-   Value type: 
-   Definition: reference to regulator supplying the input pin, as
-   described in the data sheet
-
-- vdd_s1-supply:
-- vdd_s2-supply:
-- vdd_s3-supply:
-- vdd_l1_l3-supply:
-- vdd_l2_lvs1_2_3-supply:
-- vdd_l4_l11-supply:
-- vdd_l5_l7-supply:
-- vdd_l6_l12_l14_l15-supply:
-- vdd_l8_l16_l18_l19-supply:
-- vdd_l9_l10_l17_l22-supply:
-- vdd_l13_l20_l23_l24-supply:
-- vdd_l21-supply:
-- vin_5vs-supply:
-   Usage: optional (pm8941 only)
-   Value type: 
-   Definition: reference to regulator supplying the input pin, as
-   described in the data sheet
-
-- vdd_s1-supply:
-- vdd_s2-supply:
-- vdd_s3-supply:
-- vdd_s4-supply:
-- vdd_s5-supply:
-- vdd_s6-supply:
-- vdd_s7-supply:
-- vdd_s8-supply:
-- vdd_s9-supply:
-- vdd_s10-supply:
-- vdd_s11-supply:
-- vdd_s12-supply:
-- vdd_l1-supply:
-- vdd_l2_l26_l28-supply:
-- vdd_l3_l11-supply:
-- vdd_l4_l27_l31-supply:
-- vdd_l5_l7-supply:
-- vdd_l6_l12_l32-supply:
-- vdd_l5_l7-supply:
-- vdd_l8_l16_l30-supply:
-- vdd_l9_l10_l18_l22-supply:
-- vdd_l9_l10_l18_l22-supply:
-- vdd_l3_l11-supply:
-- vdd_l6_l12_l32-supply:
-- vdd_l13_l19_l23_l24-supply:
-- vdd_l14_l15-supply:
-- vdd_l14_l15-supply:
-- vdd_l8_l16_l30-supply:
-- vdd_l17_l29-supply:
-- vdd_l9_l10_l18_l22-supply:
-- vdd_l13_l19_l23_l24-supply:
-- vdd_l20_l21-supply:
-- vdd_l20_l21-supply:
-- vdd_l9_l10_l18_l22-supply:
-- vdd_l13_l19_l23_l24-supply:
-- vdd_l13_l19_l23_l24-supply:
-- vdd_l25-supply:
-- vdd_l2_l26_l28-supply:
-- vdd_l4_l27_l31-supply:
-- vdd_l2_l26_l28-supply:
-- vdd_l17_l29-supply:
-- vdd_l8_l16_l30-supply:
-- vdd_l4_l27_l31-supply:
-- vdd_l6_l12_l32-supply:
-- vdd_lvs1_2-supply:
-   Usage: optional (pm8994 only)
-   Value type: 
-   Definition: reference to regulator supplying the input pin, as
-   described in the data sheet
-
-- vdd

[PATCH V3 3/4] dt-bindings: soc: qcom: convert the SMD-RPM document to YAML schema

2020-07-19 Thread Kathiravan T

Convert the qcom,smd-rpm.txt document to YAML schema

Signed-off-by: Kathiravan T 
---
 .../devicetree/bindings/soc/qcom/qcom,smd-rpm.txt  | 65 
 .../devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml | 87 ++
 2 files changed, 87 insertions(+), 65 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt 
b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
deleted file mode 100644
index 4c9c6fc..000
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-Qualcomm Resource Power Manager (RPM) over SMD
-
-This driver is used to interface with the Resource Power Manager (RPM) found in
-various Qualcomm platforms. The RPM allows each component in the system to vote
-for state of the system resources, such as clocks, regulators and bus
-frequencies.
-
-The SMD information for the RPM edge should be filled out.  See qcom,smd.txt 
for
-the required edge properties.  All SMD related properties will reside within 
the
-RPM node itself.
-
-= SUBDEVICES
-
-The RPM exposes resources to its subnodes.  The rpm_requests node must be
-present and this subnode may contain children that designate regulator
-resources.
-
-- compatible:
-   Usage: required
-   Value type: 
-   Definition: must be one of:
-   "qcom,rpm-apq8084"
-   "qcom,rpm-ipq6018"
-   "qcom,rpm-msm8916"
-   "qcom,rpm-msm8936"
-   "qcom,rpm-msm8974"
-   "qcom,rpm-msm8976"
-   "qcom,rpm-msm8994"
-   "qcom,rpm-msm8998"
-   "qcom,rpm-sdm660"
-   "qcom,rpm-qcs404"
-
-- qcom,smd-channels:
-   Usage: required
-   Value type: 
-   Definition: must be "rpm_requests"
-
-Refer to Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
-for information on the regulator subnodes that can exist under the 
rpm_requests.
-
-Example:
-
-   soc {
-   apcs: syscon@f9011000 {
-   compatible = "syscon";
-   reg = <0xf9011000 0x1000>;
-   };
-   };
-
-   smd {
-   compatible = "qcom,smd";
-
-   rpm {
-   interrupts = <0 168 1>;
-   qcom,ipc = <&apcs 8 0>;
-   qcom,smd-edge = <15>;
-
-   rpm_requests {
-   compatible = "qcom,rpm-msm8974";
-   qcom,smd-channels = "rpm_requests";
-
-   ...
-   };
-   };
-   };
diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml 
b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml
new file mode 100644
index 000..468d658
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml
@@ -0,0 +1,87 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/soc/qcom/qcom,smd-rpm.yaml#";
+$schema: "http://devicetree.org/meta-schemas/core.yaml#";
+
+title: Qualcomm Resource Power Manager (RPM) over SMD
+
+description: |
+  This driver is used to interface with the Resource Power Manager (RPM) found
+  in various Qualcomm platforms. The RPM allows each component in the system
+  to vote for state of the system resources, such as clocks, regulators and bus
+  frequencies.
+
+  The SMD information for the RPM edge should be filled out.  See qcom,smd.txt
+  for the required edge properties.  All SMD related properties will reside
+  within the RPM node itself.
+
+  The RPM exposes resources to its subnodes.  The rpm_requests node must be
+  present and this subnode may contain children that designate regulator
+  resources.
+
+  Refer to 
Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
+  for information on the regulator subnodes that can exist under the
+  rpm_requests.
+
+maintainers:
+  - Kathiravan T 
+
+properties:
+  compatible:
+enum:
+  - qcom,rpm-apq8084
+  - qcom,rpm-ipq6018
+  - qcom,rpm-msm8916
+  - qcom,rpm-msm8974
+  - qcom,rpm-msm8976
+  - qcom,rpm-msm8996
+  - qcom,rpm-msm8998
+  - qcom,rpm-sdm660
+  - qcom,rpm-qcs404
+
+  qcom,smd-channels:
+$ref: /schemas/types.yaml#/definitions/string-array
+description: Channel name used for the RPM communication
+items:
+  - const: rpm_requests
+
+if:
+  properties:
+compatible:
+  contains:
+enum:
+  - qcom,rpm-apq8084
+  - qcom,rpm-msm8916
+  - qcom,rpm-msm8974
+then:
+  required:
+- qcom,smd-channels
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+
+

[PATCH V3 1/4] dt-bindings: soc: qcom: Add IPQ6018 compatible

2020-07-19 Thread Kathiravan T

This patch adds the dt-binding for the rpm on the Qualcomm IPQ6018
platform.

Signed-off-by: Kathiravan T 
---
 Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt 
b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
index 005940f9..4c9c6fc 100644
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
@@ -20,6 +20,7 @@ resources.
Value type: 
Definition: must be one of:
"qcom,rpm-apq8084"
+   "qcom,rpm-ipq6018"
"qcom,rpm-msm8916"
"qcom,rpm-msm8936"
"qcom,rpm-msm8974"
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

[PATCH] mmc: sdhci-cadence: do not use hardware tuning for SD mode

2020-07-19 Thread Masahiro Yamada

As commit ef6b75671b5f ("mmc: sdhci-cadence: send tune request twice to
work around errata") stated, this IP has an errata. This commit applies
the second workaround for the SD mode.

Due to the errata, it is not possible to use the hardware tuning provided
by SDHCI_HOST_CONTROL2.

Use the software-controlled tuning like the eMMC mode.

Set sdhci_host_ops::platform_execute_tuning instead of overriding
mmc_host_ops::execute_tuning.

Signed-off-by: Masahiro Yamada 
---

 drivers/mmc/host/sdhci-cadence.c | 123 ---
 1 file changed, 62 insertions(+), 61 deletions(-)

diff --git a/drivers/mmc/host/sdhci-cadence.c b/drivers/mmc/host/sdhci-cadence.c
index 4a6c9ba82538..4d9f7681817c 100644
--- a/drivers/mmc/host/sdhci-cadence.c
+++ b/drivers/mmc/host/sdhci-cadence.c
@@ -202,57 +202,6 @@ static u32 sdhci_cdns_get_emmc_mode(struct sdhci_cdns_priv 
*priv)
return FIELD_GET(SDHCI_CDNS_HRS06_MODE, tmp);
 }
 
-static void sdhci_cdns_set_uhs_signaling(struct sdhci_host *host,
-unsigned int timing)
-{
-   struct sdhci_cdns_priv *priv = sdhci_cdns_priv(host);
-   u32 mode;
-
-   switch (timing) {
-   case MMC_TIMING_MMC_HS:
-   mode = SDHCI_CDNS_HRS06_MODE_MMC_SDR;
-   break;
-   case MMC_TIMING_MMC_DDR52:
-   mode = SDHCI_CDNS_HRS06_MODE_MMC_DDR;
-   break;
-   case MMC_TIMING_MMC_HS200:
-   mode = SDHCI_CDNS_HRS06_MODE_MMC_HS200;
-   break;
-   case MMC_TIMING_MMC_HS400:
-   if (priv->enhanced_strobe)
-   mode = SDHCI_CDNS_HRS06_MODE_MMC_HS400ES;
-   else
-   mode = SDHCI_CDNS_HRS06_MODE_MMC_HS400;
-   break;
-   default:
-   mode = SDHCI_CDNS_HRS06_MODE_SD;
-   break;
-   }
-
-   sdhci_cdns_set_emmc_mode(priv, mode);
-
-   /* For SD, fall back to the default handler */
-   if (mode == SDHCI_CDNS_HRS06_MODE_SD)
-   sdhci_set_uhs_signaling(host, timing);
-}
-
-static const struct sdhci_ops sdhci_cdns_ops = {
-   .set_clock = sdhci_set_clock,
-   .get_timeout_clock = sdhci_cdns_get_timeout_clock,
-   .set_bus_width = sdhci_set_bus_width,
-   .reset = sdhci_reset,
-   .set_uhs_signaling = sdhci_cdns_set_uhs_signaling,
-};
-
-static const struct sdhci_pltfm_data sdhci_cdns_uniphier_pltfm_data = {
-   .ops = &sdhci_cdns_ops,
-   .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
-};
-
-static const struct sdhci_pltfm_data sdhci_cdns_pltfm_data = {
-   .ops = &sdhci_cdns_ops,
-};
-
 static int sdhci_cdns_set_tune_val(struct sdhci_host *host, unsigned int val)
 {
struct sdhci_cdns_priv *priv = sdhci_cdns_priv(host);
@@ -286,23 +235,24 @@ static int sdhci_cdns_set_tune_val(struct sdhci_host 
*host, unsigned int val)
return 0;
 }
 
-static int sdhci_cdns_execute_tuning(struct mmc_host *mmc, u32 opcode)
+/*
+ * In SD mode, software must not use the hardware tuning and instead perform
+ * an almost identical procedure to eMMC.
+ */
+static int sdhci_cdns_execute_tuning(struct sdhci_host *host, u32 opcode)
 {
-   struct sdhci_host *host = mmc_priv(mmc);
int cur_streak = 0;
int max_streak = 0;
int end_of_streak = 0;
int i;
 
/*
-* This handler only implements the eMMC tuning that is specific to
-* this controller.  Fall back to the standard method for SD timing.
+* Do not execute tuning for UHS_SDR50 or UHS_DDR50.
+* The delay is set by probe, based on the DT properties.
 */
-   if (host->timing != MMC_TIMING_MMC_HS200)
-   return sdhci_execute_tuning(mmc, opcode);
-
-   if (WARN_ON(opcode != MMC_SEND_TUNING_BLOCK_HS200))
-   return -EINVAL;
+   if (host->timing != MMC_TIMING_MMC_HS200 &&
+   host->timing != MMC_TIMING_UHS_SDR104)
+   return 0;
 
for (i = 0; i < SDHCI_CDNS_MAX_TUNING_LOOP; i++) {
if (sdhci_cdns_set_tune_val(host, i) ||
@@ -325,6 +275,58 @@ static int sdhci_cdns_execute_tuning(struct mmc_host *mmc, 
u32 opcode)
return sdhci_cdns_set_tune_val(host, end_of_streak - max_streak / 2);
 }
 
+static void sdhci_cdns_set_uhs_signaling(struct sdhci_host *host,
+unsigned int timing)
+{
+   struct sdhci_cdns_priv *priv = sdhci_cdns_priv(host);
+   u32 mode;
+
+   switch (timing) {
+   case MMC_TIMING_MMC_HS:
+   mode = SDHCI_CDNS_HRS06_MODE_MMC_SDR;
+   break;
+   case MMC_TIMING_MMC_DDR52:
+   mode = SDHCI_CDNS_HRS06_MODE_MMC_DDR;
+   break;
+   case MMC_TIMING_MMC_HS200:
+   mode = SDHCI_CDNS_HRS06_MODE_MMC_HS200;
+   break;
+   case MMC_TIMING_MMC_HS400:
+   if (priv->enhanced_strobe)
+   mode = SDHCI_CDNS_HRS06_MODE_MMC_HS400ES;
+   el

[PATCH V3 0/4] Add frequency / voltage scaling support for IPQ6018 SoC

2020-07-19 Thread Kathiravan T

IPQ6018 SoC uses the PMIC MP5496. SMPA2 and LDOA2 regulator of MP5496
controls the APSS and SDCC voltage scaling respectively. Add support
for the same.

changes since V2:
- Rebased on top of linux-next 20200717 tag
- Addressed Rob's comment to drop the 'syscon' node in qcom,smd-rpm.yaml
- Picked up the Reviewed-by tag for qcom,smd-rpm-regulator.yaml
- Regulator patches part of V2 was picked up by Mark and it's available 
in linux-next tree

changes since V1:
- Moved YAML conversion to the last as per Mark's comments

Kathiravan T (4):
  dt-bindings: soc: qcom: Add IPQ6018 compatible
  soc: qcom: smd-rpm: Add IPQ6018 compatible
  dt-bindings: soc: qcom: convert the SMD-RPM document to YAML schema
  dt-bindings: regulator: convert QCOM SMD-RPM regulator document to
YAML schema

 .../bindings/regulator/qcom,smd-rpm-regulator.txt  | 321 -
 .../bindings/regulator/qcom,smd-rpm-regulator.yaml | 106 +++
 .../devicetree/bindings/soc/qcom/qcom,smd-rpm.txt  |  64 
 .../devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml |  87 ++
 drivers/soc/qcom/smd-rpm.c |   1 +
 5 files changed, 194 insertions(+), 385 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.txt
 create mode 100644 
Documentation/devicetree/bindings/regulator/qcom,smd-rpm-regulator.yaml
 delete mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.txt
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

io_uring vs in_compat_syscall()

2020-07-19 Thread Christoph Hellwig

Hi Jens,

I just found a (so far theoretical) issue with the io_uring submission
offloading to workqueues or threads.  We have lots of places using
in_compat_syscall() to check if a syscall needs compat treatmenet.
While the biggest users is iocttl(), we also have a fair amount of
places using in_compat_task() in read and write methods, and these
will not do the wrong thing when used with io_uring under certain
conditions.  I'm not sure how to best fix this, except for making sure
in_compat_syscall() returns true one way or another for these cases.

Re: [PATCH] staging: dpaa2-ethsw: fix switch/case fallthrough warning

2020-07-19 Thread Joe Perches

On Mon, 2020-07-20 at 08:30 +0300, Marian Posteuca wrote:
> Fix the fallthrough warning that is reported by checkpatch.
> 
> Signed-off-by: Marian Posteuca 
> ---
>  drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c 
> b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
> index 2fb75a7c9314..4986591097dc 100644
> --- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
> +++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
> @@ -1362,7 +1362,8 @@ static int port_switchdev_blocking_event(struct 
> notifier_block *unused,
>   return NOTIFY_DONE;
>  
>   switch (event) {
> - case SWITCHDEV_PORT_OBJ_ADD: /* fall through */
> + case SWITCHDEV_PORT_OBJ_ADD:
> + fallthrough;

This comment is better removed instead of
converted to fallthrough;

It's no different than a multiple case block like:

switch (val) {
case FOO:
case BAR:
foobar();
break;
default:
return baz;
}

etc...

Re: [PATCH 2/5] cpuidle-pseries: Add function to parse extended CEDE records

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:36]:

> From: "Gautham R. Shenoy" 
> 
> Currently we use CEDE with latency-hint 0 as the only other idle state
> on a dedicated LPAR apart from the polling "snooze" state.
> 
> The platform might support additional extended CEDE idle states, which
> can be discovered through the "ibm,get-system-parameter" rtas-call
> made with CEDE_LATENCY_TOKEN.
> 
> This patch adds a function to obtain information about the extended
> CEDE idle states from the platform and parse the contents to populate
> an array of extended CEDE states. These idle states thus discovered
> will be added to the cpuidle framework in the next patch.
> 
> dmesg on a POWER9 LPAR, demonstrating the output of parsing the
> extended CEDE latency parameters.
> 
> [5.913180] xcede : xcede_record_size = 10
> [5.913183] xcede : Record 0 : hint = 1, latency =0x400 tb-ticks, 
> Wake-on-irq = 1
> [5.913188] xcede : Record 1 : hint = 2, latency =0x3e8000 tb-ticks, 
> Wake-on-irq = 0
> [5.913193] cpuidle : Skipping the 2 Extended CEDE idle states
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 

>
> ---
>  drivers/cpuidle/cpuidle-pseries.c | 129 
> +-
>  1 file changed, 127 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 39d4bb6..c13549b 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  struct cpuidle_driver pseries_idle_driver = {
>   .name = "pseries_idle",
> @@ -105,9 +106,120 @@ static void check_and_cede_processor(void)
>   }
>  }
> 
> -#define NR_CEDE_STATES   1  /* CEDE with latency-hint 0 */
> +struct xcede_latency_records {
> + u8  latency_hint;
> + u64 wakeup_latency_tb_ticks;
> + u8  responsive_to_irqs;
> +};
> +
> +/*
> + * XCEDE : Extended CEDE states discovered through the
> + * "ibm,get-systems-parameter" rtas-call with the token
> + * CEDE_LATENCY_TOKEN
> + */
> +#define MAX_XCEDE_STATES 4
> +#define  XCEDE_LATENCY_RECORD_SIZE   10
> +#define XCEDE_LATENCY_PARAM_MAX_LENGTH   (2 + 2 + \
> + (MAX_XCEDE_STATES * 
> XCEDE_LATENCY_RECORD_SIZE))
> +
> +#define CEDE_LATENCY_TOKEN   45
> +
> +#define NR_CEDE_STATES   (MAX_XCEDE_STATES + 1) /* CEDE with 
> latency-hint 0 */
>  #define NR_DEDICATED_STATES  (NR_CEDE_STATES + 1) /* Includes snooze */
> 
> +struct xcede_latency_records xcede_records[MAX_XCEDE_STATES];
> +unsigned int nr_xcede_records;
> +char xcede_parameters[XCEDE_LATENCY_PARAM_MAX_LENGTH];
> +
> +static int parse_cede_parameters(void)
> +{
> + int ret = -1, i;
> + u16 payload_length;
> + u8 xcede_record_size;
> + u32 total_xcede_records_size;
> + char *payload;
> +
> + memset(xcede_parameters, 0, XCEDE_LATENCY_PARAM_MAX_LENGTH);
> +
> + ret = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> + NULL, CEDE_LATENCY_TOKEN, __pa(xcede_parameters),
> + XCEDE_LATENCY_PARAM_MAX_LENGTH);
> +
> + if (ret) {
> + pr_err("xcede: Error parsing CEDE_LATENCY_TOKEN\n");
> + return ret;
> + }
> +
> + payload_length = be16_to_cpu(*(__be16 *)(&xcede_parameters[0]));
> + payload = &xcede_parameters[2];
> +
> + /*
> +  * If the platform supports the cede latency settings
> +  * information system parameter it must provide the following
> +  * information in the NULL terminated parameter string:
> +  *
> +  * a. The first byte is the length ???N??? of each cede
> +  *latency setting record minus one (zero indicates a length
> +  *of 1 byte).
> +  *
> +  * b. For each supported cede latency setting a cede latency
> +  *setting record consisting of the first ???N??? bytes as per
> +  *the following table.
> +  *
> +  *  -
> +  *  | Field   | Field  |
> +  *  | Name| Length |
> +  *  -
> +  *  | Cede Latency| 1 Byte |
> +  *  | Specifier Value ||
> +  *  -
> +  *  | Maximum wakeup  ||
> +  *  | latency in  | 8 Bytes|
> +  *  | tb-ticks||
> +  *  -
> +  *  | Responsive to   ||
> +  *  | external| 1 Byte |
> +  *  | interrupts  ||
> +  *  -
> +  *
> +  * This version has cede latency record size = 10.
> +  */
> + xcede_record_size = (u8)payload[0] + 1;

This is standard PAPR interface that has been defined long time ago.
However, new H_CEDE hints that map to new platform

Re: [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail)

2020-07-19 Thread Li, Aubrey

On 2020/7/20 12:06, benbjiang(蒋彪) wrote:
> Hi,
> 
>> On Jul 1, 2020, at 5:32 AM, Vineeth Remanan Pillai 
>>  wrote:
>>
>> From: Peter Zijlstra 
>>
>> When a sibling is forced-idle to match the core-cookie; search for
>> matching tasks to fill the core.
>>
>> rcu_read_unlock() can incur an infrequent deadlock in
>> sched_core_balance(). Fix this by using the RCU-sched flavor instead.
>>
>> Signed-off-by: Peter Zijlstra (Intel) 
>> Signed-off-by: Joel Fernandes (Google) 
>> Acked-by: Paul E. McKenney 
>> ---
>> include/linux/sched.h |   1 +
>> kernel/sched/core.c   | 131 +-
>> kernel/sched/idle.c   |   1 +
>> kernel/sched/sched.h  |   6 ++
>> 4 files changed, 138 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 3c8dcc5ff039..4f9edf013df3 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -688,6 +688,7 @@ struct task_struct {
>> #ifdef CONFIG_SCHED_CORE
>>  struct rb_node  core_node;
>>  unsigned long   core_cookie;
>> +unsigned intcore_occupation;
>> #endif
>>
>> #ifdef CONFIG_CGROUP_SCHED
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 4d6d6a678013..fb9edb09ead7 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -201,6 +201,21 @@ static struct task_struct *sched_core_find(struct rq 
>> *rq, unsigned long cookie)
>>  return match;
>> }
>>
>> +static struct task_struct *sched_core_next(struct task_struct *p, unsigned 
>> long cookie)
>> +{
>> +struct rb_node *node = &p->core_node;
>> +
>> +node = rb_next(node);
>> +if (!node)
>> +return NULL;
>> +
>> +p = container_of(node, struct task_struct, core_node);
>> +if (p->core_cookie != cookie)
>> +return NULL;
>> +
>> +return p;
>> +}
>> +
>> /*
>>  * The static-key + stop-machine variable are needed such that:
>>  *
>> @@ -4233,7 +4248,7 @@ pick_next_task(struct rq *rq, struct task_struct 
>> *prev, struct rq_flags *rf)
>>  struct task_struct *next, *max = NULL;
>>  const struct sched_class *class;
>>  const struct cpumask *smt_mask;
>> -int i, j, cpu;
>> +int i, j, cpu, occ = 0;
>>  bool need_sync;
>>
>>  if (!sched_core_enabled(rq))
>> @@ -4332,6 +4347,9 @@ pick_next_task(struct rq *rq, struct task_struct 
>> *prev, struct rq_flags *rf)
>>  goto done;
>>  }
>>
>> +if (!is_idle_task(p))
>> +occ++;
>> +
>>  rq_i->core_pick = p;
>>
>>  /*
>> @@ -4357,6 +4375,7 @@ pick_next_task(struct rq *rq, struct task_struct 
>> *prev, struct rq_flags *rf)
>>
>>  cpu_rq(j)->core_pick = NULL;
>>  }
>> +occ = 1;
>>  goto again;
>>  } else {
>>  /*
>> @@ -4393,6 +4412,8 @@ next_class:;
>>  if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
>>  rq_i->core_forceidle = true;
>>
>> +rq_i->core_pick->core_occupation = occ;
>> +
>>  if (i == cpu)
>>  continue;
>>
>> @@ -4408,6 +4429,114 @@ next_class:;
>>  return next;
>> }
>>
>> +static bool try_steal_cookie(int this, int that)
>> +{
>> +struct rq *dst = cpu_rq(this), *src = cpu_rq(that);
>> +struct task_struct *p;
>> +unsigned long cookie;
>> +bool success = false;
>> +
>> +local_irq_disable();
>> +double_rq_lock(dst, src);
>> +
>> +cookie = dst->core->core_cookie;
>> +if (!cookie)
>> +goto unlock;
>> +
>> +if (dst->curr != dst->idle)
>> +goto unlock;
>> +
>> +p = sched_core_find(src, cookie);
>> +if (p == src->idle)
>> +goto unlock;
>> +
>> +do {
>> +if (p == src->core_pick || p == src->curr)
>> +goto next;
>> +
>> +if (!cpumask_test_cpu(this, &p->cpus_mask))
>> +goto next;
>> +
>> +if (p->core_occupation > dst->idle->core_occupation)
>> +goto next;
>> +
>> +p->on_rq = TASK_ON_RQ_MIGRATING;
>> +deactivate_task(src, p, 0);
>> +set_task_cpu(p, this);
>> +activate_task(dst, p, 0);
>> +p->on_rq = TASK_ON_RQ_QUEUED;
>> +
>> +resched_curr(dst);
>> +
>> +success = true;
>> +break;
>> +
>> +next:
>> +p = sched_core_next(p, cookie);
>> +} while (p);
>> +
>> +unlock:
>> +double_rq_unlock(dst, src);
>> +local_irq_enable();
>> +
>> +return success;
>> +}
>> +
>> +static bool steal_cookie_task(int cpu, struct sched_domain *sd)
>> +{
>> +int i;
>> +
>> +for_each_cpu_wrap(i, sched_domain_span(sd), cpu) {
> Since (i == c

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Jiri Slaby

On 18. 07. 20, 19:14, Oleg Nesterov wrote:
> On 07/18, Jiri Slaby wrote:
>>
>> On 17. 07. 20, 14:40, Oleg Nesterov wrote:
>>>
>>> please see the updated patch below, lets check ptrace_unfreeze() too.
>>
>> Sure, dmesg attached.
> 
> Thanks a lot!
> 
> But I am totally confused...
> 
>> [   94.513944] [ cut here ]
>> [   94.513985] do not call blocking ops when !TASK_RUNNING; state=8 set at 
>> [<2fe279e9>] ptrace_check_attach+0xbf/0x110
> 
> OK, so the ptracer has already did the TASK_TRACED -> __TASK_TRACED change in
> ptrace_freeze_traced(),
> 
>> [   94.514019] WARNING: CPU: 16 PID: 34171 at kernel/sched/core.c:6881 
>> __might_sleep+0x6c/0x70
>> [   94.514020] Modules linked in: ata_generic(E) pata_acpi(E) 
>> crc32_pclmul(E) qemu_fw_cfg(E) ata_piix(E) e1000(E) nls_iso8859_1(E) 
>> nls_cp437(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) xfs(E) btrfs(E) 
>> blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) 
>> reiserfs(E) squashfs(E) fuse(E) dm_snapshot(E) dm_bufio(E) dm_crypt(E) 
>> dm_mod(E) binfmt_misc(E) loop(E) sg(E) virtio_rng(E)
>> [   94.514082] CPU: 16 PID: 34171 Comm: strace Tainted: GE 
>> 5.8.0-rc5-100.g55927f9-default #1 openSUSE Tumbleweed (unreleased)
>> [   94.514084] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
>> [   94.514087] RIP: 0010:__might_sleep+0x6c/0x70
>> [   94.514090] Code: 41 5c 41 5d e9 25 ff ff ff 48 8b 90 68 1a 00 00 48 8b 
>> 70 10 48 c7 c7 10 45 70 8f c6 05 4f a9 68 01 01 48 89 d1 e8 7a bb fc ff <0f> 
>> 0b eb c8 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 fd
>> [   94.514092] RSP: 0018:9ffa4ba1be00 EFLAGS: 00010286
>> [   94.514093] RAX:  RBX: 8dc82b503e00 RCX: 
>> 0489
>> [   94.514094] RDX: 0001 RSI: 0096 RDI: 
>> 0247
>> [   94.514095] RBP: 8f6ffa6b R08: 0004 R09: 
>> 0489
>> [   94.514095] R10:  R11: 0001 R12: 
>> 0039
>> [   94.514096] R13:  R14: 0001 R15: 
>> 8dc82b5045e4
>> [   94.514098] FS:  7fa00f1f9240() GS:8dcb0c00() 
>> knlGS:
>> [   94.514099] CS:  0010 DS:  ES:  CR0: 80050033
>> [   94.514100] CR2: 557b53d25877 CR3: 0004ca490005 CR4: 
>> 00360ee0
>> [   94.514103] DR0:  DR1:  DR2: 
>> 
>> [   94.514104] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [   94.514105] Call Trace:
>> [   94.514821]  ptrace_stop+0x1a9/0x300
> 
> This is already wrong. But
> 
>   Where does this __might_sleep() come from ??? I ses no blocking calls
>   in ptrace_stop(). Not to mention it is called with ->siglock held and
>   right after this lock is dropped we take tasklist_lock.

Decoded stacktrace:

> ptrace_stop (include/linux/freezer.h:57 include/linux/freezer.h:67 
> include/linux/freezer.h:128 include/linux/freezer.h:173 kernel/signal.c:2217) 
>  
> ptrace_do_notify (kernel/signal.c:2272)  
> ptrace_notify (arch/x86/include/asm/paravirt.h:656 
> arch/x86/include/asm/qspinlock.h:55 include/linux/spinlock.h:211 
> include/linux/spinlock_api_smp.h:167 include/linux/spinlock.h:403 
> kernel/signal.c:2282)  
> syscall_trace_enter (include/linux/tracehook.h:73 
> include/linux/tracehook.h:104 arch/x86/entry/common.c:159)
> do_syscall_64 (arch/x86/entry/common.c:380)  
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:124)  

It is try_to_freeze_unsafe in try_to_freeze in freezable_schedule in
ptrace_stop.

> 
>   How this connects to the debugging patch I sent? Did you see this 
> warning
>   without that patch?

I suppose this made it appear:
+CONFIG_PREEMPT_COUNT=y
-# CONFIG_DEBUG_ATOMIC_SLEEP is not set
+CONFIG_DEBUG_ATOMIC_SLEEP=y

>> [   94.514888] WARNING: CPU: 16 PID: 34171 at kernel/signal.c:2219 
>> ptrace_stop+0x1d8/0x300
> ...
>> [   94.514899] RIP: 0010:ptrace_stop+0x1d8/0x300
> 
> This is WARN_ON(current->state) added to ptrace_stop(), this can explain
> BUG_ON() in do_notify_parent() you reported.
> 
> So, the tracee returns from schedule() with ->state != TASK_RUNNING ???
> This must not be possible.
> 
> OK, perhaps task->state was changed by ptrace_unfreeze_traced()? this can
> only happen if it races with ttwu(__TASK_TRACED) without ->siglock held,
> nobody should do this.
> 
> Strange.

Please see my other e-mail, all this is with dbfb089d360b applied. Maybe
it makes more sense now?

thanks,
-- 
js

linux-next: manual merge of the seccomp tree with the kselftest tree

2020-07-19 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the seccomp tree got a conflict in:

  tools/testing/selftests/seccomp/seccomp_bpf.c

between commit:

  4c6614dc86ad ("selftests/seccomp: Check ENOSYS under tracing")

from the kselftest tree and commit:

  11eb004ef7ea ("selftests/seccomp: Check ENOSYS under tracing")

from the seccomp tree.

I fixed it up (I just used the latter version) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpM47brwIoYG.pgp
Description: OpenPGP digital signature

Re: [PATCH 1/5] cpuidle-pseries: Set the latency-hint before entering CEDE

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:35]:

> From: "Gautham R. Shenoy" 
> 
> As per the PAPR, each H_CEDE call is associated with a latency-hint to
> be passed in the VPA field "cede_latency_hint". The CEDE states that
> we were implicitly entering so far is CEDE with latency-hint = 0.
> 
> This patch explicitly sets the latency hint corresponding to the CEDE
> state that we are currently entering. While at it, we save the
> previous hint, to be restored once we wakeup from CEDE. This will be
> required in the future when we expose extended-cede states through the
> cpuidle framework, where each of them will have a different
> cede-latency hint.
> 
> Signed-off-by: Gautham R. Shenoy 

Reviewed-by: Vaidyanathan Srinivasan 


> ---
>  drivers/cpuidle/cpuidle-pseries.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 4a37252..39d4bb6 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -105,19 +105,27 @@ static void check_and_cede_processor(void)
>   }
>  }
> 
> +#define NR_CEDE_STATES   1  /* CEDE with latency-hint 0 */
> +#define NR_DEDICATED_STATES  (NR_CEDE_STATES + 1) /* Includes snooze */
> +
> +u8 cede_latency_hint[NR_DEDICATED_STATES];
>  static int dedicated_cede_loop(struct cpuidle_device *dev,
>   struct cpuidle_driver *drv,
>   int index)
>  {
> + u8 old_latency_hint;
> 
>   pseries_idle_prolog();
>   get_lppaca()->donate_dedicated_cpu = 1;
> + old_latency_hint = get_lppaca()->cede_latency_hint;
> + get_lppaca()->cede_latency_hint = cede_latency_hint[index];
> 
>   HMT_medium();
>   check_and_cede_processor();
> 
>   local_irq_disable();
>   get_lppaca()->donate_dedicated_cpu = 0;
> + get_lppaca()->cede_latency_hint = old_latency_hint;
> 
>   pseries_idle_epilog();
> 
> @@ -149,7 +157,7 @@ static int shared_cede_loop(struct cpuidle_device *dev,
>  /*
>   * States for dedicated partition case.
>   */
> -static struct cpuidle_state dedicated_states[] = {
> +static struct cpuidle_state dedicated_states[NR_DEDICATED_STATES] = {
>   { /* Snooze */
>   .name = "snooze",
>   .desc = "snooze",


Saving and restoring the current cede hint value helps in maintaining
compatibility with other parts of the kernel.  Over long term we can
make cpuidle driver deterministically set the CEDE hint at each
invocation of H_CEDE call so that we do not have to do multiple
redundant save-restore.

This is a reasonable start to cleanup this cupidle subsystem on PAPR
guests. 

--Vaidy

Re: [PATCH v2 2/2] selftest/cpuidle: Add support for cpuidle latency measurement

2020-07-19 Thread Gautham R Shenoy

Hi Pratik,


On Fri, Jul 17, 2020 at 02:48:01PM +0530, Pratik Rajesh Sampat wrote:
> This patch adds support to trace IPI based and timer based wakeup
> latency from idle states
> 
> Latches onto the test-cpuidle_latency kernel module using the debugfs
> interface to send IPIs or schedule a timer based event, which in-turn
> populates the debugfs with the latency measurements.
> 
> Currently for the IPI and timer tests; first disable all idle states
> and then test for latency measurements incrementally enabling each state
> 
> Signed-off-by: Pratik Rajesh Sampat 

A few comments below.

> ---
>  tools/testing/selftests/Makefile   |   1 +
>  tools/testing/selftests/cpuidle/Makefile   |   6 +
>  tools/testing/selftests/cpuidle/cpuidle.sh | 257 +
>  tools/testing/selftests/cpuidle/settings   |   1 +
>  4 files changed, 265 insertions(+)
>  create mode 100644 tools/testing/selftests/cpuidle/Makefile
>  create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
>  create mode 100644 tools/testing/selftests/cpuidle/settings
> 

[..skip..]

> +
> +ins_mod()
> +{
> + if [ ! -f "$MODULE" ]; then
> + printf "$MODULE module does not exist. Exitting\n"

If the module has been compiled into the kernel (due to a
localyesconfig, for instance), then it is unlikely that we will find
it in /lib/modules. Perhaps you want to check if the debugfs
directories created by the module exist, and if so, print a message
saying that the modules is already loaded or some such?

> + exit $ksft_skip
> + fi
> + printf "Inserting $MODULE module\n\n"
> + insmod $MODULE
> + if [ $? != 0 ]; then
> + printf "Insmod $MODULE failed\n"
> + exit $ksft_skip
> + fi
> +}
> +
> +compute_average()
> +{
> + arr=("$@")
> + sum=0
> + size=${#arr[@]}
> + for i in "${arr[@]}"
> + do
> + sum=$((sum + i))
> + done
> + avg=$((sum/size))

It would be good to assert that "size" isn't 0 here.

> +}
> +
> +# Disable all stop states
> +disable_idle()
> +{
> + for ((cpu=0; cpu + do
> + for ((state=0; state + do
> + echo 1 > 
> /sys/devices/system/cpu/cpu$cpu/cpuidle/state$state/disable

So, on offlined CPUs, we won't see
/sys/devices/system/cpu/cpu$cpu/cpuidle/state$state directory. You
should probably perform this operation only on online CPUs.


> + done
> + done
> +}
> +
> +# Perform operation on each CPU for the given state
> +# $1 - Operation: enable (0) / disable (1)
> +# $2 - State to enable
> +op_state()
> +{
> + for ((cpu=0; cpu + do
> + echo $1 > 
> /sys/devices/system/cpu/cpu$cpu/cpuidle/state$2/disable


Ditto

> + done
> +}

This is a helper function. For better readability of the main code you
can define the following wrappers and use them.


cpuidle_enable_state()
{
state=$1
op_state 1 $state
}

cpuidle_disable_state()
{
state=$1
op_state 0 $state

}


> +

[..snip..]

> +run_ipi_tests()
> +{
> +extract_latency
> +disable_idle
> +declare -a avg_arr
> +echo -e "--IPI Latency Test---" >> $LOG
> +
> + echo -e "--Baseline IPI Latency measurement: CPU Busy--" >> $LOG
> + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" "IPI_Latency(ns)" 
> >> $LOG
> + for ((cpu=0; cpu + do
> + ipi_test_once "baseline" $cpu
> + printf "%-3s %10s %12s\n" $src_cpu $cpu $ipi_latency >> 
> $LOG
> + avg_arr+=($ipi_latency)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Baseline Average IPI latency(ns): $avg" >> $LOG
> +
> +for ((state=0; state +do
> + unset avg_arr
> + echo -e "---Enabling state: $state---" >> $LOG
> + op_state 0 $state
> + printf "%s %10s %12s\n" "SRC_CPU" "DEST_CPU" 
> "IPI_Latency(ns)" >> $LOG
> + for ((cpu=0; cpu + do

If a CPU is offline, then we should skip it here.

> + # Running IPI test and logging results
> + sleep 1
> + ipi_test_once "test" $cpu
> + printf "%-3s %10s %12s\n" $src_cpu $cpu 
> $ipi_latency >> $LOG
> + avg_arr+=($ipi_latency)
> + done
> + compute_average "${avg_arr[@]}"
> + echo -e "Expected IPI latency(ns): 
> ${latency_arr[$state]}" >> $LOG
> + echo -e "Observed Average IPI latency(ns): $avg" >> $LOG
> + op_state 1 $state
> +done
> +}
> +
> +# Extract the residency in microseconds and convert to nanoseconds.
> +# Add 100 ns so that the timer stays for a little longer than the residency
> +extract_residency()
> +{
> +

[PATCH] staging: dpaa2-ethsw: fix switch/case fallthrough warning

2020-07-19 Thread Marian Posteuca

Fix the fallthrough warning that is reported by checkpatch.

Signed-off-by: Marian Posteuca 
---
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c 
b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index 2fb75a7c9314..4986591097dc 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -1362,7 +1362,8 @@ static int port_switchdev_blocking_event(struct 
notifier_block *unused,
return NOTIFY_DONE;
 
switch (event) {
-   case SWITCHDEV_PORT_OBJ_ADD: /* fall through */
+   case SWITCHDEV_PORT_OBJ_ADD:
+   fallthrough;
case SWITCHDEV_PORT_OBJ_DEL:
return ethsw_switchdev_port_obj_event(event, dev, ptr);
case SWITCHDEV_PORT_ATTR_SET:
-- 
2.26.2

[PATCH v4 4/4] power: supply: core: property to control reverse charge

2020-07-19 Thread Qiwu Huang

From: Qiwu Huang 

Interface to control wireless reverse charge.
Supply reverse charge function when enabled.

Signed-off-by: Qiwu Huang 
---
 Documentation/ABI/testing/sysfs-class-power | 12 
 drivers/power/supply/power_supply_sysfs.c   |  1 +
 include/linux/power_supply.h|  1 +
 3 files changed, 14 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-power 
b/Documentation/ABI/testing/sysfs-class-power
index 75ec7de2fe78..54647d6995d3 100644
--- a/Documentation/ABI/testing/sysfs-class-power
+++ b/Documentation/ABI/testing/sysfs-class-power
@@ -767,3 +767,15 @@ Description:
 
Access: Read-Only
Valid values: 0 - 100
+
+What:  /sys/class/power_supply//reverse_chg_mode
+Date:  Jul 2020
+Contact:   Fei Jiang 
+Description:
+   Some phones support wireless reverse charge function which 
charge other phones.
+   The property supply interface to control wireless reverse 
charge.
+   If enabled, start TX mode and detect RX. Disabled when timeout 
or manual setting.
+
+   Valid values:
+   - 1: enabled
+   - 0: disabled
diff --git a/drivers/power/supply/power_supply_sysfs.c 
b/drivers/power/supply/power_supply_sysfs.c
index e420a453095e..81916b6b6ccf 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -209,6 +209,7 @@ static struct power_supply_attr power_supply_attrs[] = {
POWER_SUPPLY_ATTR(QUICK_CHARGE_TYPE),
POWER_SUPPLY_ATTR(TX_ADAPTER),
POWER_SUPPLY_ATTR(SIGNAL_STRENGTH),
+   POWER_SUPPLY_ATTR(REVERSE_CHG_MODE),
 };
 
 static struct attribute *
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index 346c6c9c0737..a87ae1fff8d1 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -170,6 +170,7 @@ enum power_supply_property {
POWER_SUPPLY_PROP_QUICK_CHARGE_TYPE,
POWER_SUPPLY_PROP_TX_ADAPTER,
POWER_SUPPLY_PROP_SIGNAL_STRENGTH,
+   POWER_SUPPLY_PROP_REVERSE_CHG_MODE,
 };
 
 enum power_supply_type {
-- 
2.27.0

[PATCH v4 1/4] power: supply: core: add quick charge type property

2020-07-19 Thread Qiwu Huang

From: Qiwu Huang 

Reports the kind of quick charge type based on
different adapter power.

Signed-off-by: Qiwu Huang 
---
 Documentation/ABI/testing/sysfs-class-power | 21 +
 drivers/power/supply/power_supply_sysfs.c   |  1 +
 include/linux/power_supply.h| 10 ++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-power 
b/Documentation/ABI/testing/sysfs-class-power
index 216d61a22f1e..dd3773dcf16a 100644
--- a/Documentation/ABI/testing/sysfs-class-power
+++ b/Documentation/ABI/testing/sysfs-class-power
@@ -708,3 +708,24 @@ Description:
 
Access: Read
Valid values: 1-31
+
+What:  /sys/class/power_supply//quick_charge_type
+Date:  Jul 2020
+Contact:   Fei Jiang 
+   Description:
+   Reports the kind of quick charge type based on different 
adapter power.
+   Different quick charge type represent different charging power.
+   QUICK_CHARGE_NORMAL : Charging Power <= 10W
+   QUICK_CHARGE_FAST : 10W < Charging Power <= 20W
+   QUICK_CHARGE_FLASH : 20W < Charging Power <= 30W
+   QUICK_CHARGE_TURBE : 30W < Charging Power <= 50W
+   QUICK_CHARGE_SUPER : Charging Power > 50W
+
+   Access: Read-Only
+   Valid values:
+   0: QUICK_CHARGE_NORMAL,
+   1: QUICK_CHARGE_FAST,
+   2: QUICK_CHARGE_FLASH,
+   3: QUICK_CHARGE_TURBE,
+   4: QUICK_CHARGE_SUPER.
+
diff --git a/drivers/power/supply/power_supply_sysfs.c 
b/drivers/power/supply/power_supply_sysfs.c
index bc79560229b5..9554d7907373 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -206,6 +206,7 @@ static struct power_supply_attr power_supply_attrs[] = {
POWER_SUPPLY_ATTR(MODEL_NAME),
POWER_SUPPLY_ATTR(MANUFACTURER),
POWER_SUPPLY_ATTR(SERIAL_NUMBER),
+   POWER_SUPPLY_ATTR(QUICK_CHARGE_TYPE),
 };
 
 static struct attribute *
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index ac1345a48ad0..f35c661a2544 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -167,6 +167,7 @@ enum power_supply_property {
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER,
POWER_SUPPLY_PROP_SERIAL_NUMBER,
+   POWER_SUPPLY_PROP_QUICK_CHARGE_TYPE,
 };
 
 enum power_supply_type {
@@ -197,6 +198,15 @@ enum power_supply_usb_type {
POWER_SUPPLY_USB_TYPE_APPLE_BRICK_ID,   /* Apple Charging Method */
 };
 
+enum power_supply_quick_charge_type {
+   QUICK_CHARGE_NORMAL = 0,/* Charging Power <= 10W */
+   QUICK_CHARGE_FAST,  /* 10W < Charging Power <= 20W 
*/
+   QUICK_CHARGE_FLASH, /* 20W < Charging Power <= 30W 
*/
+   QUICK_CHARGE_TURBE, /* 30W < Charging Power <= 50W 
*/
+   QUICK_CHARGE_SUPER, /* Charging Power > 50W */
+   QUICK_CHARGE_MAX,
+};
+
 enum power_supply_notifier_events {
PSY_EVENT_PROP_CHANGED,
 };
-- 
2.27.0

Re: [PATCH 0/5] cpuidle-pseries: Parse extended CEDE information for idle.

2020-07-19 Thread Vaidyanathan Srinivasan

* Gautham R Shenoy  [2020-07-07 16:41:34]:

> From: "Gautham R. Shenoy" 
> 
> Hi,
> 
> On pseries Dedicated Linux LPARs, apart from the polling snooze idle
> state, we currently have the CEDE idle state which cedes the CPU to
> the hypervisor with latency-hint = 0.
> 
> However, the PowerVM hypervisor supports additional extended CEDE
> states, which can be queried through the "ibm,get-systems-parameter"
> rtas-call with the CEDE_LATENCY_TOKEN. The hypervisor maps these
> extended CEDE states to appropriate platform idle-states in order to
> provide energy-savings as well as shifting power to the active
> units. On existing pseries LPARs today we have extended CEDE with
> latency-hints {1,2} supported.
> 
> In Patches 1-3 of this patchset, we add the code to parse the CEDE
> latency records provided by the hypervisor. We use this information to
> determine the wakeup latency of the regular CEDE (which we have been
> so far hardcoding to 10us while experimentally it is much lesser ~
> 1us), by looking at the wakeup latency provided by the hypervisor for
> Extended CEDE states. Since the platform currently advertises Extended
> CEDE 1 to have wakeup latency of 2us, we can be sure that the wakeup
> latency of the regular CEDE is no more than this.
> 
> Patch 4 (currently marked as RFC), expose the extended CEDE states
> parsed above to the cpuidle framework, provided that they can wakeup
> on an interrupt. On current platforms only Extended CEDE 1 fits the
> bill, but this is going to change in future platforms where even
> Extended CEDE 2 may be responsive to external interrupts.
> 
> Patch 5 (currently marked as RFC), filters out Extended CEDE 1 since
> it offers no added advantage over the normal CEDE.
> 
> With Patches 1-3, we see an improvement in the single-threaded
> performance on ebizzy.
> 
> 2 ebizzy threads bound to the same big-core. 25% improvement in the
> avg records/s (higher the better) with patches 1-3.
> x without_patches
> * with_patches
> N   Min   MaxMedian   AvgStddev
> x  10   2491089   5834307   5398375   4244335 1596244.9
> *  10   2893813   5834474   5832448 5327281.3 1055941.4
> 
> We do not observe any major regression in either the context_switch2
> benchmark or the schbench benchmark
> 
> context_switch2 across CPU0 CPU1 (Both belong to same big-core, but different
> small cores). We observe a minor 0.14% regression in the number of
> context-switches (higher is better).
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500348872362236354712 354745.69  2711.827
> * 500349422361452353942  354215.4 2576.9258
> 
> context_switch2 across CPU0 CPU8 (Different big-cores). We observe a 0.37%
> improvement in the number of context-switches (higher is better).
> x without_patch
> * with_patch
> N   Min   MaxMedian   AvgStddev
> x 500287956294940288896 288977.23 646.59295
> * 500288300294646289582 290064.76 1161.9992
> 
> schbench:
> No major difference could be seen until the 99.9th percentile.
> 
> Without-patch
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 39
>   90.0th: 49
>   95.0th: 59
>   *99.0th: 13104
>   99.5th: 14672
>   99.9th: 15824
>   min=0, max=17993
> 
> With-patch:
> Latency percentiles (usec)
>   50.0th: 29
>   75.0th: 40
>   90.0th: 50
>   95.0th: 61
>   *99.0th: 13648
>   99.5th: 14768
>   99.9th: 15664
>   min=0, max=29812

This patch series mainly cleans up the CEDE latency discovery and
prepares to add different cpuidle states in virtualised environment.
This helps in improving SMT folding speeds and also power savings and
power shifting with newer platform firmware.

The current benefit is primarily from faster SMT folding and resulting
single performance achieved by updating the platform firmware provided
heuristics in the cpuidle states.

--Vaidy

[PATCH v4 2/4] power: supply: core: add wireless charger adapter type property

2020-07-19 Thread Qiwu Huang

From: Qiwu Huang 

Reports what type of wireless adapter connection is
currently active for the supply.
for example it can show if ADAPTER_PD capable source is attached.

Signed-off-by: Qiwu Huang 
---
 Documentation/ABI/testing/sysfs-class-power | 28 +
 drivers/power/supply/power_supply_sysfs.c   |  1 +
 include/linux/power_supply.h| 19 ++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-power 
b/Documentation/ABI/testing/sysfs-class-power
index dd3773dcf16a..03ab449fae8a 100644
--- a/Documentation/ABI/testing/sysfs-class-power
+++ b/Documentation/ABI/testing/sysfs-class-power
@@ -729,3 +729,31 @@ Contact:   Fei Jiang 
3: QUICK_CHARGE_TURBE,
4: QUICK_CHARGE_SUPER.
 
+= Wireless Charger Properties =
+What:  /sys/class/power_supply//tx_adapter
+Date:  Jul 2020
+Contact:   Fei Jiang 
+Description:
+   Reports the type of wireless adapter connection is currently 
active for
+   the supply, for example it can show if ADAPTER_PD capable source
+   is attached. Expect common wireless adapter type, also increase 
by
+   some vendor private adapter type(ex. ADAPTER_PD_40W).
+
+   Access: Read-Only
+   Valid values:
+   0: ADAPTER_NONE,
+   1: ADAPTER_SDP,
+   2: ADAPTER_DCP,
+   3: ADAPTER_CDP,
+   4: ADAPTER_OCP,
+   5: ADAPTER_QC2,
+   6: ADAPTER_QC3,
+   7: ADAPTER_PD,
+   8: ADAPTER_AUTH_FAILED,
+   9: ADAPTER_PRIVATE_QC3,
+   10: ADAPTER_PRIVATE_PD,
+   11: ADAPTER_CAR_POWER,
+   12: ADAPTER_PRIVATE_PD_40W,
+   13: ADAPTER_VOICE_BOX,
+   14: ADAPTER_PRIVATE_PD_50W.
+
diff --git a/drivers/power/supply/power_supply_sysfs.c 
b/drivers/power/supply/power_supply_sysfs.c
index 9554d7907373..f2458e21d02b 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -207,6 +207,7 @@ static struct power_supply_attr power_supply_attrs[] = {
POWER_SUPPLY_ATTR(MANUFACTURER),
POWER_SUPPLY_ATTR(SERIAL_NUMBER),
POWER_SUPPLY_ATTR(QUICK_CHARGE_TYPE),
+   POWER_SUPPLY_ATTR(TX_ADAPTER),
 };
 
 static struct attribute *
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index f35c661a2544..0bbdec1630a4 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -168,6 +168,7 @@ enum power_supply_property {
POWER_SUPPLY_PROP_MANUFACTURER,
POWER_SUPPLY_PROP_SERIAL_NUMBER,
POWER_SUPPLY_PROP_QUICK_CHARGE_TYPE,
+   POWER_SUPPLY_PROP_TX_ADAPTER,
 };
 
 enum power_supply_type {
@@ -207,6 +208,24 @@ enum power_supply_quick_charge_type {
QUICK_CHARGE_MAX,
 };
 
+enum power_supply_tx_adapter_type {
+   ADAPTER_NONE = 0,   /* Nothing Attached */
+   ADAPTER_SDP,/* Standard Downstream Port */
+   ADAPTER_CDP,/* Charging Downstream Port */
+   ADAPTER_DCP,/* Dedicated Charging Port */
+   ADAPTER_OCP,/* Other Charging Port */
+   ADAPTER_QC2,/* Qualcomm Charge 2.0 */
+   ADAPTER_QC3,/* Qualcomm Charge 3.0 */
+   ADAPTER_PD, /* Power Delivery Port */
+   ADAPTER_AUTH_FAILED,/* Authenticated Failed Adapter 
*/
+   ADAPTER_PRIVATE_QC3,/* Qualcomm Charge 3.0 with 
Private Protocol */
+   ADAPTER_PRIVATE_PD, /* PD Adapter with Private 
Protocol */
+   ADAPTER_CAR_POWER,  /* Wireless Car Charger */
+   ADAPTER_PRIVATE_PD_40W, /* 40W PD Adapter with Private 
Protocol */
+   ADAPTER_VOICE_BOX,  /* Voice Box which Support 
Wireless Charger */
+   ADAPTER_PRIVATE_PD_50W, /* 50W PD Adapter with Private 
Protocol */
+};
+
 enum power_supply_notifier_events {
PSY_EVENT_PROP_CHANGED,
 };
-- 
2.27.0

[PATCH v4 0/4] add some power supply properties about wireless/wired charging

2020-07-19 Thread Qiwu Huang

From: Qiwu Huang 

quick_charge_type reports quick charge type based on charging power.
tx_adapter shows wireless charging adapter type.
signal_strength shows degree of coupling between tx and rx when wireless 
charging.
reverse_chg_mode supply interface to enable/disable wireless reverse charging.

Qiwu Huang (4):
  power: supply: core: add quick charge type property
  power: supply: core: add wireless charger adapter type property
  power: supply: core: add wireless signal strength property
  power: supply: core: property to control reverse charge

 Documentation/ABI/testing/sysfs-class-power | 71 +
 drivers/power/supply/power_supply_sysfs.c   |  4 ++
 include/linux/power_supply.h| 31 +
 3 files changed, 106 insertions(+)

-- 
2.27.0

[PATCH v4 3/4] power: supply: core: add wireless signal strength property

2020-07-19 Thread Qiwu Huang

From: Qiwu Huang 

reports wireless signal strength.
The value show degree of coupling between tx and rx.

Signed-off-by: Qiwu Huang 
---
 Documentation/ABI/testing/sysfs-class-power | 10 ++
 drivers/power/supply/power_supply_sysfs.c   |  1 +
 include/linux/power_supply.h|  1 +
 3 files changed, 12 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-power 
b/Documentation/ABI/testing/sysfs-class-power
index 03ab449fae8a..75ec7de2fe78 100644
--- a/Documentation/ABI/testing/sysfs-class-power
+++ b/Documentation/ABI/testing/sysfs-class-power
@@ -757,3 +757,13 @@ Description:
13: ADAPTER_VOICE_BOX,
14: ADAPTER_PRIVATE_PD_50W.
 
+What:  /sys/class/power_supply//signal_strength
+Date:  Jul 2020
+Contact:   Fei Jiang 
+Description:
+   In PING phase, RX transmits a signal strength packet as the 
first
+   communication packet to instruct the base to keep power signal 
on.
+   The value reports wireless signal strength and show degree of 
coupling.
+
+   Access: Read-Only
+   Valid values: 0 - 100
diff --git a/drivers/power/supply/power_supply_sysfs.c 
b/drivers/power/supply/power_supply_sysfs.c
index f2458e21d02b..e420a453095e 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -208,6 +208,7 @@ static struct power_supply_attr power_supply_attrs[] = {
POWER_SUPPLY_ATTR(SERIAL_NUMBER),
POWER_SUPPLY_ATTR(QUICK_CHARGE_TYPE),
POWER_SUPPLY_ATTR(TX_ADAPTER),
+   POWER_SUPPLY_ATTR(SIGNAL_STRENGTH),
 };
 
 static struct attribute *
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index 0bbdec1630a4..346c6c9c0737 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -169,6 +169,7 @@ enum power_supply_property {
POWER_SUPPLY_PROP_SERIAL_NUMBER,
POWER_SUPPLY_PROP_QUICK_CHARGE_TYPE,
POWER_SUPPLY_PROP_TX_ADAPTER,
+   POWER_SUPPLY_PROP_SIGNAL_STRENGTH,
 };
 
 enum power_supply_type {
-- 
2.27.0

Re: [PATCH v16 18/22] mm/lru: replace pgdat lru_lock with lruvec lock

2020-07-19 Thread Alex Shi




在 2020/7/19 下午11:14, Alexander Duyck 写道:
>> Compare to move to tail, how about to move it to head of struct, which is
>> close to lru list? Did you have some data of the place change?
> I don't have specific data, just anecdotal evidence from the past that
> usually you want to keep locks away from read-mostly items since they
> cause obvious cache thrash. My concern was more with the other fields
> in the structure such as pgdat since it should be a static value and
> having it evicted would likely be more expensive than just leaving the
> cacheline as it is.
> 

Thanks for comments, Alex.

So, sounds like moving the lru_lock to head of struct lruvec could be better.

>> -   __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>> +   if (delta_munlocked)
>> +   __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>> if (lruvec)
>> unlock_page_lruvec_irq(lruvec);
> Why not just wrap the entire thing in a check for "lruvec"? Yes you
> could theoretically be modding with a value of 0, but it avoids a
> secondary unnecessary check and branching.

Right, and the delta_munlocked value could be checked inside the accounting
func

Thanks!

Re: 5.8-rc*: kernel BUG at kernel/signal.c:1917

2020-07-19 Thread Jiri Slaby

On 18. 07. 20, 19:44, Christian Brauner wrote:
> On Sat, Jul 18, 2020 at 07:14:07PM +0200, Oleg Nesterov wrote:
>> On 07/18, Jiri Slaby wrote:
>>>
>>> On 17. 07. 20, 14:40, Oleg Nesterov wrote:

 please see the updated patch below, lets check ptrace_unfreeze() too.
>>>
>>> Sure, dmesg attached.
>>
>> Thanks a lot!
>>
>> But I am totally confused...
>>
>>> [   94.513944] [ cut here ]
>>> [   94.513985] do not call blocking ops when !TASK_RUNNING; state=8 set at 
>>> [<2fe279e9>] ptrace_check_attach+0xbf/0x110
>>
>> OK, so the ptracer has already did the TASK_TRACED -> __TASK_TRACED change in
>> ptrace_freeze_traced(),
>>
>>> [   94.514019] WARNING: CPU: 16 PID: 34171 at kernel/sched/core.c:6881 
>>> __might_sleep+0x6c/0x70
>>> [   94.514020] Modules linked in: ata_generic(E) pata_acpi(E) 
>>> crc32_pclmul(E) qemu_fw_cfg(E) ata_piix(E) e1000(E) nls_iso8859_1(E) 
>>> nls_cp437(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) xfs(E) btrfs(E) 
>>> blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) 
>>> reiserfs(E) squashfs(E) fuse(E) dm_snapshot(E) dm_bufio(E) dm_crypt(E) 
>>> dm_mod(E) binfmt_misc(E) loop(E) sg(E) virtio_rng(E)
>>> [   94.514082] CPU: 16 PID: 34171 Comm: strace Tainted: GE 
>>> 5.8.0-rc5-100.g55927f9-default #1 openSUSE Tumbleweed (unreleased)
>>> [   94.514084] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>> rel-1.12.0-0-ga698c89-rebuilt.suse.com 04/01/2014
>>> [   94.514087] RIP: 0010:__might_sleep+0x6c/0x70
>>> [   94.514090] Code: 41 5c 41 5d e9 25 ff ff ff 48 8b 90 68 1a 00 00 48 8b 
>>> 70 10 48 c7 c7 10 45 70 8f c6 05 4f a9 68 01 01 48 89 d1 e8 7a bb fc ff 
>>> <0f> 0b eb c8 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 fd
>>> [   94.514092] RSP: 0018:9ffa4ba1be00 EFLAGS: 00010286
>>> [   94.514093] RAX:  RBX: 8dc82b503e00 RCX: 
>>> 0489
>>> [   94.514094] RDX: 0001 RSI: 0096 RDI: 
>>> 0247
>>> [   94.514095] RBP: 8f6ffa6b R08: 0004 R09: 
>>> 0489
>>> [   94.514095] R10:  R11: 0001 R12: 
>>> 0039
>>> [   94.514096] R13:  R14: 0001 R15: 
>>> 8dc82b5045e4
>>> [   94.514098] FS:  7fa00f1f9240() GS:8dcb0c00() 
>>> knlGS:
>>> [   94.514099] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   94.514100] CR2: 557b53d25877 CR3: 0004ca490005 CR4: 
>>> 00360ee0
>>> [   94.514103] DR0:  DR1:  DR2: 
>>> 
>>> [   94.514104] DR3:  DR6: fffe0ff0 DR7: 
>>> 0400
>>> [   94.514105] Call Trace:
>>> [   94.514821]  ptrace_stop+0x1a9/0x300
>>
>> This is already wrong. But
>>
>>  Where does this __might_sleep() come from ??? I ses no blocking calls
>>  in ptrace_stop(). Not to mention it is called with ->siglock held and
>>  right after this lock is dropped we take tasklist_lock.
>>
>>  How this connects to the debugging patch I sent? Did you see this 
>> warning
>>  without that patch?
>>  
>>
>>> [   94.514888] WARNING: CPU: 16 PID: 34171 at kernel/signal.c:2219 
>>> ptrace_stop+0x1d8/0x300
>> ...
>>> [   94.514899] RIP: 0010:ptrace_stop+0x1d8/0x300
>>
>> This is WARN_ON(current->state) added to ptrace_stop(), this can explain
>> BUG_ON() in do_notify_parent() you reported.
>>
>> So, the tracee returns from schedule() with ->state != TASK_RUNNING ???
>> This must not be possible.
>>
>> OK, perhaps task->state was changed by ptrace_unfreeze_traced()? this can
>> only happen if it races with ttwu(__TASK_TRACED) without ->siglock held,
>> nobody should do this.
>>
>> Strange.
> 
> I have tried to reproduce this with an vanilla upstream 5.8-rc4 and the
> strace test-suite with
> 
> make check -j4
> 
> and I wasn't able to after multiple retries. Jiri, just to make sure
> this is upstream 5.8-rc4 without any additional patches?

You tackled it, we cherry-picked dbfb089d360 to our kernels. Ccing more
people.

So if you try with pure vanilla 5.8-rc6 (it contains the commit), you
should be able to reproduce. I am.

OTOH 5.8-rc6 with that commit reverted -- I cannot reproduce there. So
it must be it.

> Anything special required to reproduce this in the way you run strace
> and so on?

Nothing special there. Except the HW: when I run a VM with 16+
processors, it's much more likely to be hit (usually, on the first run
of make check).

thanks,
-- 
js

Re: [PATCH] platform_data: cros_ec_commands.h: drop a duplicated word

2020-07-19 Thread Randy Dunlap

On 7/19/20 9:42 PM, Tzung-Bi Shih wrote:
> On Sun, Jul 19, 2020 at 8:30 AM Randy Dunlap  wrote:
>>
>> Drop the repeated word "using" in a comment.
>>
>> Signed-off-by: Randy Dunlap 
>> Cc: Tzung-Bi Shih 
>> Cc: Mark Brown 
>> ---
> 
> I guess you didn't include the maintainers:
> Benson Leung  (maintainer:CHROMEOS EC SUBDRIVERS)
> Enric Balletbo i Serra 
> (maintainer:CHROMEOS EC SUBDRIVERS)
> Guenter Roeck  (reviewer:CHROMEOS EC SUBDRIVERS)
> 

Thanks. I'll add them and resend the patch.

-- 
~Randy

Re: [PATCH 1/2] thermal: netlink: Improve the initcall ordering

2020-07-19 Thread Amit Kucheria

On Fri, Jul 17, 2020 at 10:12 PM Daniel Lezcano
 wrote:
>
> The initcalls like to play joke. In our case, the thermal-netlink
> initcall is called after the thermal-core initcall but this one sends
> a notification before the former is initialzed. No issue was spotted,

typo: initialized

> but it could lead to a memory corruption, so instead of relying on the
> core_initcall for the thermal-netlink, let's initialize directly from
> the thermal-core init routine, so we have full control of the init
> ordering.

> Reported-by: Marek Szyprowski 
> Tested-by: Marek Szyprowski 
> Signed-off-by: Daniel Lezcano 

Reviewed-by: Amit Kucheria 

> ---
>  drivers/thermal/thermal_core.c| 4 
>  drivers/thermal/thermal_netlink.c | 3 +--
>  drivers/thermal/thermal_netlink.h | 6 ++
>  3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 25ef29123f72..c2e7d7aaa354 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1581,6 +1581,10 @@ static int __init thermal_init(void)
>  {
> int result;
>
> +   result = thermal_netlink_init();
> +   if (result)
> +   goto error;
> +
> mutex_init(&poweroff_lock);
> result = thermal_register_governors();
> if (result)
> diff --git a/drivers/thermal/thermal_netlink.c 
> b/drivers/thermal/thermal_netlink.c
> index dd0a3b889674..42eace7401da 100644
> --- a/drivers/thermal/thermal_netlink.c
> +++ b/drivers/thermal/thermal_netlink.c
> @@ -641,8 +641,7 @@ static struct genl_family thermal_gnl_family 
> __ro_after_init = {
> .n_mcgrps   = ARRAY_SIZE(thermal_genl_mcgrps),
>  };
>
> -static int __init thermal_netlink_init(void)
> +int __init thermal_netlink_init(void)
>  {
> return genl_register_family(&thermal_gnl_family);
>  }
> -core_initcall(thermal_netlink_init);
> diff --git a/drivers/thermal/thermal_netlink.h 
> b/drivers/thermal/thermal_netlink.h
> index 0ec28d105da5..828d1dddfa98 100644
> --- a/drivers/thermal/thermal_netlink.h
> +++ b/drivers/thermal/thermal_netlink.h
> @@ -6,6 +6,7 @@
>
>  /* Netlink notification function */
>  #ifdef CONFIG_THERMAL_NETLINK
> +int __init thermal_netlink_init(void);
>  int thermal_notify_tz_create(int tz_id, const char *name);
>  int thermal_notify_tz_delete(int tz_id);
>  int thermal_notify_tz_enable(int tz_id);
> @@ -23,6 +24,11 @@ int thermal_notify_cdev_delete(int cdev_id);
>  int thermal_notify_tz_gov_change(int tz_id, const char *name);
>  int thermal_genl_sampling_temp(int id, int temp);
>  #else
> +static inline int thermal_netlink_init(void)
> +{
> +   return 0;
> +}
> +
>  static inline int thermal_notify_tz_create(int tz_id, const char *name)
>  {
> return 0;
> --
> 2.17.1
>

Re: [PATCH] mm/vmstat: don't do count if no needs

2020-07-19 Thread Alex Shi



>From 4cb977d34227e4bafa95b8da5e47dbd8b6141d26 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Mon, 20 Jul 2020 11:50:12 +0800
Subject: [PATCH v2] mm/vmstat: don't do count if no needs

For couple of vmstat account funcs, the caller usually doesn't check the
delta value, if delta == 0, irq or atomic operator is a waste. That's
better to be skipped, also add unlikey() since !delta is less happened.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmstat.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3fb23a21f6dd..b02d99dff443 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -321,6 +321,9 @@ void __mod_zone_page_state(struct zone *zone, enum 
zone_stat_item item,
long x;
long t;
 
+   if (unlikely(!delta))
+   return;
+
x = delta + __this_cpu_read(*p);
 
t = __this_cpu_read(pcp->stat_threshold);
@@ -341,6 +344,9 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
long x;
long t;
 
+   if (unlikely(!delta))
+   return;
+
x = delta + __this_cpu_read(*p);
 
t = __this_cpu_read(pcp->stat_threshold);
@@ -484,6 +490,9 @@ static inline void mod_zone_state(struct zone *zone,
s8 __percpu *p = pcp->vm_stat_diff + item;
long o, n, t, z;
 
+   if (unlikely(!delta))
+   return;
+
do {
z = 0;  /* overflow to zone counters */
 
@@ -518,6 +527,9 @@ static inline void mod_zone_state(struct zone *zone,
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 long delta)
 {
+   if (unlikely(!delta))
+   return;
+
mod_zone_state(zone, item, delta, 0);
 }
 EXPORT_SYMBOL(mod_zone_page_state);
@@ -541,6 +553,9 @@ static inline void mod_node_state(struct pglist_data *pgdat,
s8 __percpu *p = pcp->vm_node_stat_diff + item;
long o, n, t, z;
 
+   if (unlikely(!delta))
+   return;
+
do {
z = 0;  /* overflow to node counters */
 
@@ -604,6 +619,9 @@ void mod_zone_page_state(struct zone *zone, enum 
zone_stat_item item,
 {
unsigned long flags;
 
+   if (unlikely(!delta))
+   return;
+
local_irq_save(flags);
__mod_zone_page_state(zone, item, delta);
local_irq_restore(flags);
@@ -647,6 +665,9 @@ void mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
 {
unsigned long flags;
 
+   if (unlikely(!delta))
+   return;
+
local_irq_save(flags);
__mod_node_page_state(pgdat, item, delta);
local_irq_restore(flags);
-- 
2.18.4

Re: [PATCH 2/2] thermal: core: Move initialization after core initcall

2020-07-19 Thread Amit Kucheria

On Fri, Jul 17, 2020 at 10:12 PM Daniel Lezcano
 wrote:
>
> The generic netlink is initialized at subsys_initcall, so far after
> the thermal init routine and the thermal generic netlink family
> initialization.
>
> On ŝome platforms, that leads to a memory corruption.
>
> The fix was sent to netdev@ to move the genetlink framework
> initialization at core_initcall.
>
> Move the thermal core initialization to postcore level which is very
> close to core level.
>
> Reported-by: Marek Szyprowski 
> Tested-by: Marek Szyprowski 
> Signed-off-by: Daniel Lezcano 
> ---
>  drivers/thermal/thermal_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index c2e7d7aaa354..79551bb6cd4c 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1617,4 +1617,4 @@ static int __init thermal_init(void)
> mutex_destroy(&poweroff_lock);
> return result;
>  }
> -core_initcall(thermal_init);
> +postcore_initcall(thermal_init);

For posterity, we moved to core_initcall from fs_initcall since we had
removed netlink support from the thermal framework and we wanted to
initialise it as quickly as possible after cpufreq to allow early
mitigation possibility.

Given the addition of the new netlink-based API to thermal framework,
I think moving to postcore_initcall is an acceptable compromise.

Reviewed-by: Amit Kucheria 

> --
> 2.17.1
>

Re: linux-next: build failure after merge of the tip tree

2020-07-19 Thread Stephen Rothwell

Hi all,

On Tue, 14 Jul 2020 13:50:12 +1000 Stephen Rothwell  
wrote:
>
> On Tue, 7 Jul 2020 12:33:01 +1000 Stephen Rothwell  
> wrote:
> >
> > On Tue, 30 Jun 2020 12:54:59 +1000 Stephen Rothwell  
> > wrote:  
> > >
> > > On Mon, 22 Jun 2020 11:37:00 +1000 Stephen Rothwell 
> > >  wrote:
> > > >
> > > > After merging the tip tree, today's linux-next build (x86_64 
> > > > allmodconfig)
> > > > failed like this:
> > > > 
> > > > ERROR: modpost: "sched_setscheduler" 
> > > > [kernel/trace/ring_buffer_benchmark.ko] undefined!
> > > > 
> > > > Caused by commit
> > > > 
> > > >   616d91b68cd5 ("sched: Remove sched_setscheduler*() EXPORTs")
> > > > 
> > > > Missed one :-)
> > > > 
> > > > I have reverted that commit for today.  
> > > 
> > > I am still reverting that commit.
> > 
> > Ditto ...  
> 
> Ping?

OK, it has now been three weeks ... I must admit I was a little
suprised that initial testing of the tip tree (or even the branch that
this commit is on) did not pick this up.

Unless there is something really weird going on here, Linus will notice
if you try to get him to pull this as he tests with an x86_64
allmodconfig build.

-- 
Cheers,
Stephen Rothwell


pgpGW7Zgy2vol.pgp
Description: OpenPGP digital signature

Re: [PATCH v7 10/10] arm64: dts: actions: Add uSD support for Cubieboard7

2020-07-19 Thread Manivannan Sadhasivam

On Sun, Jul 19, 2020 at 11:12:07PM +0530, Amit Singh Tomar wrote:
> This commit adds uSD support for Cubieboard7 board based on Actions Semi
> S700 SoC. SD0 is connected to uSD slot. Since there is no PMIC support
> added yet, fixed regulator has been used as a regulator node.
> 
> Signed-off-by: Amit Singh Tomar 

Reviewed-by: Manivannan Sadhasivam 

Thanks,
Mani

> ---
> Changes since v6:
>   * Brought back the uSD fixed regulator.
> Changes since v5:
> * Removed the Fixed regulators as these are
>   not needed.
> Changes since v4:
> * No change.
> Changes since v3:
> * No change.
> Changes since v2:
> * No change.
> Changes since v1:
> * No change.
> Changes since RFC:
> * No change.
> ---
>  arch/arm64/boot/dts/actions/s700-cubieboard7.dts | 33 
> 
>  arch/arm64/boot/dts/actions/s700.dtsi|  1 +
>  2 files changed, 34 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/actions/s700-cubieboard7.dts 
> b/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> index 63e375cd9eb4..f81d63261ba2 100644
> --- a/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> +++ b/arch/arm64/boot/dts/actions/s700-cubieboard7.dts
> @@ -13,6 +13,7 @@
>  
>   aliases {
>   serial3 = &uart3;
> + mmc0 = &mmc0;
>   };
>  
>   chosen {
> @@ -28,6 +29,15 @@
>   device_type = "memory";
>   reg = <0x1 0xe000 0x0 0x0>;
>   };
> +
> + /* Fixed regulator used in the absence of PMIC */
> + sd_vcc: sd-vcc {
> + compatible = "regulator-fixed";
> + regulator-name = "fixed-3.1V";
> + regulator-min-microvolt = <310>;
> + regulator-max-microvolt = <310>;
> + regulator-always-on;
> + };
>  };
>  
>  &i2c0 {
> @@ -81,6 +91,14 @@
>   bias-pull-up;
>   };
>   };
> +
> + mmc0_default: mmc0_default {
> + pinmux {
> + groups = "sd0_d0_mfp", "sd0_d1_mfp", "sd0_d2_d3_mfp",
> +  "sd0_cmd_mfp", "sd0_clk_mfp";
> + function = "sd0";
> + };
> + };
>  };
>  
>  &timer {
> @@ -90,3 +108,18 @@
>  &uart3 {
>   status = "okay";
>  };
> +
> +/* uSD */
> +&mmc0 {
> + status = "okay";
> + pinctrl-names = "default";
> + pinctrl-0 = <&mmc0_default>;
> + cd-gpios = <&pinctrl 120 GPIO_ACTIVE_LOW>;
> + no-sdio;
> + no-mmc;
> + no-1-8-v;
> + bus-width = <4>;
> + vmmc-supply = <&sd_vcc>;
> + vqmmc-supply = <&sd_vcc>;
> +};
> +
> diff --git a/arch/arm64/boot/dts/actions/s700.dtsi 
> b/arch/arm64/boot/dts/actions/s700.dtsi
> index 9ed88aafc2da..ba498cf9217d 100644
> --- a/arch/arm64/boot/dts/actions/s700.dtsi
> +++ b/arch/arm64/boot/dts/actions/s700.dtsi
> @@ -4,6 +4,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> 2.7.4
>

Re: [PATCH v2 13/16] scripts/kallsyms: move ignored symbol types to is_ignored_symbol()

2020-07-19 Thread Masahiro Yamada

On Mon, Jul 20, 2020 at 10:46 AM Finn Thain  wrote:
>
> On Sun, 24 Nov 2019, Masahiro Yamada wrote:
>
> > Collect the ignored patterns to is_ignored_symbol().
> >
> > Signed-off-by: Masahiro Yamada 
>
> This commit (887df76de67f5) caused a regression in my powerpc builds as it
> causes symbol names to disappear from backtraces:
>
> [ cut here ]
> WARNING: CPU: 0 PID: 0 at kernel/smp.c:433 _einittext+0x3f9e5120/0x3feb71b8
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7-pmac-00055-g887df76de67f5 
> #18
> NIP:  c00aef68 LR: c00af114 CTR: c001272c
> REGS: c0705c40 TRAP: 0700   Not tainted  (5.4.0-rc7-pmac-00055-g887df76de67f5)
> MSR:  00029032   CR: 4244  XER: 
>
> GPR00: 001f0100 c0705cf8 c06dc300 c070af1c c001258c   ef7fb5bc
> GPR08: 0880 0100 0001 0100 4244  c0709040 0004
> GPR16: 0001 c06022b4 c058297c 0022 8cb9  c06d84a0 c071
> GPR24: c071   c070af1c c070af1c  c001258c 
> NIP [c00aef68] _einittext+0x3f9e5120/0x3feb71b8
> LR [c00af114] _einittext+0x3f9e52cc/0x3feb71b8
> Call Trace:
> [c0705cf8] [ef006320] 0xef006320 (unreliable)
> [c0705d38] [c00af114] _einittext+0x3f9e52cc/0x3feb71b8
> [c0705d48] [c00af158] _einittext+0x3f9e5310/0x3feb71b8
> [c0705d68] [c0012768] _einittext+0x3f948920/0x3feb71b8
> [c0705d78] [c0092c04] _einittext+0x3f9c8dbc/0x3feb71b8
> [c0705d88] [c0092d18] _einittext+0x3f9c8ed0/0x3feb71b8
> [c0705da8] [c0093a2c] _einittext+0x3f9c9be4/0x3feb71b8
> [c0705de8] [c0580224] _einittext+0x3feb63dc/0x3feb71b8
> [c0705e48] [c00382ec] _einittext+0x3f96e4a4/0x3feb71b8
> [c0705e58] [c000d2a0] _einittext+0x3f943458/0x3feb71b8
> [c0705e88] [c001353c] _einittext+0x3f9496f4/0x3feb71b8
> --- interrupt: 901 at _einittext+0x3f941058/0x3feb71b8
> LR = _einittext+0x3f941058/0x3feb71b8
> [c0705f50] [c06cc214] 0xc06cc214 (unreliable)
> [c0705f60] [c057fa20] _einittext+0x3feb5bd8/0x3feb71b8
> [c0705f70] [c005de48] _einittext+0x3f994000/0x3feb71b8
> [c0705f90] [c005e050] _einittext+0x3f994208/0x3feb71b8
> [c0705fa0] [c0004cc8] _einittext+0x3f93ae80/0x3feb71b8
> [c0705fb0] [c069a36c] _einittext+0x3ffd0524/0x4000
> [c0705ff0] [3500] 0x3500
> Instruction dump:
> 7c0803a6 7fa5eb78 7d808120 7ea6ab78 baa10014 38210040 4bfffbb0 7f64db78
> 7f85e378 484b31b1 7c601b78 4bfffdf4 <0fe0> 4bfffd60 9421ffe0 7c0802a6
> ---[ end trace a06fef4788747c72 ]---
>
>
> Prior to that (e.g. 97261e1e2240f), I get backtraces like this:
>
> [ cut here ]
> WARNING: CPU: 0 PID: 0 at kernel/smp.c:433 smp_call_function_many+0x318/0x320
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7-pmac-00054-g97261e1e2240f 
> #20
> NIP:  c00aef68 LR: c00af114 CTR: c001272c
> REGS: c075dc40 TRAP: 0700   Not tainted  (5.4.0-rc7-pmac-00054-g97261e1e2240f)
> MSR:  00029032   CR: 4244  XER: 
>
> GPR00: 001f0100 c075dcf8 c0733300 c0762f1c c001258c   ef7fb5bc
> GPR08: 0480 0100 0001 0100 4244  c0761040 0004
> GPR16: 0001 c0658e58 c058297c 0022 8cb9  c072f4a0 c076
> GPR24: c076   c0762f1c c0762f1c  c001258c 
> NIP [c00aef68] smp_call_function_many+0x318/0x320
> LR [c00af114] smp_call_function+0x34/0x44
> Call Trace:
> [c075dcf8] [ef006320] 0xef006320 (unreliable)
> [c075dd38] [c00af114] smp_call_function+0x34/0x44
> [c075dd48] [c00af158] on_each_cpu+0x1c/0x4c
> [c075dd68] [c0012768] tau_timeout_smp+0x3c/0x4c
> [c075dd78] [c0092c04] call_timer_fn.isra.26+0x20/0x84
> [c075dd88] [c0092d18] expire_timers+0xb0/0xc0
> [c075dda8] [c0093a2c] run_timer_softirq+0xa4/0x1a4
> [c075dde8] [c0580224] __do_softirq+0x11c/0x280
> [c075de48] [c00382ec] irq_exit+0xc0/0xd4
> [c075de58] [c000d2a0] timer_interrupt+0x154/0x260
> [c075de88] [c001353c] ret_from_except+0x0/0x14
> --- interrupt: 901 at arch_cpu_idle+0x24/0x78
> LR = arch_cpu_idle+0x24/0x78
> [c075df50] [c0723214] 0xc0723214 (unreliable)
> [c075df60] [c057fa20] default_idle_call+0x38/0x58
> [c075df70] [c005de48] do_idle+0xd4/0x17c
> [c075df90] [c005e054] cpu_startup_entry+0x24/0x28
> [c075dfa0] [c0004cc8] rest_init+0xa8/0xbc
> [c075dfb0] [c06f136c] start_kernel+0x40c/0x420
> [c075dff0] [3500] 0x3500
> Instruction dump:
> 7c0803a6 7fa5eb78 7d808120 7ea6ab78 baa10014 38210040 4bfffbb0 7f64db78
> 7f85e378 484b31b1 7c601b78 4bfffdf4 <0fe0> 4bfffd60 9421ffe0 7c0802a6
> ---[ end trace 784c7f15ecd23941 ]---
>
> Has anyone else observed these problems (either the WARNING from
> smp_call_function_many() or the missing symbol names)?
>
> What is the best way to fix this? Should I upgrade binutils?



I got a similar report before.

I'd like to know whether or not
this is the same issue as fixed by
7883a14339299773b2ce08dcfd97c63c199a9289


Does your problem happen on the latest kernel?
Which version of binutils are you using?


-- 
Best Regards
Masahiro Yamada

Re: [PATCH] net: genetlink: Move initialization to core_initcall

2020-07-19 Thread Amit Kucheria

On Wed, Jul 15, 2020 at 1:11 PM Daniel Lezcano
 wrote:
>
> The generic netlink is initialized far after the netlink protocol
> itself at subsys_initcall. The devlink is initialized at the same
> level, but after, as shown by a disassembly of the vmlinux:
>
> [ ... ]
> 374 8000115f22c0 <__initcall_devlink_init4>:
> 375 8000115f22c4 <__initcall_genl_init4>:
> [ ... ]
>
> The function devlink_init() calls genl_register_family() before the
> generic netlink subsystem is initialized.
>
> As the generic netlink initcall level is set since 2005, it seems that
> was not a problem, but now we have the thermal framework initialized
> at the core_initcall level which creates the generic netlink family
> and sends a notification which leads to a subtle memory corruption
> only detectable when the CONFIG_INIT_ON_ALLOC_DEFAULT_ON option is set
> with the earlycon at init time.
>
> The thermal framework needs to be initialized early in order to begin
> the mitigation as soon as possible. Moving it to postcore_initcall is
> acceptable.
>
> This patch changes the initialization level for the generic netlink
> family to the core_initcall and comes after the netlink protocol
> initialization.
>
> Signed-off-by: Daniel Lezcano 

Reviewed-by: Amit Kucheria 


> ---
>  net/netlink/genetlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
> index 55ee680e9db1..36b8a1909826 100644
> --- a/net/netlink/genetlink.c
> +++ b/net/netlink/genetlink.c
> @@ -1263,7 +1263,7 @@ static int __init genl_init(void)
> panic("GENL: Cannot register controller: %d\n", err);
>  }
>
> -subsys_initcall(genl_init);
> +core_initcall(genl_init);
>
>  static int genlmsg_mcast(struct sk_buff *skb, u32 portid, unsigned long 
> group,
>  gfp_t flags)
> --
> 2.17.1
>

Re: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

2020-07-19 Thread Andrii Nakryiko

On Wed, Jul 15, 2020 at 2:46 PM Hao Luo  wrote:
>
> Extend ksyms.c selftest to make sure BTF enables direct loads of ksyms.
>
> Note that test is done against the kernel btf extended with kernel VARs.
>
> Signed-off-by: Hao Luo 
> ---
>  tools/testing/selftests/bpf/prog_tests/ksyms.c |  2 ++
>  tools/testing/selftests/bpf/progs/test_ksyms.c | 14 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c 
> b/tools/testing/selftests/bpf/prog_tests/ksyms.c
> index e3d6777226a8..0e7f3bc3b0ae 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
> @@ -65,6 +65,8 @@ void test_ksyms(void)
>   "got %llu, exp %llu\n", data->out__btf_size, btf_size);
> CHECK(data->out__per_cpu_start != 0, "__per_cpu_start",
>   "got %llu, exp %llu\n", data->out__per_cpu_start, (__u64)0);
> +   CHECK(data->out__rq_cpu != 0, "rq_cpu",
> + "got %llu, exp %llu\n", data->out__rq_cpu, (__u64)0);
>
>  cleanup:
> test_ksyms__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c 
> b/tools/testing/selftests/bpf/progs/test_ksyms.c
> index 6c9cbb5a3bdf..e777603757e5 100644
> --- a/tools/testing/selftests/bpf/progs/test_ksyms.c
> +++ b/tools/testing/selftests/bpf/progs/test_ksyms.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /* Copyright (c) 2019 Facebook */
>
> +#include "vmlinux.h"
>  #include 
>  #include 
>  #include 
> @@ -9,11 +10,13 @@ __u64 out__bpf_link_fops = -1;
>  __u64 out__bpf_link_fops1 = -1;
>  __u64 out__btf_size = -1;
>  __u64 out__per_cpu_start = -1;
> +__u64 out__rq_cpu = -1;
>
>  extern const void bpf_link_fops __ksym;
>  extern const void __start_BTF __ksym;
>  extern const void __stop_BTF __ksym;
>  extern const void __per_cpu_start __ksym;
> +extern const void runqueues __ksym;

This should ideally look like a real global variable extern:

extern const struct rq runqueues __ksym;


But that's the case for non-per-cpu variables. You didn't seem to
address per-CPU variables in this patch set. How did you intend to
handle that? We should look at a possible BPF helper to access such
variables as well and how the verifier will prevent direct memory
accesses for such variables.

We should have some BPF helper that accepts per-CPU PTR_TO_BTF_ID, and
returns PTR_TO_BTF_ID, but adjusted to desired CPU. And verifier
ideally would allow direct memory access on that resulting
PTR_TO_BTF_ID, but not on per-CPU one. Not sure yet how this should
look like, but the verifier probably needs to know that variable
itself is per-cpu, no?

>  /* non-existing symbol, weak, default to zero */
>  extern const void bpf_link_fops1 __ksym __weak;
>
> @@ -29,4 +32,15 @@ int handler(const void *ctx)
> return 0;
>  }
>
> +SEC("tp_btf/sys_enter")
> +int handler_tp_btf(const void *ctx)
> +{
> +   const struct rq *rq = &runqueues;
> +
> +   /* rq now points to the runqueue of cpu 0. */
> +   out__rq_cpu = rq->cpu;
> +
> +   return 0;
> +}
> +
>  char _license[] SEC("license") = "GPL";
> --
> 2.27.0.389.gc38d7665816-goog
>

[PATCH] usb: appledisplay: remove needless check before usb_free_coherent()

2020-07-19 Thread Xu Wang

usb_free_coherent() is safe with NULL addr and this check is
not required.

Signed-off-by: Xu Wang 
---
 drivers/usb/misc/appledisplay.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/misc/appledisplay.c b/drivers/usb/misc/appledisplay.c
index ba1eaabc7796..eab3ea7e21cc 100644
--- a/drivers/usb/misc/appledisplay.c
+++ b/drivers/usb/misc/appledisplay.c
@@ -305,8 +305,7 @@ static int appledisplay_probe(struct usb_interface *iface,
if (pdata->urb) {
usb_kill_urb(pdata->urb);
cancel_delayed_work_sync(&pdata->work);
-   if (pdata->urbdata)
-   usb_free_coherent(pdata->udev, 
ACD_URB_BUFFER_LEN,
+   usb_free_coherent(pdata->udev, ACD_URB_BUFFER_LEN,
pdata->urbdata, 
pdata->urb->transfer_dma);
usb_free_urb(pdata->urb);
}
-- 
2.17.1

linux-next: manual merge of the staging tree with Linus' tree

2020-07-19 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the staging tree got a conflict in:

  drivers/iio/magnetometer/ak8974.c

between commit:

  0187294d227d ("iio: magnetometer: ak8974: Fix runtime PM imbalance on error")

from Linus' tree and commit:

  d3be83244c7d ("iio: remove explicit IIO device parent assignment")

from the staging tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/iio/magnetometer/ak8974.c
index 91c39352fba2,c2260c84f7f1..
--- a/drivers/iio/magnetometer/ak8974.c
+++ b/drivers/iio/magnetometer/ak8974.c
@@@ -890,10 -884,14 +890,9 @@@ static int ak8974_probe(struct i2c_clie
ret = ak8974_reset(ak8974);
if (ret) {
dev_err(&i2c->dev, "AK8974 reset failed\n");
 -  goto power_off;
 +  goto disable_pm;
}
  
-   indio_dev->dev.parent = &i2c->dev;
 -  pm_runtime_set_autosuspend_delay(&i2c->dev,
 -   AK8974_AUTOSUSPEND_DELAY);
 -  pm_runtime_use_autosuspend(&i2c->dev);
 -  pm_runtime_put(&i2c->dev);
 -
switch (ak8974->variant) {
case AK8974_WHOAMI_VALUE_AMI306:
case AK8974_WHOAMI_VALUE_AMI305:


pgp0GC1BrwhTB.pgp
Description: OpenPGP digital signature

[PATCH] mm/vmstat: don't do count if no needs

2020-07-19 Thread Alex Shi

For couple of vmstat account funcs, the caller usually doesn't check the
delta value, if delta == 0, irq or atomic operator is a waste. That's
better to be skipped.

Signed-off-by: Alex Shi 
Cc: Andrew Morton  
Cc: linux...@kvack.org 
Cc: linux-kernel@vger.kernel.org 
---
 mm/vmstat.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3fb23a21f6dd..91f28146daa7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -321,6 +321,9 @@ void __mod_zone_page_state(struct zone *zone, enum 
zone_stat_item item,
long x;
long t;
 
+   if (!delta)
+   return;
+
x = delta + __this_cpu_read(*p);
 
t = __this_cpu_read(pcp->stat_threshold);
@@ -341,6 +344,9 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
long x;
long t;
 
+   if (!delta)
+   return;
+
x = delta + __this_cpu_read(*p);
 
t = __this_cpu_read(pcp->stat_threshold);
@@ -484,6 +490,9 @@ static inline void mod_zone_state(struct zone *zone,
s8 __percpu *p = pcp->vm_stat_diff + item;
long o, n, t, z;
 
+   if (!delta)
+   return;
+
do {
z = 0;  /* overflow to zone counters */
 
@@ -518,6 +527,9 @@ static inline void mod_zone_state(struct zone *zone,
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 long delta)
 {
+   if (!delta)
+   return;
+
mod_zone_state(zone, item, delta, 0);
 }
 EXPORT_SYMBOL(mod_zone_page_state);
@@ -541,6 +553,9 @@ static inline void mod_node_state(struct pglist_data *pgdat,
s8 __percpu *p = pcp->vm_node_stat_diff + item;
long o, n, t, z;
 
+   if (!delta)
+   return;
+
do {
z = 0;  /* overflow to node counters */
 
@@ -581,6 +596,9 @@ EXPORT_SYMBOL(mod_node_page_state);
 
 void inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 {
+   if (!delta)
+   return;
+
mod_node_state(pgdat, item, 1, 1);
 }
 
@@ -604,6 +622,9 @@ void mod_zone_page_state(struct zone *zone, enum 
zone_stat_item item,
 {
unsigned long flags;
 
+   if (!delta)
+   return;
+
local_irq_save(flags);
__mod_zone_page_state(zone, item, delta);
local_irq_restore(flags);
@@ -647,6 +668,9 @@ void mod_node_page_state(struct pglist_data *pgdat, enum 
node_stat_item item,
 {
unsigned long flags;
 
+   if (!delta)
+   return;
+
local_irq_save(flags);
__mod_node_page_state(pgdat, item, delta);
local_irq_restore(flags);
-- 
2.18.4

Re: [RFC PATCH bpf-next 1/2] bpf: BTF support for __ksym externs

2020-07-19 Thread Andrii Nakryiko

On Wed, Jul 15, 2020 at 2:45 PM Hao Luo  wrote:
>
> Previous commits:
>
>  commit 1c0c7074fefd ("libbpf: Add support for extracting kernel symbol 
> addresses")
>  commit 2e33efe32e01 ("libbpf: Generalize libbpf externs support")
>
> have introduced a new type of extern variable ksyms to access kernel
> global variables. This patch extends that work by adding btf info
> for ksyms. In more details, in addition to the existing type btf_types,
> pahole is going to encode a certain global variables in kernel btf
> (percpu variables at this moment). With the extended kernel btf, we
> can associate btf id to the ksyms to improve the performance of
> accessing those vars by using direct load instructions.

This is a step in the right direction, thanks for working on this. See
below for a few problems, though.

Also, in the next version, please split kernel part and libbpf part
into separate patches.

>
> More specifically, libbpf can scan the kernel btf to find the btf id
> of a ksym at extern resolution. During relocation, it will replace
> "ld_imm64 rX, foo" with BPF_PSEUDO_BTF_ID. From the verifier point of
> view "ld_imm64 rX, foo // pseudo_btf_id" will be similar to ld_imm64
> with pseudo_map_fd and pseudo_map_value. The verifier will check btf_id
> and replace that with actual kernel address at program load time. It
> will also know that exact type of 'rX' from there on.
>
> Note that since only a subset of kernel symbols are encoded in btf right
> now, finding btf_id for ksyms is only best effort. If a ksym does not
> have a btf id, we do not rewrite its ld_imm64 to pseudo_btf_id. In that
> case, it is treated as loading from a scalar value, which is the current
> default behavior for ksyms.

I don't think that's the right approach. It can't be the best effort.
It's actually pretty clear when a user wants a BTF-based variable with
ability to do direct memory access vs __ksym address that we have
right now: variable type info. In your patch you are only looking up
variable by name, but it needs to be more elaborate logic:

1. if variable type is `extern void` -- do what we do today (no BTF required)
2. if the variable type is anything but `extern void`, then find that
variable in BTF. If no BTF or variable is not found -- hard error with
detailed enough message about what we expected to find in kernel BTF.
3. If such a variable is found in the kernel, then might be a good
idea to additionally check type compatibility (e.g., struct/union
should match struct/union, int should match int, typedefs should get
resolved to underlying type, etc). I don't think deep comparison of
structs is right, though, due to CO-RE, so just high-level
compatibility checks to prevent the most obvious mistakes.

>
> Also note since we need to carry the ksym's address (64bits) as well as
> its btf_id (32bits), pseudo_btf_id uses ld_imm64's both imm and off
> fields.

For BTF-enabled ksyms, libbpf doesn't need to provide symbol address,
kernel will find it and substitute it, so BTF ID is the only
parameter. Thus it can just go into the imm field (and simplify
ldimm64 validation logic a bit).

>
> Signed-off-by: Hao Luo 
> ---
>  include/uapi/linux/bpf.h   | 37 +++--
>  kernel/bpf/verifier.c  | 26 +++---
>  tools/include/uapi/linux/bpf.h | 37 +++--
>  tools/lib/bpf/libbpf.c | 50 +-
>  4 files changed, 127 insertions(+), 23 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 5e386389913a..7490005acdba 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -334,18 +334,37 @@ enum bpf_link_type {
>  #define BPF_F_TEST_STATE_FREQ  (1U << 3)
>
>  /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> - * two extensions:
> - *
> - * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
> - * insn[0].imm:  map fd  map fd
> - * insn[1].imm:  0   offset into value
> - * insn[0].off:  0   0
> - * insn[1].off:  0   0
> - * ldimm64 rewrite:  address of map  address of map[0]+offset
> - * verifier type:CONST_PTR_TO_MAPPTR_TO_MAP_VALUE
> + * three extensions:
> + *
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_FD
> + * insn[0].imm:  map fd
> + * insn[1].imm:  0
> + * insn[0].off:  0
> + * insn[1].off:  0
> + * ldimm64 rewrite:  address of map
> + * verifier type:CONST_PTR_TO_MAP
>   */
>  #define BPF_PSEUDO_MAP_FD  1
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_VALUE
> + * insn[0].imm:  map fd
> + * insn[1].imm:  offset into value
> + * insn[0].off:  0
> + * insn[1].off:  0
> + * ldimm64 rewrite:  address of map[0]+offset
> + * verifier type:PTR_TO_MAP_VALUE
> + */
>  #define BPF_PSEUDO_MAP_VALUE   2
> +/*
> + * insn[0].src_reg:  BPF_PSEUDO_BTF_ID
> + * insn[0].imm:  lower 32 bits of address
> + * insn[1].imm:  higher 32 bits of address
> +

Re: [PATCH vhost next 10/10] vdpa/mlx5: Add VDPA driver for supported mlx5 devices

2020-07-19 Thread Eli Cohen

On Mon, Jul 20, 2020 at 12:12:30PM +0800, Jason Wang wrote:
> 
> On 2020/7/19 上午3:49, Eli Cohen wrote:
> >On Fri, Jul 17, 2020 at 04:57:29PM +0800, Jason Wang wrote:
> >>>Looks like checking intialized is enough. Will fix this.
> >+
> >+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, 
> >bool ready)
> >+{
> >+struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >+struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> >+int err;
> >+
> >+if (!mvq->ready && ready && mvq->fw_state != 
> >MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> >+err = modify_virtqueue(ndev, mvq, 
> >MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> >+if (err) {
> >+mlx5_vdpa_warn(mvdev, "failed to modify 
> >virtqueue(%d)\n", err);
> >+return;
> >+}
> >+}
> I wonder what's the reason of changing vq state on the hardware
> here. I think we can defer it to set_status().
> 
> >>>I can defer this to set status.
> >>>
> >>>I just wonder if it is possible that the core vdpa driver may call this
> >>>function with ready equals false and after some time call it with ready
> >>>equals true.
> >>
> >>Good point, so I think we can keep the logic. But looks like the
> >>code can not work if ready equals false since it only tries to
> >>modify vq state to RDY.
> >>
> >The point is that you cannot modify the virtqueue to "not ready".
> 
> 
> Is this a hardware limitation of software one?

Sorry, but I was not accruate in my statement above. You can suspend the
hardware VQ but you cannot mover out of suspend back to ready.
> 
> I'm asking since we need support live migration. But a questions is
> how to stop the device but not reset, since we need get e.g
> last_avail_idx from the device.
> 
> It could be either:
> 
> 1) set_status(0)
> 2) get_vq_state()
> 
> or
> 
> 1) set_queue_ready(0)
> 2) get_vq_state()
> 

This can work.

> Set_status(0) means reset the virtio device but last_avail_idx is
> something out of virtio spec. I guess using set_queue_ready() is
> better.
> 
> What's you opinion?

So if the intention to set ready(0) as a preliminary state for live
migration then we're ok. We just need to keep in mind that there's no
way of suspend but destroy the hardware vq.

> 
> Thanks
> 
> 
> >  The
> >only option is to destroy it and create a new one. This means that if I
> >get ready equals false after the virtqueue has been created I need to
> >teardown the driver and set it up again.
> >
> >Given that, I think your original suggestion to defer this logic is
> >reasonable.
> >
>

Re: [PATCH for v5.9] dt-bindings: aspeed-lpc: Replace HTTP links with HTTPS ones

2020-07-19 Thread Joel Stanley

On Sun, 19 Jul 2020 at 17:34, Alexander A. Klimov
 wrote:

> diff --git a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt 
> b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
> index 86446074e206..4f3cf91b3b7f 100644
> --- a/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
> +++ b/Documentation/devicetree/bindings/mfd/aspeed-lpc.txt
> @@ -35,9 +35,9 @@ Additionally the state of the LPC controller influences the 
> pinmux
>  configuration, therefore the host portion of the controller is exposed as a
>  syscon as a means to arbitrate access.
>
> -[0] http://www.intel.com/design/chipsets/industry/25128901.pdf
> +[0] https://www.intel.com/design/chipsets/industry/25128901.pdf

This URL redirects to a landing page. We should update it to this:

https://www.intel.com/content/dam/www/program/design/us/en/documents/low-pin-count-interface-specification.pdf'

or this:

https://www.intel.com/content/www/us/en/design/technologies-and-topics/low-pin-count-interface-specification.html


>  [1] 
> https://www.renesas.com/en-sg/doc/products/mpumcu/001/rej09b0078_h8s2168.pdf?key=7c88837454702128622bee53acbda8f4
> -[2] 
> http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf
> +[2] 
> https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf

For this change:

Acked-by: Joel Stanley 

>  [3] https://en.wikipedia.org/wiki/Super_I/O
>
>  Required properties
> --
> 2.27.0
>

Re: [PATCH 03/10] ASoC: audio-graph: Support Codec with multiple endpoints

2020-07-19 Thread Kuninori Morimoto



Hi Sameer

> >> diff --git a/sound/soc/generic/audio-graph-card.c 
> >> b/sound/soc/generic/audio-graph-card.c
> >> index 1e20562..b1903f9 100644
> >> --- a/sound/soc/generic/audio-graph-card.c
> >> +++ b/sound/soc/generic/audio-graph-card.c
> >> @@ -201,8 +201,7 @@ static void graph_parse_mclk_fs(struct device_node 
> >> *top,
> >>   static int graph_dai_link_of_dpcm(struct asoc_simple_priv *priv,
> >>  struct device_node *cpu_ep,
> >>  struct device_node *codec_ep,
> >> -   struct link_info *li,
> >> -   int dup_codec)
> >> +   struct link_info *li)
> > This patch breaks DPCM connection which is used for MIXer
> 
> Could you please elaborate a bit more as to what is broken with this?
> The problem I am trying to solve here is to have multiple endpoints
> for Codec port.

For example MIXer, it is like below.
If you removes "dup_codec", it breaks MIXer and/or TDM split mode.

CPU0  ---+ DAI
 |
CPU1  ---+

cpu {
ports {
port@0 {
cpu_0: endpoint {
remote-endpoint = <&codec_0>;
};
};
port@1 {
cpu_1: endpoint {
remote-endpoint = <&codec_1>;
};
};
};
};

codec {
port {
codec_0: endpoint {
remote-endpoint = <&cpu_0>;
};
codec_1: endpoint {
remote-endpoint = <&cpu_1>;
};
}
};

Thank you for your help !!

Best regards
---
Kuninori Morimoto

[PATCH 3/4] ACPI: APD: Add a fmw property is_raven

2020-07-19 Thread Akshu Agrawal

Since there is slight difference in AMD RV based soc in misc
clk architecture. The fmw property will help in differentiating
the SoCs.

Signed-off-by: Akshu Agrawal 
---
 drivers/acpi/acpi_apd.c   | 4 
 include/linux/platform_data/clk-fch.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/acpi/acpi_apd.c b/drivers/acpi/acpi_apd.c
index 2d99e46add1a..d879ba28826c 100644
--- a/drivers/acpi/acpi_apd.c
+++ b/drivers/acpi/acpi_apd.c
@@ -82,6 +82,7 @@ static int misc_check_res(struct acpi_resource *ares, void 
*data)
 static int fch_misc_setup(struct apd_private_data *pdata)
 {
struct acpi_device *adev = pdata->adev;
+   const union acpi_object *obj;
struct platform_device *clkdev;
struct fch_clk_data *clk_data;
struct resource_entry *rentry;
@@ -98,6 +99,9 @@ static int fch_misc_setup(struct apd_private_data *pdata)
if (ret < 0)
return -ENOENT;
 
+   acpi_dev_get_property(adev, "is-rv", ACPI_TYPE_INTEGER, &obj);
+   clk_data->is_rv = obj->integer.value;
+
list_for_each_entry(rentry, &resource_list, node) {
clk_data->base = devm_ioremap(&adev->dev, rentry->res->start,
  resource_size(rentry->res));
diff --git a/include/linux/platform_data/clk-fch.h 
b/include/linux/platform_data/clk-fch.h
index 850ca776156d..b9f682459f08 100644
--- a/include/linux/platform_data/clk-fch.h
+++ b/include/linux/platform_data/clk-fch.h
@@ -12,6 +12,7 @@
 
 struct fch_clk_data {
void __iomem *base;
+   u32 is_rv;
 };
 
 #endif /* __CLK_FCH_H */
-- 
2.20.1

[PATCH 1/4] ACPI: APD: Change name from ST to FCH

2020-07-19 Thread Akshu Agrawal

AMD SoC general pupose clk is present in new platforms with
same MMIO mappings. We can reuse the same clk handler support
for other platforms. Hence, changing name from ST(SoC) to FCH(IP)

Signed-off-by: Akshu Agrawal 
---
 drivers/acpi/acpi_apd.c| 14 +++---
 .../linux/platform_data/{clk-st.h => clk-fch.h}| 10 +-
 2 files changed, 12 insertions(+), 12 deletions(-)
 rename include/linux/platform_data/{clk-st.h => clk-fch.h} (53%)

diff --git a/drivers/acpi/acpi_apd.c b/drivers/acpi/acpi_apd.c
index ba2612e9a0eb..2d99e46add1a 100644
--- a/drivers/acpi/acpi_apd.c
+++ b/drivers/acpi/acpi_apd.c
@@ -8,7 +8,7 @@
  */
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -79,11 +79,11 @@ static int misc_check_res(struct acpi_resource *ares, void 
*data)
return !acpi_dev_resource_memory(ares, &res);
 }
 
-static int st_misc_setup(struct apd_private_data *pdata)
+static int fch_misc_setup(struct apd_private_data *pdata)
 {
struct acpi_device *adev = pdata->adev;
struct platform_device *clkdev;
-   struct st_clk_data *clk_data;
+   struct fch_clk_data *clk_data;
struct resource_entry *rentry;
struct list_head resource_list;
int ret;
@@ -106,7 +106,7 @@ static int st_misc_setup(struct apd_private_data *pdata)
 
acpi_dev_free_resource_list(&resource_list);
 
-   clkdev = platform_device_register_data(&adev->dev, "clk-st",
+   clkdev = platform_device_register_data(&adev->dev, "clk-fch",
   PLATFORM_DEVID_NONE, clk_data,
   sizeof(*clk_data));
return PTR_ERR_OR_ZERO(clkdev);
@@ -135,8 +135,8 @@ static const struct apd_device_desc cz_uart_desc = {
.properties = uart_properties,
 };
 
-static const struct apd_device_desc st_misc_desc = {
-   .setup = st_misc_setup,
+static const struct apd_device_desc fch_misc_desc = {
+   .setup = fch_misc_setup,
 };
 #endif
 
@@ -239,7 +239,7 @@ static const struct acpi_device_id acpi_apd_device_ids[] = {
{ "AMD0020", APD_ADDR(cz_uart_desc) },
{ "AMDI0020", APD_ADDR(cz_uart_desc) },
{ "AMD0030", },
-   { "AMD0040", APD_ADDR(st_misc_desc)},
+   { "AMD0040", APD_ADDR(fch_misc_desc)},
 #endif
 #ifdef CONFIG_ARM64
{ "APMC0D0F", APD_ADDR(xgene_i2c_desc) },
diff --git a/include/linux/platform_data/clk-st.h 
b/include/linux/platform_data/clk-fch.h
similarity index 53%
rename from include/linux/platform_data/clk-st.h
rename to include/linux/platform_data/clk-fch.h
index 7cdb6a402b35..850ca776156d 100644
--- a/include/linux/platform_data/clk-st.h
+++ b/include/linux/platform_data/clk-fch.h
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: MIT */
 /*
- * clock framework for AMD Stoney based clock
+ * clock framework for AMD misc clocks
  *
  * Copyright 2018 Advanced Micro Devices, Inc.
  */
 
-#ifndef __CLK_ST_H
-#define __CLK_ST_H
+#ifndef __CLK_FCH_H
+#define __CLK_FCH_H
 
 #include 
 
-struct st_clk_data {
+struct fch_clk_data {
void __iomem *base;
 };
 
-#endif /* __CLK_ST_H */
+#endif /* __CLK_FCH_H */
-- 
2.20.1

[v2 4/4] clk: x86: Support RV architecture

2020-07-19 Thread Akshu Agrawal

There is minor difference between previous family of SoC and
the current one. Which is the there is only 48Mh fixed clk.
There is no mux and no option to select another freq as there in previous.

Signed-off-by: Akshu Agrawal 
---
v2: Consolidated the loops in remove.

 drivers/clk/x86/clk-fch.c | 53 ---
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/x86/clk-fch.c b/drivers/clk/x86/clk-fch.c
index b252f0cf0628..d68bca7b213f 100644
--- a/drivers/clk/x86/clk-fch.c
+++ b/drivers/clk/x86/clk-fch.c
@@ -26,6 +26,10 @@
 #define ST_CLK_GATE3
 #define ST_MAX_CLKS4
 
+#define RV_CLK_48M 0
+#define RV_CLK_GATE1
+#define RV_MAX_CLKS2
+
 static const char * const clk_oscout1_parents[] = { "clk48MHz", "clk25MHz" };
 static struct clk_hw *hws[ST_MAX_CLKS];
 
@@ -37,33 +41,52 @@ static int fch_clk_probe(struct platform_device *pdev)
if (!fch_data || !fch_data->base)
return -EINVAL;
 
-   hws[ST_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz", NULL, 0,
-4800);
-   hws[ST_CLK_25M] = clk_hw_register_fixed_rate(NULL, "clk25MHz", NULL, 0,
-2500);
+   if (!fch_data->is_rv) {
+   hws[ST_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz",
+   NULL, 0, 4800);
+   hws[ST_CLK_25M] = clk_hw_register_fixed_rate(NULL, "clk25MHz",
+   NULL, 0, 2500);
+
+   hws[ST_CLK_MUX] = clk_hw_register_mux(NULL, "oscout1_mux",
+   clk_oscout1_parents, ARRAY_SIZE(clk_oscout1_parents),
+   0, fch_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0,
+   NULL);
 
-   hws[ST_CLK_MUX] = clk_hw_register_mux(NULL, "oscout1_mux",
-   clk_oscout1_parents, ARRAY_SIZE(clk_oscout1_parents),
-   0, fch_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0, NULL);
+   clk_set_parent(hws[ST_CLK_MUX]->clk, hws[ST_CLK_48M]->clk);
 
-   clk_set_parent(hws[ST_CLK_MUX]->clk, hws[ST_CLK_48M]->clk);
+   hws[ST_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1",
+   "oscout1_mux", 0, fch_data->base + MISCCLKCNTL1,
+   OSCCLKENB, CLK_GATE_SET_TO_DISABLE, NULL);
 
-   hws[ST_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1", "oscout1_mux",
-   0, fch_data->base + MISCCLKCNTL1, OSCCLKENB,
-   CLK_GATE_SET_TO_DISABLE, NULL);
+   devm_clk_hw_register_clkdev(&pdev->dev, hws[ST_CLK_GATE],
+   "oscout1", NULL);
+   } else {
+   hws[RV_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz",
+   NULL, 0, 4800);
 
-   devm_clk_hw_register_clkdev(&pdev->dev, hws[ST_CLK_GATE], "oscout1",
-   NULL);
+   hws[RV_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1",
+   "clk48MHz", 0, fch_data->base + MISCCLKCNTL1,
+   OSCCLKENB, CLK_GATE_SET_TO_DISABLE, NULL);
+
+   devm_clk_hw_register_clkdev(&pdev->dev, hws[RV_CLK_GATE],
+   "oscout1", NULL);
+   }
 
return 0;
 }
 
 static int fch_clk_remove(struct platform_device *pdev)
 {
-   int i;
+   int i, clks;
+   struct fch_clk_data *fch_data;
 
-   for (i = 0; i < ST_MAX_CLKS; i++)
+   fch_data = dev_get_platdata(&pdev->dev);
+
+   clks = (!fch_data->is_rv) ? ST_MAX_CLKS : RV_MAX_CLKS;
+
+   for (i = 0; i < clks; i++)
clk_hw_unregister(hws[i]);
+
return 0;
 }
 
-- 
2.20.1

[PATCH 2/4] clk: x86: Change name from ST to FCH

2020-07-19 Thread Akshu Agrawal

AMD SoC general pupose clk is present in new platforms with
minor differences. We can reuse the same clk driver for other
platforms. Hence, changing name from ST(SoC) to FCH(IP)

Signed-off-by: Akshu Agrawal 
---
 drivers/clk/x86/Makefile|  2 +-
 drivers/clk/x86/{clk-st.c => clk-fch.c} | 26 -
 2 files changed, 14 insertions(+), 14 deletions(-)
 rename drivers/clk/x86/{clk-st.c => clk-fch.c} (73%)

diff --git a/drivers/clk/x86/Makefile b/drivers/clk/x86/Makefile
index 7c774ea7ddeb..18564efdc651 100644
--- a/drivers/clk/x86/Makefile
+++ b/drivers/clk/x86/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_PMC_ATOM) += clk-pmc-atom.o
-obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE)  += clk-st.o
+obj-$(CONFIG_X86_AMD_PLATFORM_DEVICE)  += clk-fch.o
 clk-x86-lpss-objs  := clk-lpt.o
 obj-$(CONFIG_X86_INTEL_LPSS)   += clk-x86-lpss.o
 obj-$(CONFIG_CLK_LGM_CGU)  += clk-cgu.o clk-cgu-pll.o clk-lgm.o
diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-fch.c
similarity index 73%
rename from drivers/clk/x86/clk-st.c
rename to drivers/clk/x86/clk-fch.c
index 25d4b97aff9b..b252f0cf0628 100644
--- a/drivers/clk/x86/clk-st.c
+++ b/drivers/clk/x86/clk-fch.c
@@ -8,7 +8,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /* Clock Driving Strength 2 register */
@@ -29,12 +29,12 @@
 static const char * const clk_oscout1_parents[] = { "clk48MHz", "clk25MHz" };
 static struct clk_hw *hws[ST_MAX_CLKS];
 
-static int st_clk_probe(struct platform_device *pdev)
+static int fch_clk_probe(struct platform_device *pdev)
 {
-   struct st_clk_data *st_data;
+   struct fch_clk_data *fch_data;
 
-   st_data = dev_get_platdata(&pdev->dev);
-   if (!st_data || !st_data->base)
+   fch_data = dev_get_platdata(&pdev->dev);
+   if (!fch_data || !fch_data->base)
return -EINVAL;
 
hws[ST_CLK_48M] = clk_hw_register_fixed_rate(NULL, "clk48MHz", NULL, 0,
@@ -44,12 +44,12 @@ static int st_clk_probe(struct platform_device *pdev)
 
hws[ST_CLK_MUX] = clk_hw_register_mux(NULL, "oscout1_mux",
clk_oscout1_parents, ARRAY_SIZE(clk_oscout1_parents),
-   0, st_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0, NULL);
+   0, fch_data->base + CLKDRVSTR2, OSCOUT1CLK25MHZ, 3, 0, NULL);
 
clk_set_parent(hws[ST_CLK_MUX]->clk, hws[ST_CLK_48M]->clk);
 
hws[ST_CLK_GATE] = clk_hw_register_gate(NULL, "oscout1", "oscout1_mux",
-   0, st_data->base + MISCCLKCNTL1, OSCCLKENB,
+   0, fch_data->base + MISCCLKCNTL1, OSCCLKENB,
CLK_GATE_SET_TO_DISABLE, NULL);
 
devm_clk_hw_register_clkdev(&pdev->dev, hws[ST_CLK_GATE], "oscout1",
@@ -58,7 +58,7 @@ static int st_clk_probe(struct platform_device *pdev)
return 0;
 }
 
-static int st_clk_remove(struct platform_device *pdev)
+static int fch_clk_remove(struct platform_device *pdev)
 {
int i;
 
@@ -67,12 +67,12 @@ static int st_clk_remove(struct platform_device *pdev)
return 0;
 }
 
-static struct platform_driver st_clk_driver = {
+static struct platform_driver fch_clk_driver = {
.driver = {
-   .name = "clk-st",
+   .name = "clk-fch",
.suppress_bind_attrs = true,
},
-   .probe = st_clk_probe,
-   .remove = st_clk_remove,
+   .probe = fch_clk_probe,
+   .remove = fch_clk_remove,
 };
-builtin_platform_driver(st_clk_driver);
+builtin_platform_driver(fch_clk_driver);
-- 
2.20.1

Re: [PATCH v2 1/2] cpuidle: Trace IPI based and timer based wakeup latency from idle states

2020-07-19 Thread Gautham R Shenoy

On Fri, Jul 17, 2020 at 02:48:00PM +0530, Pratik Rajesh Sampat wrote:
> Fire directed smp_call_function_single IPIs from a specified source
> CPU to the specified target CPU to reduce the noise we have to wade
> through in the trace log.
> The module is based on the idea written by Srivatsa Bhat and maintained
> by Vaidyanathan Srinivasan internally.
> 
> Queue HR timer and measure jitter. Wakeup latency measurement for idle
> states using hrtimer.  Echo a value in ns to timer_test_function and
> watch trace. A HRtimer will be queued and when it fires the expected
> wakeup vs actual wakeup is computes and delay printed in ns.
> 
> Implemented as a module which utilizes debugfs so that it can be
> integrated with selftests.
> 
> To include the module, check option and include as module
> kernel hacking -> Cpuidle latency selftests
> 
> [srivatsa.b...@linux.vnet.ibm.com: Initial implementation in
>  cpidle/sysfs]
> 
> [sva...@linux.vnet.ibm.com: wakeup latency measurements using hrtimer
>  and fix some of the time calculation]
> 
> [e...@linux.vnet.ibm.com: Fix some whitespace and tab errors and
>  increase the resolution of IPI wakeup]
> 
> Signed-off-by: Pratik Rajesh Sampat 


The debugfs module looks good to me.

Reviewed-by: Gautham R. Shenoy 


> ---
>  drivers/cpuidle/Makefile   |   1 +
>  drivers/cpuidle/test-cpuidle_latency.c | 150 +
>  lib/Kconfig.debug  |  10 ++
>  3 files changed, 161 insertions(+)
>  create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
> 
> diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
> index f07800cbb43f..2ae05968078c 100644
> --- a/drivers/cpuidle/Makefile
> +++ b/drivers/cpuidle/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
>  obj-$(CONFIG_DT_IDLE_STATES)   += dt_idle_states.o
>  obj-$(CONFIG_ARCH_HAS_CPU_RELAX)   += poll_state.o
>  obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o
> +obj-$(CONFIG_IDLE_LATENCY_SELFTEST)+= test-cpuidle_latency.o
> 
>  
> ##
>  # ARM SoC drivers
> diff --git a/drivers/cpuidle/test-cpuidle_latency.c 
> b/drivers/cpuidle/test-cpuidle_latency.c
> new file mode 100644
> index ..61574665e972
> --- /dev/null
> +++ b/drivers/cpuidle/test-cpuidle_latency.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Module-based API test facility for cpuidle latency using IPIs and timers
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/* IPI based wakeup latencies */
> +struct latency {
> + unsigned int src_cpu;
> + unsigned int dest_cpu;
> + ktime_t time_start;
> + ktime_t time_end;
> + u64 latency_ns;
> +} ipi_wakeup;
> +
> +static void measure_latency(void *info)
> +{
> + struct latency *v;
> + ktime_t time_diff;
> +
> + v = (struct latency *)info;
> + v->time_end = ktime_get();
> + time_diff = ktime_sub(v->time_end, v->time_start);
> + v->latency_ns = ktime_to_ns(time_diff);
> +}
> +
> +void run_smp_call_function_test(unsigned int cpu)
> +{
> + ipi_wakeup.src_cpu = smp_processor_id();
> + ipi_wakeup.dest_cpu = cpu;
> + ipi_wakeup.time_start = ktime_get();
> + smp_call_function_single(cpu, measure_latency, &ipi_wakeup, 1);
> +}
> +
> +/* Timer based wakeup latencies */
> +struct timer_data {
> + unsigned int src_cpu;
> + u64 timeout;
> + ktime_t time_start;
> + ktime_t time_end;
> + struct hrtimer timer;
> + u64 timeout_diff_ns;
> +} timer_wakeup;
> +
> +static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
> +{
> + struct timer_data *w;
> + ktime_t time_diff;
> +
> + w = container_of(hrtimer, struct timer_data, timer);
> + w->time_end = ktime_get();
> +
> + time_diff = ktime_sub(w->time_end, w->time_start);
> + time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
> + w->timeout_diff_ns = ktime_to_ns(time_diff);
> + return HRTIMER_NORESTART;
> +}
> +
> +static void run_timer_test(unsigned int ns)
> +{
> + hrtimer_init(&timer_wakeup.timer, CLOCK_MONOTONIC,
> +  HRTIMER_MODE_REL);
> + timer_wakeup.timer.function = timer_called;
> + timer_wakeup.time_start = ktime_get();
> + timer_wakeup.src_cpu = smp_processor_id();
> + timer_wakeup.timeout = ns;
> +
> + hrtimer_start(&timer_wakeup.timer, ns_to_ktime(ns),
> +   HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static struct dentry *dir;
> +
> +static int cpu_read_op(void *data, u64 *value)
> +{
> + *value = ipi_wakeup.dest_cpu;
> + return 0;
> +}
> +
> +static int cpu_write_op(void *data, u64 value)
> +{
> + run_smp_call_function_test(value);
> + return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
> +
> +static int timeout_read_op(void *data, u64 *value)
> +{
> + *value = timer_wakeup.timeout;
> + re

Re: [PATCH 2/5] clk: x86: Change name from ST to FCH

2020-07-19 Thread Agrawal, Akshu




On 7/16/2020 6:12 AM, Stephen Boyd wrote:

Quoting Akshu Agrawal (2020-07-12 17:59:50)

diff --git a/drivers/clk/x86/clk-st.c b/drivers/clk/x86/clk-fch.c
similarity index 73%
rename from drivers/clk/x86/clk-st.c
rename to drivers/clk/x86/clk-fch.c
index 25d4b97aff9b..b252f0cf0628 100644
--- a/drivers/clk/x86/clk-st.c
+++ b/drivers/clk/x86/clk-fch.c
@@ -8,7 +8,7 @@
  #include 
  #include 
  #include 
-#include 
+#include 

Is this file moved somewhere?


Renaming it from ST to FCH in previous patch of the series.

Thanks,

Akshu

Re: [PATCH 4/5] clk: x86: Support RV architecture

2020-07-19 Thread Agrawal, Akshu




On 7/16/2020 6:33 AM, Stephen Boyd wrote:

Quoting Akshu Agrawal (2020-07-12 17:59:52)

There is minor difference between previous family of SoC and
the current one. Which is the there is only 48Mh fixed clk.
There is no mux and no option to select another freq as there in previous.

Signed-off-by: Akshu Agrawal 
---

I only see four out of five patches and there isn't a cover letter. I
have no idea if I can apply this change or if you're expecting me to ack
it. Please help make my life a little easier!


Numbering went wrong due to another unrelated patch.

Will send another with cover letter explaining the series.


  drivers/clk/x86/clk-fch.c | 55 ---
  1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/x86/clk-fch.c b/drivers/clk/x86/clk-fch.c
index b252f0cf0628..a8aac71a3b65 100644
--- a/drivers/clk/x86/clk-fch.c
+++ b/drivers/clk/x86/clk-fch.c
@@ -61,9 +78,17 @@ static int fch_clk_probe(struct platform_device *pdev)
  static int fch_clk_remove(struct platform_device *pdev)
  {
 int i;
+   struct fch_clk_data *fch_data;
+
+   fch_data = dev_get_platdata(&pdev->dev);
  
-   for (i = 0; i < ST_MAX_CLKS; i++)

-   clk_hw_unregister(hws[i]);
+   if (!fch_data->is_rv) {
+   for (i = 0; i < ST_MAX_CLKS; i++)
+   clk_hw_unregister(hws[i]);
+   } else {
+   for (i = 0; i < RV_MAX_CLKS; i++)
+   clk_hw_unregister(hws[i]);

Can ST_MAX_CLKS or RV_MAX_CLKS be a local variable and then the loop
consolidated.


Yes, making the change in next series.

Thanks,

Akshu

[PATCH v2 4/4] mm/gup: use a standard migration target allocation callback

2020-07-19 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback. Use it.

Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 54 ++
 1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4ba822a..628ca4c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,52 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, 
long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non CMA
-* allocation memory.
-*
-* Note that CMA region is prohibited by allocation scope.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
-
-   return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1668,6 +1622,10 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+   };
 
 check_again:
for (i = 0; i < nr_pages;) {
@@ -1713,8 +1671,8 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(&cma_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(&cma_page_list, alloc_migration_target, NULL,
+   (unsigned long)&mtc, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH v2 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

2020-07-19 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c requires to allocate the new page that is not
on the CMA area. new_non_cma_page() implements it by using allocation
scope APIs.

However, there is a work-around for hugetlb. Normal hugetlb page
allocation API for migration is alloc_huge_page_nodemask(). It consists
of two steps. First is dequeing from the pool. Second is, if there is no
available page on the queue, allocating by using the page allocator.

new_non_cma_page() can't use this API since first step (deque) isn't
aware of scope API to exclude CMA area. So, new_non_cma_page() exports
hugetlb internal function for the second step, alloc_migrate_huge_page(),
to global scope and uses it directly. This is suboptimal since hugetlb
pages on the queue cannot be utilized.

This patch tries to fix this situation by making the deque function on
hugetlb CMA aware. In the deque function, CMA memory is skipped if
PF_MEMALLOC_NOCMA flag is found.

Acked-by: Mike Kravetz 
Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  6 +-
 mm/hugetlb.c| 11 +--
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2660b04..fb2b5aa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int 
preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
diff --git a/mm/gup.c b/mm/gup.c
index bbd36a1..4ba822a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1634,11 +1634,7 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct hstate *h = page_hstate(page);
 
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..d9eb923 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 {
struct page *page;
+   bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
+
+   list_for_each_entry(page, &h->hugepage_freelists[nid], lru) {
+   if (nocma && is_migrate_cma_page(page))
+   continue;
 
-   list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 int nid, nodemask_t *nmask)
 {
struct page *page;
-- 
2.7.4

[PATCH v2 1/4] mm/page_alloc: fix non cma alloc context

2020-07-19 Thread js1304

From: Joonsoo Kim 

Currently, preventing cma area in page allocation is implemented by using
current_gfp_context(). However, there are two problems of this
implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath.
Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
cma area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding cma area in allocation.

Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc)
Cc: 
Signed-off-by: Joonsoo Kim 
---
 include/linux/sched/mm.h |  8 +---
 mm/page_alloc.c  | 37 -
 2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 44ad5b7..6c652ec 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -175,14 +175,12 @@ static inline bool in_vfork(struct task_struct *tsk)
  * Applies per-task gfp context to the given allocation flags.
  * PF_MEMALLOC_NOIO implies GFP_NOIO
  * PF_MEMALLOC_NOFS implies GFP_NOFS
- * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
  */
 static inline gfp_t current_gfp_context(gfp_t flags)
 {
unsigned int pflags = READ_ONCE(current->flags);
 
-   if (unlikely(pflags &
-(PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | 
PF_MEMALLOC_NOCMA))) {
+   if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS))) {
/*
 * NOIO implies both NOIO and NOFS and it is a weaker context
 * so always make sure it makes precedence
@@ -191,10 +189,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
flags &= ~(__GFP_IO | __GFP_FS);
else if (pflags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
-#ifdef CONFIG_CMA
-   if (pflags & PF_MEMALLOC_NOCMA)
-   flags &= ~__GFP_MOVABLE;
-#endif
}
return flags;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6416d08..b529220 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 * allocating from CMA when over half of the zone's free memory
 * is in the CMA area.
 */
-   if (migratetype == MIGRATE_MOVABLE &&
+   if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
@@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
-   if (migratetype == MIGRATE_MOVABLE)
+   if (alloc_flags & ALLOC_CMA)
page = __rmqueue_cma_fallback(zone, order);
 
if (!page && __rmqueue_fallback(zone, order, migratetype,
@@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct 
zone *z,
if (likely(!alloc_harder))
unusable_free += z->nr_reserved_highatomic;
 
-#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
-   if (!(alloc_flags & ALLOC_CMA))
+   if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
return unusable_free;
 }
@@ -3693,6 +3691,20 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
return alloc_flags;
 }
 
+static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
+   unsigned int alloc_flags)
+{
+#ifdef CONFIG_CMA
+   unsigned int pflags = current->flags;
+
+   if (!(pflags & PF_MEMALLOC_NOCMA) &&
+   gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+   alloc_flags |= ALLOC_CMA;
+
+#endif
+   return alloc_flags;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -4339,10 +4351,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-   if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+   alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+
return alloc_flags;
 }
 
@@ -4642,8 +4652,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
wake_all_kswapds(order, gfp_mask, ac);
 
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
-   if (reserve_flag

[PATCH v2 2/4] mm/gup: restrict CMA region by using allocation scope API

2020-07-19 Thread js1304

From: Joonsoo Kim 

We have well defined scope API to exclude CMA region.
Use it rather than manipulating gfp_mask manually. With this change,
we can now restore __GFP_MOVABLE for gfp_mask like as usual migration
target allocation. It would result in that the ZONE_MOVABLE is also
searched by page allocator. For hugetlb, gfp_mask is redefined since
it has a regular allocation mask filter for migration target.
__GPF_NOWARN is added to hugetlb gfp_mask filter since a new user for
gfp_mask filter, gup, want to be silent when allocation fails.

Note that this can be considered as a fix for the commit 9a4e9f3b2d73
("mm: update get_user_pages_longterm to migrate pages allocated from
CMA region"). However, "Fixes" tag isn't added here since it is just
suboptimal but it doesn't cause any problem.

Suggested-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 ++
 mm/gup.c| 17 -
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b9508d..2660b04 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate 
*h, gfp_t gfp_mask)
/* Some callers might want to enfoce node */
modified_mask |= (gfp_mask & __GFP_THISNODE);
 
+   modified_mask |= (gfp_mask & __GFP_NOWARN);
+
return modified_mask;
 }
 
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..bbd36a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1619,10 +1619,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 * Trying to allocate a page for migration. Ignore allocation
 * failure warnings. We don't force __GFP_THISNODE here because
 * this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
+* in some case these nodes will have really less non CMA
 * allocation memory.
+*
+* Note that CMA region is prohibited by allocation scope.
 */
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
 
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1630,6 +1632,8 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
+   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
@@ -1644,11 +1648,6 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 */
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
 
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
if (!thp)
return NULL;
@@ -1794,7 +1793,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
 vmas_tmp, NULL, gup_flags);
 
if (gup_flags & FOLL_LONGTERM) {
-   memalloc_nocma_restore(flags);
if (rc < 0)
goto out;
 
@@ -1807,9 +1805,10 @@ static long __gup_longterm_locked(struct task_struct 
*tsk,
 
rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
 vmas_tmp, gup_flags);
+out:
+   memalloc_nocma_restore(flags);
}
 
-out:
if (vmas_tmp != vmas)
kfree(vmas_tmp);
return rc;
-- 
2.7.4

1 2 3 4 5 6 >

1 - 100 of 579 matches

Mail list logo