Re: [PATCH v3 1/5] leds: lm3532: Fix brightness control for i2c mode

2019-08-26 Thread Tony Lindgren
* Pavel Machek  [190826 22:14]:
> On Mon 2019-08-26 14:58:22, Tony Lindgren wrote:
> > Hi,
> > 
> > * Dan Murphy  [190820 19:53]:
> > > Fix the brightness control for I2C mode.  Instead of
> > > changing the full scale current register update the ALS target
> > > register for the appropriate banks.
> > > 
> > > In addition clean up some code errors and random misspellings found
> > > during coding.
> > > 
> > > Tested on Droid4 as well as LM3532 EVM connected to a BeagleBoneBlack
> > > 
> > > Fixes: e37a7f8d77e1 ("leds: lm3532: Introduce the lm3532 LED driver")
> > > Reported-by: Pavel Machek 
> > > Signed-off-by: Dan Murphy 
> > > ---
> > > 
> > > v3 - Removed register define updates - 
> > > https://lore.kernel.org/patchwork/patch/1114542/
> > 
> > Looks like starting with this patch in Linux next the LCD on droid4
> > is so dim it's unreadable even with brightness set to 255. Setting
> > brightness to 0 does blank it completely though.
> > 
> > Did something maybe break with the various patch revisions or are
> > we now missing some dts patch?
> 
> Maybe missing dts patch. We should provide maximum current the LED can
> handle... 

Or i2c control is somehow broken and only als control now works?

Regards,

Tony



Re: [PATCH 00/12] libperf: Add events to perf/event.h

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 26, 2019 at 07:14:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Aug 26, 2019 at 06:58:52PM +0200, Jiri Olsa escreveu:
> > On Mon, Aug 26, 2019 at 01:18:49PM -0300, Arnaldo Carvalho de Melo wrote:
> > 
> > SNIP
> > 
> > > [perfbuilder@490c2c7bdaab ~]$ grep 'printf("lost' 
> > > /tmp/build/perf/builtin-sched.i
> > >  printf("lost %" "l" "ll""u" " events on cpu %d\n", event->lost.lost, 
> > > sample->cpu);
> > > [perfbuilder@490c2c7bdaab ~]$
> > > 
> > > And if we do this on a fedora:30 x86_64:
> > > 
> > > $ make -C tools/perf O=/tmp/build/perf /tmp/build/perf/builtin-sched.i
> > > [acme@quaco perf]$ grep -A4 'printf("lost' /tmp/build/perf/builtin-sched.i
> > >  printf("lost %" "l" 
> > > # 2646 "builtin-sched.c" 3 4
> > > "l" "u" 
> > > # 2646 "builtin-sched.c"
> > >  " events on cpu %d\n", event->lost.lost, 
> > > sample->cpu);
> > > [acme@quaco perf]$
> > > 
> > > I.e. on 32-bit arches we shouldn't add that extra "l", right?
> > 
> > hum, I guess we could #ifdef it 64/32 bits
> 
> I tried to figure out how to fix this better, but the int-ll64.h versus
> int-l64.h versus how __u64 is defined got me confused and I ended up
> with:
> 
> #if __WORDSIZE == 64

Make that:

#ifdef __LP64__ to build on Alpine/musl libc.

- Arnaldo

> /*
>  * /usr/include/inttypes.h uses just 'lu' for PRIu64, but we end up defining
>  * __u64 as long long unsigned int, and then -Werror=format= kicks in and
>  * complains of the mismatched types, so use these two special extra PRI
>  * macros to overcome that.
>  */
> #define PRI_lu64 "l" PRIu64
> #define PRI_lx64 "l" PRIx64
> #else
> #define PRI_lu64 PRIu64
> #define PRI_lx64 PRIx64
> #endif
> 
> Builds in all the containers I have, 32-bit, 64-bit, old gccs/clangs,
> new ones, uclibc, musl libc, glibc, etc
>  
> > > 
> > > I bet the build for the mips/mipsel will fail too, lemme see... Yeah,
> > > both failed:
> > > 
> > > 
> >> [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
> >> dm.log/debian\:experimental-x-mips
> > > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > > format [-Werror=format=]
> > >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > > sample->cpu);
> > >  ^~~~
> > > In file included from builtin-sched.c:31:
> > > /usr/mips-linux-gnu/include/inttypes.h:47:28: note: format string is 
> > > defined here
> > >  #  define __PRI64_PREFIX "ll"
> > > ^
> > > [root@quaco ~]#
> > > 
> > > [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
> > > dm.log/debian\:experimental-x-mipsel
> > > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > > format [-Werror=format=]
> > >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > > sample->cpu);
> > >  ^~~~
> > > In file included from builtin-sched.c:31:
> > > /usr/mipsel-linux-gnu/include/inttypes.h:47:28: note: format string is 
> > > defined here
> > >  #  define __PRI64_PREFIX "ll"
> > > ^
> > > [root@quaco ~]#
> > > 
> > > And also on a uclibc ARC arch container:
> > > 
> > > [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
> > > dm.log/fedora\:24-x-ARC-uClibc
> > > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > > format [-Werror=format=]
> > >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > > sample->cpu);
> > >  ^~~~
> > > In file included from builtin-sched.c:31:0:
> > > /arc_gnu_2017.09-rc2_prebuilt_uclibc_le_arc700_linux_install/arc-snps-linux-uclibc/sysroot/usr/include/inttypes.h:47:28:
> > >  note: format string is defined here
> > >  #  define __PRI64_PREFIX "ll"
> > > ^
> > > [root@quaco ~]#
> > > 
> > > The _fix_ will come after lunch :)
> > 
> > thanks ;-)
> > 
> > jirka
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo


Re: [PATCH v2 2/5] soc: amlogic: Add support for Everything-Else power domains controller

2019-08-26 Thread Martin Blumenstingl
Hi Neil,

On Mon, Aug 26, 2019 at 10:10 AM Neil Armstrong  wrote:
>
> On 25/08/2019 23:10, Martin Blumenstingl wrote:
> > Hi Neil,
> >
> > thank you for this update
> > I haven't tried this on the 32-bit SoCs yet, but I am confident that I
> > can make it work by "just" adding the SoC specific bits!
> >
> > On Fri, Aug 23, 2019 at 11:06 AM Neil Armstrong  
> > wrote:
> > [...]
> >> +/* AO Offsets */
> >> +
> >> +#define AO_RTI_GEN_PWR_SLEEP0  (0x3a << 2)
> >> +#define AO_RTI_GEN_PWR_ISO0(0x3b << 2)
> >> +
> >> +/* HHI Offsets */
> >> +
> >> +#define HHI_MEM_PD_REG0(0x40 << 2)
> >> +#define HHI_VPU_MEM_PD_REG0(0x41 << 2)
> >> +#define HHI_VPU_MEM_PD_REG1(0x42 << 2)
> >> +#define HHI_VPU_MEM_PD_REG3(0x43 << 2)
> >> +#define HHI_VPU_MEM_PD_REG4(0x44 << 2)
> >> +#define HHI_AUDIO_MEM_PD_REG0  (0x45 << 2)
> >> +#define HHI_NANOQ_MEM_PD_REG0  (0x46 << 2)
> >> +#define HHI_NANOQ_MEM_PD_REG1  (0x47 << 2)
> >> +#define HHI_VPU_MEM_PD_REG2(0x4d << 2)
> > should we switch to the actual register offsets like we did in the
> > clock drivers?
>
> I find it simpler to refer to the numbers in the documentation...
OK, I have no strong preference here
for the 32-bit SoCs I will need to use the offsets based on the
"amlogic,meson8b-pmu", "syscon" [0], so these will be magic anyways

[...]
> >> +#define VPU_HHI_MEMPD(__reg)   \
> >> +   { __reg, BIT(8) },  \
> >> +   { __reg, BIT(9) },  \
> >> +   { __reg, BIT(10) }, \
> >> +   { __reg, BIT(11) }, \
> >> +   { __reg, BIT(12) }, \
> >> +   { __reg, BIT(13) }, \
> >> +   { __reg, BIT(14) }, \
> >> +   { __reg, BIT(15) }
> > the Amlogic implementation from buildroot-openlinux-A113-201901 (the
> > latest one I have)
> > kernel/aml-4.9/drivers/amlogic/media/vout/hdmitx/hdmi_tx_20/hw/hdmi_tx_hw.c
> > uses:
> > hd_set_reg_bits(P_HHI_MEM_PD_REG0, 0, 8, 8)
> > that basically translates to: GENMASK(15, 8) (which means we could
> > drop this macro)
> >
> > the datasheet also states: 15~8 [...] HDMI memory PD (as a single
> > 8-bit wide register)
>
> Yep, but the actual code setting the VPU power domain is in u-boot :
>
> drivers/vpu/aml_vpu_power_init.c:
> 108 for (i = 8; i < 16; i++) {
> 109 vpu_hiu_setb(HHI_MEM_PD_REG0, 0, i, 1);
> 110 udelay(5);
> 111 }
>
> the linux code is like never used here, my preference goes to the u-boot code
> implementation.
I see, let's keep your implementation then

> >
> > [...]
> >> +static struct meson_ee_pwrc_domain_desc g12a_pwrc_domains[] = {
> >> +   [PWRC_G12A_VPU_ID]  = VPU_PD("VPU", _pwrc_vpu, 
> >> g12a_pwrc_mem_vpu,
> >> +pwrc_ee_get_power, 11, 2),
> >> +   [PWRC_G12A_ETH_ID] = MEM_PD("ETH", g12a_pwrc_mem_eth),
> >> +};
> >> +
> >> +static struct meson_ee_pwrc_domain_desc sm1_pwrc_domains[] = {
> >> +   [PWRC_SM1_VPU_ID]  = VPU_PD("VPU", _pwrc_vpu, sm1_pwrc_mem_vpu,
> >> +   pwrc_ee_get_power, 11, 2),
> >> +   [PWRC_SM1_NNA_ID]  = TOP_PD("NNA", _pwrc_nna, sm1_pwrc_mem_nna,
> >> +   pwrc_ee_get_power),
> >> +   [PWRC_SM1_USB_ID]  = TOP_PD("USB", _pwrc_usb, sm1_pwrc_mem_usb,
> >> +   pwrc_ee_get_power),
> >> +   [PWRC_SM1_PCIE_ID] = TOP_PD("PCI", _pwrc_pci, 
> >> sm1_pwrc_mem_pcie,
> >> +   pwrc_ee_get_power),
> >> +   [PWRC_SM1_GE2D_ID] = TOP_PD("GE2D", _pwrc_ge2d, 
> >> sm1_pwrc_mem_ge2d,
> >> +   pwrc_ee_get_power),
> >> +   [PWRC_SM1_AUDIO_ID] = MEM_PD("AUDIO", sm1_pwrc_mem_audio),
> >> +   [PWRC_SM1_ETH_ID] = MEM_PD("ETH", g12a_pwrc_mem_eth),
> >> +};
> > my impression: I find this hard to read as it merges the TOP and
> > Memory PD domains from above, adding some seemingly random "11, 2" for
> > the VPU PD as well as pwrc_ee_get_power for some of the power domains
> > personally I like the way we describe clk_regmap because it's easy to
> > read (even though it adds a bit of boilerplate). I'm not sure if we
> > can make it work here, but this (not compile tested) is what I have in
> > mind (I chose two random power domains):
> >   [PWRC_SM1_VPU_ID]  = {
> > .name = "VPU",
> > .top_pd = SM1_EE_PD(8),
> > .mem_pds = {
> > VPU_MEMPD(HHI_VPU_MEM_PD_REG0),
> > VPU_MEMPD(HHI_VPU_MEM_PD_REG1),
> > VPU_MEMPD(HHI_VPU_MEM_PD_REG2),
> > VPU_MEMPD(HHI_VPU_MEM_PD_REG3),
> > { HHI_VPU_MEM_PD_REG4, GENMASK(1, 0) },
> > { HHI_VPU_MEM_PD_REG4, GENMASK(3, 2) },
> > { 

[GIT PULL] timers drivers v5.4

2019-08-26 Thread Daniel Lezcano


Hi Thomas,

The following changes since commit 3e2d94535adb2df15f3907e4b4c7cd8a5a4c2b5a:

  clocksource/drivers/hyperv: Enable TSC page clocksource on 32bit
(2019-08-23 16:59:54 +0200)

are available in the Git repository at:

  https://git.linaro.org/people/daniel.lezcano/linux.git tags/timers-v5.4

for you to fetch changes up to 19d608458f4f3bb3a1f89bd7e4814c3fd30dbec7:

  clocksource/drivers/sh_cmt: Document "cmt-48" as deprecated
(2019-08-27 00:31:39 +0200)


- Remove dev_err() when used with platform_get_irq (Stephen Boyd)

- Add DT binding and new compatible for Allwinner sun4i (Maxime Ripard)

- Register the Atmel tcb clocksource for delays (Alexandre Belloni)

- Add a clock divider for the Freescale imx platforms and new timer node
  in the DT (Anson Huang)

- Use DIV_ROUND_CLOSEST macro for the Renesas OSTM (Geert Uytterhoeven)

- Fix GENMASK and timer operation for the npcm timer (Avi Fishman)

- Fix timer-of showing an error message when EPROBE_DEFER is
  returned (Jon Hunter)

- Add new SoC DT binding and match for Renesas timers (Magnus Damm)


Alexandre Belloni (1):
  clocksource/drivers/tcb_clksrc: Register delay timer

Anson Huang (3):
  clocksource/drivers/imx-sysctr: Add internal clock divider handle
  arm64: dts: imx8mm: Add system counter node
  arm64: dts: imx8mq: Add system counter node

Avi Fishman (1):
  clocksource/drivers/npcm: Fix GENMASK and timer operation

Geert Uytterhoeven (1):
  clocksource/drivers/renesas-ostm: Use DIV_ROUND_CLOSEST() helper

Jon Hunter (2):
  clocksource/drivers/timer-of: Do not warn on deferred probe
  clocksource/drivers: Do not warn on probe defer

Magnus Damm (7):
  dt-bindings: timer: renesas, cmt: Add CMT0234 to sh73a0 and r8a7740
  dt-bindings: timer: renesas, cmt: Update CMT1 on sh73a0 and r8a7740
  dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a7792
  dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a77995
  dt-bindings: timer: renesas, cmt: Update R-Car Gen3 CMT1 usage
  clocksource/drivers/sh_cmt: r8a7740 and sh73a0 SoC-specific match
  clocksource/drivers/sh_cmt: Document "cmt-48" as deprecated

Maxime Ripard (4):
  dt-bindings: timer: Convert Allwinner A10 Timer to a schema
  dt-bindings: timer: Add missing compatibles
  clocksource: sun4i: Add missing compatibles
  dt-bindings: timer: Convert Allwinner A13 HSTimer to a schema

Stephen Boyd (1):
  clocksource: Remove dev_err() usage after platform_get_irq()

 .../bindings/timer/allwinner,sun4i-a10-timer.yaml  | 102
+
 .../bindings/timer/allwinner,sun4i-timer.txt   |  19 
 .../bindings/timer/allwinner,sun5i-a13-hstimer.txt |  26 --
 .../timer/allwinner,sun5i-a13-hstimer.yaml |  79 
 .../devicetree/bindings/timer/renesas,cmt.txt  |  40 
 arch/arm64/boot/dts/freescale/imx8mm.dtsi  |   8 ++
 arch/arm64/boot/dts/freescale/imx8mq.dtsi  |   8 ++
 drivers/clocksource/Kconfig|   2 +-
 drivers/clocksource/em_sti.c   |   4 +-
 drivers/clocksource/renesas-ostm.c |   2 +-
 drivers/clocksource/sh_cmt.c   |  19 +++-
 drivers/clocksource/sh_tmu.c   |   5 +-
 drivers/clocksource/timer-atmel-tcb.c  |  18 
 drivers/clocksource/timer-imx-sysctr.c |   5 +
 drivers/clocksource/timer-npcm7xx.c|   9 +-
 drivers/clocksource/timer-of.c |   6 +-
 drivers/clocksource/timer-probe.c  |   4 +-
 drivers/clocksource/timer-sun4i.c  |   4 +
 18 files changed, 275 insertions(+), 85 deletions(-)
 create mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
 delete mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt
 delete mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt
 create mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml

-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH RESEND 0/8] Fix mmap base in bottom-up mmap

2019-08-26 Thread Helge Deller

On 26.08.19 09:34, Alexandre Ghiti wrote:

On 6/20/19 7:03 AM, Alexandre Ghiti wrote:

This series fixes the fallback of the top-down mmap: in case of
failure, a bottom-up scheme can be tried as a last resort between
the top-down mmap base and the stack, hoping for a large unused stack
limit.

Lots of architectures and even mm code start this fallback
at TASK_UNMAPPED_BASE, which is useless since the top-down scheme
already failed on the whole address space: instead, simply use
mmap_base.

Along the way, it allows to get rid of of mmap_legacy_base and
mmap_compat_legacy_base from mm_struct.

Note that arm and mips already implement this behaviour.

Alexandre Ghiti (8):
   s390: Start fallback of top-down mmap at mm->mmap_base
   sh: Start fallback of top-down mmap at mm->mmap_base
   sparc: Start fallback of top-down mmap at mm->mmap_base
   x86, hugetlbpage: Start fallback of top-down mmap at mm->mmap_base
   mm: Start fallback top-down mmap at mm->mmap_base
   parisc: Use mmap_base, not mmap_legacy_base, as low_limit for
 bottom-up mmap
   x86: Use mmap_*base, not mmap_*legacy_base, as low_limit for bottom-up
 mmap
   mm: Remove mmap_legacy_base and mmap_compat_legacy_code fields from
 mm_struct

  arch/parisc/kernel/sys_parisc.c  |  8 +++-
  arch/s390/mm/mmap.c  |  2 +-
  arch/sh/mm/mmap.c    |  2 +-
  arch/sparc/kernel/sys_sparc_64.c |  2 +-
  arch/sparc/mm/hugetlbpage.c  |  2 +-
  arch/x86/include/asm/elf.h   |  2 +-
  arch/x86/kernel/sys_x86_64.c |  4 ++--
  arch/x86/mm/hugetlbpage.c    |  7 ---
  arch/x86/mm/mmap.c   | 20 +---
  include/linux/mm_types.h |  2 --
  mm/debug.c   |  4 ++--
  mm/mmap.c    |  2 +-
  12 files changed, 26 insertions(+), 31 deletions(-)



Any thoughts about that series ? As said before, this is just a preparatory 
patchset in order to
merge x86 mmap top down code with the generic version.


I just tested your patch series successfully on the parisc
architeture. You may add:

Tested-by: Helge Deller  # parisc

Thanks!
Helge


Re: numlist_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation

2019-08-26 Thread John Ogness
Hi Petr,

AndreaP responded with some explanation (and great links!) on the topic
of READ_ONCE. But I feel like your comments about the WRITE_ONCE were
not addressed. I address that (and your other comments) below...

On 2019-08-23, Petr Mladek  wrote:
>> --- /dev/null
>> +++ b/kernel/printk/numlist.c
>> +/**
>> + * numlist_push() - Add a node to the list and assign it a sequence number.
>> + *
>> + * @nl: The numbered list to push to.
>> + *
>> + * @n:  A node to push to the numbered list.
>> + *  The node must not already be part of a list.
>> + *
>> + * @id: The ID of the node.
>> + *
>> + * A node is added in two steps: The first step is to make this node the
>> + * head, which causes a following push to add to this node. The second step 
>> is
>> + * to update @next_id of the former head node to point to this one, which
>> + * makes this node visible to any task that sees the former head node.
>> + */
>> +void numlist_push(struct numlist *nl, struct nl_node *n, unsigned long id)
>> +{
>> +unsigned long head_id;
>> +unsigned long seq;
>> +unsigned long r;
>> +
>> +/*
>> + * bA:
>> + *
>> + * Setup the node to be a list terminator: next_id == id.
>> + */
>> +WRITE_ONCE(n->next_id, id);
>
> Do we need WRITE_ONCE() here?
> Both "n" and "id" are given as parameters and do not change.
> The assigment must be done before "id" is set as nl->head_id.
> The ordering is enforced by cmpxchg_release().

The cmpxchg_release() ensures that if the node is visible to writers,
then the finalized assignment is also visible. And the store_release()
ensures that if the previous node is visible to any readers, then the
finalized assignment is also visible. In the reader case, if any readers
happen to be sitting on the node, numlist_read() will fail because the
ID was updated when the node was popped. So for all these cases any
compiler optimizations leading to that assigment (tearing, speculation,
etc) should be irrelevant. Therefore, IMO the WRITE_ONCE() is not
needed.

Since all of this is lockless, I used WRITE_ONCE() whenever touching
shared variables. I must admit the decision may be motivated primarily
by fear of compiler optimizations. Although "documenting lockless shared
variable access" did play a role as well.

I will replace the WRITE_ONCE with an assignment.

>> +
>> +/* bB: #1 */
>> +head_id = atomic_long_read(>head_id);
>> +
>> +for (;;) {
>> +/* bC: */
>> +while (!numlist_read(nl, head_id, , NULL)) {
>> +/*
>> + * @head_id is invalid. Try again with an
>> + * updated value.
>> + */
>> +
>> +cpu_relax();
>
> I have got very confused by this. cpu_relax() suggests that this
> cycle is busy waiting until a particular node becomes valid.
> My first though was that it must cause deadlock in NMI when
> the interrupted code is supposed to make the node valid.
>
> But it is the other way. The head is always valid when it is
> added to the list. It might become invalid when another CPU
> moves the head and the old one gets reused.
>
> Anyway, I do not see any reason for cpu_relax() here.

You are correct. The cpu_relax() should not be there. But there is still
an issue that this could spin hard if the head was recycled and this CPU
does not yet see the new head value.

To handle that, and in preparation for my next version, I'm now using a
read_acquire() to load the ID in the node() callback (matching the
set_release() in assign_desc()). This ensures that if numlist_read()
fails, the new head will be visible.

> Also the entire cycle would deserve a comment to avoid this mistake.
> For example:
>
>   /*
>* bC: Read seq from current head. Repeat with new
>* head when it has changed and the old one got reused.
>*/

Agreed.

>> +
>> +/* bB: #2 */
>> +head_id = atomic_long_read(>head_id);
>> +}
>> +
>> +/*
>> + * bD:
>> + *
>> + * Set @seq to +1 of @seq from the previous head.
>> + *
>> + * Memory barrier involvement:
>> + *
>> + * If bB reads from bE, then bC->aA reads from bD.
>> + *
>> + * Relies on:
>> + *
>> + * RELEASE from bD to bE
>> + *matching
>> + * ADDRESS DEP. from bB to bC->aA
>> + */
>> +WRITE_ONCE(n->seq, seq + 1);
>
> Do we really need WRITE_ONCE() here? 
> It is the same problem as with setting n->next_id above.

For the same reasons as the other WRITE_ONCE, I will replace the
WRITE_ONCE with an assignment.

>> +
>> +/*
>> + * bE:
>> + *
>> + * This store_release() guarantees that @seq and @next are
>> + * stored before the node with @id is visible to any popping
>> +  

Re: Kernel 5.2.8 - au0828 - Tuner Is Busy

2019-08-26 Thread shuah

On 8/20/19 7:56 AM, shuah wrote:

On 8/20/19 12:58 AM, Nathan Royce wrote:

While your mention of quirks-table.h certainly had possibilities, I'm
afraid adding the "AU0828_DEVICE(0x05e1, 0x0400, "Hauppauge",
"Woodbury")," entry for my tuner did not make any difference regarding
the "Tuner is busy. Error -19" message.

I don't know if this means anything, but I see
https://patchwork.kernel.org/patch/97726/ from 2010 which contains
changes for the 0x0400 model. I guess it never got pulled in.

Really, it's fine for me just to hang back at v5.1 for a year or two
until ATSC 3.0 USB tuners come out at a reasonable price.



Hi Nathan,

The tuner busy error code is ENODEV. It appears some devices aren't
created on your system. Would it be possible for you to send me your
config and a complete dmesg.

I am curious if /dev/media0 or /dev/media1 present on your system.
Not having this could explain the ENODEV you are seeing.



Thanks for sending the dmesg and config. The difference between the
two config is you have CONFIG_MEDIA_CONTROLLER_DVB set in the second
one. This is expected because this is enabled in 5.2 with the changes
to share resources.

grep MEDIA_CONTROLLER config5115.txt
CONFIG_MEDIA_CONTROLLER=y
# CONFIG_MEDIA_CONTROLLER_DVB is not set
# CONFIG_MEDIA_CONTROLLER_REQUEST_API is not set

grep MEDIA_CONTROLLER config529.txt
CONFIG_MEDIA_CONTROLLER=y
CONFIG_MEDIA_CONTROLLER_DVB=y
# CONFIG_MEDIA_CONTROLLER_REQUEST_API is not set
CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER=y

A new code path in DVB is enabled in 5.2 compared to 5.1. What we are
seeing is somehow the DVB media graph is incomplete. When the enable
source tries to find the media device that corresponds to the fe entity
it can't find it and hence the -ENODEV you are seeing.

I would be curious to see what happens if you have disable
CONFIG_MEDIA_CONTROLLER

I think the problem is in dvb media graph creation on your device and
unfortunately, I don't have the device to debug the problem.

Will you be able run media-ctl --print-dot on your system and send
me the media graph. You can find media-ctl tool on

https://git.linuxtv.org/v4l-utils.git/

thanks,
-- Shuah


Re: [GIT PULL] timers drivers v5.5

2019-08-26 Thread Daniel Lezcano
On 26/08/2019 22:59, Thomas Gleixner wrote:
> On Mon, 26 Aug 2019, Daniel Lezcano wrote:
> 
>> The following changes since commit 08a3c192c93f4359a94bf47971e55b0324b72b8b:
>>
>>   posix-timers: Prepare for PREEMPT_RT (2019-08-01 20:51:25 +0200)
>>
>> are available in the Git repository at:
>>
>>   https://git.linaro.org/people/daniel.lezcano/linux.git tags/timers-v5.5
> 
> 5.5 - that's for next year's first kernel so I'll put it into the fridge for 
> now.

Let me cook another tag :)


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH v1 net-next] net: stmmac: Add support for MDIO interrupts

2019-08-26 Thread Florian Fainelli
On 8/26/19 11:47 AM, Andrew Lunn wrote:
> On Tue, Aug 27, 2019 at 09:45:20AM +0800, Voon Weifeng wrote:
>> From: "Chuah, Kim Tatt" 
>>
>> DW EQoS v5.xx controllers added capability for interrupt generation
>> when MDIO interface is done (GMII Busy bit is cleared).
>> This patch adds support for this interrupt on supported HW to avoid
>> polling on GMII Busy bit.
>>
>> stmmac_mdio_read() & stmmac_mdio_write() will sleep until wake_up() is
>> called by the interrupt handler.
> 
> Hi Voon
> 
> I _think_ there are some order of operation issues here. The mdiobus
> is registered in the probe function. As soon as of_mdiobus_register()
> is called, the MDIO bus must work. At that point MDIO read/writes can
> start to happen.
> 
> As far as i can see, the interrupt handler is only requested in
> stmmac_open(). So it seems like any MDIO operations after probe, but
> before open are going to fail?

AFAIR, wait_event_timeout() will continue to busy loop and wait until
the timeout, but not return an error because the polled condition was
true, at least that is my recollection from having the same issue with
the bcmgenet driver before it was moved to connecting to the PHY in the
ndo_open() function.
-- 
Florian


[PATCH v2] ACPI / CPPC: do not require the _PSD method when using CPPC

2019-08-26 Thread Al Stone
According to the ACPI 6.3 specification, the _PSD method is optional
when using CPPC.  The underlying assumption is that each CPU can change
frequency independently from all other CPUs; _PSD is provided to tell
the OS that some processors can NOT do that.

However, the acpi_get_psd() function returns ENODEV if there is no _PSD
method present, or an ACPI error status if an error occurs when evaluating
_PSD, if present.  This makes _PSD mandatory when using CPPC, in violation
of the specification, and only on Linux.

This has forced some firmware writers to provide a dummy _PSD, even though
it is irrelevant, but only because Linux requires it; other OSPMs follow
the spec.  We really do not want to have OS specific ACPI tables, though.

So, correct acpi_get_psd() so that it does not return an error if there
is no _PSD method present, but does return a failure when the method can
not be executed properly.  This allows _PSD to be optional as it should
be.

v2:
   -- verified simple check for AE_NOT_FOUND was sufficient
   -- simplified return status check per Rafael's suggestion

Signed-off-by: Al Stone 
Cc: Rafael J. Wysocki 
Cc: Len Brown 
---
 drivers/acpi/cppc_acpi.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 15f103d7532b..7a946f1944ab 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -365,10 +365,12 @@ static int acpi_get_psd(struct cpc_desc *cpc_ptr, 
acpi_handle handle)
union acpi_object  *psd = NULL;
struct acpi_psd_package *pdomain;
 
-   status = acpi_evaluate_object_typed(handle, "_PSD", NULL, ,
-   ACPI_TYPE_PACKAGE);
-   if (ACPI_FAILURE(status))
-   return -ENODEV;
+   if (acpi_has_method(handle, "_PSD")) {
+   status = acpi_evaluate_object_typed(handle, "_PSD", NULL,
+   , ACPI_TYPE_PACKAGE);
+   if (status == AE_NOT_FOUND) /* _PSD is optional */
+   return 0;
+   }
 
psd = buffer.pointer;
if (!psd || psd->package.count != 1) {
-- 
2.21.0



Re: [patch V2 33/38] posix-cpu-timers: Consolidate timer expiry further

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:20PM +0200, Thomas Gleixner wrote:
> With the array based samples and expiry cache, the expiry function can use
> a loop to collect timers from the clock specific lists.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code

2019-08-26 Thread Thomas Gleixner
On Mon, 26 Aug 2019, Cristian Marussi wrote:
> On 8/26/19 4:32 PM, Christoph Hellwig wrote:
> > > +config ARCH_USE_COMMON_SMP_STOP
> > > + def_bool y if SMP
> > 
> > The option belongs into common code and the arch code shoud only
> > select it.
> > 
> 
> In fact that was my first approach, but then I noticed that in kernel/ topdir
> there was no generic Kconfig but only subsystem specific ones:
> 
> Kconfig.freezer  Kconfig.hz   Kconfig.locksKconfig.preempt

arch/Kconfig

Thanks,

tglx


Re: [PATCH v2 3/3] remoteproc: ingenic: Added remoteproc driver

2019-08-26 Thread Bjorn Andersson
On Mon 29 Jul 11:31 PDT 2019, Paul Cercueil wrote:

> This driver is used to boot, communicate with and load firmwares to the
> MIPS co-processor found in the VPU hardware of the JZ47xx SoCs from
> Ingenic.
> 
> Signed-off-by: Paul Cercueil 
> ---
> 
> Notes:
> v2: Remove exception for always-mapped memories
> 
>  drivers/remoteproc/Kconfig |   8 +
>  drivers/remoteproc/Makefile|   1 +
>  drivers/remoteproc/ingenic_rproc.c | 285 +
>  3 files changed, 294 insertions(+)
>  create mode 100644 drivers/remoteproc/ingenic_rproc.c
> 
> diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
> index 28ed306982f7..a0be40e2098d 100644
> --- a/drivers/remoteproc/Kconfig
> +++ b/drivers/remoteproc/Kconfig
> @@ -214,6 +214,14 @@ config STM32_RPROC
>  
> This can be either built-in or a loadable module.
>  
> +config INGENIC_RPROC
> + tristate "Ingenic JZ47xx VPU remoteproc support"
> + depends on MIPS || COMPILE_TEST
> + help
> +   Say y or m here to support the VPU in the JZ47xx SoCs from Ingenic.
> +   This can be either built-in or a loadable module.
> +   If unsure say N.
> +
>  endif # REMOTEPROC
>  
>  endmenu
> diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
> index 00f09e658cb3..6eb0137abbc7 100644
> --- a/drivers/remoteproc/Makefile
> +++ b/drivers/remoteproc/Makefile
> @@ -10,6 +10,7 @@ remoteproc-y+= 
> remoteproc_sysfs.o
>  remoteproc-y += remoteproc_virtio.o
>  remoteproc-y += remoteproc_elf_loader.o
>  obj-$(CONFIG_IMX_REMOTEPROC) += imx_rproc.o
> +obj-$(CONFIG_INGENIC_RPROC)  += ingenic_rproc.o
>  obj-$(CONFIG_OMAP_REMOTEPROC)+= omap_remoteproc.o
>  obj-$(CONFIG_WKUP_M3_RPROC)  += wkup_m3_rproc.o
>  obj-$(CONFIG_DA8XX_REMOTEPROC)   += da8xx_remoteproc.o
> diff --git a/drivers/remoteproc/ingenic_rproc.c 
> b/drivers/remoteproc/ingenic_rproc.c
> new file mode 100644
> index ..6fe0530c83a6
> --- /dev/null
> +++ b/drivers/remoteproc/ingenic_rproc.c
> @@ -0,0 +1,285 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Ingenic JZ47xx remoteproc driver
> + * Copyright 2019, Paul Cercueil 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "remoteproc_internal.h"
> +
> +#define REG_AUX_CTRL 0x0
> +#define REG_AUX_MSG_ACK  0x10
> +#define REG_AUX_MSG  0x14
> +#define REG_CORE_MSG_ACK 0x18
> +#define REG_CORE_MSG 0x1C
> +
> +#define AUX_CTRL_SLEEP   BIT(31)
> +#define AUX_CTRL_MSG_IRQ_EN  BIT(3)
> +#define AUX_CTRL_NMI_RESETS  BIT(2)
> +#define AUX_CTRL_NMI BIT(1)
> +#define AUX_CTRL_SW_RESETBIT(0)
> +
> +struct vpu_mem_map {
> + const char *name;
> + unsigned int da;
> +};
> +
> +struct vpu_mem_info {
> + const struct vpu_mem_map *map;
> + unsigned long len;
> + void __iomem *base;
> +};
> +
> +static const struct vpu_mem_map vpu_mem_map[] = {
> + { "tcsm0", 0x132b },
> + { "tcsm1", 0xf400 },
> + { "sram",  0x132f },
> +};
> +
> +/* Device data */
> +struct vpu {
> + int irq;
> + struct clk *vpu_clk;
> + struct clk *aux_clk;
> + void __iomem *aux_base;
> + struct vpu_mem_info mem_info[ARRAY_SIZE(vpu_mem_map)];
> + struct device *dev;
> +};
> +
> +static int ingenic_rproc_prepare(struct rproc *rproc)

So I presume aux_clk and vpu_clk are required by the load callback?

> +{
> + struct vpu *vpu = rproc->priv;
> + int ret;
> +
> + ret = clk_prepare_enable(vpu->vpu_clk);

Please use the clk_bulk API instead.

> + if (ret) {
> + dev_err(vpu->dev, "Unable to start VPU clock: %d\n", ret);
> + return ret;
> + }
> +
> + ret = clk_prepare_enable(vpu->aux_clk);
> + if (ret) {
> + dev_err(vpu->dev, "Unable to start AUX clock: %d\n", ret);
> + goto err_disable_vpu_clk;
> + }
> +
> + return 0;
> +
> +err_disable_vpu_clk:
> + clk_disable_unprepare(vpu->vpu_clk);
> + return ret;
> +}
> +
> +static void ingenic_rproc_unprepare(struct rproc *rproc)
> +{
> + struct vpu *vpu = rproc->priv;
> +
> + clk_disable_unprepare(vpu->aux_clk);
> + clk_disable_unprepare(vpu->vpu_clk);
> +}
> +
> +static int ingenic_rproc_start(struct rproc *rproc)
> +{
> + struct vpu *vpu = rproc->priv;
> + u32 ctrl;
> +
> + enable_irq(vpu->irq);
> +
> + /* Reset the AUX and enable message IRQ */
> + ctrl = AUX_CTRL_NMI_RESETS | AUX_CTRL_NMI | AUX_CTRL_MSG_IRQ_EN;
> + writel(ctrl, vpu->aux_base + REG_AUX_CTRL);
> +
> + return 0;
> +}
> +
> +static int ingenic_rproc_stop(struct rproc *rproc)
> +{
> + struct vpu *vpu = rproc->priv;
> +
> + /* Keep AUX in reset mode */
> + writel(AUX_CTRL_SW_RESET, vpu->aux_base + REG_AUX_CTRL);
> +
> + 

Re: [PATCH 03/11] asm-generic: add generic dwarf definition

2019-08-26 Thread Changbin Du
Hi, Peter,

On Mon, Aug 26, 2019 at 09:42:15AM +0200, Peter Zijlstra wrote:
> On Sun, Aug 25, 2019 at 09:23:22PM +0800, Changbin Du wrote:
> > Add generic DWARF constant definitions. We will use it later.
> > 
> > Signed-off-by: Changbin Du 
> > ---
> >  include/asm-generic/dwarf.h | 199 
> >  1 file changed, 199 insertions(+)
> >  create mode 100644 include/asm-generic/dwarf.h
> > 
> > diff --git a/include/asm-generic/dwarf.h b/include/asm-generic/dwarf.h
> > new file mode 100644
> > index ..c705633c2a8f
> > --- /dev/null
> > +++ b/include/asm-generic/dwarf.h
> > @@ -0,0 +1,199 @@
> > +/* SPDX-License-Identifier: GPL-2.0
> > + *
> > + * Architecture independent definitions of DWARF.
> > + *
> > + * Copyright (C) 2019 Changbin Du 
> 
> You're claiming copyright on dwarf definitions? ;-)
> 
> I'm thinking only Oracle was daft enough to think stuff like that was
> copyrightable.
> 
ok, let me remove copyright line. I think SPDX claim is okay, right?

> Also; I think it would be very good to not use/depend on DWARF for this.
>
It only includes the DWARF expersion opcodes, not all of dwarf stuffs.

> You really don't need all of DWARF; I'm thikning you only need a few
> types; for location we already have regs_get_kernel_argument() which
> has all the logic to find the n-th argument.
> 
regs_get_kernel_argument() can handle most cases, but if the size of one 
paramater
exceeds 64bit (it is rare in kernel), we must recalculate the locations. So I 
think
dwarf location descriptor is the most accurate one.

-- 
Cheers,
Changbin Du


Re: [PATCH v3 1/5] leds: lm3532: Fix brightness control for i2c mode

2019-08-26 Thread Pavel Machek
On Mon 2019-08-26 14:58:22, Tony Lindgren wrote:
> Hi,
> 
> * Dan Murphy  [190820 19:53]:
> > Fix the brightness control for I2C mode.  Instead of
> > changing the full scale current register update the ALS target
> > register for the appropriate banks.
> > 
> > In addition clean up some code errors and random misspellings found
> > during coding.
> > 
> > Tested on Droid4 as well as LM3532 EVM connected to a BeagleBoneBlack
> > 
> > Fixes: e37a7f8d77e1 ("leds: lm3532: Introduce the lm3532 LED driver")
> > Reported-by: Pavel Machek 
> > Signed-off-by: Dan Murphy 
> > ---
> > 
> > v3 - Removed register define updates - 
> > https://lore.kernel.org/patchwork/patch/1114542/
> 
> Looks like starting with this patch in Linux next the LCD on droid4
> is so dim it's unreadable even with brightness set to 255. Setting
> brightness to 0 does blank it completely though.
> 
> Did something maybe break with the various patch revisions or are
> we now missing some dts patch?

Maybe missing dts patch. We should provide maximum current the LED can
handle... 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH 00/12] libperf: Add events to perf/event.h

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 26, 2019 at 06:58:52PM +0200, Jiri Olsa escreveu:
> On Mon, Aug 26, 2019 at 01:18:49PM -0300, Arnaldo Carvalho de Melo wrote:
> 
> SNIP
> 
> > [perfbuilder@490c2c7bdaab ~]$ grep 'printf("lost' 
> > /tmp/build/perf/builtin-sched.i
> >  printf("lost %" "l" "ll""u" " events on cpu %d\n", event->lost.lost, 
> > sample->cpu);
> > [perfbuilder@490c2c7bdaab ~]$
> > 
> > And if we do this on a fedora:30 x86_64:
> > 
> > $ make -C tools/perf O=/tmp/build/perf /tmp/build/perf/builtin-sched.i
> > [acme@quaco perf]$ grep -A4 'printf("lost' /tmp/build/perf/builtin-sched.i
> >  printf("lost %" "l" 
> > # 2646 "builtin-sched.c" 3 4
> > "l" "u" 
> > # 2646 "builtin-sched.c"
> >  " events on cpu %d\n", event->lost.lost, 
> > sample->cpu);
> > [acme@quaco perf]$
> > 
> > I.e. on 32-bit arches we shouldn't add that extra "l", right?
> 
> hum, I guess we could #ifdef it 64/32 bits

I tried to figure out how to fix this better, but the int-ll64.h versus
int-l64.h versus how __u64 is defined got me confused and I ended up
with:

#if __WORDSIZE == 64
/*
 * /usr/include/inttypes.h uses just 'lu' for PRIu64, but we end up defining
 * __u64 as long long unsigned int, and then -Werror=format= kicks in and
 * complains of the mismatched types, so use these two special extra PRI
 * macros to overcome that.
 */
#define PRI_lu64 "l" PRIu64
#define PRI_lx64 "l" PRIx64
#else
#define PRI_lu64 PRIu64
#define PRI_lx64 PRIx64
#endif

Builds in all the containers I have, 32-bit, 64-bit, old gccs/clangs,
new ones, uclibc, musl libc, glibc, etc
 
> > 
> > I bet the build for the mips/mipsel will fail too, lemme see... Yeah,
> > both failed:
> > 
> > 
>> [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
>> dm.log/debian\:experimental-x-mips
> > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > format [-Werror=format=]
> >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > sample->cpu);
> >  ^~~~
> > In file included from builtin-sched.c:31:
> > /usr/mips-linux-gnu/include/inttypes.h:47:28: note: format string is 
> > defined here
> >  #  define __PRI64_PREFIX "ll"
> > ^
> > [root@quaco ~]#
> > 
> > [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
> > dm.log/debian\:experimental-x-mipsel
> > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > format [-Werror=format=]
> >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > sample->cpu);
> >  ^~~~
> > In file included from builtin-sched.c:31:
> > /usr/mipsel-linux-gnu/include/inttypes.h:47:28: note: format string is 
> > defined here
> >  #  define __PRI64_PREFIX "ll"
> > ^
> > [root@quaco ~]#
> > 
> > And also on a uclibc ARC arch container:
> > 
> > [root@quaco ~]# grep -m1 -A6 -- -Werror=format=  
> > dm.log/fedora\:24-x-ARC-uClibc
> > builtin-sched.c:2646:9: error: unknown conversion type character 'l' in 
> > format [-Werror=format=]
> >   printf("lost %" PRI_lu64 " events on cpu %d\n", event->lost.lost, 
> > sample->cpu);
> >  ^~~~
> > In file included from builtin-sched.c:31:0:
> > /arc_gnu_2017.09-rc2_prebuilt_uclibc_le_arc700_linux_install/arc-snps-linux-uclibc/sysroot/usr/include/inttypes.h:47:28:
> >  note: format string is defined here
> >  #  define __PRI64_PREFIX "ll"
> > ^
> > [root@quaco ~]#
> > 
> > The _fix_ will come after lunch :)
> 
> thanks ;-)
> 
> jirka

-- 

- Arnaldo


Re: BoF on LPC 2019 : Linux Perf advancements for compute intensive and server systems

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 26, 2019 at 10:57:58AM -0700, Andi Kleen escreveu:
> > 
> > > 
> > > All those are already merged, after long reviewing phases and lots of
> > > testing, right?
> > 
> > Right. These changes now constitute parts of the Linux kernel source tree.
> 
> Might be better to focus on future areas that haven't been merged yet.

Agreed, we can have a initial, short report on what has been done to
address these issues, and I think Alexey could take care of that, but
then we should try and list here what else in addition to what Ian et
all listed on their talk.

And perhaps even things that ammeliorate the problems they list there,
i.e. Ian, Stephane, the things that Alexey listed were already
tested/considered by you guys?

- Arnaldo


Re: [PATCH 00/12] libperf: Add events to perf/event.h

2019-08-26 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 26, 2019 at 06:47:34PM +0200, Jiri Olsa escreveu:
> On Mon, Aug 26, 2019 at 12:41:38PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Aug 25, 2019 at 08:17:40PM +0200, Jiri Olsa escreveu:
> > > hi,
> > > as a preparation for sampling libperf interface, moving event
> > > definitions into the library header. Moving just the kernel 
> > > non-AUX events now.
> > > 
> > > In order to keep libperf simple, we switch 'u64/u32/u16/u8'
> > > types used events to their generic '__u*' versions.
> > > 
> > > Perf added 'u*' types mainly to ease up printing __u64 values
> > > as stated in the linux/types.h comment:
> > > 
> > >   /*
> > >* We define u64 as uint64_t for every architecture
> > >* so that we can print it with "%"PRIx64 without getting warnings.
> > >*
> > >* typedef __u64 u64;
> > >* typedef __s64 s64;
> > >*/
> > > 
> > > Adding and using new PRI_lu64 and PRI_lx64 macros to be used for
> > > that.  Using extra '_' to ease up the reading and differentiate
> > > them from standard PRI*64 macros.
> > 
> > I think we should take advantage of this moment to rename those structs
> > to have the 'perf_record_' prefix on them, I guess we could even remove
> > the _event from them, i.e.:
> > 
> > 'struct mmap_event' becomes 'perf_record_mmap', as it is the description
> > for the PERF_RECORD_MMAP meta-data event, are you ok with that?
> 
> hum, not sure about loosing the '_event' here, but we are
> not public yet, so we can always change back ;-) I do like
> it'd follow the enum name

So I'm making the already exported to libperf to be renamed to have the
same name as the PERF_RECORD_ enum they map to, just all lowercase.

Looks nice and also makes something exported by libperf to have a perf_
namespace prefix.

BTW: you forgot to move PERF_RECORD_CONTEXT_SWITCH :-)

> > I can go ahead and do it myself, updating each patch on this series to
> > do that.
> 
> sure, I thought we'd do it later, but feel free to do it,
> maybe in separate changes?

I did it as a separate patch, one patch for all the PERF_RECORD_ already
moved to libperf.

Also some other minor stuff, like having that
perf_event.{bpf,ksymbol}_event renamed to play perf_event.{bpf,ksymbol},
like the other records. so to make this idiom more compact and less
redundant:

event->bpf_event

becomes:

event->bpf

ditto for ksymbol_event.

- Arnaldo


Re: [PATCH] PCI: Add missing link delays required by the PCIe spec

2019-08-26 Thread Bjorn Helgaas
On Mon, Aug 26, 2019 at 05:42:42PM +0300, Mika Westerberg wrote:
> On Mon, Aug 26, 2019 at 09:07:12AM -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 26, 2019 at 01:17:26PM +0300, Mika Westerberg wrote:
> > > On Fri, Aug 23, 2019 at 09:12:54PM -0500, Bjorn Helgaas wrote:

> > > > But the "wait downstream" part seems a little too specific to be at
> > > > the .resume_noirq and .runtime_resume level.
> > > > 
> > > > Do we descend the hierarchy and call .resume_noirq and .runtime_resume
> > > > for the children of the bridge, too?
> > > 
> > > We do but at that time it is too late as we have already resumed pciehp
> > > of the parent downstream port and it may have already started tearing
> > > down the device stack below.
> > > 
> > > I'm open to any better ideas where this delay could be added, though :)
> > 
> > So we resume pciehp *before* we're finished resuming the Downstream
> > Port?  That sounds wrong.
> 
> I mean once we resume the downstream port (the bridge) we also resume
> "PCIe port services" including pciehp and only then descend to whatever
> device is physically connected to that port.

That order sounds right.  I guess I'd have to see more details about
what's happening with pciehp to understand this.  Do you happen to
have a trace (dmesg, backtrace, etc) of pciehp tearing things down?

> > > > > +static int pcie_get_downstream_delay(struct pci_bus *bus)
> > > > > +{
> > > > > + struct pci_dev *pdev;
> > > > > + int min_delay = 100;
> > > > > + int max_delay = 0;
> > > > > +
> > > > > + list_for_each_entry(pdev, >devices, bus_list) {
> > > > > + if (pdev->imm_ready)
> > > > > + min_delay = 0;
> > > > > + else if (pdev->d3cold_delay < min_delay)
> > > > > + min_delay = pdev->d3cold_delay;
> > > > > + if (pdev->d3cold_delay > max_delay)
> > > > > + max_delay = pdev->d3cold_delay;
> > > > > + }
> > > > > +
> > > > > + return max(min_delay, max_delay);
> > > > > +}
> > 
> > > > > + */
> > > > > +void pcie_wait_downstream_accessible(struct pci_dev *pdev)
> > 
> > > > Do we need to observe the Trhfa requirements for Conventional PCI and
> > > > PCI-X devices here?  If we don't do it here, where is it observed?
> > > 
> > > We probably should but I intended this to be PCIe specific since there
> > > we have issues. For conventional PCI/PCI-X things "seem" to work and we
> > > don't power manage those bridges anyway.
> > > 
> > > I'm not aware if Trhfa is handled in anywhere in the PCI stack
> > > currently.
> > 
> > I think we should make this agnostic of the Conventional/PCIe
> > difference if possible.  I assume we can tell if we're dealing with a
> > D3->D0 transition and we only add delays in that case.  If we don't
> > power manage Conventional PCI devices, I assume we won't see D3->D0
> > transitions for runtime resume so there won't be any harm.
> >
> > Making it PCIe-specific seems like it adds extra code ("dev-is-PCIe
> > checks") with no obvious reason for existence and an implicit
> > dependency on the fact that we only power manage PCIe devices.  If we
> > ever *did* support runtime power-management for Conventional PCI, we'd
> > trip over that implicit dependency and probably debug this issue
> > again.
> > 
> > But I guess it might slow down system resume for Conventional PCI
> > devices.  If we rely on delays in firmware, I wonder if there's
> > any point during resume where we could grab an early timestamp, then
> > take another timestamp here and deduce that we've already delayed the
> > difference?
> 
> That sounds rather complex, to be honest ;-)

Maybe so, I was just trying to brainstorm possibilities for making
sure we observe the delay requirements without slowing down resume.

For example, if we have several devices on the same bus, we shouldn't
have to do the delays serially; we should be able to take advantage of
the fact that the Trhfa period starts at the same time for all of
them.

> > > > > + delay = pcie_get_downstream_delay(bus);
> > > > > + if (!delay)
> > > > > + return;
> > > > 
> > > > I'm not sold on the idea that this delay depends on what's *below* the
> > > > bridge.  We're using sec 6.6.1 to justify the delay, and that section
> > > > doesn't say anything about downstream devices.
> > > 
> > > 6.6.1 indeed talks about Downstream Ports and devices immediately below
> > > them.
> > 
> > Wait, I don't think we're reading this the same way.  6.6.1 mentions
> > Downstream Ports: "With a Downstream Port that does not support Link
> > speeds greater than 5.0 GT/s, software must wait a minimum of 100 ms
> > before sending a Configuration Request to the device immediately below
> > that Port."
> > 
> > This says we have to delay before sending a config request to a device
> > below a Downstream Port, but it doesn't say anything about the
> > characteristics of that device.  In particular, I don't think it says
> > the delay can be 

Re: [PATCH v3 1/5] leds: lm3532: Fix brightness control for i2c mode

2019-08-26 Thread Tony Lindgren
Hi,

* Dan Murphy  [190820 19:53]:
> Fix the brightness control for I2C mode.  Instead of
> changing the full scale current register update the ALS target
> register for the appropriate banks.
> 
> In addition clean up some code errors and random misspellings found
> during coding.
> 
> Tested on Droid4 as well as LM3532 EVM connected to a BeagleBoneBlack
> 
> Fixes: e37a7f8d77e1 ("leds: lm3532: Introduce the lm3532 LED driver")
> Reported-by: Pavel Machek 
> Signed-off-by: Dan Murphy 
> ---
> 
> v3 - Removed register define updates - 
> https://lore.kernel.org/patchwork/patch/1114542/

Looks like starting with this patch in Linux next the LCD on droid4
is so dim it's unreadable even with brightness set to 255. Setting
brightness to 0 does blank it completely though.

Did something maybe break with the various patch revisions or are
we now missing some dts patch?

Regards,

Tony


Re: [patch V2 32/38] posix-cpu-timers: Get rid of zero checks

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:19PM +0200, Thomas Gleixner wrote:
> Deactivation of the expiry cache is done by setting all clock caches to
> 0. That requires to have a check for zero in all places which update the
> expiry cache:
> 
>   if (cache == 0 || new < cache)
>   cache = new;
> 
> Use U64_MAX as the deactivated value, which allows to remove the zero
> checks when updating the cache and reduces it to the obvious check:
> 
>   if (new < cache)
>   cache = new;
> 
> This also removes the weird workaround in do_prlimit() which was required
> to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing
> in 0 to the posix CPU timer code would have effectively disarmed it.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 2/2] reset: Reset controller driver for Intel LGM SoC

2019-08-26 Thread Martin Blumenstingl
Hi,

On Mon, Aug 26, 2019 at 6:01 AM Chuan Hua, Lei
 wrote:
>
> Hi Martin,
>
> Thanks for your comment.
thank you for the quick reply

> On 8/25/2019 5:11 AM, Martin Blumenstingl wrote:
> > Hi Dilip,
> >
> >> Add driver for the reset controller present on Intel
> >> Lightening Mountain (LGM) SoC for performing reset
> >> management of the devices present on the SoC. Driver also
> >> registers a reset handler to peform the entire device reset.
> > [...]
> >> +static const struct of_device_id intel_reset_match[] = {
> >> +{ .compatible = "intel,rcu-lgm" },
> >> +{}
> >> +};
> > how is this IP block differnet from the one used in many Lantiq SoCs?
> > there is already an upstream driver for the RCU IP block on the Lantiq
> > SoCs: drivers/reset/reset-lantiq.c
> >
> > some background:
> > Lantiq was started as a spinoff from Infineon in 2009. Intel then
> > acquired Lantiq in 2015. source: [0]
> > Intel is re-using some of the IP blocks from the MIPS Lantiq SoCs
> > (Intel even has some own MIPS SoCs as part of the Lantiq acquisition,
> > typically used for PON/GPON/ADSL/VDSL capable network devices).
> > Thus I think it is likely that the new "Lightening Mountain" SoCs use
> > an updated version of the Lantiq RCU IP.
>
> I would not say there is a fundamental difference since reset is a
> really simple
> stuff from all reset drivers.  However, it did have some difference
> from existing reset-lantiq.c since SoC becomes more and more complex.
OK, let me go through your detailed list

> 1. reset-lantiq.c use index instead of register offset + bit position.
> index reset is good for a small system (< 64). However, it will become very
> difficult to use if you have  > 100 reset. So we use register offset +
> bit position
reset-lantiq uses bit bit positions for specifying the reset line.
for example this is from OpenWrt's vr9.dtsi:
  reset0: reset-controller@10 {
...
reg = <0x10 4>, <0x14 4>;
#reset-cells = <2>;
  };

  gphy0: gphy@20 {
...
resets = < 31 30>, < 7 7>;
reset-names = "gphy", "gphy2";
  };

in my own words this means:
- all reset0 reset bits are at offset 0x10 (parent is RCU)
- all reset0 status bits are at offset 0x14 (parent is RCU)
- the first reset line uses reset bit 31 and status bit 30
- the second reset line uses reset bit 7 and status bit 7
- there can be multiple reset-controller instances, each taking the
reset and status offsets (OpenWrt's vr9.dtsi specifies the second RCU
reset controller "reset1" with reset offset 0x48 and status offset
0x24)

> 2. reset-lantiq.c does not support device restart which is part of the
> reset in
> old lantiq SoC. It moved this part into arch/mips/lantiq directory.
it was moved to the .dts instead of the arch code. again from
OpenWrt's vr9.dtsi [0]:
  reboot {
compatible = "syscon-reboot";
regmap = <>;
offset = <0x10>;
mask = <0xe000>;
  };

this sets the reset0 reset bits 31, 30 and 29 at reboot

> 3. reset-lantiqc reset callback doesn't implement what hardware implemented
> function. In old SoCs, some bits in the same register can be hardware
> reset clear.
>
> It just call assert + assert. For these SoCs, we should only call
> assert, hardware will auto deassert.
I didn't know that. so to confirm I understand this correctly:
- some reset lines must be asserted and de-asserted manually (setting
and clearing the bit manually). the _assert and _deassert functions
should be used in this case
- other reset lines only support reset pulses. the _reset function
should be used in this case
- the _reset function should only assert the reset line, then wait
until the hardware automatically de-asserts it (without any further
write)

is this the same for all, old and new SoCs?

only two mainline Lantiq drivers are using reset lines - both are
using the _assert and _deassert variants:
- drivers/net/dsa/lantiq_gswip.c
- drivers/phy/lantiq/phy-lantiq-rcu-usb2.c

> 4. Code not optimized and intel internal review not assessed.
insights from you (like the issue with the reset callback) are very
valuable - this shows that we should focus on having one driver.

> Based on the above findings, I would suggest reset-lantiq.c to move to
> reset-intel-syscon.c
my concern with having two separate drivers is that it will be hard to
migrate from reset-lantiq to the "optimized" reset-intel-syscon
driver.
I don't have access to the datasheets for the any Lantiq/Intel SoC
(VRX200 and even older).
so debugging issues after switching from one driver to another is
tedious because I cannot tell which part of the driver is causing a
problem (it's either "all code from driver A" vs "all code from driver
B", meaning it's hard to narrow it down).
with separate commits/patches that are improving the reset-lantiq
driver I can do git bisect to find the cause of a problem on the older
SoCs (VRX200 for example)

> What is your opinion?
I explained why I would like to avoid having two separate drivers
(even just for a limited amount of time)



Re: [patch V2 28/38] posix-cpu-timers: Restructure expiry array

2019-08-26 Thread Thomas Gleixner
On Mon, 26 Aug 2019, Frederic Weisbecker wrote:
> On Wed, Aug 21, 2019 at 09:09:15PM +0200, Thomas Gleixner wrote:
> > @@ -884,7 +888,7 @@ static void check_process_timers(struct
> >  struct list_head *firing)
> >  {
> > struct signal_struct *const sig = tsk->signal;
> > -   struct list_head *timers = sig->posix_cputimers.cpu_timers;
> > +   struct posix_cputimer_base *base = sig->posix_cputimers.bases;
> > u64 utime, ptime, virt_expires, prof_expires;
> > u64 sum_sched_runtime, sched_expires;
> > struct task_cputime cputime;
> > @@ -912,9 +916,12 @@ static void check_process_timers(struct
> > ptime = utime + cputime.stime;
> > sum_sched_runtime = cputime.sum_exec_runtime;
> >  
> > -   prof_expires = check_timers_list(timers, firing, ptime);
> > -   virt_expires = check_timers_list(++timers, firing, utime);
> > -   sched_expires = check_timers_list(++timers, firing, sum_sched_runtime);
> > +   prof_expires = check_timers_list([CPUCLOCK_PROF].cpu_timers,
> > +firing, ptime);
> > +   virt_expires = check_timers_list([CPUCLOCK_VIRT].cpu_timers,
> > +firing, utime);
> > +   sched_expires = check_timers_list([CLPCLOCK_SCHED].cpu_timers,
> 
> ^^
> 0-day bot should have warned by now.

It didn't but my own testing found it and I fixed it locally already



Re: [patch V2 31/38] rlimit: Rewrite non-sensical RLIMIT_CPU comment

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:18PM +0200, Thomas Gleixner wrote:
> The comment above the function which arms RLIMIT_CPU in the posix CPU timer
> code makes no sense at all. It claims that the kernel does not return an
> error code when it rejected the attempt to set RLIMIT_CPU. That's clearly
> bogus as the code does an error check and the rlimit is only set and
> activated when the permission checks are ok. In case of a rejection an
> appropriate error code is returned.
> 
> This is a historical and outdated comment which got dragged along even when
> the rlimit handling code was rewritten.
> 
> Replace it with an explanation why the setup function is not called when
> the rlimit value is RLIM_INFINITY and how the 'disarming' is handled.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Andrew Morton 
> Cc: Jiri Slaby 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH] bpf: handle 32-bit zext during constant blinding

2019-08-26 Thread Daniel Borkmann

On 8/21/19 9:23 PM, Naveen N. Rao wrote:

Since BPF constant blinding is performed after the verifier pass, the
ALU32 instructions inserted for doubleword immediate loads don't have a
corresponding zext instruction. This is causing a kernel oops on powerpc
and can be reproduced by running 'test_cgroup_storage' with
bpf_jit_harden=2.

Fix this by emitting BPF_ZEXT during constant blinding if
prog->aux->verifier_zext is set.

Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis 
result")
Reported-by: Michael Ellerman 
Signed-off-by: Naveen N. Rao 


LGTM, applied to bpf, thanks!


Re: [PATCH 02/19] dax: Pass dax_dev to dax_writeback_mapping_range()

2019-08-26 Thread Dan Williams
[ add Jan ]

On Mon, Aug 26, 2019 at 1:58 PM Vivek Goyal  wrote:
>
> On Mon, Aug 26, 2019 at 04:33:26PM -0400, Vivek Goyal wrote:
> > On Mon, Aug 26, 2019 at 04:53:16AM -0700, Christoph Hellwig wrote:
> > > On Wed, Aug 21, 2019 at 01:57:03PM -0400, Vivek Goyal wrote:
> > > > Right now dax_writeback_mapping_range() is passed a bdev and dax_dev
> > > > is searched from that bdev name.
> > > >
> > > > virtio-fs does not have a bdev. So pass in dax_dev also to
> > > > dax_writeback_mapping_range(). If dax_dev is passed in, bdev is not
> > > > used otherwise dax_dev is searched using bdev.
> > >
> > > Please just pass in only the dax_device and get rid of the block device.
> > > The callers should have one at hand easily, e.g. for XFS just call
> > > xfs_find_daxdev_for_inode instead of xfs_find_bdev_for_inode.
> >
> > Sure. Here is the updated patch.
> >
> > This patch can probably go upstream independently. If you are fine with
> > the patch, I can post it separately for inclusion.
>
> Forgot to update function declaration in case of !CONFIG_FS_DAX. Here is
> the updated patch.
>
> Subject: dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()
>
> As of now dax_writeback_mapping_range() takes "struct block_device" as a
> parameter and dax_dev is searched from bdev name. This also involves taking
> a fresh reference on dax_dev and putting that reference at the end of
> function.
>
> We are developing a new filesystem virtio-fs and using dax to access host
> page cache directly. But there is no block device. IOW, we want to make
> use of dax but want to get rid of this assumption that there is always
> a block device associated with dax_dev.
>
> So pass in "struct dax_device" as parameter instead of bdev.
>
> ext2/ext4/xfs are current users and they already have a reference on
> dax_device. So there is no need to take reference and drop reference to
> dax_device on each call of this function.
>
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/dax.c|8 +---
>  fs/ext2/inode.c |5 +++--
>  fs/ext4/inode.c |2 +-
>  fs/xfs/xfs_aops.c   |2 +-
>  include/linux/dax.h |4 ++--

Looks good to me. Would be nice to get some ext4 and xfs acks then
I'll take it through the dax tree for v5.4.

>  5 files changed, 8 insertions(+), 13 deletions(-)
>
> Index: rhvgoyal-linux-fuse/fs/dax.c
> ===
> --- rhvgoyal-linux-fuse.orig/fs/dax.c   2019-08-26 16:45:26.093710196 -0400
> +++ rhvgoyal-linux-fuse/fs/dax.c2019-08-26 16:45:29.462710196 -0400
> @@ -936,12 +936,11 @@ static int dax_writeback_one(struct xa_s
>   * on persistent storage prior to completion of the operation.
>   */
>  int dax_writeback_mapping_range(struct address_space *mapping,
> -   struct block_device *bdev, struct writeback_control *wbc)
> +   struct dax_device *dax_dev, struct writeback_control *wbc)
>  {
> XA_STATE(xas, >i_pages, wbc->range_start >> PAGE_SHIFT);
> struct inode *inode = mapping->host;
> pgoff_t end_index = wbc->range_end >> PAGE_SHIFT;
> -   struct dax_device *dax_dev;
> void *entry;
> int ret = 0;
> unsigned int scanned = 0;
> @@ -952,10 +951,6 @@ int dax_writeback_mapping_range(struct a
> if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
> return 0;
>
> -   dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
> -   if (!dax_dev)
> -   return -EIO;
> -
> trace_dax_writeback_range(inode, xas.xa_index, end_index);
>
> tag_pages_for_writeback(mapping, xas.xa_index, end_index);
> @@ -976,7 +971,6 @@ int dax_writeback_mapping_range(struct a
> xas_lock_irq();
> }
> xas_unlock_irq();
> -   put_dax(dax_dev);
> trace_dax_writeback_range_done(inode, xas.xa_index, end_index);
> return ret;
>  }
> Index: rhvgoyal-linux-fuse/include/linux/dax.h
> ===
> --- rhvgoyal-linux-fuse.orig/include/linux/dax.h2019-08-26 
> 16:45:26.094710196 -0400
> +++ rhvgoyal-linux-fuse/include/linux/dax.h 2019-08-26 16:46:08.101710196 
> -0400
> @@ -141,7 +141,7 @@ static inline void fs_put_dax(struct dax
>
>  struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
>  int dax_writeback_mapping_range(struct address_space *mapping,
> -   struct block_device *bdev, struct writeback_control *wbc);
> +   struct dax_device *dax_dev, struct writeback_control *wbc);
>
>  struct page *dax_layout_busy_page(struct address_space *mapping);
>  dax_entry_t dax_lock_page(struct page *page);
> @@ -180,7 +180,7 @@ static inline struct page *dax_layout_bu
>  }
>
>  static inline int dax_writeback_mapping_range(struct address_space *mapping,
> -   struct block_device *bdev, struct writeback_control *wbc)
> +   struct 

Re: [PATCH 4.14 38/53] IB/hfi1: Fix Spectre v1 vulnerability

2019-08-26 Thread Gustavo A. R. Silva



On 8/26/19 4:06 AM, Greg Kroah-Hartman wrote:

> 
> Can you provide backports that work if they really are needed?
> 

It seems they aren't needed.

Sorry about the noise.

--
Gustavo



Re: [PATCH] RISC-V: Fix FIXMAP area corruption on RV32 systems

2019-08-26 Thread Alistair Francis
On Mon, 2019-08-26 at 14:17 -0700, Palmer Dabbelt wrote:
> On Sun, 18 Aug 2019 21:49:01 PDT (-0700), a...@brainfault.org wrote:
> > On Sun, Aug 18, 2019 at 11:49 PM Christoph Hellwig <
> > h...@infradead.org> wrote:
> > > > +#define FIXADDR_TOP  (VMALLOC_START)
> > > 
> > > Nit: no need for the braces, the definitions below don't use it
> > > either.
> > 
> > Sure, I will update and send v2 soon.
> > 
> > > > +#ifdef CONFIG_64BIT
> > > > +#define FIXADDR_SIZE PMD_SIZE
> > > > +#else
> > > > +#define FIXADDR_SIZE PGDIR_SIZE
> > > > +#endif
> > > > +#define FIXADDR_START(FIXADDR_TOP - FIXADDR_SIZE)
> > > > +
> > > >  /*
> > > > - * Task size is 0x40 for RV64 or 0xb80 for RV32.
> > > > + * Task size is 0x40 for RV64 or 0x9fc0 for RV32.
> > > >   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
> > > >   */
> > > >  #ifdef CONFIG_64BIT
> > > >  #define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
> > > >  #else
> > > > -#define TASK_SIZE VMALLOC_START
> > > > +#define TASK_SIZE FIXADDR_START
> > > >  #endif
> > > 
> > > Mentioning the addresses is a little weird.  IMHO this would be
> > > a much nicer place to explain the high-level memory layout,
> > > including
> > > maybe a little ASCII art.  Also we could have one #ifdef
> > > CONFIG_64BIT
> > > for both related values.  Last but not least instead of saying
> > > that
> > > something should be dividable it would be nice to have a
> > > BUILD_BUG_ON
> > > to enforce it.
> > > 
> > > Either way we are late in the cycle, so I guess this is ok for
> > > now:
> > > 
> > > Reviewed-by: Christoph Hellwig 
> > > 
> > > But I'd love to see this area improved a little further as it is
> > > full
> > > of mine fields.
> > 
> > I agree with you. We also have Sparsemem and KASAN patches which
> > touch virtual memory layout so it's important to have virtual
> > memory layout
> > documented clearly. I can add the required documentation as
> > separate patch.
> 
> Documentation is great, but if we document something that is broken
> then it's 
> still broken :)

I'm confused here. What is broken?

Right now RV32 does not work with the 5.3 kernel and this patch fixes
the regression.

> 
> I think this needs to just be redone -- we keep running into issues
> here and 
> fixing them, but there are probably more issues and it'll probably be
> faster to 
> just think through the memory map than to keep fixing bugs as they
> crop up.  
> This was one of the areas of the port I didn't rewrite as part of the
> upstream 
> submission process, and as a result it's pretty crusty.

I can't speak for rewriting the code, but that seems like something
that should happen in the 5.4 merge window right? With RC6 already out 
this patch seems like the only option for 5.3. Unless we are just going
to drop RV32 support from Linux in the 5.3 release?

Alistair

> 
> > I think the best place to add ASCII art would be asm/pgtable.h
> > where all
> > virtual memory related defines are placed. Suggestions??


Re: [PATCH v1 net-next] net: phy: mdio_bus: make mdiobus_scan also cover PHY that only talks C45

2019-08-26 Thread David Miller


There is something wrong with the clock on the computer you are
posting these patches from, the date in these postings are in the
future by several hours.

This messes up the ordering of changes in patchwork and makes my life
miserable to a certain degree, so please fix this.

Thank you.


Re: [RFC] clk: Remove cached cores in parent map during unregister

2019-08-26 Thread Stephen Boyd
Quoting Stephen Boyd (2019-08-21 11:10:08)
> Quoting Stephen Boyd (2019-07-29 15:46:51)
> > Quoting Bjorn Andersson (2019-07-22 22:14:46)
> > > As clocks are registered their parents are resolved and the parent_map
> > > is updated to cache the clk_core objects of each existing parent.
> > > But in the event of a clock being unregistered this cache will carry
> > > dangling pointers if not invalidated, so do this for all children of the
> > > clock being unregistered.
> > > 
> > > Signed-off-by: Bjorn Andersson 
> > > ---
> > > 
> > > This resolves the issue seen where the DSI PLL (and it's provided clocks) 
> > > is
> > > being registered and unregistered multiple times due to probe deferral.
> > > 
> > > Marking it RFC because I don't fully understand the life of the clock yet.
> > 
> > The concept sounds sane but the implementation is going to be not much
> > fun. The problem is that a clk can be in many different parent caches,
> > even ones for clks that aren't currently parented to it. We would need
> > to walk the entire tree(s) and find anywhere that we've cached the
> > clk_core pointer and invalidate it. Maybe we can speed that up a little
> > bit by keeping a reference to the entry of each parent cache that is for
> > the parent we're removing, essentially holding an inverse cache, but I'm
> > not sure it will provide any benefit besides wasting space for this one
> > operation that we shouldn't be doing very often if at all.
> > 
> > It certainly sounds easier to iterate through the whole tree and just
> > invalidate entries in all the caches under the prepare lock. We can
> > optimize it later.
> 
> Here's an attempt at the simple approach. There's another problem where
> the cached 'hw' member of the parent data is held around when we don't
> know when the caller has destroyed it. Not much else we can do for that
> though.
> 
> ---8<---
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index c0990703ce54..f42a803fb11a 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -3737,6 +3737,37 @@ static const struct clk_ops clk_nodrv_ops = {
> .set_parent = clk_nodrv_set_parent,
>  };
>  
> +static void clk_core_evict_parent_cache_subtree(struct clk_core *root,
> +   struct clk_core *target)
> +{
> +   int i;
> +   struct clk_core *child;
> +
> +   if (!root)
> +   return;

I don't think we need this part. Child is always a valid pointer.



Re: [PATCH v2] arm64: dts: sdm845: Add parent clock for rpmhcc

2019-08-26 Thread Stephen Boyd
Quoting Vinod Koul (2019-08-26 10:42:33)
> RPM clock controller has parent as xo, so specify that in DT node for
> rpmhcc
> 
> Signed-off-by: Vinod Koul 
> ---

Reviewed-by: Stephen Boyd 



a bug in genksysms/CONFIG_MODVERSIONS w/ __attribute__((foo))?

2019-08-26 Thread Nick Desaulniers
I'm looking into a linkage failure for one of our device kernels, and
it seems that genksyms isn't producing a hash value correctly for
aggregate definitions that contain __attribute__s like
__attribute__((packed)).

Example:
$ echo 'struct foo { int bar; };' | ./scripts/genksyms/genksyms -d
Defn for struct foo == 
Hash table occupancy 1/4096 = 0.000244141
$ echo 'struct __attribute__((packed)) foo { int bar; };' |
./scripts/genksyms/genksyms -d
Hash table occupancy 0/4096 = 0

I assume the __attribute__ part isn't being parsed correctly (looks
like genksyms is a lex/yacc based C parser).

The issue we have in our out of tree driver (*sadface*) is basically a
EXPORT_SYMBOL'd function whose signature contains a packed struct.

Theoretically, there should be nothing wrong with exporting a function
that requires packed structs, and this is just a bug in the lex/yacc
based parser, right?  I assume that not having CONFIG_MODVERSIONS
coverage of packed structs in particular could lead to potentially
not-fun bugs?  Or is using packed structs in exported function symbols
with CONFIG_MODVERSIONS forbidden in some documentation somewhere I
missed?
-- 
Thanks,
~Nick Desaulniers


[PATCH v2] clk: Document of_parse_clkspec() some more

2019-08-26 Thread Stephen Boyd
The return value of of_parse_clkspec() is peculiar. If the function is
called with a NULL argument for 'name' it will return -ENOENT, but if
it's called with a non-NULL argument for 'name' it will return -EINVAL.
This peculiarity is documented by commit 5c56dfe63b6e ("clk: Add comment
about __of_clk_get_by_name() error values").

Let's further document this function so that it's clear what the return
value is and how to use the arguments to parse clk specifiers.

Cc: Phil Edworthy 
Signed-off-by: Stephen Boyd 
---
 drivers/clk/clk.c | 43 +--
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index c0990703ce54..5c6585eb35d4 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4316,12 +4316,43 @@ void devm_of_clk_del_provider(struct device *dev)
 }
 EXPORT_SYMBOL(devm_of_clk_del_provider);
 
-/*
- * Beware the return values when np is valid, but no clock provider is found.
- * If name == NULL, the function returns -ENOENT.
- * If name != NULL, the function returns -EINVAL. This is because
- * of_parse_phandle_with_args() is called even if of_property_match_string()
- * returns an error.
+/**
+ * of_parse_clkspec() - Parse a DT clock specifier for a given device node
+ * @np: device node to parse clock specifier from
+ * @index: index of phandle to parse clock out of. If index < 0, @name is used
+ * @name: clock name to find and parse. If name is NULL, the index is used
+ * @out_args: Result of parsing the clock specifier
+ *
+ * Parses a device node's "clocks" and "clock-names" properties to find the
+ * phandle and cells for the index or name that is desired. The resulting clock
+ * specifier is placed into @out_args, or an errno is returned when there's a
+ * parsing error. The @index argument is ignored if @name is non-NULL.
+ *
+ * Example:
+ *
+ * phandle1: clock-controller@1 {
+ * #clock-cells = <2>;
+ * }
+ *
+ * phandle2: clock-controller@2 {
+ * #clock-cells = <1>;
+ * }
+ *
+ * clock-consumer@3 {
+ * clocks = < 1 2  3>;
+ * clock-names = "name1", "name2";
+ * }
+ *
+ * To get a device_node for `clock-controller@2' node you may call this
+ * function a few different ways:
+ *
+ *   of_parse_clkspec(clock-consumer@3, -1, "name2", );
+ *   of_parse_clkspec(clock-consumer@3, 1, NULL, );
+ *   of_parse_clkspec(clock-consumer@3, 1, "name2", );
+ *
+ * Return: 0 upon successfully parsing the clock specifier. Otherwise, -ENOENT
+ * if @name is NULL or -EINVAL if @name is non-NULL and it can't be found in
+ * the "clock-names" property of @np.
  */
 static int of_parse_clkspec(const struct device_node *np, int index,
const char *name, struct of_phandle_args *out_args)
-- 
Sent by a computer through tubes



Re: [patch V2 30/38] posix-cpu-timers: Respect INFINITY for hard RTTIME limit

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:17PM +0200, Thomas Gleixner wrote:
> The RTIME limit expiry code does not check the hard RTTIME limit for
> INFINITY, i.e. being disabled.  Add it.
> 
> While this could be considered an ABI breakage if something would depend on
> this behaviour. Though it's highly unlikely to have an effect because
> RLIM_INFINITY is at minimum INT_MAX and the RTTIME limit is in seconds, so
> the timer would fire after ~68 years.
> 
> Adding this obvious correct limit check also allows further consolidation
> of that code and is a prerequisite for cleaning up the 0 based checks and
> the rlimit setter code.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH] RISC-V: Fix FIXMAP area corruption on RV32 systems

2019-08-26 Thread Palmer Dabbelt

On Sun, 18 Aug 2019 21:49:01 PDT (-0700), a...@brainfault.org wrote:

On Sun, Aug 18, 2019 at 11:49 PM Christoph Hellwig  wrote:


> +#define FIXADDR_TOP  (VMALLOC_START)

Nit: no need for the braces, the definitions below don't use it
either.


Sure, I will update and send v2 soon.



> +#ifdef CONFIG_64BIT
> +#define FIXADDR_SIZE PMD_SIZE
> +#else
> +#define FIXADDR_SIZE PGDIR_SIZE
> +#endif
> +#define FIXADDR_START(FIXADDR_TOP - FIXADDR_SIZE)
> +
>  /*
> - * Task size is 0x40 for RV64 or 0xb80 for RV32.
> + * Task size is 0x40 for RV64 or 0x9fc0 for RV32.
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   */
>  #ifdef CONFIG_64BIT
>  #define TASK_SIZE (PGDIR_SIZE * PTRS_PER_PGD / 2)
>  #else
> -#define TASK_SIZE VMALLOC_START
> +#define TASK_SIZE FIXADDR_START
>  #endif

Mentioning the addresses is a little weird.  IMHO this would be
a much nicer place to explain the high-level memory layout, including
maybe a little ASCII art.  Also we could have one #ifdef CONFIG_64BIT
for both related values.  Last but not least instead of saying that
something should be dividable it would be nice to have a BUILD_BUG_ON
to enforce it.

Either way we are late in the cycle, so I guess this is ok for now:

Reviewed-by: Christoph Hellwig 

But I'd love to see this area improved a little further as it is full
of mine fields.


I agree with you. We also have Sparsemem and KASAN patches which
touch virtual memory layout so it's important to have virtual memory layout
documented clearly. I can add the required documentation as separate patch.


Documentation is great, but if we document something that is broken then it's 
still broken :)


I think this needs to just be redone -- we keep running into issues here and 
fixing them, but there are probably more issues and it'll probably be faster to 
just think through the memory map than to keep fixing bugs as they crop up.  
This was one of the areas of the port I didn't rewrite as part of the upstream 
submission process, and as a result it's pretty crusty.



I think the best place to add ASCII art would be asm/pgtable.h where all
virtual memory related defines are placed. Suggestions??


Re: [PATCH v2 3/3] dwc: PCI: intel: Intel PCIe RC controller driver

2019-08-26 Thread Martin Blumenstingl
Hello,

On Mon, Aug 26, 2019 at 5:31 AM Chuan Hua, Lei
 wrote:
>
> Hi Martin,
>
> Thanks for your valuable comments. I reply some of them as below.
you're welcome

[...]
> >> +config PCIE_INTEL_AXI
> >> +bool "Intel AHB/AXI PCIe host controller support"
> > I believe that this is mostly the same IP block as it's used in Lantiq
> > (xDSL) VRX200 SoCs (with MIPS cores) which was introduced in 2010
> > (before Intel acquired Lantiq).
> > This is why I would have personally called the driver PCIE_LANTIQ
>
> VRX200 SoC(internally called VR9) was the first PCIe SoC product which
> was using synopsys
>
> controller v3.30a. It only supports PCIe Gen1.1/1.0. The phy is internal
> phy from infineon.
thank you for these details
I wasn't aware that the PCIe PHY on these SoCs was developed by
Infineon nor is the DWC version documented anywhere

[...]
> >> +#define PCIE_CCRID  0x8
> >> +
> >> +#define PCIE_LCAP   0x7C
> >> +#define PCIE_LCAP_MAX_LINK_SPEEDGENMASK(3, 0)
> >> +#define PCIE_LCAP_MAX_LENGTH_WIDTH  GENMASK(9, 4)
> >> +
> >> +/* Link Control and Status Register */
> >> +#define PCIE_LCTLSTS0x80
> >> +#define PCIE_LCTLSTS_ASPM_ENABLEGENMASK(1, 0)
> >> +#define PCIE_LCTLSTS_RCB128 BIT(3)
> >> +#define PCIE_LCTLSTS_LINK_DISABLE   BIT(4)
> >> +#define PCIE_LCTLSTS_COM_CLK_CFGBIT(6)
> >> +#define PCIE_LCTLSTS_HW_AW_DIS  BIT(9)
> >> +#define PCIE_LCTLSTS_LINK_SPEED GENMASK(19, 16)
> >> +#define PCIE_LCTLSTS_NEGOTIATED_LINK_WIDTH  GENMASK(25, 20)
> >> +#define PCIE_LCTLSTS_SLOT_CLK_CFG   BIT(28)
> >> +
> >> +#define PCIE_LCTLSTS2   0xA0
> >> +#define PCIE_LCTLSTS2_TGT_LINK_SPEEDGENMASK(3, 0)
> >> +#define PCIE_LCTLSTS2_TGT_LINK_SPEED_25GT   0x1
> >> +#define PCIE_LCTLSTS2_TGT_LINK_SPEED_5GT0x2
> >> +#define PCIE_LCTLSTS2_TGT_LINK_SPEED_8GT0x3
> >> +#define PCIE_LCTLSTS2_TGT_LINK_SPEED_16GT   0x4
> >> +#define PCIE_LCTLSTS2_HW_AUTO_DIS   BIT(5)
> >> +
> >> +/* Ack Frequency Register */
> >> +#define PCIE_AFR0x70C
> >> +#define PCIE_AFR_FTS_NUMGENMASK(15, 8)
> >> +#define PCIE_AFR_COM_FTS_NUMGENMASK(23, 16)
> >> +#define PCIE_AFR_GEN12_FTS_NUM_DFT  (SZ_128 - 1)
> >> +#define PCIE_AFR_GEN3_FTS_NUM_DFT   180
> >> +#define PCIE_AFR_GEN4_FTS_NUM_DFT   196
> >> +
> >> +#define PCIE_PLCR_DLL_LINK_EN   BIT(5)
> >> +#define PCIE_PORT_LOGIC_FTS GENMASK(7, 0)
> >> +#define PCIE_PORT_LOGIC_DFT_FTS_NUM (SZ_128 - 1)
> >> +
> >> +#define PCIE_MISC_CTRL  0x8BC
> >> +#define PCIE_MISC_CTRL_DBI_RO_WR_EN BIT(0)
> >> +
> >> +#define PCIE_MULTI_LANE_CTRL0x8C0
> >> +#define PCIE_UPCONFIG_SUPPORT   BIT(7)
> >> +#define PCIE_DIRECT_LINK_WIDTH_CHANGE   BIT(6)
> >> +#define PCIE_TARGET_LINK_WIDTH  GENMASK(5, 0)
> >> +
> >> +#define PCIE_IOP_CTRL   0x8C4
> >> +#define PCIE_IOP_RX_STANDBY_CTRLGENMASK(6, 0)
> no need for IOP
with "are you sure that you need any of the registers above?" I really
meant all registers above (including, but not limited to IOP)

[...]
> As I mentioned, VRX200 was a very old PCIe Gen1.1 product. In our latest
> SoC Lightning
>
> Mountain, we are using synopsys controller 5.20/5.50a. We support
> Gen2(XRX350/550),
>
> Gen3(PRX300) and GEN4(X86 based SoC). We also supported dual lane and
> single lane.
>
> Some of the above registers are needed to control FTS, link width and
> link speed.
only now I noticed that I didn't explain why I was asking whether all
these registers are needed
my understanding of the DWC PCIe controller driver "library" is that:
- all functionality which is provided by the DesignWare PCIe core
should be added to drivers/pci/controller/dwc/pcie-designware*
- functionality which is built on top/around the DWC PCIe core should
be added to 

the link width and link speed settings (I don't know about "FTS")
don't seem Intel/Lantiq controller specific to me
so the register setup for these bits should be moved to
drivers/pci/controller/dwc/pcie-designware*

> > this also makes me wonder if various functions below like
> > intel_pcie_link_setup() and intel_pcie_max_speed_setup() (and probably
> > more) are really needed or if it's just cargo cult / copy paste from
> > an out-of-tree driver).
>
> intel_pcie_link_setup is to record maximum speed and and link width. We need 
> these
> to change speed and link width on the fly which is not supported by dwc 
> driver common
> source.
> There are two major purposes.
> 1. For cable applications, they have battery support mode. In this case, it 
> has to
> switch from x2 and gen4 to x1 and gen1 on the fly
> 2. Some 

Re: [Xen-devel] [PATCH V3 2/2] Xen/PCIback: Implement PCI flr/slot/bus reset with 'reset' SysFS attribute

2019-08-26 Thread Pasi Kärkkäinen
Hi,

On Mon, Oct 08, 2018 at 10:32:45AM -0400, Boris Ostrovsky wrote:
> On 10/3/18 11:51 AM, Pasi Kärkkäinen wrote:
> > On Wed, Sep 19, 2018 at 11:05:26AM +0200, Roger Pau Monné wrote:
> >> On Tue, Sep 18, 2018 at 02:09:53PM -0400, Boris Ostrovsky wrote:
> >>> On 9/18/18 5:32 AM, George Dunlap wrote:
> > On Sep 18, 2018, at 8:15 AM, Pasi Kärkkäinen  wrote:
> >
> > Hi,
> >
> > On Mon, Sep 17, 2018 at 02:06:02PM -0400, Boris Ostrovsky wrote:
> >> What about the toolstack changes? Have they been accepted? I vaguely
> >> recall there was a discussion about those changes but don't remember 
> >> how
> >> it ended.
> >>
> > I don't think toolstack/libxl patch has been applied yet either.
> >
> >
> > "[PATCH V1 0/1] Xen/Tools: PCI reset using 'reset' SysFS attribute":
> > https://lists.xen.org/archives/html/xen-devel/2017-12/msg00664.html
> >
> > "[PATCH V1 1/1] Xen/libxl: Perform PCI reset using 'reset' SysFS 
> > attribute":
> > https://lists.xen.org/archives/html/xen-devel/2017-12/msg00663.html
> >>>
> >>> Will this patch work for *BSD? Roger?
> >> At least FreeBSD don't support pci-passthrough, so none of this works
> >> ATM. There's no sysfs on BSD, so much of what's in libxl_pci.c will
> >> have to be moved to libxl_linux.c when BSD support is added.
> >>
> > Ok. That sounds like it's OK for the initial pci 'reset' implementation in 
> > xl/libxl to be linux-only.. 
> >
> 
> Are these two patches still needed? ISTR they were  written originally
> to deal with guest trying to use device that was previously assigned to
> another guest. But pcistub_put_pci_dev() calls
> __pci_reset_function_locked() which first tries FLR, and it looks like
> it was added relatively recently.
> 

Replying to an old thread.. I only now realized I forgot to reply to this 
message earlier.

afaik these patches are still needed. Håkon (CC'd) wrote to me in private that
he gets a (dom0) Linux kernel crash if he doesn't have these patches applied.


Here are the links to both the linux kernel and libxl patches:


"[Xen-devel] [PATCH V3 0/2] Xen/PCIback: PCI reset using 'reset' SysFS 
attribute":
https://lists.xen.org/archives/html/xen-devel/2017-12/msg00659.html

[Note that PATCH V3 1/2 "Drivers/PCI: Export pcie_has_flr() interface" is 
already applied in upstream linux kernel, so it's not needed anymore]

"[Xen-devel] [PATCH V3 2/2] Xen/PCIback: Implement PCI flr/slot/bus reset with 
'reset' SysFS attribute":
https://lists.xen.org/archives/html/xen-devel/2017-12/msg00661.html


"[Xen-devel] [PATCH V1 0/1] Xen/Tools: PCI reset using 'reset' SysFS attribute":
https://lists.xen.org/archives/html/xen-devel/2017-12/msg00664.html

"[Xen-devel] [PATCH V1 1/1] Xen/libxl: Perform PCI reset using 'reset' SysFS 
attribute":
https://lists.xen.org/archives/html/xen-devel/2017-12/msg00663.html


> 
> -boris


Thanks,

-- Pasi



Re: [patch V2 29/38] posix-cpu-timers: Switch thread group sampling to array

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:16PM +0200, Thomas Gleixner wrote:
> That allows more simplifications in various places.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH] net/mlx5: fix a -Wstringop-truncation warning

2019-08-26 Thread Saeed Mahameed
On Fri, 2019-08-23 at 15:56 -0400, Qian Cai wrote:
> In file included from ./arch/powerpc/include/asm/paca.h:15,
>  from ./arch/powerpc/include/asm/current.h:13,
>  from ./include/linux/thread_info.h:21,
>  from ./include/asm-generic/preempt.h:5,
>  from
> ./arch/powerpc/include/generated/asm/preempt.h:1,
>  from ./include/linux/preempt.h:78,
>  from ./include/linux/spinlock.h:51,
>  from ./include/linux/wait.h:9,
>  from ./include/linux/completion.h:12,
>  from ./include/linux/mlx5/driver.h:37,
>  from
> drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h:6,
>  from
> drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:33:
> In function 'strncpy',
> inlined from 'mlx5_fw_tracer_save_trace' at
> drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:549:2,
> inlined from 'mlx5_tracer_print_trace' at
> drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:574:2:
> ./include/linux/string.h:305:9: warning: '__builtin_strncpy' output
> may
> be truncated copying 256 bytes from a string of length 511
> [-Wstringop-truncation]
>   return __builtin_strncpy(p, q, size);
>  ^
> 
> Fix it by using the new strscpy_pad() since the commit 458a3bf82df4
> ("lib/string: Add strscpy_pad() function") which will always
> NUL-terminate the string, and avoid possibly leak data through the
> ring
> buffer where non-admin account might enable these events through
> perf.
> 
> Fixes: fd1483fe1f9f ("net/mlx5: Add support for FW reporter dump")
> Signed-off-by: Qian Cai 


Hi Qian and thanks for your patch,

We already have a patch that handles this issue, please check it out:
https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=net-next-mlx5





Re: [PATCH] clk: Make of_parse_clkspec() return -ENOENT on errors

2019-08-26 Thread Stephen Boyd
Hrmm.. the subject is misleading. Let me reword it and resend.

Quoting Stephen Boyd (2019-08-19 15:29:15)
> The return value of of_parse_clkspec() is peculiar. If the function is
> called with a NULL argument for 'name' it will return -ENOENT, but if
> it's called with a non-NULL argument for 'name' it will return -EINVAL.
> This peculiarity is documented by commit 5c56dfe63b6e ("clk: Add comment
> about __of_clk_get_by_name() error values").
> 
> Let's further document this function so that it's clear what the return
> value is and how to use the arguments to parse clk specifiers.
> 
> Cc: Phil Edworthy 
> Signed-off-by: Stephen Boyd 


Re: [patch V2 28/38] posix-cpu-timers: Restructure expiry array

2019-08-26 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:15PM +0200, Thomas Gleixner wrote:
> @@ -884,7 +888,7 @@ static void check_process_timers(struct
>struct list_head *firing)
>  {
>   struct signal_struct *const sig = tsk->signal;
> - struct list_head *timers = sig->posix_cputimers.cpu_timers;
> + struct posix_cputimer_base *base = sig->posix_cputimers.bases;
>   u64 utime, ptime, virt_expires, prof_expires;
>   u64 sum_sched_runtime, sched_expires;
>   struct task_cputime cputime;
> @@ -912,9 +916,12 @@ static void check_process_timers(struct
>   ptime = utime + cputime.stime;
>   sum_sched_runtime = cputime.sum_exec_runtime;
>  
> - prof_expires = check_timers_list(timers, firing, ptime);
> - virt_expires = check_timers_list(++timers, firing, utime);
> - sched_expires = check_timers_list(++timers, firing, sum_sched_runtime);
> + prof_expires = check_timers_list([CPUCLOCK_PROF].cpu_timers,
> +  firing, ptime);
> + virt_expires = check_timers_list([CPUCLOCK_VIRT].cpu_timers,
> +  firing, utime);
> + sched_expires = check_timers_list([CLPCLOCK_SCHED].cpu_timers,

^^
0-day bot should have warned by now.


[PATCH] leds: Replace {devm_}led_classdev_register() macros with inlines

2019-08-26 Thread Jacek Anaszewski
Replace preprocessor macro aliases for legacy LED registration helpers
with inline functions. It will allow to avoid misleading compiler error
messages about missing symbol that actually wasn't explicitly used
in the code. It used to occur when CONFIG_LEDS_CLASS was undefined
and legacy (non-ext) function had been used in the code.

Signed-off-by: Jacek Anaszewski 
Cc: Pavel Machek 
Cc: Dan Murphy 
---
 include/linux/leds.h | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/leds.h b/include/linux/leds.h
index d101fd13e18e..b8df71193329 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -157,18 +157,39 @@ struct led_classdev {
  * @led_cdev: the led_classdev structure for this device
  * @init_data: the LED class device initialization data
  *
+ * Register a new object of LED class, with name derived from init_data.
+ *
  * Returns: 0 on success or negative error value on failure
  */
 extern int led_classdev_register_ext(struct device *parent,
 struct led_classdev *led_cdev,
 struct led_init_data *init_data);
-#define led_classdev_register(parent, led_cdev)\
-   led_classdev_register_ext(parent, led_cdev, NULL)
+
+/**
+ * led_classdev_register - register a new object of LED class
+ * @parent: LED controller device this LED is driven by
+ * @led_cdev: the led_classdev structure for this device
+ *
+ * Register a new object of LED class, with name derived from the name property
+ * of passed led_cdev argument.
+ *
+ * Returns: 0 on success or negative error value on failure
+ */
+static inline int led_classdev_register(struct device *parent,
+   struct led_classdev *led_cdev)
+{
+   return led_classdev_register_ext(parent, led_cdev, NULL);
+}
+
 extern int devm_led_classdev_register_ext(struct device *parent,
  struct led_classdev *led_cdev,
  struct led_init_data *init_data);
-#define devm_led_classdev_register(parent, led_cdev)   \
-   devm_led_classdev_register_ext(parent, led_cdev, NULL)
+
+static inline int devm_led_classdev_register(struct device *parent,
+struct led_classdev *led_cdev)
+{
+   return devm_led_classdev_register_ext(parent, led_cdev, NULL);
+}
 extern void led_classdev_unregister(struct led_classdev *led_cdev);
 extern void devm_led_classdev_unregister(struct device *parent,
 struct led_classdev *led_cdev);
-- 
2.11.0



Re: [GIT PULL] timers drivers v5.5

2019-08-26 Thread Thomas Gleixner
On Mon, 26 Aug 2019, Daniel Lezcano wrote:

> The following changes since commit 08a3c192c93f4359a94bf47971e55b0324b72b8b:
> 
>   posix-timers: Prepare for PREEMPT_RT (2019-08-01 20:51:25 +0200)
> 
> are available in the Git repository at:
> 
>   https://git.linaro.org/people/daniel.lezcano/linux.git tags/timers-v5.5

5.5 - that's for next year's first kernel so I'll put it into the fridge for 
now.




Re: [PATCH 02/19] dax: Pass dax_dev to dax_writeback_mapping_range()

2019-08-26 Thread Vivek Goyal
On Mon, Aug 26, 2019 at 04:33:26PM -0400, Vivek Goyal wrote:
> On Mon, Aug 26, 2019 at 04:53:16AM -0700, Christoph Hellwig wrote:
> > On Wed, Aug 21, 2019 at 01:57:03PM -0400, Vivek Goyal wrote:
> > > Right now dax_writeback_mapping_range() is passed a bdev and dax_dev
> > > is searched from that bdev name.
> > > 
> > > virtio-fs does not have a bdev. So pass in dax_dev also to
> > > dax_writeback_mapping_range(). If dax_dev is passed in, bdev is not
> > > used otherwise dax_dev is searched using bdev.
> > 
> > Please just pass in only the dax_device and get rid of the block device.
> > The callers should have one at hand easily, e.g. for XFS just call
> > xfs_find_daxdev_for_inode instead of xfs_find_bdev_for_inode.
> 
> Sure. Here is the updated patch.
> 
> This patch can probably go upstream independently. If you are fine with
> the patch, I can post it separately for inclusion.

Forgot to update function declaration in case of !CONFIG_FS_DAX. Here is
the updated patch.

Subject: dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()

As of now dax_writeback_mapping_range() takes "struct block_device" as a
parameter and dax_dev is searched from bdev name. This also involves taking
a fresh reference on dax_dev and putting that reference at the end of
function.

We are developing a new filesystem virtio-fs and using dax to access host
page cache directly. But there is no block device. IOW, we want to make
use of dax but want to get rid of this assumption that there is always
a block device associated with dax_dev.

So pass in "struct dax_device" as parameter instead of bdev.

ext2/ext4/xfs are current users and they already have a reference on
dax_device. So there is no need to take reference and drop reference to
dax_device on each call of this function.

Suggested-by: Christoph Hellwig 
Signed-off-by: Vivek Goyal 
---
 fs/dax.c|8 +---
 fs/ext2/inode.c |5 +++--
 fs/ext4/inode.c |2 +-
 fs/xfs/xfs_aops.c   |2 +-
 include/linux/dax.h |4 ++--
 5 files changed, 8 insertions(+), 13 deletions(-)

Index: rhvgoyal-linux-fuse/fs/dax.c
===
--- rhvgoyal-linux-fuse.orig/fs/dax.c   2019-08-26 16:45:26.093710196 -0400
+++ rhvgoyal-linux-fuse/fs/dax.c2019-08-26 16:45:29.462710196 -0400
@@ -936,12 +936,11 @@ static int dax_writeback_one(struct xa_s
  * on persistent storage prior to completion of the operation.
  */
 int dax_writeback_mapping_range(struct address_space *mapping,
-   struct block_device *bdev, struct writeback_control *wbc)
+   struct dax_device *dax_dev, struct writeback_control *wbc)
 {
XA_STATE(xas, >i_pages, wbc->range_start >> PAGE_SHIFT);
struct inode *inode = mapping->host;
pgoff_t end_index = wbc->range_end >> PAGE_SHIFT;
-   struct dax_device *dax_dev;
void *entry;
int ret = 0;
unsigned int scanned = 0;
@@ -952,10 +951,6 @@ int dax_writeback_mapping_range(struct a
if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
return 0;
 
-   dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
-   if (!dax_dev)
-   return -EIO;
-
trace_dax_writeback_range(inode, xas.xa_index, end_index);
 
tag_pages_for_writeback(mapping, xas.xa_index, end_index);
@@ -976,7 +971,6 @@ int dax_writeback_mapping_range(struct a
xas_lock_irq();
}
xas_unlock_irq();
-   put_dax(dax_dev);
trace_dax_writeback_range_done(inode, xas.xa_index, end_index);
return ret;
 }
Index: rhvgoyal-linux-fuse/include/linux/dax.h
===
--- rhvgoyal-linux-fuse.orig/include/linux/dax.h2019-08-26 
16:45:26.094710196 -0400
+++ rhvgoyal-linux-fuse/include/linux/dax.h 2019-08-26 16:46:08.101710196 
-0400
@@ -141,7 +141,7 @@ static inline void fs_put_dax(struct dax
 
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
 int dax_writeback_mapping_range(struct address_space *mapping,
-   struct block_device *bdev, struct writeback_control *wbc);
+   struct dax_device *dax_dev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
 dax_entry_t dax_lock_page(struct page *page);
@@ -180,7 +180,7 @@ static inline struct page *dax_layout_bu
 }
 
 static inline int dax_writeback_mapping_range(struct address_space *mapping,
-   struct block_device *bdev, struct writeback_control *wbc)
+   struct dax_device *dax_dev, struct writeback_control *wbc)
 {
return -EOPNOTSUPP;
 }
Index: rhvgoyal-linux-fuse/fs/xfs/xfs_aops.c
===
--- rhvgoyal-linux-fuse.orig/fs/xfs/xfs_aops.c  2019-08-26 16:45:26.094710196 
-0400
+++ rhvgoyal-linux-fuse/fs/xfs/xfs_aops.c   2019-08-26 16:45:29.471710196 
-0400
@@ 

Re: [RFC PATCH v3 11/16] sched: Basic tracking of matching tasks

2019-08-26 Thread mark gross
On Wed, May 29, 2019 at 08:36:47PM +, Vineeth Remanan Pillai wrote:
> From: Peter Zijlstra 
> 
> Introduce task_struct::core_cookie as an opaque identifier for core
> scheduling. When enabled; core scheduling will only allow matching
> task to be on the core; where idle matches everything.
> 
> When task_struct::core_cookie is set (and core scheduling is enabled)
> these tasks are indexed in a second RB-tree, first on cookie value
> then on scheduling function, such that matching task selection always
> finds the most elegible match.
> 
> NOTE: *shudder* at the overhead...
> 
> NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative
> is per class tracking of cookies and that just duplicates a lot of
> stuff for no raisin (the 2nd copy lives in the rt-mutex PI code).
 s/raisen/reason

> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Signed-off-by: Vineeth Remanan Pillai 
> Signed-off-by: Julien Desfossez 
> ---
> 
> Changes in v3
> -
> - Refactored priority comparison code
> - Fixed a comparison logic issue in sched_core_find
>   - Aaron Lu
> 
> Changes in v2
> -
> - Improves the priority comparison logic between processes in
>   different cpus.
>   - Peter Zijlstra
>   - Aaron Lu
> 
> ---
>  include/linux/sched.h |   8 ++-
>  kernel/sched/core.c   | 146 ++
>  kernel/sched/fair.c   |  46 -
>  kernel/sched/sched.h  |  55 
>  4 files changed, 208 insertions(+), 47 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1549584a1538..a4b39a28236f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -636,10 +636,16 @@ struct task_struct {
>   const struct sched_class*sched_class;
>   struct sched_entity se;
>   struct sched_rt_entity  rt;
> + struct sched_dl_entity  dl;
> +
> +#ifdef CONFIG_SCHED_CORE
> + struct rb_node  core_node;
> + unsigned long   core_cookie;
> +#endif
> +
>  #ifdef CONFIG_CGROUP_SCHED
>   struct task_group   *sched_task_group;
>  #endif
> - struct sched_dl_entity  dl;
>  
>  #ifdef CONFIG_PREEMPT_NOTIFIERS
>   /* List of struct preempt_notifier: */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b1ce33f9b106..112d70f2b1e5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -64,6 +64,141 @@ int sysctl_sched_rt_runtime = 95;
>  
>  DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
>  
> +/* kernel prio, less is more */
> +static inline int __task_prio(struct task_struct *p)
> +{
> + if (p->sched_class == _sched_class) /* trumps deadline */
> + return -2;
> +
> + if (rt_prio(p->prio)) /* includes deadline */
> + return p->prio; /* [-1, 99] */
> +
> + if (p->sched_class == _sched_class)
> + return MAX_RT_PRIO + NICE_WIDTH; /* 140 */
> +
> + return MAX_RT_PRIO + MAX_NICE; /* 120, squash fair */
> +}
> +
> +/*
> + * l(a,b)
> + * le(a,b) := !l(b,a)
> + * g(a,b)  := l(b,a)
> + * ge(a,b) := !l(a,b)
why does this truth table comment exist?
maybe inline comments at the confusing inequalities would be better.
--mark




Re: [PATCH net v2 1/2] Revert "r8152: napi hangup fix after disconnect"

2019-08-26 Thread David Miller
From: Hayes Wang 
Date: Mon, 26 Aug 2019 09:43:32 +

> Jiri Slaby [mailto:jsl...@suse.cz]
>> Sent: Monday, August 26, 2019 4:55 PM
> [...]
>> Could you clarify *why* it conflicts? And how is the problem fixed by
>> 0ee1f473496 avoided now?
> 
> In rtl8152_disconnect(), the flow would be as following.
> 
> static void rtl8152_disconnect(struct usb_interface *intf)
> {
>   ...
>   - netif_napi_del(>napi);
>   - unregister_netdev(tp->netdev);
>  - rtl8152_close
> - napi_disable
> 
> Therefore you add a checking of RTL8152_UNPLUG to avoid
> calling napi_disable() after netif_napi_del(). However,
> after commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG
> only for real disconnection"), RTL8152_UNPLUG is not
> always set when calling rtl8152_disconnect(). That is,
> napi_disable() would be called after netif_napi_del(),
> if RTL8152_UNPLUG is not set.
> 
> The best way is to avoid calling netif_napi_del() before
> calling unregister_netdev(). And I has submitted such
> patch following this one.

These details belong in the commit message, always.


Re: [PATCH 04/15] drivers: thermal: tsens: Add debugfs support

2019-08-26 Thread Stephen Boyd
Quoting Amit Kucheria (2019-08-21 05:55:39)
> On Mon, Aug 19, 2019 at 7:57 PM Stephen Boyd  wrote:
> >
> > Quoting Amit Kucheria (2019-08-19 00:58:23)
> > > On Sat, Aug 17, 2019 at 9:37 AM Stephen Boyd  wrote:
> > > > > +
> > > > > +static void tsens_debug_init(struct platform_device *pdev)
> > > > > +{
> > > > > +   struct tsens_priv *priv = platform_get_drvdata(pdev);
> > > > > +   struct dentry *root, *file;
> > > > > +
> > > > > +   root = debugfs_lookup("tsens", NULL);
> > > >
> > > > Does this get created many times? Why doesn't tsens have a pointer to
> > > > the root saved away somewhere globally?
> > > >
> > >
> > > I guess we could call the statement below to create the root dir and
> > > save away the pointer. I was trying to avoid #ifdef CONFIG_DEBUG_FS in
> > > init_common() and instead have all of it in a single function that
> > > gets called once per instance of the tsens controller.
> >
> > Or call this code many times and try to create the tsens node if
> > !tsens_root exists where the variable is some global.
> 
> So I didn't quite understand this statement. The change you're
> requesting is that the 'root' variable below should be a global?
> 
> tsens_probe() will get called twice on platforms with two instances of
> the controller. So I will need to check some place if the 'tsens' root
> dir already exists in debugfs, no? That is what I'm doing below.
> 

Yeah. I was suggesting making a global instead of doing the lookup, but
I guess the lookup is fine and avoids a global variable. It's all
debugfs so it doesn't really matter. Sorry! Do whatever then.



Re: [PATCH] x86/mm: Do not split_large_page() for set_kernel_text_rw()

2019-08-26 Thread Song Liu



> On Aug 26, 2019, at 8:08 AM, Song Liu  wrote:
> 
> 
> 
>> On Aug 26, 2019, at 2:23 AM, Peter Zijlstra  wrote:
>> 
>> So only the high mapping is ever executable; the identity map should not
>> be. Both should be RO.
>> 
>>> kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
>>> mapping. 
>> 
>> Please provide more information; kprobes shouldn't be touching either
>> mapping. That is, afaict kprobes uses text_poke() which uses a temporary
>> mapping (in 'userspace' even) to alias the high text mapping.
> 
> kprobe without CONFIG_KPROBES_ON_FTRACE uses text_poke(). But kprobe with
> CONFIG_KPROBES_ON_FTRACE uses another path. The split happens with
> set_kernel_text_rw() -> ... -> __change_page_attr() -> split_large_page().
> The split is introduced by commit 585948f4f695. do_split in 
> __change_page_attr() becomes true after commit 585948f4f695. This patch 
> tries to fix/workaround this part. 
> 
>> 
>> I'm also not sure how it would then result in any 4k text maps. Yes the
>> alias is 4k, but it should not affect the actual high text map in any
>> way.
> 
> I am confused by the alias logic. set_kernel_text_rw() makes the high map
> rw, and split the PMD in the high map. 
> 
>> 
>> kprobes also allocates executable slots, but it does that in the module
>> range (afaict), so that, again, should not affect the high text mapping.
>> 
>>> We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
>>> CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
>>> kernel text mapping into pte-mapped pages. This increases iTLB 
>>> miss rate from about 300 per million instructions to about 700 per
>>> million instructions (for the application I test with). 
>>> 
>>> Per bisect, we found this behavior happens after commit 585948f4f695 
>>> ("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
>>> proposed this PATCH to fix/workaround this issue. However, per
>>> Peter's comment and my study of the code, this doesn't seem the 
>>> real problem or the only here. 
>>> 
>>> I also tested that the PMD split issue doesn't happen w/o 
>>> CONFIG_KPROBES_ON_FTRACE. 
>> 
>> Right, because then ftrace doesn't flip the whole kernel map writable;
>> which it _really_ should stop doing anyway.
>> 
>> But I'm still wondering what causes that first 4k split...
> 
> Please see above. 

Another data point: we can repro the issue on Linus's master with just
ftrace:

# start with PMD mapped
root@virt-test:~# grep 8100- /sys/kernel/debug/page_tables/kernel
0x8100-0x81c0  12M ro PSE x 
 pmd

# enable single ftrace
root@virt-test:~# echo consume_skb > /sys/kernel/debug/tracing/set_ftrace_filter
root@virt-test:~# echo function > /sys/kernel/debug/tracing/current_tracer

# now the text is PTE mapped
root@virt-test:~# grep 8100- /sys/kernel/debug/page_tables/kernel
0x8100-0x81c0  12M ro x 
 pte

Song



Re: [PATCH v2 12/15] kvm: i8254: Check LAPIC EOI pending when injecting irq on SVM AVIC

2019-08-26 Thread Suthikulpanit, Suravee
Alex,

On 8/19/2019 5:42 AM, Alexander Graf wrote:
> 
> 
> On 15.08.19 18:25, Suthikulpanit, Suravee wrote:
>> ACK notifiers don't work with AMD SVM w/ AVIC when the PIT interrupt
>> is delivered as edge-triggered fixed interrupt since AMD processors
>> cannot exit on EOI for these interrupts.
>>
>> Add code to check LAPIC pending EOI before injecting any pending PIT
>> interrupt on AMD SVM when AVIC is activated.
>>
>> Signed-off-by: Suravee Suthikulpanit 
>> ---
>>   arch/x86/kvm/i8254.c | 31 +--
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>> index 4a6dc54..31c4a9b 100644
>> --- a/arch/x86/kvm/i8254.c
>> +++ b/arch/x86/kvm/i8254.c
>> @@ -34,10 +34,12 @@
>>   #include 
>>   #include 
>> +#include 
>>   #include "ioapic.h"
>>   #include "irq.h"
>>   #include "i8254.h"
>> +#include "lapic.h"
>>   #include "x86.h"
>>   #ifndef CONFIG_X86_64
>> @@ -236,6 +238,12 @@ static void destroy_pit_timer(struct kvm_pit *pit)
>>   kthread_flush_work(>expired);
>>   }
>> +static inline void kvm_pit_reset_reinject(struct kvm_pit *pit)
>> +{
>> +atomic_set(>pit_state.pending, 0);
>> +atomic_set(>pit_state.irq_ack, 1);
>> +}
>> +
>>   static void pit_do_work(struct kthread_work *work)
>>   {
>>   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
>> @@ -244,6 +252,23 @@ static void pit_do_work(struct kthread_work *work)
>>   int i;
>>   struct kvm_kpit_state *ps = >pit_state;
>> +/*
>> + * Since, AMD SVM AVIC accelerates write access to APIC EOI
>> + * register for edge-trigger interrupts. PIT will not be able
>> + * to receive the IRQ ACK notifier and will always be zero.
>> + * Therefore, we check if any LAPIC EOI pending for vector 0
>> + * and reset irq_ack if no pending.
>> + */
>> +if (cpu_has_svm(NULL) && kvm->arch.apicv_state == APICV_ACTIVATED) {
>> +int eoi = 0;
>> +
>> +kvm_for_each_vcpu(i, vcpu, kvm)
>> +if (kvm_apic_pending_eoi(vcpu, 0))
>> +eoi++;
>> +if (!eoi)
>> +kvm_pit_reset_reinject(pit);
> 
> In which case would eoi be != 0 when APIC-V is active?

That would be the case when guest has not processed and/or still processing the 
interrupt.
Once the guest writes to APIC EOI register for edge-triggered interrupt for 
vector 0,
and the AVIC hardware accelerated the access by clearing the highest priority 
ISR bit,
then the eoi should be zero.

Suravee


[PATCH 12/20] clocksource/drivers/timer-of: Do not warn on deferred probe

2019-08-26 Thread Daniel Lezcano
From: Jon Hunter 

Deferred probe is an expected return value for clk_get() on many
platforms. The driver deals with it properly, so there's no need
to output a warning that may potentially confuse users.

Signed-off-by: Jon Hunter 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/timer-of.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/timer-of.c b/drivers/clocksource/timer-of.c
index 80542289fae7..d8c2bd4391d0 100644
--- a/drivers/clocksource/timer-of.c
+++ b/drivers/clocksource/timer-of.c
@@ -113,8 +113,10 @@ static __init int timer_of_clk_init(struct device_node *np,
of_clk->clk = of_clk->name ? of_clk_get_by_name(np, of_clk->name) :
of_clk_get(np, of_clk->index);
if (IS_ERR(of_clk->clk)) {
-   pr_err("Failed to get clock for %pOF\n", np);
-   return PTR_ERR(of_clk->clk);
+   ret = PTR_ERR(of_clk->clk);
+   if (ret != -EPROBE_DEFER)
+   pr_err("Failed to get clock for %pOF\n", np);
+   goto out;
}
 
ret = clk_prepare_enable(of_clk->clk);
-- 
2.17.1



[PATCH 10/20] clocksource/drivers/renesas-ostm: Use DIV_ROUND_CLOSEST() helper

2019-08-26 Thread Daniel Lezcano
From: Geert Uytterhoeven 

Use the DIV_ROUND_CLOSEST() helper instead of open-coding the same
operation.

Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Simon Horman 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/renesas-ostm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/renesas-ostm.c 
b/drivers/clocksource/renesas-ostm.c
index 61d5f3b539ce..37c39b901bb1 100644
--- a/drivers/clocksource/renesas-ostm.c
+++ b/drivers/clocksource/renesas-ostm.c
@@ -221,7 +221,7 @@ static int __init ostm_init(struct device_node *np)
}
 
rate = clk_get_rate(ostm_clk);
-   ostm->ticks_per_jiffy = (rate + HZ / 2) / HZ;
+   ostm->ticks_per_jiffy = DIV_ROUND_CLOSEST(rate, HZ);
 
/*
 * First probed device will be used as system clocksource. Any
-- 
2.17.1



[PATCH 09/20] arm64: dts: imx8mq: Add system counter node

2019-08-26 Thread Daniel Lezcano
From: Anson Huang 

Add i.MX8MQ system counter node to enable timer-imx-sysctr
broadcast timer driver.

Signed-off-by: Anson Huang 
Signed-off-by: Daniel Lezcano 
---
 arch/arm64/boot/dts/freescale/imx8mq.dtsi | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
index d09b808eff87..b4529773af51 100644
--- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
@@ -635,6 +635,14 @@
#pwm-cells = <2>;
status = "disabled";
};
+
+   system_counter: timer@306a {
+   compatible = "nxp,sysctr-timer";
+   reg = <0x306a 0x2>;
+   interrupts = ;
+   clocks = <_25m>;
+   clock-names = "per";
+   };
};
 
bus@3080 { /* AIPS3 */
-- 
2.17.1



[PATCH 04/20] clocksource: sun4i: Add missing compatibles

2019-08-26 Thread Daniel Lezcano
From: Maxime Ripard 

Newer Allwinner SoCs have different number of interrupts, let's add
different compatibles for all of them to deal with this properly.

Signed-off-by: Maxime Ripard 
Acked-by: Daniel Lezcano 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/timer-sun4i.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/clocksource/timer-sun4i.c 
b/drivers/clocksource/timer-sun4i.c
index 65f38f6ca714..0ba8155b8287 100644
--- a/drivers/clocksource/timer-sun4i.c
+++ b/drivers/clocksource/timer-sun4i.c
@@ -219,5 +219,9 @@ static int __init sun4i_timer_init(struct device_node *node)
 }
 TIMER_OF_DECLARE(sun4i, "allwinner,sun4i-a10-timer",
   sun4i_timer_init);
+TIMER_OF_DECLARE(sun8i_a23, "allwinner,sun8i-a23-timer",
+sun4i_timer_init);
+TIMER_OF_DECLARE(sun8i_v3s, "allwinner,sun8i-v3s-timer",
+sun4i_timer_init);
 TIMER_OF_DECLARE(suniv, "allwinner,suniv-f1c100s-timer",
   sun4i_timer_init);
-- 
2.17.1



[PATCH 11/20] clocksource/drivers/npcm: Fix GENMASK and timer operation

2019-08-26 Thread Daniel Lezcano
From: Avi Fishman 

NPCM7XX_Tx_OPER GENMASK bits are wrong, fix them.

Hopefully the NPCM7XX_REG_TICR0 register reset value of those bits was 0,
so it did not cause an issue.

The function npcm7xx_timer_oneshot() reads the register
NPCM7XX_REG_TCSR0, modifies it and then reads it again overwriting the
previous changes. Remove the extra read which is pointless.

The function npcm7xx_timer_periodic() is correct but the code writes
to the NPCM7XX_REG_TICR0 register while it is dealing with the
NPCM7XX_REG_TCSR0 register, that is confusing. Separate the write to
the registers in the code for the sake of clarity.

Fixes: 1c00289ecd12 ("clocksource/drivers/npcm: Add NPCM7xx timer driver")
Signed-off-by: Avi Fishman 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/timer-npcm7xx.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/timer-npcm7xx.c 
b/drivers/clocksource/timer-npcm7xx.c
index 8a30da7f083b..9780ffd8010e 100644
--- a/drivers/clocksource/timer-npcm7xx.c
+++ b/drivers/clocksource/timer-npcm7xx.c
@@ -32,7 +32,7 @@
 #define NPCM7XX_Tx_INTEN   BIT(29)
 #define NPCM7XX_Tx_COUNTEN BIT(30)
 #define NPCM7XX_Tx_ONESHOT 0x0
-#define NPCM7XX_Tx_OPERGENMASK(27, 3)
+#define NPCM7XX_Tx_OPERGENMASK(28, 27)
 #define NPCM7XX_Tx_MIN_PRESCALE0x1
 #define NPCM7XX_Tx_TDR_MASK_BITS   24
 #define NPCM7XX_Tx_MAX_CNT 0xFF
@@ -84,8 +84,6 @@ static int npcm7xx_timer_oneshot(struct clock_event_device 
*evt)
 
val = readl(timer_of_base(to) + NPCM7XX_REG_TCSR0);
val &= ~NPCM7XX_Tx_OPER;
-
-   val = readl(timer_of_base(to) + NPCM7XX_REG_TCSR0);
val |= NPCM7XX_START_ONESHOT_Tx;
writel(val, timer_of_base(to) + NPCM7XX_REG_TCSR0);
 
@@ -97,12 +95,11 @@ static int npcm7xx_timer_periodic(struct clock_event_device 
*evt)
struct timer_of *to = to_timer_of(evt);
u32 val;
 
+   writel(timer_of_period(to), timer_of_base(to) + NPCM7XX_REG_TICR0);
+
val = readl(timer_of_base(to) + NPCM7XX_REG_TCSR0);
val &= ~NPCM7XX_Tx_OPER;
-
-   writel(timer_of_period(to), timer_of_base(to) + NPCM7XX_REG_TICR0);
val |= NPCM7XX_START_PERIODIC_Tx;
-
writel(val, timer_of_base(to) + NPCM7XX_REG_TCSR0);
 
return 0;
-- 
2.17.1



[PATCH 14/20] dt-bindings: timer: renesas, cmt: Add CMT0234 to sh73a0 and r8a7740

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

Document the on-chip CMT devices included in r8a7740 and sh73a0.

Included in this patch is DT binding documentation for 32-bit CMTs
CMT0, CMT2, CMT3 and CMT4. They all contain a single channel and are
quite similar however some minor differences still exist:
 - "Counter input clock" (clock input and on-device divider)
One example is that RCLK 1/1 is supported by CMT2, CMT3 and CMT4.
 - "Wakeup request" (supported by CMT0 and CMT2)

Because of this one unique compat string per CMT device is selected.

Signed-off-by: Magnus Damm 
Reviewed-by: Rob Herring 
Reviewed-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
Signed-off-by: Daniel Lezcano 
---
 Documentation/devicetree/bindings/timer/renesas,cmt.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt 
b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index c5220bcd852b..45840d475050 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -22,6 +22,10 @@ Required Properties:
 
 - "renesas,r8a73a4-cmt0" for the 32-bit CMT0 device included in r8a73a4.
 - "renesas,r8a73a4-cmt1" for the 48-bit CMT1 device included in r8a73a4.
+- "renesas,r8a7740-cmt0" for the 32-bit CMT0 device included in r8a7740.
+- "renesas,r8a7740-cmt2" for the 32-bit CMT2 device included in r8a7740.
+- "renesas,r8a7740-cmt3" for the 32-bit CMT3 device included in r8a7740.
+- "renesas,r8a7740-cmt4" for the 32-bit CMT4 device included in r8a7740.
 - "renesas,r8a7743-cmt0" for the 32-bit CMT0 device included in r8a7743.
 - "renesas,r8a7743-cmt1" for the 48-bit CMT1 device included in r8a7743.
 - "renesas,r8a7744-cmt0" for the 32-bit CMT0 device included in r8a7744.
@@ -54,6 +58,10 @@ Required Properties:
 - "renesas,r8a77980-cmt1" for the 48-bit CMT1 device included in r8a77980.
 - "renesas,r8a77990-cmt0" for the 32-bit CMT0 device included in r8a77990.
 - "renesas,r8a77990-cmt1" for the 48-bit CMT1 device included in r8a77990.
+- "renesas,sh73a0-cmt0" for the 32-bit CMT0 device included in sh73a0.
+- "renesas,sh73a0-cmt2" for the 32-bit CMT2 device included in sh73a0.
+- "renesas,sh73a0-cmt3" for the 32-bit CMT3 device included in sh73a0.
+- "renesas,sh73a0-cmt4" for the 32-bit CMT4 device included in sh73a0.
 
 - "renesas,rcar-gen2-cmt0" for 32-bit CMT0 devices included in R-Car Gen2
and RZ/G1.
-- 
2.17.1



[PATCH 19/20] clocksource/drivers/sh_cmt: r8a7740 and sh73a0 SoC-specific match

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

Add SoC-specific matching for CMT1 on r8a7740 and sh73a0.

This allows us to move away from the old DT bindings such as
 - "renesas,cmt-48-sh73a0"
 - "renesas,cmt-48-r8a7740"
 - "renesas,cmt-48"
in favour for the now commonly used format "renesas,-"

Signed-off-by: Magnus Damm 
Reviewed-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/sh_cmt.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index f6424b61e212..abf5e7873a18 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -924,6 +924,14 @@ static const struct of_device_id sh_cmt_of_table[] 
__maybe_unused = {
.compatible = "renesas,cmt-48-gen2",
.data = _cmt_info[SH_CMT0_RCAR_GEN2]
},
+   {
+   .compatible = "renesas,r8a7740-cmt1",
+   .data = _cmt_info[SH_CMT_48BIT]
+   },
+   {
+   .compatible = "renesas,sh73a0-cmt1",
+   .data = _cmt_info[SH_CMT_48BIT]
+   },
{
.compatible = "renesas,rcar-gen2-cmt0",
.data = _cmt_info[SH_CMT0_RCAR_GEN2]
-- 
2.17.1



[PATCH 20/20] clocksource/drivers/sh_cmt: Document "cmt-48" as deprecated

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

Update the CMT driver to mark "renesas,cmt-48" as deprecated.

Instead of documenting a theoretical hardware device based on current software
support level, define DT bindings top-down based on available data sheet
information and make use of part numbers in the DT compat string.

In case of the only in-tree users r8a7740 and sh73a0 the compat strings
"renesas,r8a7740-cmt1" and "renesas,sh73a0-cmt1" may be used instead.

Signed-off-by: Magnus Damm 
Reviewed-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/sh_cmt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index abf5e7873a18..ef773db080e9 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -918,7 +918,11 @@ static const struct platform_device_id sh_cmt_id_table[] = 
{
 MODULE_DEVICE_TABLE(platform, sh_cmt_id_table);
 
 static const struct of_device_id sh_cmt_of_table[] __maybe_unused = {
-   { .compatible = "renesas,cmt-48", .data = _cmt_info[SH_CMT_48BIT] },
+   {
+   /* deprecated, preserved for backward compatibility */
+   .compatible = "renesas,cmt-48",
+   .data = _cmt_info[SH_CMT_48BIT]
+   },
{
/* deprecated, preserved for backward compatibility */
.compatible = "renesas,cmt-48-gen2",
-- 
2.17.1



[PATCH 18/20] dt-bindings: timer: renesas, cmt: Update R-Car Gen3 CMT1 usage

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

The R-Car Gen3 SoCs so far come with a total for 4 on-chip CMT devices:
 - CMT0
 - CMT1
 - CMT2
 - CMT3

CMT0 includes two rather basic 32-bit timer channels. The rest of the on-chip
CMT devices support 48-bit counters and have 8 channels each.

Based on the data sheet information "CMT2/3 are exactly same as CMT1"
it seems that CMT2 and CMT3 now use the CMT1 compat string in the DTSI.

Clarify this in the DT binding documentation by describing R-Car Gen3 and
RZ/G2 CMT1 as "48-bit CMT devices".

Signed-off-by: Magnus Damm 
Reviewed-by: Geert Uytterhoeven 
Reviewed-by: Rob Herring 
Reviewed-by: Simon Horman 
Signed-off-by: Daniel Lezcano 
---
 .../devicetree/bindings/timer/renesas,cmt.txt | 20 +--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt 
b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index c7fdcb02e083..a444cfc5852a 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -28,9 +28,9 @@ Required Properties:
 - "renesas,r8a77470-cmt0" for the 32-bit CMT0 device included in r8a77470.
 - "renesas,r8a77470-cmt1" for the 48-bit CMT1 device included in r8a77470.
 - "renesas,r8a774a1-cmt0" for the 32-bit CMT0 device included in r8a774a1.
-- "renesas,r8a774a1-cmt1" for the 48-bit CMT1 device included in r8a774a1.
+- "renesas,r8a774a1-cmt1" for the 48-bit CMT devices included in r8a774a1.
 - "renesas,r8a774c0-cmt0" for the 32-bit CMT0 device included in r8a774c0.
-- "renesas,r8a774c0-cmt1" for the 48-bit CMT1 device included in r8a774c0.
+- "renesas,r8a774c0-cmt1" for the 48-bit CMT devices included in r8a774c0.
 - "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790.
 - "renesas,r8a7790-cmt1" for the 48-bit CMT1 device included in r8a7790.
 - "renesas,r8a7791-cmt0" for the 32-bit CMT0 device included in r8a7791.
@@ -42,19 +42,19 @@ Required Properties:
 - "renesas,r8a7794-cmt0" for the 32-bit CMT0 device included in r8a7794.
 - "renesas,r8a7794-cmt1" for the 48-bit CMT1 device included in r8a7794.
 - "renesas,r8a7795-cmt0" for the 32-bit CMT0 device included in r8a7795.
-- "renesas,r8a7795-cmt1" for the 48-bit CMT1 device included in r8a7795.
+- "renesas,r8a7795-cmt1" for the 48-bit CMT devices included in r8a7795.
 - "renesas,r8a7796-cmt0" for the 32-bit CMT0 device included in r8a7796.
-- "renesas,r8a7796-cmt1" for the 48-bit CMT1 device included in r8a7796.
+- "renesas,r8a7796-cmt1" for the 48-bit CMT devices included in r8a7796.
 - "renesas,r8a77965-cmt0" for the 32-bit CMT0 device included in r8a77965.
-- "renesas,r8a77965-cmt1" for the 48-bit CMT1 device included in r8a77965.
+- "renesas,r8a77965-cmt1" for the 48-bit CMT devices included in r8a77965.
 - "renesas,r8a77970-cmt0" for the 32-bit CMT0 device included in r8a77970.
-- "renesas,r8a77970-cmt1" for the 48-bit CMT1 device included in r8a77970.
+- "renesas,r8a77970-cmt1" for the 48-bit CMT devices included in r8a77970.
 - "renesas,r8a77980-cmt0" for the 32-bit CMT0 device included in r8a77980.
-- "renesas,r8a77980-cmt1" for the 48-bit CMT1 device included in r8a77980.
+- "renesas,r8a77980-cmt1" for the 48-bit CMT devices included in r8a77980.
 - "renesas,r8a77990-cmt0" for the 32-bit CMT0 device included in r8a77990.
-- "renesas,r8a77990-cmt1" for the 48-bit CMT1 device included in r8a77990.
+- "renesas,r8a77990-cmt1" for the 48-bit CMT devices included in r8a77990.
 - "renesas,r8a77995-cmt0" for the 32-bit CMT0 device included in r8a77995.
-- "renesas,r8a77995-cmt1" for the 48-bit CMT1 device included in r8a77995.
+- "renesas,r8a77995-cmt1" for the 48-bit CMT devices included in r8a77995.
 - "renesas,sh73a0-cmt0" for the 32-bit CMT0 device included in sh73a0.
 - "renesas,sh73a0-cmt1" for the 48-bit CMT1 device included in sh73a0.
 - "renesas,sh73a0-cmt2" for the 32-bit CMT2 device included in sh73a0.
@@ -69,7 +69,7 @@ Required Properties:
listed above.
 - "renesas,rcar-gen3-cmt0" for 32-bit CMT0 devices included in R-Car Gen3
and RZ/G2.
-- "renesas,rcar-gen3-cmt1" for 48-bit CMT1 devices included in R-Car Gen3
+- "renesas,rcar-gen3-cmt1" for 48-bit CMT devices included in R-Car Gen3
and RZ/G2.
These are fallbacks for R-Car Gen3 and RZ/G2 entries listed
above.
-- 
2.17.1



[PATCH 16/20] dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a7792

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

This patch adds DT binding documentation for the CMT devices on
the R-Car Gen2 V2H (r8a7792) SoC.

Signed-off-by: Magnus Damm 
Reviewed-by: Geert Uytterhoeven 
Reviewed-by: Rob Herring 
Reviewed-by: Simon Horman 
Signed-off-by: Daniel Lezcano 
---
 Documentation/devicetree/bindings/timer/renesas,cmt.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt 
b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index a297fca5b61e..5b7690ae8b9d 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -35,6 +35,8 @@ Required Properties:
 - "renesas,r8a7790-cmt1" for the 48-bit CMT1 device included in r8a7790.
 - "renesas,r8a7791-cmt0" for the 32-bit CMT0 device included in r8a7791.
 - "renesas,r8a7791-cmt1" for the 48-bit CMT1 device included in r8a7791.
+- "renesas,r8a7792-cmt0" for the 32-bit CMT0 device included in r8a7792.
+- "renesas,r8a7792-cmt1" for the 48-bit CMT1 device included in r8a7792.
 - "renesas,r8a7793-cmt0" for the 32-bit CMT0 device included in r8a7793.
 - "renesas,r8a7793-cmt1" for the 48-bit CMT1 device included in r8a7793.
 - "renesas,r8a7794-cmt0" for the 32-bit CMT0 device included in r8a7794.
-- 
2.17.1



[PATCH 17/20] dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a77995

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

This patch adds DT binding documentation for the CMT devices on
the R-Car Gen3 D3 (r8a77995) SoC.

Signed-off-by: Magnus Damm 
Reviewed-by: Geert Uytterhoeven 
Reviewed-by: Rob Herring 
Reviewed-by: Simon Horman 
Signed-off-by: Daniel Lezcano 
---
 Documentation/devicetree/bindings/timer/renesas,cmt.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt 
b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index 5b7690ae8b9d..c7fdcb02e083 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -53,6 +53,8 @@ Required Properties:
 - "renesas,r8a77980-cmt1" for the 48-bit CMT1 device included in r8a77980.
 - "renesas,r8a77990-cmt0" for the 32-bit CMT0 device included in r8a77990.
 - "renesas,r8a77990-cmt1" for the 48-bit CMT1 device included in r8a77990.
+- "renesas,r8a77995-cmt0" for the 32-bit CMT0 device included in r8a77995.
+- "renesas,r8a77995-cmt1" for the 48-bit CMT1 device included in r8a77995.
 - "renesas,sh73a0-cmt0" for the 32-bit CMT0 device included in sh73a0.
 - "renesas,sh73a0-cmt1" for the 48-bit CMT1 device included in sh73a0.
 - "renesas,sh73a0-cmt2" for the 32-bit CMT2 device included in sh73a0.
-- 
2.17.1



[PATCH 13/20] clocksource/drivers: Do not warn on probe defer

2019-08-26 Thread Daniel Lezcano
From: Jon Hunter 

Deferred probe is an expected return value on many platforms and so
there's no need to output a warning that may potentially confuse users.

Signed-off-by: Jon Hunter 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/timer-probe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/timer-probe.c 
b/drivers/clocksource/timer-probe.c
index dda1946e84dd..ee9574da53c0 100644
--- a/drivers/clocksource/timer-probe.c
+++ b/drivers/clocksource/timer-probe.c
@@ -29,7 +29,9 @@ void __init timer_probe(void)
 
ret = init_func_ret(np);
if (ret) {
-   pr_err("Failed to initialize '%pOF': %d\n", np, ret);
+   if (ret != -EPROBE_DEFER)
+   pr_err("Failed to initialize '%pOF': %d\n", np,
+  ret);
continue;
}
 
-- 
2.17.1



[PATCH 15/20] dt-bindings: timer: renesas, cmt: Update CMT1 on sh73a0 and r8a7740

2019-08-26 Thread Daniel Lezcano
From: Magnus Damm 

This patch reworks the DT binding documentation for the 6-channel
48-bit CMTs known as CMT1 on r8a7740 and sh73a0.

After the update the same style of DT binding as the rest of the upstream
SoCs will now also be used by r8a7740 and sh73a0. The DT binding "cmt-48"
is removed from the DT binding documentation, however software support for
this deprecated binding will still remain in the CMT driver for some time.

Signed-off-by: Magnus Damm 
Reviewed-by: Rob Herring 
Reviewed-by: Simon Horman 
Reviewed-by: Geert Uytterhoeven 
Signed-off-by: Daniel Lezcano 
---
 .../devicetree/bindings/timer/renesas,cmt.txt  | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt 
b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
index 45840d475050..a297fca5b61e 100644
--- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt
+++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt
@@ -12,17 +12,10 @@ datasheets.
 Required Properties:
 
   - compatible: must contain one or more of the following:
-- "renesas,cmt-48-sh73a0" for the sh73A0 48-bit CMT
-   (CMT1)
-- "renesas,cmt-48-r8a7740" for the r8a7740 48-bit CMT
-   (CMT1)
-- "renesas,cmt-48" for all non-second generation 48-bit CMT
-   (CMT1 on sh73a0 and r8a7740)
-   This is a fallback for the above renesas,cmt-48-* entries.
-
 - "renesas,r8a73a4-cmt0" for the 32-bit CMT0 device included in r8a73a4.
 - "renesas,r8a73a4-cmt1" for the 48-bit CMT1 device included in r8a73a4.
 - "renesas,r8a7740-cmt0" for the 32-bit CMT0 device included in r8a7740.
+- "renesas,r8a7740-cmt1" for the 48-bit CMT1 device included in r8a7740.
 - "renesas,r8a7740-cmt2" for the 32-bit CMT2 device included in r8a7740.
 - "renesas,r8a7740-cmt3" for the 32-bit CMT3 device included in r8a7740.
 - "renesas,r8a7740-cmt4" for the 32-bit CMT4 device included in r8a7740.
@@ -59,6 +52,7 @@ Required Properties:
 - "renesas,r8a77990-cmt0" for the 32-bit CMT0 device included in r8a77990.
 - "renesas,r8a77990-cmt1" for the 48-bit CMT1 device included in r8a77990.
 - "renesas,sh73a0-cmt0" for the 32-bit CMT0 device included in sh73a0.
+- "renesas,sh73a0-cmt1" for the 48-bit CMT1 device included in sh73a0.
 - "renesas,sh73a0-cmt2" for the 32-bit CMT2 device included in sh73a0.
 - "renesas,sh73a0-cmt3" for the 32-bit CMT3 device included in sh73a0.
 - "renesas,sh73a0-cmt4" for the 32-bit CMT4 device included in sh73a0.
-- 
2.17.1



[PATCH 06/20] clocksource/drivers/tcb_clksrc: Register delay timer

2019-08-26 Thread Daniel Lezcano
From: Alexandre Belloni 

Implement and register delay timer to allow get_cycles() to work properly.

Signed-off-by: Alexandre Belloni 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/Kconfig   |  2 +-
 drivers/clocksource/timer-atmel-tcb.c | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 5e9317dc3d39..a642c23b2fba 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -429,7 +429,7 @@ config ATMEL_ST
 
 config ATMEL_TCB_CLKSRC
bool "Atmel TC Block timer driver" if COMPILE_TEST
-   depends on HAS_IOMEM
+   depends on ARM && HAS_IOMEM
select TIMER_OF if OF
help
  Support for Timer Counter Blocks on Atmel SoCs.
diff --git a/drivers/clocksource/timer-atmel-tcb.c 
b/drivers/clocksource/timer-atmel-tcb.c
index 6ed31f9def7e..7427b07495a8 100644
--- a/drivers/clocksource/timer-atmel-tcb.c
+++ b/drivers/clocksource/timer-atmel-tcb.c
@@ -6,6 +6,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,6 +126,18 @@ static u64 notrace tc_sched_clock_read32(void)
return tc_get_cycles32();
 }
 
+static struct delay_timer tc_delay_timer;
+
+static unsigned long tc_delay_timer_read(void)
+{
+   return tc_get_cycles();
+}
+
+static unsigned long notrace tc_delay_timer_read32(void)
+{
+   return tc_get_cycles32();
+}
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
 struct tc_clkevt_device {
@@ -432,6 +445,7 @@ static int __init tcb_clksrc_init(struct device_node *node)
/* setup ony channel 0 */
tcb_setup_single_chan(, best_divisor_idx);
tc_sched_clock = tc_sched_clock_read32;
+   tc_delay_timer.read_current_timer = tc_delay_timer_read32;
} else {
/* we have three clocks no matter what the
 * underlying platform supports.
@@ -444,6 +458,7 @@ static int __init tcb_clksrc_init(struct device_node *node)
/* setup both channel 0 & 1 */
tcb_setup_dual_chan(, best_divisor_idx);
tc_sched_clock = tc_sched_clock_read;
+   tc_delay_timer.read_current_timer = tc_delay_timer_read;
}
 
/* and away we go! */
@@ -458,6 +473,9 @@ static int __init tcb_clksrc_init(struct device_node *node)
 
sched_clock_register(tc_sched_clock, 32, divided_rate);
 
+   tc_delay_timer.freq = divided_rate;
+   register_current_timer_delay(_delay_timer);
+
return 0;
 
 err_unregister_clksrc:
-- 
2.17.1



[PATCH 05/20] dt-bindings: timer: Convert Allwinner A13 HSTimer to a schema

2019-08-26 Thread Daniel Lezcano
From: Maxime Ripard 

The newer Allwinner SoCs have a High Speed Timer supported in Linux, with a
matching Device Tree binding.

Now that we have the DT validation in place, let's convert the device tree
bindings for that controller over to a YAML schemas.

Signed-off-by: Maxime Ripard 
Reviewed-by: Rob Herring 
Signed-off-by: Daniel Lezcano 
---
 .../timer/allwinner,sun5i-a13-hstimer.txt | 26 --
 .../timer/allwinner,sun5i-a13-hstimer.yaml| 79 +++
 2 files changed, 79 insertions(+), 26 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt
 create mode 100644 
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml

diff --git 
a/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt 
b/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt
deleted file mode 100644
index 2c5c1be78360..
--- a/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-Allwinner SoCs High Speed Timer Controller
-
-Required properties:
-
-- compatible : should be "allwinner,sun5i-a13-hstimer" or
-   "allwinner,sun7i-a20-hstimer"
-- reg : Specifies base physical address and size of the registers.
-- interrupts : The interrupts of these timers (2 for the sun5i IP, 4 for the 
sun7i
-   one)
-- clocks: phandle to the source clock (usually the AHB clock)
-
-Optional properties:
-- resets: phandle to a reset controller asserting the timer
-
-Example:
-
-timer@1c6 {
-   compatible = "allwinner,sun7i-a20-hstimer";
-   reg = <0x01c6 0x1000>;
-   interrupts = <0 51 1>,
-<0 52 1>,
-<0 53 1>,
-<0 54 1>;
-   clocks = <_gates 19>;
-   resets = < 19>;
-};
diff --git 
a/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml 
b/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml
new file mode 100644
index ..dfa0c41fd261
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml
@@ -0,0 +1,79 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/allwinner,sun5i-a13-hstimer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Allwinner A13 High-Speed Timer Device Tree Bindings
+
+maintainers:
+  - Chen-Yu Tsai 
+  - Maxime Ripard 
+
+properties:
+  compatible:
+oneOf:
+  - const: allwinner,sun5i-a13-hstimer
+  - const: allwinner,sun7i-a20-hstimer
+  - items:
+  - const: allwinner,sun6i-a31-hstimer
+  - const: allwinner,sun7i-a20-hstimer
+
+  reg:
+maxItems: 1
+
+  interrupts:
+minItems: 2
+maxItems: 4
+items:
+  - description: Timer 0 Interrupt
+  - description: Timer 1 Interrupt
+  - description: Timer 2 Interrupt
+  - description: Timer 3 Interrupt
+
+  clocks:
+maxItems: 1
+
+  resets:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+
+if:
+  properties:
+compatible:
+  items:
+const: allwinner,sun5i-a13-hstimer
+
+then:
+  properties:
+interrupts:
+  minItems: 2
+  maxItems: 2
+
+else:
+  properties:
+interrupts:
+  minItems: 4
+  maxItems: 4
+
+additionalProperties: false
+
+examples:
+  - |
+timer@1c6 {
+compatible = "allwinner,sun7i-a20-hstimer";
+reg = <0x01c6 0x1000>;
+interrupts = <0 51 1>,
+ <0 52 1>,
+ <0 53 1>,
+ <0 54 1>;
+clocks = <_gates 19>;
+resets = < 19>;
+};
+
+...
-- 
2.17.1



[PATCH 08/20] arm64: dts: imx8mm: Add system counter node

2019-08-26 Thread Daniel Lezcano
From: Anson Huang 

Add i.MX8MM system counter node to enable timer-imx-sysctr
broadcast timer driver.

Signed-off-by: Anson Huang 
Signed-off-by: Daniel Lezcano 
---
 arch/arm64/boot/dts/freescale/imx8mm.dtsi | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
index 232a7412755a..89ef22a8f81e 100644
--- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
@@ -510,6 +510,14 @@
#pwm-cells = <2>;
status = "disabled";
};
+
+   system_counter: timer@306a {
+   compatible = "nxp,sysctr-timer";
+   reg = <0x306a 0x2>;
+   interrupts = ;
+   clocks = <_24m>;
+   clock-names = "per";
+   };
};
 
aips3: bus@3080 {
-- 
2.17.1



[PATCH 03/20] dt-bindings: timer: Add missing compatibles

2019-08-26 Thread Daniel Lezcano
From: Maxime Ripard 

Newer Allwinner SoCs have different number of interrupts, let's add
different compatibles for all of them to deal with this properly.

Signed-off-by: Maxime Ripard 
Reviewed-by: Rob Herring 
Signed-off-by: Daniel Lezcano 
---
 .../timer/allwinner,sun4i-a10-timer.yaml  | 26 +++
 1 file changed, 26 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml 
b/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
index 7292a424092c..20adc1c8e9cc 100644
--- a/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
+++ b/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
@@ -14,6 +14,8 @@ properties:
   compatible:
 enum:
   - allwinner,sun4i-a10-timer
+  - allwinner,sun8i-a23-timer
+  - allwinner,sun8i-v3s-timer
   - allwinner,suniv-f1c100s-timer
 
   reg:
@@ -39,6 +41,30 @@ allOf:
   minItems: 6
   maxItems: 6
 
+  - if:
+  properties:
+compatible:
+  items:
+const: allwinner,sun8i-a23-timer
+
+then:
+  properties:
+interrupts:
+  minItems: 2
+  maxItems: 2
+
+  - if:
+  properties:
+compatible:
+  items:
+const: allwinner,sun8i-v3s-timer
+
+then:
+  properties:
+interrupts:
+  minItems: 3
+  maxItems: 3
+
   - if:
   properties:
 compatible:
-- 
2.17.1



Re: [RFC PATCH 0/2] Add predictive memory reclamation and compaction

2019-08-26 Thread Bharath Vedartham
Hi Michal,

Here are some of my thoughts,
On Wed, Aug 21, 2019 at 04:06:32PM +0200, Michal Hocko wrote:
> On Thu 15-08-19 14:51:04, Khalid Aziz wrote:
> > Hi Michal,
> > 
> > The smarts for tuning these knobs can be implemented in userspace and
> > more knobs added to allow for what is missing today, but we get back to
> > the same issue as before. That does nothing to make kernel self-tuning
> > and adds possibly even more knobs to userspace. Something so fundamental
> > to kernel memory management as making free pages available when they are
> > needed really should be taken care of in the kernel itself. Moving it to
> > userspace just means the kernel is hobbled unless one installs and tunes
> > a userspace package correctly.
> 
> From my past experience the existing autotunig works mostly ok for a
> vast variety of workloads. A more clever tuning is possible and people
> are doing that already. Especially for cases when the machine is heavily
> overcommited. There are different ways to achieve that. Your new
> in-kernel auto tuning would have to be tested on a large variety of
> workloads to be proven and riskless. So I am quite skeptical to be
> honest.
Could you give some references to such works regarding tuning the kernel? 

Essentially, Our idea here is to foresee potential memory exhaustion.
This foreseeing is done by observing the workload, observing the memory
usage of the workload. Based on this observations, we make a prediction
whether or not memory exhaustion could occur. If memory exhaustion
occurs, we reclaim some more memory. kswapd stops reclaim when
hwmark is reached. hwmark is usually set to a fairly low percentage of
total memory, in my system for zone Normal hwmark is 13% of total pages.
So there is scope for reclaiming more pages to make sure system does not
suffer from a lack of pages. 

Since we are "predicting", there could be mistakes in our prediction.
The question is how bad are the mistakes? How much does a wrong
prediction cost? 

A right prediction would be a win. We rightfully predict that there could be
exhaustion, this would lead to us reclaiming more memory(than hwmark)/compacting
memory beforehand(unlike kcompactd which does it on demand).

A wrong prediction on the other hand can be categorized into 2
situations: 
(i) We foresee memory exhaustion but there is no memory exhaustion in
the future. In this case, we would be reclaiming more memory for not a lot
of use. This situation is not entirely bad but we definitly waste a few
clock cycles.
(ii) We don't foresee memory exhaustion but there is memory exhaustion
in the future. This is a bad case where we may end up going into direct
compaction/reclaim. But it could be the case that the memory exhaustion
is far in the future and even though we didnt see it, kswapd could have
reclaimed that memory or drop_cache occured.

How often we hit wrong predictions of type (ii) would really determine our
efficiency. 

Coming to your situation of provisioning vms. A situation where our work
will come to good is when there is a cloud burst. When the demand for
vms is super high, our algorithm could adapt to the increase in demand
for these vms and reclaim more memory/compact more memory to reduce
allocation stalls and improve performance.
> Therefore I would really focus on discussing whether we have sufficient
> APIs to tune the kernel to do the right thing when needed. That requires
> to identify gaps in that area. 
One thing that comes to my mind is based on the issue Khalid mentioned
earlier on how his desktop took more than 30secs to boot up because of
the caches using up a lot of memory.
Rather than allowing any unused memory to be the page cache, would it be
a good idea to fix a size for the caches and elastically change the size
based on the workload?

Thank you
Bharath

> -- 
> Michal Hocko
> SUSE Labs
> 


[PATCH 02/20] dt-bindings: timer: Convert Allwinner A10 Timer to a schema

2019-08-26 Thread Daniel Lezcano
From: Maxime Ripard 

The older Allwinner SoCs have a Timer supported in Linux, with a matching
Device Tree binding.

While the original binding only mentions one interrupt, the timer actually
has 6 of them.

Now that we have the DT validation in place, let's convert the device tree
bindings for that controller over to a YAML schemas.

Signed-off-by: Maxime Ripard 
Reviewed-by: Rob Herring 
Signed-off-by: Daniel Lezcano 
---
 .../timer/allwinner,sun4i-a10-timer.yaml  | 76 +++
 .../bindings/timer/allwinner,sun4i-timer.txt  | 19 -
 2 files changed, 76 insertions(+), 19 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
 delete mode 100644 
Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt

diff --git 
a/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml 
b/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
new file mode 100644
index ..7292a424092c
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
@@ -0,0 +1,76 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/timer/allwinner,sun4i-a10-timer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Allwinner A10 Timer Device Tree Bindings
+
+maintainers:
+  - Chen-Yu Tsai 
+  - Maxime Ripard 
+
+properties:
+  compatible:
+enum:
+  - allwinner,sun4i-a10-timer
+  - allwinner,suniv-f1c100s-timer
+
+  reg:
+maxItems: 1
+
+  interrupts:
+description:
+  List of timers interrupts
+
+  clocks:
+maxItems: 1
+
+allOf:
+  - if:
+  properties:
+compatible:
+  items:
+const: allwinner,sun4i-a10-timer
+
+then:
+  properties:
+interrupts:
+  minItems: 6
+  maxItems: 6
+
+  - if:
+  properties:
+compatible:
+  items:
+const: allwinner,suniv-f1c100s-timer
+
+then:
+  properties:
+interrupts:
+  minItems: 3
+  maxItems: 3
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+
+additionalProperties: false
+
+examples:
+  - |
+timer {
+compatible = "allwinner,sun4i-a10-timer";
+reg = <0x01c20c00 0x400>;
+interrupts = <22>,
+ <23>,
+ <24>,
+ <25>,
+ <67>,
+ <68>;
+clocks = <>;
+};
+
+...
diff --git a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt 
b/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt
deleted file mode 100644
index 3da9d515c03a..
--- a/Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-Allwinner A1X SoCs Timer Controller
-
-Required properties:
-
-- compatible : should be one of the following:
-  "allwinner,sun4i-a10-timer"
-  "allwinner,suniv-f1c100s-timer"
-- reg : Specifies base physical address and size of the registers.
-- interrupts : The interrupt of the first timer
-- clocks: phandle to the source clock (usually a 24 MHz fixed clock)
-
-Example:
-
-timer {
-   compatible = "allwinner,sun4i-a10-timer";
-   reg = <0x01c20c00 0x400>;
-   interrupts = <22>;
-   clocks = <>;
-};
-- 
2.17.1



[PATCH 07/20] clocksource/drivers/imx-sysctr: Add internal clock divider handle

2019-08-26 Thread Daniel Lezcano
From: Anson Huang 

The system counter block guide states that the base clock is
internally divided by 3 before use, that means the clock input of
system counter defined in DT should be base clock which is normally
from OSC, and then internally divided by 3 before use.

Signed-off-by: Anson Huang 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/timer-imx-sysctr.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/clocksource/timer-imx-sysctr.c 
b/drivers/clocksource/timer-imx-sysctr.c
index fd7d68066efb..b7c80a368a1b 100644
--- a/drivers/clocksource/timer-imx-sysctr.c
+++ b/drivers/clocksource/timer-imx-sysctr.c
@@ -20,6 +20,8 @@
 #define SYS_CTR_EN 0x1
 #define SYS_CTR_IRQ_MASK   0x2
 
+#define SYS_CTR_CLK_DIV0x3
+
 static void __iomem *sys_ctr_base;
 static u32 cmpcr;
 
@@ -134,6 +136,9 @@ static int __init sysctr_timer_init(struct device_node *np)
if (ret)
return ret;
 
+   /* system counter clock is divided by 3 internally */
+   to_sysctr.of_clk.rate /= SYS_CTR_CLK_DIV;
+
sys_ctr_base = timer_of_base(_sysctr);
cmpcr = readl(sys_ctr_base + CMPCR);
cmpcr &= ~SYS_CTR_EN;
-- 
2.17.1



[PATCH 01/20] clocksource: Remove dev_err() usage after platform_get_irq()

2019-08-26 Thread Daniel Lezcano
From: Stephen Boyd 

We don't need dev_err() messages when platform_get_irq() fails now that
platform_get_irq() prints an error message itself when something goes
wrong. Let's remove these prints with a simple semantic patch.

// 
@@
expression ret;
struct platform_device *E;
@@

ret =
(
platform_get_irq(E, ...)
|
platform_get_irq_byname(E, ...)
);

if ( \( ret < 0 \| ret <= 0 \) )
{
(
-if (ret != -EPROBE_DEFER)
-{ ...
-dev_err(...);
-... }
|
...
-dev_err(...);
)
...
}
// 

While we're here, remove braces on if statements that only have one
statement (manually).

Cc: Greg Kroah-Hartman 
Cc: Daniel Lezcano 
Cc: Thomas Gleixner 
Signed-off-by: Stephen Boyd 
Reviewed-by: Geert Uytterhoeven 
Signed-off-by: Daniel Lezcano 
---
 drivers/clocksource/em_sti.c | 4 +---
 drivers/clocksource/sh_cmt.c | 5 +
 drivers/clocksource/sh_tmu.c | 5 +
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/clocksource/em_sti.c b/drivers/clocksource/em_sti.c
index 8e12b11e81b0..9039df4f90e2 100644
--- a/drivers/clocksource/em_sti.c
+++ b/drivers/clocksource/em_sti.c
@@ -291,10 +291,8 @@ static int em_sti_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, p);
 
irq = platform_get_irq(pdev, 0);
-   if (irq < 0) {
-   dev_err(>dev, "failed to get irq\n");
+   if (irq < 0)
return irq;
-   }
 
/* map memory, let base point to the STI instance */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/clocksource/sh_cmt.c b/drivers/clocksource/sh_cmt.c
index 55d3e03f2cd4..f6424b61e212 100644
--- a/drivers/clocksource/sh_cmt.c
+++ b/drivers/clocksource/sh_cmt.c
@@ -776,11 +776,8 @@ static int sh_cmt_register_clockevent(struct 
sh_cmt_channel *ch,
int ret;
 
irq = platform_get_irq(ch->cmt->pdev, ch->index);
-   if (irq < 0) {
-   dev_err(>cmt->pdev->dev, "ch%u: failed to get irq\n",
-   ch->index);
+   if (irq < 0)
return irq;
-   }
 
ret = request_irq(irq, sh_cmt_interrupt,
  IRQF_TIMER | IRQF_IRQPOLL | IRQF_NOBALANCING,
diff --git a/drivers/clocksource/sh_tmu.c b/drivers/clocksource/sh_tmu.c
index 49f1c805fc95..8c4f3753b36e 100644
--- a/drivers/clocksource/sh_tmu.c
+++ b/drivers/clocksource/sh_tmu.c
@@ -462,11 +462,8 @@ static int sh_tmu_channel_setup(struct sh_tmu_channel *ch, 
unsigned int index,
ch->base = tmu->mapbase + 8 + ch->index * 12;
 
ch->irq = platform_get_irq(tmu->pdev, index);
-   if (ch->irq < 0) {
-   dev_err(>pdev->dev, "ch%u: failed to get irq\n",
-   ch->index);
+   if (ch->irq < 0)
return ch->irq;
-   }
 
ch->cs_enabled = false;
ch->enable_count = 0;
-- 
2.17.1



Re: [PATCH] cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available

2019-08-26 Thread Marcelo Tosatti
On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote:
> On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti  wrote:
> >
> > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote:
> > > On 01/08/19 18:51, Rafael J. Wysocki wrote:
> > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote:
> > > >> From: Wanpeng Li 
> > > >>
> > > >> The downside of guest side polling is that polling is performed even
> > > >> with other runnable tasks in the host. However, even if poll in kvm
> > > >> can aware whether or not other runnable tasks in the same pCPU, it
> > > >> can still incur extra overhead in over-subscribe scenario. Now we can
> > > >> just enable guest polling when dedicated pCPUs are available.
> > > >>
> > > >> Cc: Rafael J. Wysocki 
> > > >> Cc: Paolo Bonzini 
> > > >> Cc: Radim Krčmář 
> > > >> Cc: Marcelo Tosatti 
> > > >> Signed-off-by: Wanpeng Li 
> > > >
> > > > Paolo, Marcelo, any comments?
> > >
> > > Yes, it's a good idea.
> > >
> > > Acked-by: Paolo Bonzini 
> > >
> > > Paolo
> >
> 
> Hi Marcelo,
> 
> Sorry for the late response.
> 
> > I think KVM_HINTS_REALTIME is being abused somewhat.
> > It has no clear meaning and used in different locations
> > for different purposes.
> 
> ==  =
> KVM_HINTS_REALTIME 0  guest checks this feature bit to
> 
> determine that vCPUs are never
> 
> preempted for an unlimited time

Unlimited time means infinite time, or unlimited time means 
10s ? 1s ?

The previous definition was much better IMO: HINTS_DEDICATED.


> allowing optimizations
> ==  =
> 
> Now it disables pv queued spinlock, 

OK. 

> pv tlb shootdown, 

OK.

> pv sched yield

"The idea is from Xen, when sending a call-function IPI-many to vCPUs,
yield if any of the IPI target vCPUs was preempted. 17% performance
increasement of ebizzy benchmark can be observed in an over-subscribe
environment. (w/ kvm-pv-tlb disabled, testing TLB flush call-function
IPI-many since call-function is not easy to be trigged by userspace
workload)."

This can probably hurt if vcpus are rarely preempted. 

> which are not expected present in vCPUs are never preempted for an
> unlimited time scenario.
> 
> >
> > For example, i think that using pv queued spinlocks and
> > haltpoll is a desired scenario, which the patch below disallows.
> 
> So even if dedicated pCPU is available, pv queued spinlocks should
> still be chose if something like vhost-kthreads are used instead of
> DPDK/vhost-user. 

Can't you enable the individual features you need for optimizing 
the overcommitted case? This is how things have been done historically:
If a new feature is available, you enable it to get the desired
performance. x2apic, invariant-tsc, cpuidle haltpoll...

So in your case: enable pv schedyield, enable pv tlb shootdown.

> kvm adaptive halt-polling will compete with
> vhost-kthreads, however, poll in guest unaware other runnable tasks in
> the host which will defeat vhost-kthreads.

It depends on how much work vhost-kthreads needs to do, how successful 
halt-poll in the guest is, and what improvement halt-polling brings.
The amount of polling will be reduced to zero if polling 
is not successful.



[PATCH 1/4] mdev: Introduce sha1 based mdev alias

2019-08-26 Thread Parav Pandit
Whenever a parent requests to generate mdev alias, generate a mdev
alias.
It is an optional attribute that parent can request to generate
for each of its child mdev.
mdev alias is generated using sha1 from the mdev name.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 98 +++-
 drivers/vfio/mdev/mdev_private.h |  5 +-
 drivers/vfio/mdev/mdev_sysfs.c   | 13 +++--
 include/linux/mdev.h |  4 ++
 4 files changed, 111 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..e825ff38b037 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -10,9 +10,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #include "mdev_private.h"
 
@@ -27,6 +29,8 @@ static struct class_compat *mdev_bus_compat_class;
 static LIST_HEAD(mdev_list);
 static DEFINE_MUTEX(mdev_list_lock);
 
+static struct crypto_shash *alias_hash;
+
 struct device *mdev_parent_dev(struct mdev_device *mdev)
 {
return mdev->parent->dev;
@@ -164,6 +168,18 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
goto add_dev_err;
}
 
+   if (ops->get_alias_length) {
+   unsigned int digest_size;
+   unsigned int aligned_len;
+
+   aligned_len = roundup(ops->get_alias_length(), 2);
+   digest_size = crypto_shash_digestsize(alias_hash);
+   if (aligned_len / 2 > digest_size) {
+   ret = -EINVAL;
+   goto add_dev_err;
+   }
+   }
+
parent = kzalloc(sizeof(*parent), GFP_KERNEL);
if (!parent) {
ret = -ENOMEM;
@@ -259,6 +275,7 @@ static void mdev_device_free(struct mdev_device *mdev)
mutex_unlock(_list_lock);
 
dev_dbg(>dev, "MDEV: destroying\n");
+   kvfree(mdev->alias);
kfree(mdev);
 }
 
@@ -269,18 +286,86 @@ static void mdev_device_release(struct device *dev)
mdev_device_free(mdev);
 }
 
-int mdev_device_create(struct kobject *kobj,
-  struct device *dev, const guid_t *uuid)
+static const char *
+generate_alias(const char *uuid, unsigned int max_alias_len)
+{
+   struct shash_desc *hash_desc;
+   unsigned int digest_size;
+   unsigned char *digest;
+   unsigned int alias_len;
+   char *alias;
+   int ret = 0;
+
+   /* Align to multiple of 2 as bin2hex will generate
+* even number of bytes.
+*/
+   alias_len = roundup(max_alias_len, 2);
+   alias = kvzalloc(alias_len + 1, GFP_KERNEL);
+   if (!alias)
+   return NULL;
+
+   /* Allocate and init descriptor */
+   hash_desc = kvzalloc(sizeof(*hash_desc) +
+crypto_shash_descsize(alias_hash),
+GFP_KERNEL);
+   if (!hash_desc)
+   goto desc_err;
+
+   hash_desc->tfm = alias_hash;
+
+   digest_size = crypto_shash_digestsize(alias_hash);
+
+   digest = kvzalloc(digest_size, GFP_KERNEL);
+   if (!digest) {
+   ret = -ENOMEM;
+   goto digest_err;
+   }
+   crypto_shash_init(hash_desc);
+   crypto_shash_update(hash_desc, uuid, UUID_STRING_LEN);
+   crypto_shash_final(hash_desc, digest);
+   bin2hex([0], digest,
+   min_t(unsigned int, digest_size, alias_len / 2));
+   /* When alias length is odd, zero out and additional last byte
+* that bin2hex has copied.
+*/
+   if (max_alias_len % 2)
+   alias[max_alias_len] = 0;
+
+   kvfree(digest);
+   kvfree(hash_desc);
+   return alias;
+
+digest_err:
+   kvfree(hash_desc);
+desc_err:
+   kvfree(alias);
+   return NULL;
+}
+
+int mdev_device_create(struct kobject *kobj, struct device *dev,
+  const char *uuid_str, const guid_t *uuid)
 {
int ret;
struct mdev_device *mdev, *tmp;
struct mdev_parent *parent;
struct mdev_type *type = to_mdev_type(kobj);
+   unsigned int alias_len = 0;
+   const char *alias = NULL;
 
parent = mdev_get_parent(type->parent);
if (!parent)
return -EINVAL;
 
+   if (parent->ops->get_alias_length)
+   alias_len = parent->ops->get_alias_length();
+   if (alias_len) {
+   alias = generate_alias(uuid_str, alias_len);
+   if (!alias) {
+   ret = -ENOMEM;
+   goto alias_fail;
+   }
+   }
+
mutex_lock(_list_lock);
 
/* Check for duplicate */
@@ -300,6 +385,8 @@ int mdev_device_create(struct kobject *kobj,
}
 
guid_copy(>uuid, uuid);
+   mdev->alias = alias;
+   alias = NULL;
list_add(>next, _list);
mutex_unlock(_list_lock);
 
@@ -346,6 +433,8 @@ int mdev_device_create(struct kobject 

[PATCH 4/4] mtty: Optionally support mtty alias

2019-08-26 Thread Parav Pandit
Provide a module parameter to set alias length to optionally generate
mdev alias.

Example to request mdev alias.
$ modprobe mtty alias_length=12

Signed-off-by: Parav Pandit 
---
 samples/vfio-mdev/mtty.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 92e770a06ea2..92208245b057 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1410,6 +1410,15 @@ static struct attribute_group *mdev_type_groups[] = {
NULL,
 };
 
+static unsigned int mtty_alias_length;
+module_param_named(alias_length, mtty_alias_length, uint, 0444);
+MODULE_PARM_DESC(alias_length, "mdev alias length; default=0");
+
+static unsigned int mtty_get_alias_length(void)
+{
+   return mtty_alias_length;
+}
+
 static const struct mdev_parent_ops mdev_fops = {
.owner  = THIS_MODULE,
.dev_attr_groups= mtty_dev_groups,
@@ -1422,6 +1431,7 @@ static const struct mdev_parent_ops mdev_fops = {
.read   = mtty_read,
.write  = mtty_write,
.ioctl  = mtty_ioctl,
+   .get_alias_length   = mtty_get_alias_length
 };
 
 static void mtty_device_release(struct device *dev)
-- 
2.19.2



[GIT PULL] timers drivers v5.5

2019-08-26 Thread Daniel Lezcano
The following changes since commit 08a3c192c93f4359a94bf47971e55b0324b72b8b:

  posix-timers: Prepare for PREEMPT_RT (2019-08-01 20:51:25 +0200)

are available in the Git repository at:

  https://git.linaro.org/people/daniel.lezcano/linux.git tags/timers-v5.5

for you to fetch changes up to befd04abfbe4b933515dddb5659d0744be9dba6a:

  clocksource/drivers/sh_cmt: Document "cmt-48" as deprecated
(2019-08-23 07:38:34 +0200)


- Remove dev_err() when used with platform_get_irq (Stephen Boyd)

- Add DT binding and new compatible for Allwinner sun4i (Maxime Ripard)

- Register the Atmel tcb clocksource for delays (Alexandre Belloni)

- Add a clock divider for the Freescale imx platforms and new timer node
  in the DT (Anson Huang)

- Use DIV_ROUND_CLOSEST macro for the Renesas OSTM (Geert Uytterhoeven)

- Fix GENMASK and timer operation for the npcm timer (Avi Fishman)

- Fix timer-of showing an error message when EPROBE_DEFER is
  returned (Jon Hunter)

- Add new SoC DT binding and match for Renesas timers (Magnus Damm)


Alexandre Belloni (1):
  clocksource/drivers/tcb_clksrc: Register delay timer

Anson Huang (3):
  clocksource/drivers/imx-sysctr: Add internal clock divider handle
  arm64: dts: imx8mm: Add system counter node
  arm64: dts: imx8mq: Add system counter node

Avi Fishman (1):
  clocksource/drivers/npcm: Fix GENMASK and timer operation

Geert Uytterhoeven (1):
  clocksource/drivers/renesas-ostm: Use DIV_ROUND_CLOSEST() helper

Jon Hunter (2):
  clocksource/drivers/timer-of: Do not warn on deferred probe
  clocksource/drivers: Do not warn on probe defer

Magnus Damm (7):
  dt-bindings: timer: renesas, cmt: Add CMT0234 to sh73a0 and r8a7740
  dt-bindings: timer: renesas, cmt: Update CMT1 on sh73a0 and r8a7740
  dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a7792
  dt-bindings: timer: renesas, cmt: Add CMT0 and CMT1 to r8a77995
  dt-bindings: timer: renesas, cmt: Update R-Car Gen3 CMT1 usage
  clocksource/drivers/sh_cmt: r8a7740 and sh73a0 SoC-specific match
  clocksource/drivers/sh_cmt: Document "cmt-48" as deprecated

Maxime Ripard (4):
  dt-bindings: timer: Convert Allwinner A10 Timer to a schema
  dt-bindings: timer: Add missing compatibles
  clocksource: sun4i: Add missing compatibles
  dt-bindings: timer: Convert Allwinner A13 HSTimer to a schema

Stephen Boyd (1):
  clocksource: Remove dev_err() usage after platform_get_irq()

 .../bindings/timer/allwinner,sun4i-a10-timer.yaml  | 102
+
 .../bindings/timer/allwinner,sun4i-timer.txt   |  19 
 .../bindings/timer/allwinner,sun5i-a13-hstimer.txt |  26 --
 .../timer/allwinner,sun5i-a13-hstimer.yaml |  79 
 .../devicetree/bindings/timer/renesas,cmt.txt  |  40 
 arch/arm64/boot/dts/freescale/imx8mm.dtsi  |   8 ++
 arch/arm64/boot/dts/freescale/imx8mq.dtsi  |   8 ++
 drivers/clocksource/Kconfig|   2 +-
 drivers/clocksource/em_sti.c   |   4 +-
 drivers/clocksource/renesas-ostm.c |   2 +-
 drivers/clocksource/sh_cmt.c   |  19 +++-
 drivers/clocksource/sh_tmu.c   |   5 +-
 drivers/clocksource/timer-atmel-tcb.c  |  18 
 drivers/clocksource/timer-imx-sysctr.c |   5 +
 drivers/clocksource/timer-npcm7xx.c|   9 +-
 drivers/clocksource/timer-of.c |   6 +-
 drivers/clocksource/timer-probe.c  |   4 +-
 drivers/clocksource/timer-sun4i.c  |   4 +
 18 files changed, 275 insertions(+), 85 deletions(-)
 create mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun4i-a10-timer.yaml
 delete mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun4i-timer.txt
 delete mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.txt
 create mode 100644
Documentation/devicetree/bindings/timer/allwinner,sun5i-a13-hstimer.yaml

-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH 2/4] mdev: Make mdev alias unique among all mdevs

2019-08-26 Thread Parav Pandit
Mdev alias should be unique among all the mdevs, so that when such alias
is used by the mdev users to derive other objects, there is no
collision in a given system.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index e825ff38b037..6eb37f0c6369 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -375,6 +375,11 @@ int mdev_device_create(struct kobject *kobj, struct device 
*dev,
ret = -EEXIST;
goto mdev_fail;
}
+   if (tmp->alias && strcmp(tmp->alias, alias) == 0) {
+   mutex_unlock(_list_lock);
+   ret = -EEXIST;
+   goto mdev_fail;
+   }
}
 
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
-- 
2.19.2



[PATCH 3/4] mdev: Expose mdev alias in sysfs tree

2019-08-26 Thread Parav Pandit
Expose mdev alias as string in a sysfs tree so that such attribute can
be used to generate netdevice name by systemd/udev or can be used to
match other kernel objects based on the alias of the mdev.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 43afe0e80b76..59f4e3cc5233 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -246,7 +246,20 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
 
 static DEVICE_ATTR_WO(remove);
 
+static ssize_t alias_show(struct device *device,
+ struct device_attribute *attr, char *buf)
+{
+   struct mdev_device *dev = mdev_from_dev(device);
+
+   if (!dev->alias)
+   return -EOPNOTSUPP;
+
+   return sprintf(buf, "%s\n", dev->alias);
+}
+static DEVICE_ATTR_RO(alias);
+
 static const struct attribute *mdev_device_attrs[] = {
+   _attr_alias.attr,
_attr_remove.attr,
NULL,
 };
-- 
2.19.2



[PATCH 0/4] Introduce variable length mdev alias

2019-08-26 Thread Parav Pandit
To have consistent naming for the netdevice of a mdev and to have
consistent naming of the devlink port [1] of a mdev, which is formed using
phys_port_name of the devlink port, current UUID is not usable because
UUID is too long.

UUID in string format is 36-characters long and in binary 128-bit.
Both formats are not able to fit within 15 characters limit of netdev
name.

It is desired to have mdev device naming consistent using UUID.
So that widely used user space framework such as ovs [2] can make use
of mdev representor in similar way as PCIe SR-IOV VF and PF representors.

Hence,
(a) mdev alias is created which is derived using sha1 from the mdev name.
(b) Vendor driver describes how long an alias should be for the child mdev
created for a given parent.
(c) Mdev aliases are unique at system level.
(d) alias is created optionally whenever parent requested.
This ensures that non networking mdev parents can function without alias
creation overhead.

This design is discussed at [3].

An example systemd/udev extension will have,

1. netdev name created using mdev alias available in sysfs.

mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
mdev 12 character alias=cd5b146a80a5

netdev name of this mdev = enmcd5b146a80a5
Here en = Ethernet link
m = mediated device

2. devlink port phys_port_name created using mdev alias.
devlink phys_port_name=pcd5b146a80a5

This patchset enables mdev core to maintain unique alias for a mdev.

Patch-1 Introduces mdev alias using sha1.
Patch-2 Ensures that mdev alias is unique in a system.
Patch-3 Exposes mdev alias in a sysfs hirerchy.
Patch-4 Extends mtty driver to optionally provide alias generation.
This also enables to test UUID based sha1 collision and trigger
error handling for duplicate sha1 results.

In future when networking driver wants to use mdev alias, mdev_alias()
API will be added to derive devlink port name.

[1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
[2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
[3] https://patchwork.kernel.org/cover/11084231/

Parav Pandit (4):
  mdev: Introduce sha1 based mdev alias
  mdev: Make mdev alias unique among all mdevs
  mdev: Expose mdev alias in sysfs tree
  mtty: Optionally support mtty alias

 drivers/vfio/mdev/mdev_core.c| 103 ++-
 drivers/vfio/mdev/mdev_private.h |   5 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  26 ++--
 include/linux/mdev.h |   4 ++
 samples/vfio-mdev/mtty.c |  10 +++
 5 files changed, 139 insertions(+), 9 deletions(-)

-- 
2.19.2



Re: [PATCH 02/19] dax: Pass dax_dev to dax_writeback_mapping_range()

2019-08-26 Thread Vivek Goyal
On Mon, Aug 26, 2019 at 04:53:16AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 01:57:03PM -0400, Vivek Goyal wrote:
> > Right now dax_writeback_mapping_range() is passed a bdev and dax_dev
> > is searched from that bdev name.
> > 
> > virtio-fs does not have a bdev. So pass in dax_dev also to
> > dax_writeback_mapping_range(). If dax_dev is passed in, bdev is not
> > used otherwise dax_dev is searched using bdev.
> 
> Please just pass in only the dax_device and get rid of the block device.
> The callers should have one at hand easily, e.g. for XFS just call
> xfs_find_daxdev_for_inode instead of xfs_find_bdev_for_inode.

Sure. Here is the updated patch.

This patch can probably go upstream independently. If you are fine with
the patch, I can post it separately for inclusion.


Subject: dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()

As of now dax_writeback_mapping_range() takes "struct block_device" as a
parameter and dax_dev is searched from bdev name. This also involves taking
a fresh reference on dax_dev and putting that reference at the end of
function.

We are developing a new filesystem virtio-fs and using dax to access host
page cache directly. But there is no block device. IOW, we want to make
use of dax but want to get rid of this assumption that there is always
a block device associated with dax_dev.

So pass in "struct dax_device" as parameter instead of bdev.

ext2/ext4/xfs are current users and they already have a reference on
dax_device. So there is no need to take reference and drop reference to
dax_device on each call of this function.

Suggested-by: Christoph Hellwig 
Signed-off-by: Vivek Goyal 
---
 fs/dax.c|8 +---
 fs/ext2/inode.c |5 +++--
 fs/ext4/inode.c |2 +-
 fs/xfs/xfs_aops.c   |2 +-
 include/linux/dax.h |2 +-
 5 files changed, 7 insertions(+), 12 deletions(-)

Index: rhvgoyal-linux-fuse/fs/dax.c
===
--- rhvgoyal-linux-fuse.orig/fs/dax.c   2019-08-26 11:20:36.545009968 -0400
+++ rhvgoyal-linux-fuse/fs/dax.c2019-08-26 11:24:43.973009968 -0400
@@ -936,12 +936,11 @@ static int dax_writeback_one(struct xa_s
  * on persistent storage prior to completion of the operation.
  */
 int dax_writeback_mapping_range(struct address_space *mapping,
-   struct block_device *bdev, struct writeback_control *wbc)
+   struct dax_device *dax_dev, struct writeback_control *wbc)
 {
XA_STATE(xas, >i_pages, wbc->range_start >> PAGE_SHIFT);
struct inode *inode = mapping->host;
pgoff_t end_index = wbc->range_end >> PAGE_SHIFT;
-   struct dax_device *dax_dev;
void *entry;
int ret = 0;
unsigned int scanned = 0;
@@ -952,10 +951,6 @@ int dax_writeback_mapping_range(struct a
if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
return 0;
 
-   dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
-   if (!dax_dev)
-   return -EIO;
-
trace_dax_writeback_range(inode, xas.xa_index, end_index);
 
tag_pages_for_writeback(mapping, xas.xa_index, end_index);
@@ -976,7 +971,6 @@ int dax_writeback_mapping_range(struct a
xas_lock_irq();
}
xas_unlock_irq();
-   put_dax(dax_dev);
trace_dax_writeback_range_done(inode, xas.xa_index, end_index);
return ret;
 }
Index: rhvgoyal-linux-fuse/include/linux/dax.h
===
--- rhvgoyal-linux-fuse.orig/include/linux/dax.h2019-08-26 
11:20:36.545009968 -0400
+++ rhvgoyal-linux-fuse/include/linux/dax.h 2019-08-26 11:26:08.384009968 
-0400
@@ -141,7 +141,7 @@ static inline void fs_put_dax(struct dax
 
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
 int dax_writeback_mapping_range(struct address_space *mapping,
-   struct block_device *bdev, struct writeback_control *wbc);
+   struct dax_device *dax_dev, struct writeback_control *wbc);
 
 struct page *dax_layout_busy_page(struct address_space *mapping);
 dax_entry_t dax_lock_page(struct page *page);
Index: rhvgoyal-linux-fuse/fs/xfs/xfs_aops.c
===
--- rhvgoyal-linux-fuse.orig/fs/xfs/xfs_aops.c  2019-08-26 11:20:36.545009968 
-0400
+++ rhvgoyal-linux-fuse/fs/xfs/xfs_aops.c   2019-08-26 11:34:51.085009968 
-0400
@@ -1120,7 +1120,7 @@ xfs_dax_writepages(
 {
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
return dax_writeback_mapping_range(mapping,
-   xfs_find_bdev_for_inode(mapping->host), wbc);
+   xfs_find_daxdev_for_inode(mapping->host), wbc);
 }
 
 STATIC int
Index: rhvgoyal-linux-fuse/fs/ext4/inode.c
===
--- rhvgoyal-linux-fuse.orig/fs/ext4/inode.c2019-08-26 11:20:36.545009968 
-0400
+++ 

Re: [PATCH 1/2] x86/microcode: Update late microcode in parallel

2019-08-26 Thread Raj, Ashok
On Mon, Aug 26, 2019 at 08:53:05AM -0400, Boris Ostrovsky wrote:
> On 8/24/19 4:53 AM, Borislav Petkov wrote:
> >  
> > +wait_for_siblings:
> > +   if (__wait_for_cpus(_cpus_out, NSEC_PER_SEC))
> > +   panic("Timeout during microcode update!\n");
> > +
> > /*
> > -* Increase the wait timeout to a safe value here since we're
> > -* serializing the microcode update and that could take a while on a
> > -* large number of CPUs. And that is fine as the *actual* timeout will
> > -* be determined by the last CPU finished updating and thus cut short.
> > +* At least one thread has completed update on each core.
> > +* For others, simply call the update to make sure the
> > +* per-cpu cpuinfo can be updated with right microcode
> > +* revision.
> 
> 
> What is the advantage of having those other threads go through
> find_patch() and (in Intel case) intel_get_microcode_revision() (which
> involves two MSR accesses) vs. having the master sibling update slaves'
> microcode revisions? There are only two things that need to be updated,
> uci->cpu_sig.rev and c->microcode.
> 

True, yes we could do that. But there is some warm and comfy feeling
that you really read the revision from the thread local copy of the revision
and it matches what was updated in the other thread sibling rather than
just hardcoding the fixup's. The code looks clean in the sense it looks like
you are attempting to upgrade but the new revision is reflected correctly
and you skip the update.

But if you feel complelled, i'm not opposed to it as long as Boris is 
happy with the changes :-).

Cheers,
Ashok





[PATCH v2] soc: xilinx: Set CAP_UNUSABLE requirement for versal while powering down domain

2019-08-26 Thread Jolly Shah
From: Tejas Patel 

For "0" requirement which is used to inform firmware that device is
not required currently by master, Versal PLM (Platform Loader and
Manager) which runs on Platform Management Controller and is responsible
platform management of devices that disables clock, power it down
and reset the device. genpd_power_off() is being called during runtime
suspend also. So, if any device goes to runtime suspend state during
resumes it needs to be re-initialized again. It is possible that
drivers do not reinitialize device upon resume from runtime suspend
every time ans so dont want it to be powered down or get reset
during runtime suspend.

In Versal PLM new PM_CAP_UNUSABLE capability is added, which disables
clock only and avoids power down and reset during runtime suspend. Power
and reset will be gated with core suspend.So, this patch sets 
CAPABILITY_UNUSABLE requirement during gpd_power_off()
if platform is other than zynqmp.

Signed-off-by: Tejas Patel 
Signed-off-by: Jolly Shah 
---
 drivers/soc/xilinx/zynqmp_pm_domains.c | 10 --
 include/linux/firmware/xlnx-zynqmp.h   |  3 ++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/soc/xilinx/zynqmp_pm_domains.c 
b/drivers/soc/xilinx/zynqmp_pm_domains.c
index 600f57c..23d90cb 100644
--- a/drivers/soc/xilinx/zynqmp_pm_domains.c
+++ b/drivers/soc/xilinx/zynqmp_pm_domains.c
@@ -2,7 +2,7 @@
 /*
  * ZynqMP Generic PM domain support
  *
- *  Copyright (C) 2015-2018 Xilinx, Inc.
+ *  Copyright (C) 2015-2019 Xilinx, Inc.
  *
  *  Davorin Mista 
  *  Jolly Shah 
@@ -25,6 +25,8 @@
 
 static const struct zynqmp_eemi_ops *eemi_ops;
 
+static int min_capability;
+
 /**
  * struct zynqmp_pm_domain - Wrapper around struct generic_pm_domain
  * @gpd:   Generic power domain
@@ -106,7 +108,7 @@ static int zynqmp_gpd_power_off(struct generic_pm_domain 
*domain)
int ret;
struct pm_domain_data *pdd, *tmp;
struct zynqmp_pm_domain *pd;
-   u32 capabilities = 0;
+   u32 capabilities = min_capability;
bool may_wakeup;
 
if (!eemi_ops->set_requirement)
@@ -283,6 +285,10 @@ static int zynqmp_gpd_probe(struct platform_device *pdev)
if (!domains)
return -ENOMEM;
 
+   if (!of_device_is_compatible(dev->parent->of_node,
+"xlnx,zynqmp-firmware"))
+   min_capability = ZYNQMP_PM_CAPABILITY_UNUSABLE;
+
for (i = 0; i < ZYNQMP_NUM_DOMAINS; i++, pd++) {
pd->node_id = 0;
pd->gpd.name = kasprintf(GFP_KERNEL, "domain%d", i);
diff --git a/include/linux/firmware/xlnx-zynqmp.h 
b/include/linux/firmware/xlnx-zynqmp.h
index 778abbb..b8a7c22 100644
--- a/include/linux/firmware/xlnx-zynqmp.h
+++ b/include/linux/firmware/xlnx-zynqmp.h
@@ -2,7 +2,7 @@
 /*
  * Xilinx Zynq MPSoC Firmware layer
  *
- *  Copyright (C) 2014-2018 Xilinx
+ *  Copyright (C) 2014-2019 Xilinx
  *
  *  Michal Simek 
  *  Davorin Mista 
@@ -46,6 +46,7 @@
 #defineZYNQMP_PM_CAPABILITY_ACCESS 0x1U
 #defineZYNQMP_PM_CAPABILITY_CONTEXT0x2U
 #defineZYNQMP_PM_CAPABILITY_WAKEUP 0x4U
+#defineZYNQMP_PM_CAPABILITY_UNUSABLE   0x8U
 
 /*
  * Firmware FPGA Manager flags
-- 
2.7.4



Re: [PATCH v5 1/3] dt-bindings: opp: Introduce opp-peak-kBps and opp-avg-kBps bindings

2019-08-26 Thread Saravana Kannan
On Wed, Aug 21, 2019 at 1:33 PM Rob Herring  wrote:
>
> On Wed,  7 Aug 2019 15:31:09 -0700, Saravana Kannan wrote:
> > Interconnects often quantify their performance points in terms of
> > bandwidth. So, add opp-peak-kBps (required) and opp-avg-kBps (optional) to
> > allow specifying Bandwidth OPP tables in DT.
> >
> > opp-peak-kBps is a required property that replaces opp-hz for Bandwidth OPP
> > tables.
> >
> > opp-avg-kBps is an optional property that can be used in Bandwidth OPP
> > tables.
> >
> > Signed-off-by: Saravana Kannan 
> > ---
> >  Documentation/devicetree/bindings/opp/opp.txt | 15 ---
> >  .../devicetree/bindings/property-units.txt|  4 
> >  2 files changed, 16 insertions(+), 3 deletions(-)
> >
>
> Reviewed-by: Rob Herring 

Thanks Rob!

-Saravana


Re: [PATCH 1/2] x86/microcode: Update late microcode in parallel

2019-08-26 Thread Raj, Ashok
Hi Boris

Minor nit: Small commit log fixup below. 

On Sat, Aug 24, 2019 at 10:53:00AM +0200, Borislav Petkov wrote:
> From: Ashok Raj 
> Date: Thu, 22 Aug 2019 23:43:47 +0300
> 
> Microcode update was changed to be serialized due to restrictions after
> Spectre days. Updating serially on a large multi-socket system can be
> painful since it is being done on one CPU at a time.
> 
> Cloud customers have expressed discontent as services disappear for a
> prolonged time. The restriction is that only one core goes through the
s/one core/one thread of a core/

> update while other cores are quiesced.
s/cores/other thread(s) of the core

> 
> Do the microcode update only on the first thread of each core while
> other siblings simply wait for this to complete.
> 
>  [ bp: Simplify, massage, cleanup comments. ]
> 

Cheers,
Ashok


Re: [PATCH] net/mlx5: fix a -Wstringop-truncation warning

2019-08-26 Thread Saeed Mahameed
On Fri, 2019-08-23 at 15:18 -0700, David Miller wrote:
> Saeed, I assume I'll get this from you.

Yes, i will handle it.


Re: [PATCH v2 3/3] dwc: PCI: intel: Intel PCIe RC controller driver

2019-08-26 Thread Martin Blumenstingl
Hi Dilip,

On Mon, Aug 26, 2019 at 8:42 AM Dilip Kota  wrote:
[...]
> intel_pcie_port structure is having "struct dw_pcie" as mentioned below:
>
> struct intel_pcie_port {
> struct dw_pcie  *pci;
> unsigned intid; /* Physical RC Index */
> void __iomem*app_base;
> struct gpio_desc*reset_gpio;
> [...]
> };
>
> Almost all the drivers are following the same way. I don't see any issue in 
> this way.
> Please help me with more description if you see an issue here.
>
> struct qcom_pcie {
> struct dw_pcie *pci;
> Ref: 
> https://elixir.bootlin.com/linux/v5.3-rc6/source/drivers/pci/controller/dwc/pcie-qcom.c
>
> struct armada8k_pcie {
> struct dw_pcie *pci;
> Ref: 
> https://elixir.bootlin.com/linux/v5.3-rc6/source/drivers/pci/controller/dwc/pcie-armada8k.c
>
> struct artpec6_pcie {
> struct dw_pcie *pci;
> Ref: 
> https://elixir.bootlin.com/linux/v5.3-rc6/source/drivers/pci/controller/dwc/pcie-artpec6.c
>
> struct kirin_pcie {
> struct dw_pcie *pci;
> Ref: 
> https://elixir.bootlin.com/linux/v5.3-rc6/source/drivers/pci/controller/dwc/pcie-kirin.c
>
> struct spear13xx_pcie {
> struct dw_pcie *pci;
> Ref: 
> https://elixir.bootlin.com/linux/v5.3-rc6/source/drivers/pci/controller/dwc/pcie-spear13xx.c
thank you for this detailed list.
it seems that I picked the minority of drivers as "reference" where
it's implemented differently:

first example: pci-meson
  struct meson_pcie {
struct dw_pcie pci;
...
  };

second example: pcie-tegra194 (only in -next, will be part of v5.4)
  struct tegra_pcie_dw {
...
struct dw_pcie pci;
...
  };

so some drivers store a pointer pointer to the dw_pcie struct vs.
embedding the dw_pcie struct directly.
as far as I know the result will be equal, except that you don't have
to use a second devm_kzalloc for struct dw_pcie (and thus reducing
memory fragmentation).


Martin


[PATCH 0/5] mmu notifer debug annotations

2019-08-26 Thread Daniel Vetter
Hi all,

Next round. Changes:

- I kept the two lockdep annotations patches since when I rebased this
  before retesting linux-next didn't yet have them. Otherwise unchanged
  except for a trivial conflict.

- Ack from Peter Z. on the kernel.h patch.

- Added annotations for non_block to invalidate_range_end. I can't test
  that readily since i915 doesn't use it.

- Added might_sleep annotations to also make sure the mm side keeps up
  it's side of the contract here around what's allowed and what's not.

Comments, feedback, review as usual very much appreciated.

Cheers, Daniel

Daniel Vetter (5):
  mm, notifier: Add a lockdep map for invalidate_range_start/end
  mm, notifier: Prime lockdep
  kernel.h: Add non_block_start/end()
  mm, notifier: Catch sleeping/blocking for !blockable
  mm, notifier: annotate with might_sleep()

 include/linux/kernel.h   | 25 -
 include/linux/mmu_notifier.h | 13 +
 include/linux/sched.h|  4 
 kernel/sched/core.c  | 19 ++-
 mm/mmu_notifier.c| 31 +--
 5 files changed, 84 insertions(+), 8 deletions(-)

-- 
2.23.0



Re: [PATCH v1 net-next 4/4] net: stmmac: setup higher frequency clk support for EHL & TGL

2019-08-26 Thread Andrew Lunn
On Mon, Aug 26, 2019 at 12:55:31PM -0700, Florian Fainelli wrote:
> On 8/26/19 6:38 PM, Voon Weifeng wrote:
> > EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk,
> > ptp clock and ptp_max_adj to 200MHz.
> > 
> > Signed-off-by: Voon Weifeng 
> > Signed-off-by: Ong Boon Leong 
> > ---
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 21 +
> >  drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c |  3 +++
> >  include/linux/stmmac.h   |  1 +
> >  3 files changed, 25 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> > index e969dc9bb9f0..20906287b6d4 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> > @@ -9,6 +9,7 @@
> >Author: Giuseppe Cavallaro 
> >  
> > ***/
> >  
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -174,6 +175,19 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
> > plat->axi->axi_blen[1] = 8;
> > plat->axi->axi_blen[2] = 16;
> >  
> > +   plat->ptp_max_adj = plat->clk_ptp_rate;
> > +
> > +   /* Set system clock */
> > +   plat->stmmac_clk = clk_register_fixed_rate(>dev,
> > +  "stmmac-clk", NULL, 0,
> > +  plat->clk_ptp_rate);
> > +
> > +   if (IS_ERR(plat->stmmac_clk)) {
> > +   dev_warn(>dev, "Fail to register stmmac-clk\n");
> > +   plat->stmmac_clk = NULL;
> 
> Don't you need to propagate at least EPROBE_DEFER here?

Hi Florian

Isn't a fixed rate clock a complete fake. There is no hardware behind
it. So can it return EPROBE_DEFER?

Andrew


Re: [PATCH] IB/mlx5: Convert to use vm_map_pages_zero()

2019-08-26 Thread Souptick Joarder
On Mon, Aug 26, 2019 at 5:50 PM Jason Gunthorpe  wrote:
>
> On Mon, Aug 26, 2019 at 01:32:09AM +0530, Souptick Joarder wrote:
> > On Mon, Aug 26, 2019 at 1:13 AM Jason Gunthorpe  wrote:
> > >
> > > On Sun, Aug 25, 2019 at 11:37:27AM +0530, Souptick Joarder wrote:
> > > > First, length passed to mmap is checked explicitly against
> > > > PAGE_SIZE.
> > > >
> > > > Second, if vma->vm_pgoff is passed as non zero, it would return
> > > > error. It appears like driver is expecting vma->vm_pgoff to
> > > > be passed as 0 always.
> > >
> > > ? pg_off is not zero
> >
> > Sorry, I mean, driver has a check against non zero to return error 
> > -EOPNOTSUPP
> > which means in true scenario driver is expecting vma->vm_pgoff should be 
> > passed
> > as 0.
>
> get_index is masking vm_pgoff, it is not 0

Sorry, I missed this part. Further looking into code,
in mlx5_ib_mmap(), vma_vm_pgoff is used to get command and
inside mlx5_ib_mmap_clock_info_page() entire *dev->mdev->clock_info*
is mapped.

Consider that, the below modification will only take care of vma length
error check inside vm_map_pages_zero() and an extra check for vma
length is not needed.

diff --git a/drivers/infiniband/hw/mlx5/main.c
b/drivers/infiniband/hw/mlx5/main.c
index 0569bca..c3e3bfe 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2071,8 +2071,9 @@ static int mlx5_ib_mmap_clock_info_page(struct
mlx5_ib_dev *dev,
struct vm_area_struct *vma,
struct mlx5_ib_ucontext *context)
 {
-   if ((vma->vm_end - vma->vm_start != PAGE_SIZE) ||
-   !(vma->vm_flags & VM_SHARED))
+   struct page *pages;
+
+   if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

if (get_index(vma->vm_pgoff) != MLX5_IB_CLOCK_INFO_V1)
@@ -2084,9 +2085,9 @@ static int mlx5_ib_mmap_clock_info_page(struct
mlx5_ib_dev *dev,

if (!dev->mdev->clock_info)
return -EOPNOTSUPP;
+   pages = virt_to_page(dev->mdev->clock_info);

-   return vm_insert_page(vma, vma->vm_start,
- virt_to_page(dev->mdev->clock_info));
+   return vm_map_pages_zero(vma, , 1);
 }

If this is fine, I can post it as v2. Otherwise I will drop this patch ?


Re: [PATCH v2] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB

2019-08-26 Thread Christophe Leroy




Le 26/08/2019 à 18:50, Greg Kroah-Hartman a écrit :

On Wed, Aug 21, 2019 at 10:19:27AM +1000, Alastair D'Silva wrote:

From: Alastair D'Silva 

The upstream commit:
22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
has a similar effect, but since it is a rewrite of the assembler to C, is
too invasive for stable. This patch is a minimal fix to address the issue in
assembler.

This patch applies cleanly to v5.2, v4.19 & v4.14.

When calling flush_(inval_)dcache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.

This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.

Changelog:
v2
   - Add related upstream commit

Signed-off-by: Alastair D'Silva 
---
  arch/powerpc/kernel/misc_64.S | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 1ad4089dd110..d4d096f80f4b 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -130,7 +130,7 @@ _GLOBAL_TOC(flush_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block 
size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
mtctr   r8
  0:dcbst   0,r6
@@ -148,7 +148,7 @@ _GLOBAL(flush_inval_dcache_range)
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */
-   srw.r8,r8,r9/* compute line count */
+   srd.r8,r8,r9/* compute line count */
beqlr   /* nothing to do? */
sync
isync


I need an ack from the powerpc maintainer(s) before I can take this.


I think you already got an ack (on v1). See 
https://patchwork.ozlabs.org/patch/1147403/#2239663


Christophe


[PATCH] mtd: rawnand: brcmnand: Fix ecc chunk calculation for erased page bitfips

2019-08-26 Thread Kamal Dasu
From: Claire Lin 

In brcmstb_nand_verify_erased_page(), fix ecc chunk pointer calculation
while correcting erased page bitflip.

Fixes: 02b88eea9f9c ("mtd: brcmnand: Add check for erased page bitflips")
Signed-off-by: Claire Lin 
Reviewed-by: Ray Jui 
Signed-off-by: Kamal Dasu 
---
 drivers/mtd/nand/raw/brcmnand/brcmnand.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c 
b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 33310b8..15ef30b 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1792,6 +1792,7 @@ static int brcmstb_nand_verify_erased_page(struct 
mtd_info *mtd,
int bitflips = 0;
int page = addr >> chip->page_shift;
int ret;
+   void *ecc_chunk;
 
if (!buf)
buf = nand_get_data_buf(chip);
@@ -1804,7 +1805,9 @@ static int brcmstb_nand_verify_erased_page(struct 
mtd_info *mtd,
return ret;
 
for (i = 0; i < chip->ecc.steps; i++, oob += sas) {
-   ret = nand_check_erased_ecc_chunk(buf, chip->ecc.size,
+   ecc_chunk = buf + chip->ecc.size * i;
+   ret = nand_check_erased_ecc_chunk(ecc_chunk,
+ chip->ecc.size,
  oob, sas, NULL, 0,
  chip->ecc.strength);
if (ret < 0)
-- 
1.9.0.138.g2de3478



[RFC] perf/x86/amd: add support for Large Increment per Cycle Events

2019-08-26 Thread Kim Phillips
The core AMD PMU has a 4-bit wide per-cycle increment for each
performance monitor counter.  That works for most counters, but
now with AMD Family 17h and above processors, for some, more than 15
events can occur in a cycle.  Those events are called "Large
Increment per Cycle" events, and one example is the number of
SSE/AVX FLOPs retired (event code 0x003).  In order to count these
events, two adjacent h/w PMCs get their count signals merged
to form 8 bits per cycle total.  In addition, the PERF_CTR count
registers are merged to be able to count up to 64 bits.

Normally, events like instructions retired, get programmed on a single
counter like so:

PERF_CTL0 (MSR 0xc0010200) 0x0053ff0c # event 0x0c, umask 0xff
PERF_CTR0 (MSR 0xc0010201) 0x8001 # r/w 48-bit count

The next counter at MSRs 0xc0010202-3 remains unused, or can be used
independently to count something else.

When counting Large Increment per Cycle events, such as FLOPs,
however, we now have to reserve the next counter and program the
PERF_CTL (config) register with the Merge event (0xFFF), like so:

PERF_CTL0 (msr 0xc0010200) 0x0053ff03 # FLOPs event, umask 0xff
PERF_CTR0 (msr 0xc0010201) 0x8001 # read 64-bit count, wr low 48b
PERF_CTL1 (msr 0xc0010202) 0x000f004000ff # Merge event, enable bit
PERF_CTR1 (msr 0xc0010203) 0x # write higher 16-bits of count

The count is widened from the normal 48-bits to 64 bits by having the
second counter carry the higher 16 bits of the count in its lower 16
bits of its counter register.  Support for mixed 48- and 64-bit counting
is not supported in this version.

For more details, search a Family 17h PPR for the "Large Increment per
Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this version:

https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip

In order to support reserving the extra counter for a single Large
Increment per Cycle event in the perf core, we:

1. Add a f17h get_event_constraints() that returns only an even counter
bitmask, since Large Increment events can only be placed on counters 0,
2, and 4 out of the currently available 0-5.

2. We add a commit_scheduler hook that adds the Merge event (0xFFF) to
any Large Increment event being scheduled.  If the event being scheduled
is not a Large Increment event, we check for, and remove any
pre-existing Large Increment events on the next counter.

3. In the main x86 scheduler, we reduce the number of available
counters by the number of Large Increment per Cycle events being added.
This improves the counter scheduler success rate.

4. In perf_assign_events(), if a counter is assigned to a Large
Increment event, we increment the current counter variable, so the
counter used for the Merge event is skipped.

5. In find_counter(), if a counter has been found for the
Large Increment event, we set the next counter as used, to
prevent other events from using it.

A side-effect of assigning a new get_constraints function for f17h
disables calling the old (prior to f15h) amd_get_event_constraints
implementation left enabled by commit e40ed1542dd7 ("perf/x86: Add perf
support for AMD family-17h processors"), which is no longer
necessary since those North Bridge events are obsolete.

Simple invocation example:

perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/, \
cpu/event=0x03,umask=0xff/ 

 Performance counter stats for '':

   800,000,000  cpu/fp_ret_sse_avx_ops.all/u
   300,042,101  cpu/instructions/u
   800,000,000  cpu/event=0x03,umask=0xff/u

   0.041359898 seconds time elapsed

   0.04120 seconds user
   0.0 seconds sys

Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Kim Phillips 
Cc: Janakarajan Natarajan 
Cc: Suravee Suthikulpanit 
Cc: Tom Lendacky 
Cc: Stephane Eranian 
Cc: Martin Liska 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
---
RFC because I'd like input on the approach, including how to add support
for mixed-width (48- and 64-bit) counting for a single PMU.  Plus there
are bugs:

 - with nmi_watchdog=0, single invocations work, but it fails to count
   correctly under when invoking two simultaneous perfs pinned to the
   same cpu

 - it fails to count correctly under certain conditions with
   nmi_watchdog=1

 arch/x86/events/amd/core.c   | 102 +++
 arch/x86/events/core.c   |  39 +-
 arch/x86/events/perf_event.h |  46 
 3 files changed, 163 insertions(+), 24 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e7d35f60d53f..351e72449fb8 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -12,6 +12,10 @@
 
 

[PATCH] perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops

2019-08-26 Thread Kim Phillips
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
sample bias, IBS hardware preloads the least significant 7 bits of
current count (IbsOpCurCnt) with random values, such that, after the
interrupt is handled and counting resumes, the next sample taken
will be slightly perturbed.

The current count bitfield is in the IBS execution control h/w register,
alongside the maximum count field.

Currently, the IBS driver writes that register with the maximum count,
leaving zeroes to fill the current count field, thereby overwriting
the random bits the hardware preloaded for itself.

Fix the driver to actually retain and carry those random bits from the
read of the IBS control register, through to its write, instead of
overwriting the lower current count bits with zeroes.

Tested with:

perf record -c 11 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 

'perf annotate' output before:

 15.70  65:   addsd %xmm0,%xmm1
 17.30add   $0x1,%rax
 15.88cmp   %rdx,%rax
  je82
 17.32  72:   test  $0x1,%al
  jne   7c
  7.52movapd%xmm1,%xmm0
  5.90jmp   65
  8.23  7c:   sqrtsd%xmm1,%xmm0
 12.15jmp   65

'perf annotate' output after:

 16.63  65:   addsd %xmm0,%xmm1
 16.82add   $0x1,%rax
 16.81cmp   %rdx,%rax
  je82
 16.69  72:   test  $0x1,%al
  jne   7c
  8.30movapd%xmm1,%xmm0
  8.13jmp   65
  8.24  7c:   sqrtsd%xmm1,%xmm0
  8.39jmp   65

Tested on Family 15h and 17h machines.

Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
affect their operation.

It is unknown why commit db98c5faf8cb ("perf/x86: Implement 64-bit
counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
field; the number of preloaded random bits has always been 7, AFAICT.

Signed-off-by: Kim Phillips 
Cc: Stephane Eranian 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: sta...@vger.kernel.org
---
 arch/x86/events/amd/ibs.c | 11 +--
 arch/x86/include/asm/perf_event.h |  9 ++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 62f317c9113a..f2625b4a5a8b 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -663,8 +663,15 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, 
struct pt_regs *iregs)
 out:
if (throttle)
perf_ibs_stop(event, 0);
-   else
-   perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
+   else {
+   period >>= 4;
+
+   if ((ibs_caps & IBS_CAPS_RDWROPCNT) &&
+   (*config & IBS_OP_CNT_CTL))
+   period |= *config & IBS_OP_CUR_CNT_RAND;
+
+   perf_ibs_enable_event(perf_ibs, hwc, period);
+   }
 
perf_event_update_userpage(event);
 
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 1392d5e6e8d6..67d94696a1d6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -259,9 +259,12 @@ struct pebs_lbr {
 #define IBS_FETCH_CNT  0xULL
 #define IBS_FETCH_MAX_CNT  0xULL
 
-/* ibs op bits/masks */
-/* lower 4 bits of the current count are ignored: */
-#define IBS_OP_CUR_CNT (0x0ULL<<32)
+/* ibs op bits/masks
+ * The lower 7 bits of the current count are random bits
+ * preloaded by hardware and ignored in software
+ */
+#define IBS_OP_CUR_CNT (0xFFF80ULL<<32)
+#define IBS_OP_CUR_CNT_RAND(0x0007FULL<<32)
 #define IBS_OP_CNT_CTL (1ULL<<19)
 #define IBS_OP_VAL (1ULL<<18)
 #define IBS_OP_ENABLE  (1ULL<<17)
-- 
2.23.0



Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code

2019-08-26 Thread Cristian Marussi

Hi

On 8/26/19 4:32 PM, Christoph Hellwig wrote:

+config ARCH_USE_COMMON_SMP_STOP
+   def_bool y if SMP


The option belongs into common code and the arch code shoud only
select it.



In fact that was my first approach, but then I noticed that in kernel/ topdir
there was no generic Kconfig but only subsystem specific ones:

Kconfig.freezer  Kconfig.hz   Kconfig.locksKconfig.preempt

while instead looking into archs top level Kconfig, beside the usual 
arch/Kconfig selects,
I could find this similar sort of "reversed" approach in which the arch defined 
and
selected a CONFIG which was indeed then used only in common code like in:

20:37 $ egrep -R ARCH_HAS_CACHE_LINE_SIZE .
./arch/arc/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./arch/x86/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./arch/arm64/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./include/linux/cache.h:#ifndef CONFIG_ARCH_HAS_CACHE_LINE_SIZE

20:39 $ egrep -R ARCH_HAS_KEXEC_PURGATORY .
./arch/powerpc/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/x86/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/s390/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/s390/purgatory/Makefile:obj-$(CONFIG_ARCH_HAS_KEXEC_PURGATORY) += 
kexec-purgatory.o
./arch/s390/Kbuild:obj-$(CONFIG_ARCH_HAS_KEXEC_PURGATORY) += purgatory/
./kernel/kexec_file.c:  if (!IS_ENABLED(CONFIG_ARCH_HAS_KEXEC_PURGATORY))

so I thought it was an acceptable option and I went for it, not to introduce a 
new kernel/Kconfig.smp
just for this new config option; but in fact I could have missed the real 
reason underlying these two
different choices.

Thanks

Cristian


Re: [PATCH v1 net-next 4/4] net: stmmac: setup higher frequency clk support for EHL & TGL

2019-08-26 Thread Florian Fainelli
On 8/26/19 6:38 PM, Voon Weifeng wrote:
> EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk,
> ptp clock and ptp_max_adj to 200MHz.
> 
> Signed-off-by: Voon Weifeng 
> Signed-off-by: Ong Boon Leong 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 21 +
>  drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c |  3 +++
>  include/linux/stmmac.h   |  1 +
>  3 files changed, 25 insertions(+)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> index e969dc9bb9f0..20906287b6d4 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
> @@ -9,6 +9,7 @@
>Author: Giuseppe Cavallaro 
>  
> ***/
>  
> +#include 
>  #include 
>  #include 
>  
> @@ -174,6 +175,19 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
>   plat->axi->axi_blen[1] = 8;
>   plat->axi->axi_blen[2] = 16;
>  
> + plat->ptp_max_adj = plat->clk_ptp_rate;
> +
> + /* Set system clock */
> + plat->stmmac_clk = clk_register_fixed_rate(>dev,
> +"stmmac-clk", NULL, 0,
> +plat->clk_ptp_rate);
> +
> + if (IS_ERR(plat->stmmac_clk)) {
> + dev_warn(>dev, "Fail to register stmmac-clk\n");
> + plat->stmmac_clk = NULL;

Don't you need to propagate at least EPROBE_DEFER here?
-- 
Florian


[PATCH v2 1/3] coresight: tmc: Make memory width mask computation into a function

2019-08-26 Thread Mathieu Poirier
Make the computation of a memory mask representing the width of the memory
bus into a function so that it can be re-used by the ETR driver.

Signed-off-by: Mathieu Poirier 
---
 .../hwtracing/coresight/coresight-tmc-etf.c   | 23 ++-
 drivers/hwtracing/coresight/coresight-tmc.c   | 28 +++
 drivers/hwtracing/coresight/coresight-tmc.h   |  1 +
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 23b7ff00af5c..807416b75ecc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -479,30 +479,11 @@ static unsigned long tmc_update_etf_buffer(struct 
coresight_device *csdev,
 * traces.
 */
if (!buf->snapshot && to_read > handle->size) {
-   u32 mask = 0;
-
-   /*
-* The value written to RRP must be byte-address aligned to
-* the width of the trace memory databus _and_ to a frame
-* boundary (16 byte), whichever is the biggest. For example,
-* for 32-bit, 64-bit and 128-bit wide trace memory, the four
-* LSBs must be 0s. For 256-bit wide trace memory, the five
-* LSBs must be 0s.
-*/
-   switch (drvdata->memwidth) {
-   case TMC_MEM_INTF_WIDTH_32BITS:
-   case TMC_MEM_INTF_WIDTH_64BITS:
-   case TMC_MEM_INTF_WIDTH_128BITS:
-   mask = GENMASK(31, 4);
-   break;
-   case TMC_MEM_INTF_WIDTH_256BITS:
-   mask = GENMASK(31, 5);
-   break;
-   }
+   u32 mask = tmc_get_memwidth_mask(drvdata);
 
/*
 * Make sure the new size is aligned in accordance with the
-* requirement explained above.
+* requirement explained in function tmc_get_memwidth_mask().
 */
to_read = handle->size & mask;
/* Move the RAM read pointer up */
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c 
b/drivers/hwtracing/coresight/coresight-tmc.c
index 3055bf8e2236..1cf82fa58289 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.c
+++ b/drivers/hwtracing/coresight/coresight-tmc.c
@@ -70,6 +70,34 @@ void tmc_disable_hw(struct tmc_drvdata *drvdata)
writel_relaxed(0x0, drvdata->base + TMC_CTL);
 }
 
+u32 tmc_get_memwidth_mask(struct tmc_drvdata *drvdata)
+{
+   u32 mask = 0;
+
+   /*
+* When moving RRP or an offset address forward, the new values must
+* be byte-address aligned to the width of the trace memory databus
+* _and_ to a frame boundary (16 byte), whichever is the biggest. For
+* example, for 32-bit, 64-bit and 128-bit wide trace memory, the four
+* LSBs must be 0s. For 256-bit wide trace memory, the five LSBs must
+* be 0s.
+*/
+   switch (drvdata->memwidth) {
+   case TMC_MEM_INTF_WIDTH_32BITS:
+   /* fallthrough */
+   case TMC_MEM_INTF_WIDTH_64BITS:
+   /* fallthrough */
+   case TMC_MEM_INTF_WIDTH_128BITS:
+   mask = GENMASK(31, 4);
+   break;
+   case TMC_MEM_INTF_WIDTH_256BITS:
+   mask = GENMASK(31, 5);
+   break;
+   }
+
+   return mask;
+}
+
 static int tmc_read_prepare(struct tmc_drvdata *drvdata)
 {
int ret = 0;
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h 
b/drivers/hwtracing/coresight/coresight-tmc.h
index 9dbcdf453e22..71de978575f3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -255,6 +255,7 @@ void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata);
 void tmc_flush_and_stop(struct tmc_drvdata *drvdata);
 void tmc_enable_hw(struct tmc_drvdata *drvdata);
 void tmc_disable_hw(struct tmc_drvdata *drvdata);
+u32 tmc_get_memwidth_mask(struct tmc_drvdata *drvdata);
 
 /* ETB/ETF functions */
 int tmc_read_prepare_etb(struct tmc_drvdata *drvdata);
-- 
2.17.1



[PATCH v2 3/3] coresight: tmc-etr: Add barrier packets when moving offset forward

2019-08-26 Thread Mathieu Poirier
This patch adds barrier packets in the trace stream when the offset in the
data buffer needs to be moved forward.  Otherwise the decoder isn't aware
of the break in the stream and can't synchronise itself with the trace
data.

Signed-off-by: Mathieu Poirier 
Tested-by: Yabin Cui 
---
 .../hwtracing/coresight/coresight-tmc-etr.c   | 29 +++
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index bae47272de98..625882bc8b08 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1418,10 +1418,11 @@ static void tmc_free_etr_buffer(void *config)
  * buffer to the perf ring buffer.
  */
 static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
+unsigned long src_offset,
 unsigned long to_copy)
 {
long bytes;
-   long pg_idx, pg_offset, src_offset;
+   long pg_idx, pg_offset;
unsigned long head = etr_perf->head;
char **dst_pages, *src_buf;
struct etr_buf *etr_buf = etr_perf->etr_buf;
@@ -1430,7 +1431,6 @@ static void tmc_etr_sync_perf_buffer(struct 
etr_perf_buffer *etr_perf,
pg_idx = head >> PAGE_SHIFT;
pg_offset = head & (PAGE_SIZE - 1);
dst_pages = (char **)etr_perf->pages;
-   src_offset = etr_buf->offset + etr_buf->len - to_copy;
 
while (to_copy > 0) {
/*
@@ -1478,7 +1478,7 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
  void *config)
 {
bool lost = false;
-   unsigned long flags, size = 0;
+   unsigned long flags, offset, size = 0;
struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
struct etr_perf_buffer *etr_perf = config;
struct etr_buf *etr_buf = etr_perf->etr_buf;
@@ -1506,16 +1506,35 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
spin_unlock_irqrestore(>spinlock, flags);
 
lost = etr_buf->full;
+   offset = etr_buf->offset;
size = etr_buf->len;
+
+   /*
+* The ETR buffer may be bigger than the space available in the
+* perf ring buffer (handle->size).  If so advance the offset so that we
+* get the latest trace data.  In snapshot mode none of that matters
+* since we are expected to clobber stale data in favour of the latest
+* traces.
+*/
if (!etr_perf->snapshot && size > handle->size) {
-   size = handle->size;
+   u32 mask = tmc_get_memwidth_mask(drvdata);
+
+   /*
+* Make sure the new size is aligned in accordance with the
+* requirement explained in function tmc_get_memwidth_mask().
+*/
+   size = handle->size & mask;
+   offset = etr_buf->offset + etr_buf->len - size;
+
+   if (offset >= etr_buf->size)
+   offset -= etr_buf->size;
lost = true;
}
 
/* Insert barrier packets at the beginning, if there was an overflow */
if (lost)
tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
-   tmc_etr_sync_perf_buffer(etr_perf, size);
+   tmc_etr_sync_perf_buffer(etr_perf, offset, size);
 
/*
 * In snapshot mode we simply increment the head by the number of byte
-- 
2.17.1



[PATCH v2 0/3] coresight: Add barrier packet when moving offset forward

2019-08-26 Thread Mathieu Poirier
This set builds on top of an original patch by Yabin Cui[1] that deals with
cases where the ETR buffer it bigger than the space available in the perf
ring buffer.  The work herein complements Yabin's by inserting barrier
packets after the head of the memory buffer has been moved forward in order
for the trace decoder to still synchronise with the trace stream.  

Applies cleanly to the coresight next branch.

Thanks,
Mathieu

[1]. https://lkml.org/lkml/2019/8/14/1336

New to V2:
- Added Yabin's Tested-by.
- Addressed Leo's comment about extending the solution to the sysfs
  interface.
- Split the work in 3 patches rather than 2.

Mathieu Poirier (3):
  coresight: tmc: Make memory width mask computation into a function
  coresight: tmc-etr: Decouple buffer sync and barrier packet insertion
  coresight: tmc-etr: Add barrier packets when moving offset forward

 .../hwtracing/coresight/coresight-tmc-etf.c   | 23 +
 .../hwtracing/coresight/coresight-tmc-etr.c   | 47 ++-
 drivers/hwtracing/coresight/coresight-tmc.c   | 28 +++
 drivers/hwtracing/coresight/coresight-tmc.h   |  1 +
 4 files changed, 67 insertions(+), 32 deletions(-)

-- 
2.17.1



[PATCH v2 2/3] coresight: tmc-etr: Decouple buffer sync and barrier packet insertion

2019-08-26 Thread Mathieu Poirier
If less space is available in the perf ring buffer than the ETR buffer,
barrier packets inserted in the trace stream by tmc_sync_etr_buf() are
skipped over when the head of the buffer is moved forward, resulting in
traces that can't be decoded.

This patch decouples the process of syncing ETR buffers and the addition
of barrier packets in order to perform the latter once the offset in the
trace buffer has been properly computed.

Signed-off-by: Mathieu Poirier 
---
 .../hwtracing/coresight/coresight-tmc-etr.c| 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c 
b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 4f000a03152e..bae47272de98 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -946,10 +946,6 @@ static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata)
WARN_ON(!etr_buf->ops || !etr_buf->ops->sync);
 
etr_buf->ops->sync(etr_buf, rrp, rwp);
-
-   /* Insert barrier packets at the beginning, if there was an overflow */
-   if (etr_buf->full)
-   tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
 }
 
 static void __tmc_etr_enable_hw(struct tmc_drvdata *drvdata)
@@ -1086,6 +1082,13 @@ static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata 
*drvdata)
drvdata->sysfs_buf = NULL;
} else {
tmc_sync_etr_buf(drvdata);
+   /*
+* Insert barrier packets at the beginning, if there was
+* an overflow.
+*/
+   if (etr_buf->full)
+   tmc_etr_buf_insert_barrier_packet(etr_buf,
+ etr_buf->offset);
}
 }
 
@@ -1502,11 +1505,16 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
CS_LOCK(drvdata->base);
spin_unlock_irqrestore(>spinlock, flags);
 
+   lost = etr_buf->full;
size = etr_buf->len;
if (!etr_perf->snapshot && size > handle->size) {
size = handle->size;
lost = true;
}
+
+   /* Insert barrier packets at the beginning, if there was an overflow */
+   if (lost)
+   tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset);
tmc_etr_sync_perf_buffer(etr_perf, size);
 
/*
@@ -1517,8 +1525,6 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
 */
if (etr_perf->snapshot)
handle->head += size;
-
-   lost |= etr_buf->full;
 out:
/*
 * Don't set the TRUNCATED flag in snapshot mode because 1) the
-- 
2.17.1



Re: [patch V3 28/38] posix-cpu-timers: Restructure expiry array

2019-08-26 Thread Frederic Weisbecker
On Mon, Aug 26, 2019 at 08:22:24PM +0200, Thomas Gleixner wrote:
> Now that the abused struct task_cputime is gone, it's more natural to
> bundle the expiry cache and the list head of each clock into a struct and
> have an array of those structs.
> 
> Follow the hrtimer naming convention of 'bases' and rename the expiry cache
> to 'nextevt' and adapt all usage sites.
> 
> Generates also better code .text size shrinks by 80 bytes.
> 
> Suggested-by: Ingo Molnar 
> Signed-off-by: Thomas Gleixner 
> ---
> V2: New patch
> V3: Address review feedback from Frederic

Reviewed-by: Frederic Weisbecker 


Re: [PATCH v2 04/15] kvm: x86: Add per-VM APICv state debugfs

2019-08-26 Thread Suthikulpanit, Suravee
Alex,

On 8/19/2019 4:57 AM, Alexander Graf wrote:
> 
> 
> On 15.08.19 18:25, Suthikulpanit, Suravee wrote:
>> Currently, there is no way to tell whether APICv is active
>> on a particular VM. This often cause confusion since APICv
>> can be deactivated at runtime.
>>
>> Introduce a debugfs entry to report APICv state of a VM.
>> This creates a read-only file:
>>
>>     /sys/kernel/debug/kvm/70860-14/apicv-state
>>
>> Signed-off-by: Suravee Suthikulpanit 
> 
> Shouldn't this first and foremost be a VM ioctl so that user space can 
> inquire its own state?
> 
> 
> Alex

I introduce this mainly for debugging similar to how KVM is currently provides
some per-VCPU information:

 /sys/kernel/debug/kvm/15957-14/vcpu0/
 lapic_timer_advance_ns
 tsc-offset
 tsc-scaling-ratio
 tsc-scaling-ratio-frac-bits

I'm not sure if this needs to be VM ioctl at this point. If this information is
useful for user-space tool to inquire via ioctl, we can also provide it.

Thanks,
Suravee


[PATCH 03/10] mm/oom_debug: Add Tasks Summary

2019-08-26 Thread Edward Chron
Adds config option and code to support printing a Process / Thread Summary
of process / thread activity when an OOM event occurs. The information
provided includes the number of process and threads active, the number
of oom eligible and oom ineligible tasks, the total number of forks
that have happened since the system booted and the number of runnable
and I/O blocked processes. All values are at the time of the OOM event.

Configuring this Debug Option (DEBUG_OOM_TASKS_SUMMARY)
---
To get the tasks information summary this option must be configured.
The Tasks Summary option uses the CONFIG_DEBUG_OOM_TASKS_SUMMARY
kernel config option which is found in the kernel config under the entry:
Kernel hacking, Memory Debugging, OOM Debugging entry. The config option
to select is: DEBUG_OOM_TASKS_SUMMARY.

Dynamic disable or re-enable this OOM Debug option
--
The oom debugfs base directory is found at: /sys/kernel/debug/oom.
The oom debugfs for this option is: tasks_summary_
and there is just one file for this option, the enable file.

The option may be disabled or re-enabled using the debugfs entry for
this OOM debug option. The debugfs file to enable this option is found at:
/sys/kernel/debug/oom/tasks_summary_enabled
The option's enabled file value determines whether the facility is enabled
or disabled. A value of 1 is enabled (default) and a value of 0 is
disabled. When configured the default setting is set to enabled.

Content and format of Tasks Summary Output
--
One line of output that includes:
  - Number of Threads
  - Number of processes
  - Forks since boot
  - Processes that are runnable
  - Processes that are in iowait

Sample Output:
-
Sample Tasks Summary message output:

Aug 13 18:52:48 yoursystem kernel: Threads: 492 Processes: 248
 forks_since_boot: 7786 procs_runable: 4 procs_iowait: 0


Signed-off-by: Edward Chron 
---
 mm/Kconfig.debug| 16 
 mm/oom_kill_debug.c | 27 +++
 2 files changed, 43 insertions(+)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index dbe599b67a3b..fcbc5f9aa146 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -147,3 +147,19 @@ config DEBUG_OOM_SYSTEM_STATE
  A value of 1 is enabled (default) and a value of 0 is disabled.
 
  If unsure, say N.
+
+config DEBUG_OOM_TASKS_SUMMARY
+   bool "Debug OOM System Tasks Summary"
+   depends on DEBUG_OOM
+   help
+ When enabled, provides a kernel process/thread summary recording
+ the system's process/thread activity at the time an OOM event.
+ The number of processes and of threads, the number of runnable
+ and I/O blocked threads, the number of forks since boot and the
+ number of oom eligible and oom ineligble tasks are provided in
+ the output. If configured it is enabled/disabled by setting the
+ enabled file entry in the debugfs OOM interface at:
+ /sys/kernel/debug/oom/tasks_summary_enabled
+ A value of 1 is enabled (default) and a value of 0 is disabled.
+
+ If unsure, say N.
diff --git a/mm/oom_kill_debug.c b/mm/oom_kill_debug.c
index 6eeaad86fca8..395b3307f822 100644
--- a/mm/oom_kill_debug.c
+++ b/mm/oom_kill_debug.c
@@ -152,6 +152,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
+#include 
+#endif
+
 #define OOMD_MAX_FNAME 48
 #define OOMD_MAX_OPTNAME 32
 
@@ -182,6 +186,12 @@ static struct oom_debug_option oom_debug_options_table[] = 
{
.option_name= "system_state_summary_",
.support_tpercent = false,
},
+#endif
+#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
+   {
+   .option_name= "tasks_summary_",
+   .support_tpercent = false,
+   },
 #endif
{}
 };
@@ -190,6 +200,9 @@ static struct oom_debug_option oom_debug_options_table[] = {
 enum oom_debug_options_index {
 #ifdef CONFIG_DEBUG_OOM_SYSTEM_STATE
SYSTEM_STATE,
+#endif
+#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
+   TASKS_STATE,
 #endif
OUT_OF_BOUNDS
 };
@@ -320,6 +333,15 @@ static void oom_kill_debug_system_summary_prt(void)
 }
 #endif /* CONFIG_DEBUG_OOM_SYSTEM_STATE */
 
+#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
+static void oom_kill_debug_tasks_summary_print(void)
+{
+   pr_info("Threads:%d Processes:%d forks_since_boot:%lu procs_runable:%lu 
procs_iowait:%lu\n",
+   nr_threads, nr_processes(),
+   total_forks, nr_running(), nr_iowait());
+}
+#endif /* CONFIG_DEBUG_OOM_TASKS_SUMMARY */
+
 u32 oom_kill_debug_oom_event_is(void)
 {
++oom_kill_debug_oom_events;
@@ -329,6 +351,11 @@ u32 oom_kill_debug_oom_event_is(void)
oom_kill_debug_system_summary_prt();
 #endif
 
+#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
+   if (oom_kill_debug_enabled(TASKS_STATE))
+   oom_kill_debug_tasks_summary_print();

<    1   2   3   4   5   6   7   8   9   10   >