Re: [GIT pull] x86/pti for 5.4-rc1

2019-09-25 Thread Ingo Molnar


* Song Liu  wrote:

> 
> 
> > On Sep 17, 2019, at 4:35 PM, Linus Torvalds  
> > wrote:
> > 
> > On Tue, Sep 17, 2019 at 4:29 PM Song Liu  wrote:
> >> 
> >> How about we just do:
> >> 
> >> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> >> index b196524759ec..0437f65250db 100644
> >> --- i/arch/x86/mm/pti.c
> >> +++ w/arch/x86/mm/pti.c
> >> @@ -341,6 +341,7 @@ pti_clone_pgtable(unsigned long start, unsigned long 
> >> end,
> >>}
> >> 
> >>if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> >> +   WARN_ON_ONCE(addr & ~PMD_MASK);
> >>target_pmd = pti_user_pagetable_walk_pmd(addr);
> >>if (WARN_ON(!target_pmd))
> >>return;
> >> 
> >> So it is a "warn and continue" check just for unaligned PMD address.
> > 
> > The problem there is that the "continue" part can be wrong.
> > 
> > Admittedly it requires a pretty crazy setup: you first hit a
> > pmd_large() entry, but the *next* pmd is regular, so you start doing
> > the per-page cloning.
> > 
> > And that per-page cloning will be wrong, because it will start in the
> > middle of the next pmd, because addr wasn't aligned, and the previous
> > pmd-only clone did
> > 
> >addr += PMD_SIZE;
> > 
> > to go to the next case.
> > 
> > See?
> 
> I see. This is tricky. 
> 
> Maybe we should skip clone of the first unaligned large pmd?
> 
> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> index 7f2140414440..1dfa69f8196b 100644
> --- i/arch/x86/mm/pti.c
> +++ w/arch/x86/mm/pti.c
> @@ -343,6 +343,11 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> }
> 
> if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> +   if (WARN_ON_ONCE(addr & ~PMD_MASK)) {
> +   addr = round_up(addr, PMD_SIZE);
> +   continue;
> +   }
> +
> target_pmd = pti_user_pagetable_walk_pmd(addr);
> if (WARN_ON(!target_pmd))
> return;

No, we should do a proper iteration of the page table structures.

> Or we can round_down the addr and copy the whole PMD properly:
> 
> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> index 7f2140414440..bee9881f2e85 100644
> --- i/arch/x86/mm/pti.c
> +++ w/arch/x86/mm/pti.c
> @@ -343,6 +343,9 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> }
> 
> if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> +   if (WARN_ON_ONCE(addr & ~PMD_MASK))
> +   addr &= PMD_MASK;
> +
> target_pmd = pti_user_pagetable_walk_pmd(addr);
> if (WARN_ON(!target_pmd))
> return;
> 
> I think the latter is better, but I am not sure. 

While this works, it's the wrong iterator pattern I believe.

In this function we iterate by passing in a 'random' [start,end) virtual 
memory address range with no particular alignment assumptions, then look 
up all pagetable entries covered by that range.

The iteration's principle is straightforward: we look up the first 
address (byte granular) then continue iterating according to the observed 
structure of the kernel pagetables, by skipping the range we have just 
looked up:

- If the current PUD is not mapped, then we set 'addr' to the first byte 
  after the virtual memory range represented by the current PUD entry:

addr = round_up(addr + 1, PUD_SIZE);

- If the current PMD is not mapped, then the next byte is:

addr = round_up(addr + 1, PMD_SIZE);

The part Linus correctly pointed it is still iterating incorrectly and 
might potentially be unrobust is:

addr += PMD_SIZE;

This is buggy because it doesn't step to the next byte after the current 
mapped PMD, but potentially somewhere into the middle of the next 
PMD-sized range of virtual memory (which might or might not be covered by 
a PMD entry). The iterations after that might be similarly offset and 
buggy as well.

The right fix is to *fix the address iterator*, to use the basic 
principle of the function, with the same general exact calculation 
pattern we use in the other cases:

addr = round_down(addr, PMD_SIZE) + PMD_SIZE;

BTW., I'd also suggest using this new round_down() pattern in the other 
two cases as well:

addr = round_down(addr, PUD_SIZE) + PUD_SIZE;
...
addr = round_down(addr, PMD_SIZE) + PMD_SIZE;

Why? Because this:

addr = round_up(addr + 1, PUD_SIZE);

Will iterate incorrectly if 'addr' (which is byte granular) is the last 
*byte* of a PUD range, it will incorrectly skip the next PUD range...

Is a page-unaligned address likely to be passed in to this function? With 
the current users I really hope it won't happen, but it costs nothing to 
use clean iterators and think through all cases - it also makes the 

Re: [RFC 1/2] mmc: sdhci-msm: Add support for bus bandwidth voting

2019-09-25 Thread ppvk

On 2019-09-12 18:26, Georgi Djakov wrote:

Hi Pradeep,

Thanks for the patch!

On 9/6/19 15:47, Pradeep P V K wrote:

Vote for the MSM bus bandwidth required by SDHC driver
based on the clock frequency and bus width of the card.
Otherwise,the system clocks may run at minimum clock speed
and thus affecting the performance.

This change is based on Georgi Djakov [RFC]
(https://lkml.org/lkml/2018/10/11/499)


I am just wondering whether do we really need to predefine the 
bandwidth values
in DT? Can't we use the computations from the above patch or is there 
any

problem with that approach?

Thanks,
Georgi


Hi Georgi,

By using the direct required bandwidth(bw / 1000) values, it will not 
guarantee
that all the NOC clocks are running in the same voltage corner as 
required,

which is very crucial for power concern devices like Mobiles etc.
Also, it will not guarantee that the value passed is in proper Clock 
Plans domain

there by effecting the requested Bandwidth.
I think, you already aware of these consequences on using direct 
bandwidth values for

RPMh based devices.

The value the we passed in DT will make sure that all the NOC clocks 
between the end points
are running in the same voltage corners as required and also it will 
guarantee that

the requested BW's for the clients are obtained.

Hence the reason for passing the predefined bandwidth values in DT.

Thanks and Regards,
Pradeep


Re: [PATCH] ieee802154: mcr20a: simplify a bit 'mcr20a_handle_rx_read_buf_complete()'

2019-09-25 Thread Stefan Schmidt

Hello.

On 24.09.19 23:40, Xue Liu wrote:

On Sat, 21 Sep 2019 at 13:52, Stefan Schmidt  wrote:


Hello Xue.

On 20.09.19 21:45, Christophe JAILLET wrote:

Use a 'skb_put_data()' variant instead of rewritting it.
The __skb_put_data variant is safe here. It is obvious that the skb can
not overflow. It has just been allocated a few lines above with the same
'len'.

Signed-off-by: Christophe JAILLET 
---
  drivers/net/ieee802154/mcr20a.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ieee802154/mcr20a.c b/drivers/net/ieee802154/mcr20a.c
index 17f2300e63ee..8dc04e2590b1 100644
--- a/drivers/net/ieee802154/mcr20a.c
+++ b/drivers/net/ieee802154/mcr20a.c
@@ -800,7 +800,7 @@ mcr20a_handle_rx_read_buf_complete(void *context)
   if (!skb)
   return;

- memcpy(skb_put(skb, len), lp->rx_buf, len);
+ __skb_put_data(skb, lp->rx_buf, len);
   ieee802154_rx_irqsafe(lp->hw, skb, lp->rx_lqi[0]);

   print_hex_dump_debug("mcr20a rx: ", DUMP_PREFIX_OFFSET, 16, 1,



Could you please review and ACK this? If you are happy I will take it
through my tree.

regards
Stefan Schmidt


Acked-by: Xue Liu 



This patch has been applied to the wpan tree and will be
part of the next pull request to net. Thanks!

regards
Stefan Schmidt


[PATCH xfstests v3] overlay: Enable character device to be the base fs partition

2019-09-25 Thread Zhihao Cheng
When running overlay tests using character devices as base fs partitions,
all overlay usecase results become 'notrun'. Function
'_overay_config_override' (common/config) detects that the current base
fs partition is not a block device and will set FSTYP to base fs. The
overlay usecase will check the current FSTYP, and if it is not 'overlay'
or 'generic', it will skip the execution.

For example, using UBIFS as base fs skips all overlay usecases:

  FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
  MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
  MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch

  overlay/001   [not run] not suitable for this filesystem type: ubifs
  overlay/002   [not run] not suitable for this filesystem type: ubifs
  overlay/003   [not run] not suitable for this filesystem type: ubifs

When checking that the base fs partition is a block/character device,
FSTYP is overwritten as 'overlay'. This patch allows the base fs
partition to be a character device that can also execute overlay
usecases (such as ubifs).

Signed-off-by: Zhihao Cheng 
Signed-off-by: Amir Goldstein 
---
 common/config | 6 +++---
 common/rc | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/common/config b/common/config
index 4c86a49..4eda36c 100644
--- a/common/config
+++ b/common/config
@@ -532,7 +532,7 @@ _canonicalize_mountpoint()
 # When SCRATCH/TEST_* vars are defined in evironment and not
 # in config file, this function is called after vars have already
 # been overriden in the previous test.
-# In that case, TEST_DEV is a directory and not a blockdev and
+# In that case, TEST_DEV is a directory and not a blockdev/chardev and
 # the function will return without overriding the SCRATCH/TEST_* vars.
 _overlay_config_override()
 {
@@ -550,7 +550,7 @@ _overlay_config_override()
#the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
#of the configured base fs and SCRATCH/TEST_DEV vars are set to the
#overlayfs base and mount dirs inside base fs mount.
-   [ -b "$TEST_DEV" ] || return 0
+   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
 
# Config file may specify base fs type, but we obay -overlay flag
[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
@@ -570,7 +570,7 @@ _overlay_config_override()
export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
 
-   [ -b "$SCRATCH_DEV" ] || return 0
+   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
 
# Store original base fs vars
export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
diff --git a/common/rc b/common/rc
index 66c7fd4..8d57c37 100644
--- a/common/rc
+++ b/common/rc
@@ -3100,7 +3100,7 @@ _require_scratch_shutdown()
# SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
# will be null, so check OVL_BASE_SCRATCH_DEV before
# running shutdown to avoid shutting down base fs 
accidently.
-   _notrun "$SCRATCH_DEV is not a block device"
+   _notrun "This test requires a valid 
$OVL_BASE_SCRATCH_DEV as ovl base fs"
else
src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
|| _notrun "Underlying filesystem does not support 
shutdown"
-- 
2.7.4



Re: [PATCH v9 03/11] pwm: mediatek: remove a property "has-clks"

2019-09-25 Thread Uwe Kleine-König
On Fri, Sep 20, 2019 at 06:49:03AM +0800, Sam Shih wrote:
> We can use fixed-clock to repair mt7628 pwm during configure from
> userspace. The SoC is legacy MIPS and has no complex clock tree.
> Due to we can get clock frequency for period calculation from DT
> fixed-clock, so we can remove has-clock property, and directly
> use devm_clk_get and clk_get_rate.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 
> Acked-by: Uwe Kleine-Kö 
> ---
> Changes since v9:
> Added an Acked-by tag

Argh, my name was croped and ended up in this state in
5c50982af47ffe36df3e31bc9e11be5a067ddd18. Thierry, any chance to repair
that? Something
like

git filter-branch --msg-filter 'sed "s/Kleine-Kö /Kleine-König /"' 
linus/master..

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH V2] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Mike Rapoport
On Tue, Sep 24, 2019 at 04:09:32PM +0800, Yunfeng Ye wrote:
> sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
> for page management structure, if memory allocation fails from specified
> node, it will fall back to allocate from other nodes.
> 
> Normally, the page management structure will not exceed 2% of the total
> memory, but a large continuous block of allocation is needed. In most
> cases, memory allocation from the specified node will success always,
> but a node memory become highly fragmented will fail. we expect to
> allocate memory base section rather than by allocating a large block of
> memory from other NUMA nodes
> 
> Add memblock_alloc_exact_nid_raw() for this situation, which allocate
> boot memory block on the exact node. If a large contiguous block memory
> allocate fail in sparse_buffer_init(), it will fall back to allocate
> small block memory base section.
> 
> Signed-off-by: Yunfeng Ye 
> ---
> v1 -> v2:
>  - use memblock_alloc_exact_nid_raw() rather than using a flag
> 
>  include/linux/memblock.h |  3 +++
>  mm/memblock.c| 66 
> 
>  mm/sparse.c  |  2 +-
>  3 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f491690..b38bbef 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -358,6 +358,9 @@ static inline phys_addr_t memblock_phys_alloc(phys_addr_t 
> size,
>MEMBLOCK_ALLOC_ACCESSIBLE);
>  }
> 
> +void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
> +  phys_addr_t min_addr, phys_addr_t max_addr,
> +  int nid);
>  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>phys_addr_t min_addr, phys_addr_t max_addr,
>int nid);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7d4f61a..a71869e 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
> *zone,
>   * @start: the lower bound of the memory region to allocate (phys address)
>   * @end: the upper bound of the memory region to allocate (phys address)
>   * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
> + * @need_exact_nid: control the allocation fall back to other nodes
>   *
>   * The allocation is performed from memory region limited by
>   * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
>   *
> - * If the specified node can not hold the requested memory the
> - * allocation falls back to any node in the system
> + * If the specified node can not hold the requested memory and 
> @need_exact_nid
> + * is zero, the allocation falls back to any node in the system
>   *
>   * For systems with memory mirroring, the allocation is attempted first
>   * from the regions with mirroring enabled and then retried from any
> @@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
> *zone,
>   */
>  static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>   phys_addr_t align, phys_addr_t start,
> - phys_addr_t end, int nid)
> + phys_addr_t end, int nid,
> + int need_exact_nid)

Please make it 'bool exact_nid'

>  {
>   enum memblock_flags flags = choose_memblock_flags();
>   phys_addr_t found;
> @@ -1365,7 +1367,7 @@ static phys_addr_t __init 
> memblock_alloc_range_nid(phys_addr_t size,
>   if (found && !memblock_reserve(found, size))
>   goto done;
> 
> - if (nid != NUMA_NO_NODE) {
> + if (nid != NUMA_NO_NODE && !need_exact_nid) {
>   found = memblock_find_in_range_node(size, align, start,
>   end, NUMA_NO_NODE,
>   flags);
> @@ -1413,7 +1415,8 @@ phys_addr_t __init 
> memblock_phys_alloc_range(phys_addr_t size,
>phys_addr_t start,
>phys_addr_t end)
>  {
> - return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
> + return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
> + 0);
>  }
> 
>  /**
> @@ -1432,7 +1435,7 @@ phys_addr_t __init 
> memblock_phys_alloc_range(phys_addr_t size,
>  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t 
> align, int nid)
>  {
>   return memblock_alloc_range_nid(size, align, 0,
> - MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> + MEMBLOCK_ALLOC_ACCESSIBLE, nid, 0);
>  }
> 
>  /**
> @@ -1442,6 +1445,7 @@ phys_addr_t __init 
> 

[PATCH v2 0/2] Enable Goldfish RTC for RISC-V

2019-09-25 Thread Anup Patel
We will be using Goldfish RTC device real date-time on QEMU RISC-V virt
machine so this series:
1. Allows GOLDFISH kconfig option to be enabled for RISC-V
2. Enables GOLDFISH RTC driver in RISC-V defconfigs

This series can be found in goldfish_rtc_v2 branch at:
https//github.com/avpatel/linux.git

For the QEMU patches adding Goldfish RTC to virt machine refer:
https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg05465.html

Changes since v1:
 - Updated PATCH1 to allow goldfish drivers for all archs with IOMEM
   and DMA support

Anup Patel (2):
  platform: goldfish: Allow goldfish drivers for archs with IOMEM and
DMA
  RISC-V: defconfig: Enable Goldfish RTC driver

 arch/riscv/configs/defconfig  | 3 +++
 arch/riscv/configs/rv32_defconfig | 3 +++
 drivers/platform/goldfish/Kconfig | 3 +--
 3 files changed, 7 insertions(+), 2 deletions(-)

--
2.17.1


[PATCH v2 2/2] RISC-V: defconfig: Enable Goldfish RTC driver

2019-09-25 Thread Anup Patel
We have Goldfish RTC device available on QEMU RISC-V virt machine
hence enable required driver in RV32 and RV64 defconfigs.

Signed-off-by: Anup Patel 
---
 arch/riscv/configs/defconfig  | 3 +++
 arch/riscv/configs/rv32_defconfig | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 3efff552a261..57b4f67b0c0b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -73,7 +73,10 @@ CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
 CONFIG_MMC=y
 CONFIG_MMC_SPI=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_GOLDFISH=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_GOLDFISH=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_AUTOFS4_FS=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 7da93e494445..50716c1395aa 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -69,7 +69,10 @@ CONFIG_USB_OHCI_HCD=y
 CONFIG_USB_OHCI_HCD_PLATFORM=y
 CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_GOLDFISH=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_GOLDFISH=y
 CONFIG_SIFIVE_PLIC=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
-- 
2.17.1



[PATCH v2 1/2] platform: goldfish: Allow goldfish drivers for archs with IOMEM and DMA

2019-09-25 Thread Anup Patel
We don't need explicit dependency of Goldfish kconfig option on various
architectures. Instead, the Goldfish kconfig option should only depend
on HAS_IOMEM and HAS_DMA which is sufficient for all Goldfish devices.

Signed-off-by: Anup Patel 
---
 drivers/platform/goldfish/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/platform/goldfish/Kconfig 
b/drivers/platform/goldfish/Kconfig
index 77b35df3a801..f3d09b1631e3 100644
--- a/drivers/platform/goldfish/Kconfig
+++ b/drivers/platform/goldfish/Kconfig
@@ -1,8 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 menuconfig GOLDFISH
bool "Platform support for Goldfish virtual devices"
-   depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
-   depends on HAS_IOMEM
+   depends on HAS_IOMEM && HAS_DMA
help
  Say Y here to get to see options for the Goldfish virtual platform.
  This option alone does not add any kernel code.
-- 
2.17.1



Re: [PATCH v2 00/21] Refine memblock API

2019-09-25 Thread Mike Rapoport
(updated CC)

Hi,

On Tue, Sep 24, 2019 at 12:52:35PM -0500, Adam Ford wrote:
> On Mon, Jan 21, 2019 at 2:05 AM Mike Rapoport  wrote:
> >
> > Hi,
> >
> > v2 changes:
> > * replace some more %lu with %zu
> > * remove panics where they are not needed in s390 and in printk
> > * collect Acked-by and Reviewed-by.
> >
> >
> > Christophe Leroy (1):
> >   powerpc: use memblock functions returning virtual address
> >
> > Mike Rapoport (20):
> >   openrisc: prefer memblock APIs returning virtual address
> >   memblock: replace memblock_alloc_base(ANYWHERE) with memblock_phys_alloc
> >   memblock: drop memblock_alloc_base_nid()
> >   memblock: emphasize that memblock_alloc_range() returns a physical address
> >   memblock: memblock_phys_alloc_try_nid(): don't panic
> >   memblock: memblock_phys_alloc(): don't panic
> >   memblock: drop __memblock_alloc_base()
> >   memblock: drop memblock_alloc_base()
> >   memblock: refactor internal allocation functions
> >   memblock: make memblock_find_in_range_node() and choose_memblock_flags() 
> > static
> >   arch: use memblock_alloc() instead of memblock_alloc_from(size, align, 0)
> >   arch: don't memset(0) memory returned by memblock_alloc()
> >   ia64: add checks for the return value of memblock_alloc*()
> >   sparc: add checks for the return value of memblock_alloc*()
> >   mm/percpu: add checks for the return value of memblock_alloc*()
> >   init/main: add checks for the return value of memblock_alloc*()
> >   swiotlb: add checks for the return value of memblock_alloc*()
> >   treewide: add checks for the return value of memblock_alloc*()
> >   memblock: memblock_alloc_try_nid: don't panic
> >   memblock: drop memblock_alloc_*_nopanic() variants
> >
> I know it's rather late, but this patch broke the Etnaviv 3D graphics
> in my i.MX6Q.
 
Can you identify the exact patch from the series that caused the
regression?

> When I try to use the 3D, it returns some errors and the dmesg log
> shows some memory allocation errors too:
> [3.682347] etnaviv etnaviv: bound 13.gpu (ops gpu_ops)
> [3.688669] etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
> [3.695099] etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
> [3.700800] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108
> [3.723013] etnaviv-gpu 13.gpu: command buffer outside valid
> memory window
> [3.731308] etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
> [3.752437] etnaviv-gpu 134000.gpu: command buffer outside valid
> memory window
> [3.760583] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
> [3.766766] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
> [3.776131] [drm] Initialized etnaviv 1.2.0 20151214 for etnaviv on minor 0
> 
> # glmark2-es2-drm
> Error creating gpu
> Error: eglCreateWindowSurface failed with error: 0x3009
> Error: eglCreateWindowSurface failed with error: 0x3009
> Error: CanvasGeneric: Invalid EGL state
> Error: main: Could not initialize canvas
> 
> 
> Before this patch:
> 
> [3.691995] etnaviv etnaviv: bound 13.gpu (ops gpu_ops)
> [3.698356] etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
> [3.704792] etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
> [3.710488] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108
> [3.733649] etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
> [3.756115] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
> [3.762250] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
> [3.771432] [drm] Initialized etnaviv 1.2.0 20151214 for etnaviv on minor 0
> 
> and the 3D gemos work without this.
> 
> I don't know enough about the i.MX6 nor the 3D accelerator to know how
> to fix it.
> I am hoping someone in the know might have some suggestions.

Can you please add "memblock=debug" to your kernel command line and send
kernel logs for both working and failing versions? 

-- 
Sincerely yours,
Mike.



Re: + mm-thp-extract-split_queue_-into-a-struct.patch added to -mm tree

2019-09-25 Thread Michal Hocko
On Tue 24-09-19 09:26:37, Yang Shi wrote:
> 
> 
> On 9/24/19 6:56 AM, Michal Hocko wrote:
> > Do we really need this if deferred list is going to be shrunk more
> > pro-actively as discussed already - I am sorry I do not have a link handy
> > but in short the deferred list would be drained from a kworker context
> > more pro-actively rather than wait for the memory pressure to happen.
> 
> From our experience I really didn't see the current waiting for memory
> pressure approach is a problem, it does work well and is still a good
> compromise. And, I'm supposed we all agree the side effect incurred by the
> more proactive kworker approach is definitely a concern (i.e. may waste cpu
> cycles, break isolation, etc) according to our discussion.
>
> And we do have other much simpler ways to shrink THPs more proactively, for
> example, waking up kswapd more aggressively via tuning
> watermark_scale_factor, and/or do shrinking harder, etc.

No, we do not want to make THPs even more tricky to configure then they
are now. There are many howtos out there to recommend disabling THPs
because they might have performance or other subtle side effects. I do
not want to feed that cargo cult even more. Really users shouldn't even
notice that THPs are split because that is so much of an internal
implementation detail. Now the existing implementation really hides a
lot of memory and it requires some expertise to understand where that
memory went. And the later part is something users tend to care about
from experience.

So I really do think that waiting for the memory pressure is simply a
wrong thing to do. It causes unnecessary reclaim because some
of the pages are going to be dropped before the shrinker can act.

> Even though we have to drain THPs more proactively by whatever means in the
> future, I'd prefer it is memcg aware as well.

OK, let's keep it then until we have another means for pro-active
draining.
-- 
Michal Hocko
SUSE Labs


[PATCH v1] ARM: dts: realtek: Add support for Realtek evaluation

2019-09-25 Thread James Tai
From: "james.tai" 

This patch adds a generic devicetree board file and a dtsi for
Realtek RTD16XX and RTD13XX platform.

Signed-off-by: james.tai 
---
Changes since last version:
- Add RTD13XX platform.
- Add PSCI support.
- Add aliases.
- move uart0 to dtsi file.
---
 arch/arm/boot/dts/Makefile|   4 +
 arch/arm/boot/dts/rtd1319-pymparticle.dts |  30 ++
 arch/arm/boot/dts/rtd13xx.dtsi| 105 +++
 arch/arm/boot/dts/rtd1619-mjolnir.dts |  31 ++
 arch/arm/boot/dts/rtd16xx.dtsi| 119 ++
 5 files changed, 289 insertions(+)
 create mode 100644 arch/arm/boot/dts/rtd1319-pymparticle.dts
 create mode 100644 arch/arm/boot/dts/rtd13xx.dtsi
 create mode 100644 arch/arm/boot/dts/rtd1619-mjolnir.dts
 create mode 100644 arch/arm/boot/dts/rtd16xx.dtsi

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 9159fa2cea90..c401184622cd 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -927,6 +927,9 @@ dtb-$(CONFIG_ARCH_ROCKCHIP) += \
rk3288-veyron-pinky.dtb \
rk3288-veyron-speedy.dtb \
rk3288-vyasa.dtb
+ dtb-$(CONFIG_ARCH_REALTEK) += \
+   rtd1619-mjolnir.dtb \
+   rtd1319-pymparticle.dtb
 dtb-$(CONFIG_ARCH_S3C24XX) += \
s3c2416-smdk2416.dtb
 dtb-$(CONFIG_ARCH_S3C64XX) += \
@@ -1286,3 +1289,4 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-opp-zaius.dtb \
aspeed-bmc-portwell-neptune.dtb \
aspeed-bmc-quanta-q71l.dtb
+
diff --git a/arch/arm/boot/dts/rtd1319-pymparticle.dts 
b/arch/arm/boot/dts/rtd1319-pymparticle.dts
new file mode 100644
index ..887ab894bba1
--- /dev/null
+++ b/arch/arm/boot/dts/rtd1319-pymparticle.dts
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2019 Realtek Semiconductor Corp.
+ */
+
+/dts-v1/;
+
+#include "rtd13xx.dtsi"
+
+/ {
+   model= "Realtek Pymparticle Evaluation Board";
+
+   aliases {
+   serial0 = 
+   };
+
+   chosen {
+   bootargs = "console=ttyS0,460800 earlycon";
+   };
+
+   memory@0 {
+   device_type = "memory";
+   reg = <0x 0x 0x 0x8000>;
+   };
+};
+
+ {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/rtd13xx.dtsi b/arch/arm/boot/dts/rtd13xx.dtsi
new file mode 100644
index ..9a565919ab83
--- /dev/null
+++ b/arch/arm/boot/dts/rtd13xx.dtsi
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (c) 2019 Realtek Semiconductor Corp.
+ */
+
+#include 
+#include 
+
+/{
+   compatible = "realtek,rtd1319";
+   interrupt-parent = <>;
+   #address-cells = <0x2>;
+   #size-cells = <0x2>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   A55_0: cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a55", "arm,armv8";
+   reg = <0x000>;
+   next-level-cache = <_l2>;
+   };
+
+   A55_1: cpu@1 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a55", "arm,armv8";
+   reg = <0x100>;
+   enable-method = "psci";
+   };
+
+   A55_2: cpu@2 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a55", "arm,armv8";
+   reg = <0x200>;
+   enable-method = "psci";
+   };
+
+   A55_3: cpu@3 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a55", "arm,armv8";
+   reg = <0x300>;
+   enable-method = "psci";
+   };
+
+   a55_l2: l2-cache {
+   compatible = "cache";
+   };
+   };
+
+   psci {
+   compatible = "arm,psci-1.0", "arm,psci-0.2";
+   method = "smc";
+   };
+
+   timer {
+   compatible = "arm,armv8-timer";
+   interrupts = ,
+   ,
+   ,
+   ;
+   clock-frequency = <2700>;
+   };
+
+   osc27M: osc27M {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <2700>;
+   clock-output-names = "osc27M";
+   };
+
+   soc {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   compatible = "simple-bus";
+   ranges;
+
+   gic: interrupt-controller@ff10 {
+   compatible = "arm,gic-v3";
+   #interrupt-cells = <3>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+

Re: [PATCH V2] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Yunfeng Ye



On 2019/9/25 14:36, Mike Rapoport wrote:
> On Tue, Sep 24, 2019 at 04:09:32PM +0800, Yunfeng Ye wrote:
>> sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
>> for page management structure, if memory allocation fails from specified
>> node, it will fall back to allocate from other nodes.
>>
>> Normally, the page management structure will not exceed 2% of the total
>> memory, but a large continuous block of allocation is needed. In most
>> cases, memory allocation from the specified node will success always,
>> but a node memory become highly fragmented will fail. we expect to
>> allocate memory base section rather than by allocating a large block of
>> memory from other NUMA nodes
>>
>> Add memblock_alloc_exact_nid_raw() for this situation, which allocate
>> boot memory block on the exact node. If a large contiguous block memory
>> allocate fail in sparse_buffer_init(), it will fall back to allocate
>> small block memory base section.
>>
>> Signed-off-by: Yunfeng Ye 
>> ---
>> v1 -> v2:
>>  - use memblock_alloc_exact_nid_raw() rather than using a flag
>>
>>  include/linux/memblock.h |  3 +++
>>  mm/memblock.c| 66 
>> 
>>  mm/sparse.c  |  2 +-
>>  3 files changed, 59 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f491690..b38bbef 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -358,6 +358,9 @@ static inline phys_addr_t 
>> memblock_phys_alloc(phys_addr_t size,
>>   MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>> + phys_addr_t min_addr, phys_addr_t max_addr,
>> + int nid);
>>  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>>   phys_addr_t min_addr, phys_addr_t max_addr,
>>   int nid);
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7d4f61a..a71869e 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
>> *zone,
>>   * @start: the lower bound of the memory region to allocate (phys address)
>>   * @end: the upper bound of the memory region to allocate (phys address)
>>   * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
>> + * @need_exact_nid: control the allocation fall back to other nodes
>>   *
>>   * The allocation is performed from memory region limited by
>>   * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
>>   *
>> - * If the specified node can not hold the requested memory the
>> - * allocation falls back to any node in the system
>> + * If the specified node can not hold the requested memory and 
>> @need_exact_nid
>> + * is zero, the allocation falls back to any node in the system
>>   *
>>   * For systems with memory mirroring, the allocation is attempted first
>>   * from the regions with mirroring enabled and then retried from any
>> @@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
>> *zone,
>>   */
>>  static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>  phys_addr_t align, phys_addr_t start,
>> -phys_addr_t end, int nid)
>> +phys_addr_t end, int nid,
>> +int need_exact_nid)
> 
> Please make it 'bool exact_nid'
> 
ok, I will modify as your suggestion, thanks.

>>  {
>>  enum memblock_flags flags = choose_memblock_flags();
>>  phys_addr_t found;
>> @@ -1365,7 +1367,7 @@ static phys_addr_t __init 
>> memblock_alloc_range_nid(phys_addr_t size,
>>  if (found && !memblock_reserve(found, size))
>>  goto done;
>>
>> -if (nid != NUMA_NO_NODE) {
>> +if (nid != NUMA_NO_NODE && !need_exact_nid) {
>>  found = memblock_find_in_range_node(size, align, start,
>>  end, NUMA_NO_NODE,
>>  flags);
>> @@ -1413,7 +1415,8 @@ phys_addr_t __init 
>> memblock_phys_alloc_range(phys_addr_t size,
>>   phys_addr_t start,
>>   phys_addr_t end)
>>  {
>> -return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
>> +return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
>> +0);
>>  }
>>
>>  /**
>> @@ -1432,7 +1435,7 @@ phys_addr_t __init 
>> memblock_phys_alloc_range(phys_addr_t size,
>>  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, 
>> phys_addr_t align, int nid)
>>  {
>>  return memblock_alloc_range_nid(size, align, 0,
>> -

Re: [PATCH V2] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-25 Thread Michal Hocko
On Wed 25-09-19 08:27:53, Anshuman Khandual wrote:
> Currently during memory hot add procedure, memory gets into memblock before
> calling arch_add_memory() which creates it's linear mapping.
> 
> add_memory_resource() {
>   ..
>   memblock_add_node()
>   ..
>   arch_add_memory()
>   ..
> }
> 
> But during memory hot remove procedure, removal from memblock happens first
> before it's linear mapping gets teared down with arch_remove_memory() which
> is not consistent. Resource removal should happen in reverse order as they
> were added. However this does not pose any problem for now, unless there is
> an assumption regarding linear mapping. One example was a subtle failure on
> arm64 platform [1]. Though this has now found a different solution.
> 
> try_remove_memory() {
>   ..
>   memblock_free()
>   memblock_remove()
>   ..
>   arch_remove_memory()
>   ..
> }
> 
> This changes the sequence of resource removal including memblock and linear
> mapping tear down during memory hot remove which will now be the reverse
> order in which they were added during memory hot add. The changed removal
> order looks like the following.
> 
> try_remove_memory() {
>   ..
>   arch_remove_memory()
>   ..
>   memblock_free()
>   memblock_remove()
>   ..
> }
> 
> [1] https://patchwork.kernel.org/patch/11127623/
> 
> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Dan Williams 
> Signed-off-by: Anshuman Khandual 

Acked-by: Michal Hocko 

> ---
> Changes in V2:
> 
> - Changed the commit message as per Michal and David 
> 
> Changed in V1: https://patchwork.kernel.org/patch/11146361/
> 
> Original patch https://lkml.org/lkml/2019/9/3/327
> 
> Memory hot remove now works on arm64 without this because a recent commit
> 60bb462fc7ad ("drivers/base/node.c: simplify 
> unregister_memory_block_under_nodes()").
> 
> David mentioned that re-ordering should still make sense for consistency
> purpose (removing stuff in the reverse order they were added). This patch
> is now detached from arm64 hot-remove series.
> 
> https://lkml.org/lkml/2019/9/3/326
> 
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 49f7bf91c25a..4f7d426a84d0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1763,13 +1763,13 @@ static int __ref try_remove_memory(int nid, u64 
> start, u64 size)
>  
>   /* remove memmap entry */
>   firmware_map_remove(start, start + size, "System RAM");
> - memblock_free(start, size);
> - memblock_remove(start, size);
>  
>   /* remove memory block devices before removing memory */
>   remove_memory_block_devices(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + memblock_free(start, size);
> + memblock_remove(start, size);
>   __release_memory_resource(start, size);
>  
>   try_offline_node(nid);
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


Re: [RFC] mm: memcg: add priority for soft limit reclaiming

2019-09-25 Thread Michal Hocko
On Wed 25-09-19 10:35:30, Hillf Danton wrote:
> 
> On Tue, 24 Sep 2019 17:23:35 + from Roman Gushchin
> > 
> > On Tue, Sep 24, 2019 at 03:30:16PM +0200, Michal Hocko wrote:
> > >
> > > But really, make sure you look into the existing feature set that memcg
> > > v2 provides already and come back if you find it unsuitable and we can
> > > move from there. Soft limit reclaim is dead and we should let it RIP.
> > 
> > Can't agree more here.
> > 
> > Cgroup v2 memory protection mechanisms (memory.low/min) should perfectly
> > solve the described problem. If not, let's fix them rather than extend soft
> > reclaim which is already dead.
> > 
> Hehe, IIUC memory.low/min is essentially drawing a line that reclaimers
> would try their best not to cross. Page preemption OTOH is near ten miles
> away from that line though it is now on the shoulder of soft reclaiming.

Dynamic low limit tuning would achieve exactly what you are after - aka
prioritizing some memory consumers over others.
-- 
Michal Hocko
SUSE Labs


[PATCH] x86/mm: Clean up the pmd_read_atomic() comments

2019-09-25 Thread Ingo Molnar


* Wei Yang  wrote:

> To be honest, I have a question on how this works.
> 
> As the comment says, we need to call pmd_read_atomic before using
> pte_offset_map_lock to avoid data corruption.

There's only a risk of data corruption if mmap_sem is held for reading. 
If it's held for writing then the pagetable contents should be stable.

> For example, in function swapin_walk_pmd_entry:
> 
> pmd_none_or_trans_huge_or_clear_bad(pmd)
> pmd_read_atomic(pmd)  ---   1
> pte_offset_map_lock(mm, pmd, ...) ---   2
> 
> At point 1, we are assured the content is intact. While in point 2, we would
> read pmd again to calculate the pte address. How we ensure this time the
> content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
> pte is stable, so that the content won't be changed?

Indeed pte_offset_map_lock() will take a non-atomic *pmd value in 
pte_lockptr() before taking the pte spinlock.

I believe the rule here is that if pmd_none_or_trans_huge_or_clear_bad() 
finds the pmd 'stable' while holding the mmap_sem read-locked, then the 
pmd cannot change while we are continuously holding the mmap_sem.

Hence the followup pte_offset_map_lock() and other iterators can look at 
the value of the pmd without locking. (Individual pte entries still need 
the pte-lock, because they might be faulted-in in parallel.)

So the pmd use pattern in swapin_walk_pmd_entry() should be safe.

I'm not 100% sure though - so I've added a few more Cc:s ...

I've also cleaned up the pmd_read_atomic() some more to make it more 
readable - see the patch below.

Thanks,

Ingo

==>
From: Ingo Molnar 
Date: Wed, 25 Sep 2019 08:38:57 +0200
Subject: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments

Fix spelling, consistent parenthesis and grammar - and also clarify
the language where needed.

Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Wei Yang 
Link: 
https://lkml.kernel.org/r/20190925014453.20236-1-richardw.y...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgtable-3level.h | 44 ++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h 
b/arch/x86/include/asm/pgtable-3level.h
index 1796462ff143..5afb5e0fe903 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
 
 #define pmd_read_atomic pmd_read_atomic
 /*
- * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
- * a "*pmdp" dereference done by gcc. Problem is, in certain places
- * where pte_offset_map_lock is called, concurrent page faults are
+ * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
+ * a "*pmdp" dereference done by GCC. Problem is, in certain places
+ * where pte_offset_map_lock() is called, concurrent page faults are
  * allowed, if the mmap_sem is hold for reading. An example is mincore
  * vs page faults vs MADV_DONTNEED. On the page fault side
- * pmd_populate rightfully does a set_64bit, but if we're reading the
+ * pmd_populate() rightfully does a set_64bit(), but if we're reading the
  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
- * because gcc will not read the 64bit of the pmd atomically. To fix
- * this all places running pte_offset_map_lock() while holding the
+ * because GCC will not read the 64-bit value of the pmd atomically.
+ *
+ * To fix this all places running pte_offset_map_lock() while holding the
  * mmap_sem in read mode, shall read the pmdp pointer using this
- * function to know if the pmd is null nor not, and in turn to know if
+ * function to know if the pmd is null or not, and in turn to know if
  * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
  * operations.
  *
- * Without THP if the mmap_sem is hold for reading, the pmd can only
- * transition from null to not null while pmd_read_atomic runs. So
+ * Without THP if the mmap_sem is held for reading, the pmd can only
+ * transition from null to not null while pmd_read_atomic() runs. So
  * we can always return atomic pmd values with this function.
  *
- * With THP if the mmap_sem is hold for reading, the pmd can become
+ * With THP if the mmap_sem is held for reading, the pmd can become
  * trans_huge or none or point to a pte (and in turn become "stable")
- * at any time under pmd_read_atomic. We could read it really
- * atomically here with a atomic64_read for the THP enabled case (and
+ * at any time under pmd_read_atomic(). We could read it truly
+ * atomically here with an atomic64_read() for the THP enabled case (and
  * it would be a whole lot simpler), but to avoid using cmpxchg8b we
  * only return an atomic pmdval if the low part of the pmdval is later
- * found stable (i.e. 

Re: [PATCH V2] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-25 Thread David Hildenbrand
On 25.09.19 04:57, Anshuman Khandual wrote:
> Currently during memory hot add procedure, memory gets into memblock before
> calling arch_add_memory() which creates it's linear mapping.
> 
> add_memory_resource() {
>   ..
>   memblock_add_node()
>   ..
>   arch_add_memory()
>   ..
> }
> 
> But during memory hot remove procedure, removal from memblock happens first
> before it's linear mapping gets teared down with arch_remove_memory() which
> is not consistent. Resource removal should happen in reverse order as they
> were added. However this does not pose any problem for now, unless there is
> an assumption regarding linear mapping. One example was a subtle failure on
> arm64 platform [1]. Though this has now found a different solution.
> 
> try_remove_memory() {
>   ..
>   memblock_free()
>   memblock_remove()
>   ..
>   arch_remove_memory()
>   ..
> }
> 
> This changes the sequence of resource removal including memblock and linear
> mapping tear down during memory hot remove which will now be the reverse
> order in which they were added during memory hot add. The changed removal
> order looks like the following.
> 
> try_remove_memory() {
>   ..
>   arch_remove_memory()
>   ..
>   memblock_free()
>   memblock_remove()
>   ..
> }
> 
> [1] https://patchwork.kernel.org/patch/11127623/
> 
> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Dan Williams 
> Signed-off-by: Anshuman Khandual 
> ---
> Changes in V2:
> 
> - Changed the commit message as per Michal and David 
> 
> Changed in V1: https://patchwork.kernel.org/patch/11146361/
> 
> Original patch https://lkml.org/lkml/2019/9/3/327
> 
> Memory hot remove now works on arm64 without this because a recent commit
> 60bb462fc7ad ("drivers/base/node.c: simplify 
> unregister_memory_block_under_nodes()").
> 
> David mentioned that re-ordering should still make sense for consistency
> purpose (removing stuff in the reverse order they were added). This patch
> is now detached from arm64 hot-remove series.
> 
> https://lkml.org/lkml/2019/9/3/326
> 
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 49f7bf91c25a..4f7d426a84d0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1763,13 +1763,13 @@ static int __ref try_remove_memory(int nid, u64 
> start, u64 size)
>  
>   /* remove memmap entry */
>   firmware_map_remove(start, start + size, "System RAM");
> - memblock_free(start, size);
> - memblock_remove(start, size);
>  
>   /* remove memory block devices before removing memory */
>   remove_memory_block_devices(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + memblock_free(start, size);
> + memblock_remove(start, size);
>   __release_memory_resource(start, size);
>  
>   try_offline_node(nid);
> 

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb


Re: [RFC PATCH 0/4] Use 1st-level for DMA remapping in guest

2019-09-25 Thread Peter Xu
On Wed, Sep 25, 2019 at 10:48:32AM +0800, Lu Baolu wrote:
> Hi Kevin,
> 
> On 9/24/19 3:00 PM, Tian, Kevin wrote:
> > > > >   '---'
> > > > >   '---'
> > > > > 
> > > > > This patch series only aims to achieve the first goal, a.k.a using
> > first goal? then what are other goals? I didn't spot such information.
> > 
> 
> The overall goal is to use IOMMU nested mode to avoid shadow page table
> and VMEXIT when map an gIOVA. This includes below 4 steps (maybe not
> accurate, but you could get the point.)
> 
> 1) GIOVA mappings over 1st-level page table;
> 2) binding vIOMMU 1st level page table to the pIOMMU;
> 3) using pIOMMU second level for GPA->HPA translation;
> 4) enable nested (a.k.a. dual stage) translation in host.
> 
> This patch set aims to achieve 1).

Would it make sense to use 1st level even for bare-metal to replace
the 2nd level?

What I'm thinking is the DPDK apps - they have MMU page table already
there for the huge pages, then if they can use 1st level as the
default device page table then it even does not need to map, because
it can simply bind the process root page table pointer to the 1st
level page root pointer of the device contexts that it uses.

Regards,

-- 
Peter Xu


Re: [PATCH v1] mm/memory_hotplug: Don't take the cpu_hotplug_lock

2019-09-25 Thread David Hildenbrand
On 24.09.19 20:54, Qian Cai wrote:
> On Tue, 2019-09-24 at 17:11 +0200, Michal Hocko wrote:
>> On Tue 24-09-19 11:03:21, Qian Cai wrote:
>> [...]
>>> While at it, it might be a good time to rethink the whole locking over 
>>> there, as
>>> it right now read files under /sys/kernel/slab/ could trigger a possible
>>> deadlock anyway.
>>>
>>
>> [...]
>>> [  442.452090][ T5224] -> #0 (mem_hotplug_lock.rw_sem){}:
>>> [  442.459748][ T5224]validate_chain+0xd10/0x2bcc
>>> [  442.464883][ T5224]__lock_acquire+0x7f4/0xb8c
>>> [  442.469930][ T5224]lock_acquire+0x31c/0x360
>>> [  442.474803][ T5224]get_online_mems+0x54/0x150
>>> [  442.479850][ T5224]show_slab_objects+0x94/0x3a8
>>> [  442.485072][ T5224]total_objects_show+0x28/0x34
>>> [  442.490292][ T5224]slab_attr_show+0x38/0x54
>>> [  442.495166][ T5224]sysfs_kf_seq_show+0x198/0x2d4
>>> [  442.500473][ T5224]kernfs_seq_show+0xa4/0xcc
>>> [  442.505433][ T5224]seq_read+0x30c/0x8a8
>>> [  442.509958][ T5224]kernfs_fop_read+0xa8/0x314
>>> [  442.515007][ T5224]__vfs_read+0x88/0x20c
>>> [  442.519620][ T5224]vfs_read+0xd8/0x10c
>>> [  442.524060][ T5224]ksys_read+0xb0/0x120
>>> [  442.528586][ T5224]__arm64_sys_read+0x54/0x88
>>> [  442.533634][ T5224]el0_svc_handler+0x170/0x240
>>> [  442.538768][ T5224]el0_svc+0x8/0xc
>>
>> I believe the lock is not really needed here. We do not deallocated
>> pgdat of a hotremoved node nor destroy the slab state because an
>> existing slabs would prevent hotremove to continue in the first place.
>>
>> There are likely details to be checked of course but the lock just seems
>> bogus.
> 
> Check 03afc0e25f7f ("slab: get_online_mems for
> kmem_cache_{create,destroy,shrink}"). It actually talk about the races during
> memory as well cpu hotplug, so it might even that cpu_hotplug_lock removal is
> problematic?
> 

Which removal are you referring to? get_online_mems() does not mess with
the cpu hotplug lock (and therefore this patch).

-- 

Thanks,

David / dhildenb


[PATCH] arm64: dts: qcom: c630: Enable adsp, cdsp and mpss

2019-09-25 Thread Bjorn Andersson
Specify the firmware-name for the adsp, cdsp and mpss and enable the
nodes.

Signed-off-by: Bjorn Andersson 
---
 .../boot/dts/qcom/sdm850-lenovo-yoga-c630.dts  | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts 
b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts
index ded120d3aef5..d3c841628d66 100644
--- a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts
+++ b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts
@@ -20,6 +20,11 @@
};
 };
 
+_pas {
+   firmware-name = "qcom/c630/qcadsp850.mbn";
+   status = "okay";
+};
+
 _rsc {
pm8998-rpmh-regulators {
compatible = "qcom,pm8998-rpmh-regulators";
@@ -229,6 +234,11 @@
status = "disabled";
 };
 
+_pas {
+   firmware-name = "qcom/c630/qccdsp850.mbn";
+   status = "okay";
+};
+
  {
protected-clocks = ,
   ,
@@ -296,6 +306,10 @@
};
 };
 
+_pil {
+   firmware-name = "qcom/c630/qcdsp1v2850.mbn", "qcom/c630/qcdsp2850.mbn";
+};
+
 _i2c12_default {
drive-strength = <2>;
bias-disable;
-- 
2.18.0



Re: [patch for-5.3 0/4] revert immediate fallback to remote hugepages

2019-09-25 Thread Michal Hocko
Let me revive this thread as there was no follow up.

On Mon 09-09-19 21:30:20, Michal Hocko wrote:
[...]
> I believe it would be the best to start by explaining why we do not see
> the same problem with order-0 requests. We do not enter the slow path
> and thus the memory reclaim if there is any other node to pass through
> watermakr as well right? So essentially we are relying on kswapd to keep
> nodes balanced so that allocation request can be satisfied from a local
> node. We do have kcompactd to do background compaction. Why do we want
> to rely on the direct compaction instead? What is the fundamental
> difference?

I am especially interested about this part. The more I think about this
the more I am convinced that the underlying problem really is in the pre
mature fallback in the fast path. Does the almost-patch below helps your
workload? It effectively reduces the fast path for higher order
allocations to the local/requested node. The justification is that
watermark check might be too strict for those requests as it is primary
order-0 oriented. Low watermark target simply has no meaning for the
higher order requests AFAIU. The min-low gap is giving kswapd a chance
to balance and be more local node friendly while we do not have anything
like that in compaction.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff5484fdbdf9..09036cf55fca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4685,7 +4685,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order, int preferred_nid,
 {
struct page *page;
unsigned int alloc_flags = ALLOC_WMARK_LOW;
-   gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
+   gfp_t fastpath_mask, alloc_mask; /* The gfp_t that was actually used 
for allocation */
struct alloc_context ac = { };
 
/*
@@ -4698,7 +4698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order, int preferred_nid,
}
 
gfp_mask &= gfp_allowed_mask;
-   alloc_mask = gfp_mask;
+   fastpath_mask = alloc_mask = gfp_mask;
if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, , 
_mask, _flags))
return NULL;
 
@@ -4710,8 +4710,17 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
order, int preferred_nid,
 */
alloc_flags |= alloc_flags_nofragment(ac.preferred_zoneref->zone, 
gfp_mask);
 
-   /* First allocation attempt */
-   page = get_page_from_freelist(alloc_mask, order, alloc_flags, );
+   /*
+* First allocation attempt. If we have a high order allocation then do 
not fall
+* back to a remote node just based on the watermark check on the 
requested node
+* because compaction might easily free up a requested order and then 
it would be
+* better to simply go to the slow path.
+* TODO: kcompactd should help here but nobody has woken it up unless 
we hit the
+* slow path so we might need some tuning there as well.
+*/
+   if (order && (gfp_mask & __GFP_DIRECT_RECLAIM))
+   fastpath_mask |= __GFP_THISNODE;
+   page = get_page_from_freelist(fastpath_mask, order, alloc_flags, );
if (likely(page))
goto out;
 
-- 
Michal Hocko
SUSE Labs


[PATCH v4] perf record: Add support for limit perf output file size

2019-09-25 Thread Jiwei Sun
The patch adds a new option to limit the output file size, then based
on it, we can create a wrapper of the perf command that uses the option
to avoid exhausting the disk space by the unconscious user.

In order to make the perf.data parsable, we just limit the sample data
size, since the perf.data consists of many headers and sample data and
other data, the actual size of the recorded file will bigger than the
setting value.

Testing it:

 # ./perf record -a -g --max-size=10M
Couldn't synthesize bpf events.
WARNING: The perf data has already reached the limit, stop recording!
[ perf record: Woken up 30 times to write data ]
[ perf record: Captured and wrote 10.233 MB perf.data (175650 samples) ]
Terminated

 # ls -lh perf.data
-rw--- 1 root root 11M Jul 17 14:01 perf.data

 # ./perf record -a -g --max-size=10K
WARNING: The perf data has already reached the limit, stop recording!
Couldn't synthesize bpf events.
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 1.824 MB perf.data (67 samples) ]
Terminated

 # ls -lh perf.data
-rw--- 1 root root 1.9M Jul 17 14:05 perf.data

Signed-off-by: Jiwei Sun 
---
v4 changes:
  - Just show one WARNING message after reached the limit.

v3 changes:
  - add a test result
  - add the new option to tools/perf/Documentation/perf-record.txt

v2 changes:
  - make patch based on latest Arnaldo's perf/core,
  - display warning message when reached the limit.
---
 tools/perf/Documentation/perf-record.txt |  4 +++
 tools/perf/builtin-record.c  | 42 
 2 files changed, 46 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c6f9f31b6039..f1c6113fbc82 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -571,6 +571,10 @@ config terms. For example: 'cycles/overwrite/' and 
'instructions/no-overwrite/'.
 
 Implies --tail-synthesize.
 
+--max-size=::
+Limit the sample data max size,  is expected to be a number with
+appended unit character - B/K/M/G
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 48600c90cc7e..30904d2a3407 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -91,6 +91,7 @@ struct record {
struct switch_outputswitch_output;
unsigned long long  samples;
cpu_set_t   affinity_mask;
+   unsigned long   output_max_size;/* = 0: unlimited */
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -120,6 +121,12 @@ static bool switch_output_time(struct record *rec)
   trigger_is_ready(_output_trigger);
 }
 
+static bool record__output_max_size_exceeded(struct record *rec)
+{
+   return rec->output_max_size &&
+  (rec->bytes_written >= rec->output_max_size);
+}
+
 static int record__write(struct record *rec, struct mmap *map __maybe_unused,
 void *bf, size_t size)
 {
@@ -132,6 +139,12 @@ static int record__write(struct record *rec, struct mmap 
*map __maybe_unused,
 
rec->bytes_written += size;
 
+   if (record__output_max_size_exceeded(rec)) {
+   WARN_ONCE(1, "WARNING: The perf data has already reached "
+"the limit, stop recording!\n");
+   raise(SIGTERM);
+   }
+
if (switch_output_size(rec))
trigger_hit(_output_trigger);
 
@@ -1936,6 +1949,33 @@ static int record__parse_affinity(const struct option 
*opt, const char *str, int
return 0;
 }
 
+static int parse_output_max_size(const struct option *opt,
+const char *str, int unset)
+{
+   unsigned long *s = (unsigned long *)opt->value;
+   static struct parse_tag tags_size[] = {
+   { .tag  = 'B', .mult = 1   },
+   { .tag  = 'K', .mult = 1 << 10 },
+   { .tag  = 'M', .mult = 1 << 20 },
+   { .tag  = 'G', .mult = 1 << 30 },
+   { .tag  = 0 },
+   };
+   unsigned long val;
+
+   if (unset) {
+   *s = 0;
+   return 0;
+   }
+
+   val = parse_tag_value(str, tags_size);
+   if (val != (unsigned long) -1) {
+   *s = val;
+   return 0;
+   }
+
+   return -1;
+}
+
 static int record__parse_mmap_pages(const struct option *opt,
const char *str,
int unset __maybe_unused)
@@ -2262,6 +2302,8 @@ static struct option __record_options[] = {
"n", "Compressed records using specified level 
(default: 1 - fastest compression, 22 - greatest compression)",
record__parse_comp_level),
 #endif
+   OPT_CALLBACK(0, "max-size", _max_size,
+"size", "Limit the maximum size of the output file", 

[PATCH 1/3] kbuild: update compile-test header list for v5.4-rc1

2019-09-25 Thread Masahiro Yamada
Some build errors were fixed. Add more headers into the test coverage.

Signed-off-by: Masahiro Yamada 
---

 usr/include/Makefile | 9 -
 1 file changed, 9 deletions(-)

diff --git a/usr/include/Makefile b/usr/include/Makefile
index 05c71ef42f51..4ef9946775ee 100644
--- a/usr/include/Makefile
+++ b/usr/include/Makefile
@@ -35,7 +35,6 @@ header-test- += linux/errqueue.h
 header-test- += linux/fsmap.h
 header-test- += linux/hdlc/ioctl.h
 header-test- += linux/ivtv.h
-header-test- += linux/jffs2.h
 header-test- += linux/kexec.h
 header-test- += linux/matroxfb.h
 header-test- += linux/netfilter_bridge/ebtables.h
@@ -56,20 +55,12 @@ header-test- += linux/v4l2-mediabus.h
 header-test- += linux/v4l2-subdev.h
 header-test- += linux/videodev2.h
 header-test- += linux/vm_sockets.h
-header-test- += scsi/scsi_bsg_fc.h
-header-test- += scsi/scsi_netlink.h
-header-test- += scsi/scsi_netlink_fc.h
 header-test- += sound/asequencer.h
 header-test- += sound/asoc.h
 header-test- += sound/asound.h
 header-test- += sound/compress_offload.h
 header-test- += sound/emu10k1.h
 header-test- += sound/sfnt_info.h
-header-test- += sound/sof/eq.h
-header-test- += sound/sof/fw.h
-header-test- += sound/sof/header.h
-header-test- += sound/sof/manifest.h
-header-test- += sound/sof/trace.h
 header-test- += xen/evtchn.h
 header-test- += xen/gntdev.h
 header-test- += xen/privcmd.h
-- 
2.17.1



[PATCH 2/3] kbuild: detect missing header include guard

2019-09-25 Thread Masahiro Yamada
Our convention is to surround the whole of the header content with an
include guard. This avoids the same header being parsed over again
when it is included multiple times.

The header-test-y syntax allows the comprehensive sanity checks of
headers. This commit adds another check; if include guard is missing,
the header will fail to build due to redefinition of something.

More headers must be excluded from the test-coverage for now.

Signed-off-by: Masahiro Yamada 
Reviewed-by: Sam Ravnborg 
---

 include/Kbuild | 26 ++
 scripts/Makefile.build |  3 ++-
 usr/include/Makefile   |  3 +++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/Kbuild b/include/Kbuild
index ffba79483cc5..4c49b9ae4b4d 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -133,6 +133,7 @@ header-test-+= linux/cyclades.h
 header-test-   += linux/dcookies.h
 header-test-   += linux/delayacct.h
 header-test-   += linux/delayed_call.h
+header-test-   += linux/device_cgroup.h
 header-test-   += linux/device-mapper.h
 header-test-   += linux/devpts_fs.h
 header-test-   += linux/dio.h
@@ -146,12 +147,15 @@ header-test-  += linux/dma/sprd-dma.h
 header-test-   += linux/dns_resolver.h
 header-test-   += linux/drbd_genl.h
 header-test-   += linux/drbd_genl_api.h
+header-test-   += linux/dsa/lan9303.h
+header-test-   += linux/dtlk.h
 header-test-   += linux/dw_apb_timer.h
 header-test-   += linux/dynamic_debug.h
 header-test-   += linux/dynamic_queue_limits.h
 header-test-   += linux/ecryptfs.h
 header-test-   += linux/edma.h
 header-test-   += linux/eeprom_93cx6.h
+header-test-   += linux/eeprom_93xx46.h
 header-test-   += linux/efs_vh.h
 header-test-   += linux/elevator.h
 header-test-   += linux/elfcore-compat.h
@@ -169,6 +173,7 @@ header-test-+= 
linux/firmware/trusted_foundations.h
 header-test-   += linux/firmware/xlnx-zynqmp.h
 header-test-   += linux/fixp-arith.h
 header-test-   += linux/flat.h
+header-test-   += linux/fs_pin.h
 header-test-   += linux/fs_types.h
 header-test-   += linux/fs_uart_pd.h
 header-test-   += linux/fsi-occ.h
@@ -189,6 +194,7 @@ header-test-+= linux/gpio/aspeed.h
 header-test-   += linux/gpio/gpio-reg.h
 header-test-   += linux/hid-debug.h
 header-test-   += linux/hiddev.h
+header-test-   += linux/hil_mlc.h
 header-test-   += linux/hippidevice.h
 header-test-   += linux/hmm.h
 header-test-   += linux/hp_sdc.h
@@ -237,6 +243,7 @@ header-test-+= linux/input/bu21013.h
 header-test-   += linux/input/cma3000.h
 header-test-   += linux/input/kxtj9.h
 header-test-   += linux/input/lm8333.h
+header-test-   += linux/input/navpoint.h
 header-test-   += linux/input/sparse-keymap.h
 header-test-   += linux/input/touchscreen.h
 header-test-   += linux/input/tps6507x-ts.h
@@ -327,6 +334,7 @@ header-test-+= linux/mfd/lpc_ich.h
 header-test-   += linux/mfd/max77693.h
 header-test-   += linux/mfd/max8998-private.h
 header-test-   += linux/mfd/menelaus.h
+header-test-   += linux/mfd/motorola-cpcap.h
 header-test-   += linux/mfd/mt6397/core.h
 header-test-   += linux/mfd/palmas.h
 header-test-   += linux/mfd/pcf50633/backlight.h
@@ -448,6 +456,7 @@ header-test-+= 
linux/platform_data/ads7828.h
 header-test-   += linux/platform_data/apds990x.h
 header-test-   += linux/platform_data/arm-ux500-pm.h
 header-test-   += linux/platform_data/asoc-s3c.h
+header-test-   += linux/platform_data/asoc-s3c24xx_simtec.h
 header-test-   += linux/platform_data/at91_adc.h
 header-test-   += linux/platform_data/ata-pxa.h
 header-test-   += linux/platform_data/atmel.h
@@ -479,14 +488,17 @@ header-test-  += 
linux/platform_data/i2c-imx.h
 header-test-   += linux/platform_data/i2c-mux-reg.h
 header-test-   += linux/platform_data/i2c-ocores.h
 header-test-   += linux/platform_data/i2c-xiic.h
+header-test-   += linux/platform_data/ina2xx.h
 header-test-  

[PATCH 3/3] kbuild: stop using wildcard patterns for in-kernel header test

2019-09-25 Thread Masahiro Yamada
This compile-test started from the strong belief that we should be
able to compile (almost) all headers as a standalone unit, but this
requirement seems to be annoying.

I believe it is nice to compile-test all the exported headers. On the
other hand, in-kernel headers are not necessarily always compilable.
Actually, some headers are only included under a certain combination
of CONFIG options.

Currently, newly added headers are compile-tested by default. It
sometimes catches (not fatal) bugs, but sometimes raises false
positives.

This commit converts the negative list to the positive list. New
headers must manually be added to header-test-y if somebody wants
to put them in the test coverage.

Signed-off-by: Masahiro Yamada 
---

 include/Kbuild | 1231 +---
 include/acpi/Kbuild|   18 +
 include/clocksource/Kbuild |8 +
 include/crypto/Kbuild  |   63 ++
 include/drm/Kbuild |   89 ++
 include/keys/Kbuild|   10 +
 include/kvm/Kbuild |5 +
 include/linux/Kbuild   | 1175 ++
 include/linux/byteorder/Kbuild |4 +
 include/linux/ceph/Kbuild  |   19 +
 include/linux/i3c/Kbuild   |5 +
 include/linux/iio/Kbuild   |   22 +
 include/linux/mfd/Kbuild   |  154 
 include/linux/mmc/Kbuild   |   14 +
 include/linux/mtd/Kbuild   |   31 +
 include/linux/pinctrl/Kbuild   |   10 +
 include/linux/platform_data/Kbuild |  148 
 include/linux/regulator/Kbuild |   26 +
 include/linux/sched/Kbuild |   28 +
 include/linux/spi/Kbuild   |   18 +
 include/linux/sunrpc/Kbuild|   28 +
 include/linux/usb/Kbuild   |   41 +
 include/math-emu/Kbuild|6 +
 include/media/Kbuild   |  100 +++
 include/misc/Kbuild|5 +
 include/net/Kbuild |  239 ++
 include/pcmcia/Kbuild  |6 +
 include/ras/Kbuild |3 +
 include/rdma/Kbuild|   38 +
 include/scsi/Kbuild|   19 +
 include/soc/Kbuild |   26 +
 include/sound/Kbuild   |   93 +++
 include/target/Kbuild  |6 +
 include/trace/Kbuild   |   85 ++
 include/vdso/Kbuild|4 +
 include/video/Kbuild   |   32 +
 include/xen/Kbuild |9 +
 37 files changed, 2609 insertions(+), 1209 deletions(-)
 create mode 100644 include/acpi/Kbuild
 create mode 100644 include/clocksource/Kbuild
 create mode 100644 include/crypto/Kbuild
 create mode 100644 include/drm/Kbuild
 create mode 100644 include/keys/Kbuild
 create mode 100644 include/kvm/Kbuild
 create mode 100644 include/linux/Kbuild
 create mode 100644 include/linux/byteorder/Kbuild
 create mode 100644 include/linux/ceph/Kbuild
 create mode 100644 include/linux/i3c/Kbuild
 create mode 100644 include/linux/iio/Kbuild
 create mode 100644 include/linux/mfd/Kbuild
 create mode 100644 include/linux/mmc/Kbuild
 create mode 100644 include/linux/mtd/Kbuild
 create mode 100644 include/linux/pinctrl/Kbuild
 create mode 100644 include/linux/platform_data/Kbuild
 create mode 100644 include/linux/regulator/Kbuild
 create mode 100644 include/linux/sched/Kbuild
 create mode 100644 include/linux/spi/Kbuild
 create mode 100644 include/linux/sunrpc/Kbuild
 create mode 100644 include/linux/usb/Kbuild
 create mode 100644 include/math-emu/Kbuild
 create mode 100644 include/media/Kbuild
 create mode 100644 include/misc/Kbuild
 create mode 100644 include/net/Kbuild
 create mode 100644 include/pcmcia/Kbuild
 create mode 100644 include/ras/Kbuild
 create mode 100644 include/rdma/Kbuild
 create mode 100644 include/scsi/Kbuild
 create mode 100644 include/soc/Kbuild
 create mode 100644 include/sound/Kbuild
 create mode 100644 include/target/Kbuild
 create mode 100644 include/trace/Kbuild
 create mode 100644 include/vdso/Kbuild
 create mode 100644 include/video/Kbuild
 create mode 100644 include/xen/Kbuild

diff --git a/include/Kbuild b/include/Kbuild
index 4c49b9ae4b4d..aee20dcd18b8 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -1,1211 +1,24 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-# Add header-test-$(CONFIG_...) guard to headers that are only compiled
-# for particular architectures.
-#
-# Headers listed in header-test- are excluded from the test coverage.
-# Many headers are excluded for now because they fail to build. Please
-# consider to fix headers first before adding new ones to the blacklist.
-#
-# Sorted alphabetically.
-header-test-   += acpi/acbuffer.h
-header-test-   += acpi/acpi.h
-header-test-   += acpi/acpi_bus.h
-header-test-   += acpi/acpi_drivers.h
-header-test-   += acpi/acpi_io.h
-header-test-   += acpi/acpi_lpat.h
-header-test-   += acpi/acpiosxf.h

Re: KASAN: use-after-free Read in usb_anchor_resume_wakeups

2019-09-25 Thread Johan Hovold
On Mon, Sep 23, 2019 at 05:41:54PM +0800, Peter Chen wrote:
> > On Tue, Jul 9, 2019 at 2:27 PM syzbot
> >  wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:7829a896 usb-fuzzer: main usb gadget fuzzer driver
> > > git tree:   https://github.com/google/kasan.git usb-fuzzer
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=118d136da0
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f6d4561982f71f63
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=58e201002fe1e775e1ae
> > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+58e201002fe1e775e...@syzkaller.appspotmail.com
> > >
> > > dummy_hcd dummy_hcd.5: no ep configured for urb c6093b7b
> > > xpad 6-1:0.169: xpad_irq_out - usb_submit_urb failed with result -19
> > > ==
> > > BUG: KASAN: use-after-free in debug_spin_lock_before
> > > kernel/locking/spinlock_debug.c:83 [inline]
> > > BUG: KASAN: use-after-free in do_raw_spin_lock+0x24d/0x280
> > > kernel/locking/spinlock_debug.c:112
> > > Read of size 4 at addr 8881d0e584dc by task kworker/1:4/2786
> > >
> 
> It should due to URB is freed at xpad_disconnect, but xpad_irq_out
> still tries to access
> freed URB.
> 
> Peter
> 
> #syz test: https://github.com/google/kasan.git 7829a896
> 
> diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c
> index 6b40a1c68f9f..32b7a199b580 100644
> --- a/drivers/input/joystick/xpad.c
> +++ b/drivers/input/joystick/xpad.c
> @@ -1850,6 +1850,7 @@ static void xpad_disconnect(struct usb_interface *intf)
> 
> xpad_deinit_input(xpad);
> 
> +   usb_kill_urb(xpad->irq_out);

I'm not sure this is the right fix. The interrupt-urb should have been
stopped by xpad_stop_output() just above. Perhaps the type test in that
function is broken, or we may have a race where another thread is
submitting the URB after we tried to stop it which we fail to handle.

Didn't check that closely, though.

Johan


[PATCH V3] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Yunfeng Ye
sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
for page management structure, if memory allocation fails from specified
node, it will fall back to allocate from other nodes.

Normally, the page management structure will not exceed 2% of the total
memory, but a large continuous block of allocation is needed. In most
cases, memory allocation from the specified node will success always,
but a node memory become highly fragmented will fail. we expect to
allocate memory base section rather than by allocating a large block of
memory from other NUMA nodes

Add memblock_alloc_exact_nid_raw() for this situation, which allocate
boot memory block on the exact node. If a large contiguous block memory
allocate fail in sparse_buffer_init(), it will fall back to allocate
small block memory base section.

Signed-off-by: Yunfeng Ye 
---
v2 -> v3:
 - use "bool exact_nid" instead of "int need_exact_nid"
 - remove the comment "without panicking"

v1 -> v2:
 - use memblock_alloc_exact_nid_raw() rather than using a flag

 include/linux/memblock.h |  3 +++
 mm/memblock.c| 65 
 mm/sparse.c  |  2 +-
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f491690..b38bbef 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -358,6 +358,9 @@ static inline phys_addr_t memblock_phys_alloc(phys_addr_t 
size,
 MEMBLOCK_ALLOC_ACCESSIBLE);
 }

+void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
+phys_addr_t min_addr, phys_addr_t max_addr,
+int nid);
 void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
 phys_addr_t min_addr, phys_addr_t max_addr,
 int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61a..0de9d83 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
*zone,
  * @start: the lower bound of the memory region to allocate (phys address)
  * @end: the upper bound of the memory region to allocate (phys address)
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @exact_nid: control the allocation fall back to other nodes
  *
  * The allocation is performed from memory region limited by
  * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
  *
- * If the specified node can not hold the requested memory the
- * allocation falls back to any node in the system
+ * If the specified node can not hold the requested memory and @exact_nid
+ * is zero, the allocation falls back to any node in the system
  *
  * For systems with memory mirroring, the allocation is attempted first
  * from the regions with mirroring enabled and then retried from any
@@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
  */
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
-   phys_addr_t end, int nid)
+   phys_addr_t end, int nid,
+   bool exact_nid)
 {
enum memblock_flags flags = choose_memblock_flags();
phys_addr_t found;
@@ -1365,7 +1367,7 @@ static phys_addr_t __init 
memblock_alloc_range_nid(phys_addr_t size,
if (found && !memblock_reserve(found, size))
goto done;

-   if (nid != NUMA_NO_NODE) {
+   if (nid != NUMA_NO_NODE && !exact_nid) {
found = memblock_find_in_range_node(size, align, start,
end, NUMA_NO_NODE,
flags);
@@ -1413,7 +1415,8 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t 
size,
 phys_addr_t start,
 phys_addr_t end)
 {
-   return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
+   return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
+   false);
 }

 /**
@@ -1432,7 +1435,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t 
size,
 phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t 
align, int nid)
 {
return memblock_alloc_range_nid(size, align, 0,
-   MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+   MEMBLOCK_ALLOC_ACCESSIBLE, nid, false);
 }

 /**
@@ -1442,6 +1445,7 @@ phys_addr_t __init 
memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t ali
  * @min_addr: the lower bound of the memory region to allocate (phys address)
  * @max_addr: the upper bound of the memory 

Re: [PATCH xfstests v3] overlay: Enable character device to be the base fs partition

2019-09-25 Thread Amir Goldstein
On Wed, Sep 25, 2019 at 9:29 AM Zhihao Cheng  wrote:
>
> When running overlay tests using character devices as base fs partitions,
> all overlay usecase results become 'notrun'. Function
> '_overay_config_override' (common/config) detects that the current base
> fs partition is not a block device and will set FSTYP to base fs. The
> overlay usecase will check the current FSTYP, and if it is not 'overlay'
> or 'generic', it will skip the execution.
>
> For example, using UBIFS as base fs skips all overlay usecases:
>
>   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
>   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
>
>   overlay/001   [not run] not suitable for this filesystem type: ubifs
>   overlay/002   [not run] not suitable for this filesystem type: ubifs
>   overlay/003   [not run] not suitable for this filesystem type: ubifs
>
> When checking that the base fs partition is a block/character device,
> FSTYP is overwritten as 'overlay'. This patch allows the base fs
> partition to be a character device that can also execute overlay
> usecases (such as ubifs).
>
> Signed-off-by: Zhihao Cheng 
> Signed-off-by: Amir Goldstein 

Looks fine.
Eryu, you may change this to Reviewed-by

> ---
>  common/config | 6 +++---
>  common/rc | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/common/config b/common/config
> index 4c86a49..4eda36c 100644
> --- a/common/config
> +++ b/common/config
> @@ -532,7 +532,7 @@ _canonicalize_mountpoint()
>  # When SCRATCH/TEST_* vars are defined in evironment and not
>  # in config file, this function is called after vars have already
>  # been overriden in the previous test.
> -# In that case, TEST_DEV is a directory and not a blockdev and
> +# In that case, TEST_DEV is a directory and not a blockdev/chardev and
>  # the function will return without overriding the SCRATCH/TEST_* vars.
>  _overlay_config_override()
>  {
> @@ -550,7 +550,7 @@ _overlay_config_override()
> #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
> #of the configured base fs and SCRATCH/TEST_DEV vars are set to 
> the
> #overlayfs base and mount dirs inside base fs mount.
> -   [ -b "$TEST_DEV" ] || return 0
> +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
>
> # Config file may specify base fs type, but we obay -overlay flag
> [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> @@ -570,7 +570,7 @@ _overlay_config_override()
> export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
>
> -   [ -b "$SCRATCH_DEV" ] || return 0
> +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
>
> # Store original base fs vars
> export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> diff --git a/common/rc b/common/rc
> index 66c7fd4..8d57c37 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3100,7 +3100,7 @@ _require_scratch_shutdown()
> # SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
> # will be null, so check OVL_BASE_SCRATCH_DEV before
> # running shutdown to avoid shutting down base fs 
> accidently.
> -   _notrun "$SCRATCH_DEV is not a block device"
> +   _notrun "This test requires a valid 
> $OVL_BASE_SCRATCH_DEV as ovl base fs"
> else
> src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
> || _notrun "Underlying filesystem does not support 
> shutdown"
> --
> 2.7.4
>


Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

2019-09-25 Thread Dave Chinner
On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:
> On 24/09/2019 10.39, Dave Chinner wrote:
> > On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> > > On 23/09/2019 17.52, Tejun Heo wrote:
> > > > Hello, Konstantin.
> > > > 
> > > > On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> > > > > With vm.dirty_write_behind 1 or 2 files are written even faster and
> > > > 
> > > > Is the faster speed reproducible?  I don't quite understand why this
> > > > would be.
> > > 
> > > Writing to disk simply starts earlier.
> > 
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> > 
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
> > 
> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> > 
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
> > 
> > So that writers are directly throttled at 200MB of dirty pages in
> > memory?
> > 
> > This effectively gives us global writebehind behaviour with a
> > 100-200MB cache write burst for initial writes.
> 
> Global limits affect all dirty pages including memory-mapped and
> randomly touched. Write-behind aims only into sequential streams.

There are  apps that do sequential writes via mmap()d files.
They should do writebehind too, yes?

> > ANd, really such strict writebehind behaviour is going to cause all
> > sorts of unintended problesm with filesystems because there will be
> > adverse interactions with delayed allocation. We need a substantial
> > amount of dirty data to be cached for writeback for fragmentation
> > minimisation algorithms to be able to do their job
> 
> I think most sequentially written files never change after close.

There are lots of apps that write zeros to initialise and allocate
space, then go write real data to them. Database WAL files are
commonly initialised like this...

> Except of knowing final size of huge files (>16Mb in my patch)
> there should be no difference for delayed allocation.

There is, because you throttle the writes down such that there is
only 16MB of dirty data in memory. Hence filesystems will only
typically allocate in 16MB chunks as that's all the delalloc range
spans.

I'm not so concerned for XFS here, because our speculative
preallocation will handle this just fine, but for ext4 and btrfs
it's going to interleave the allocate of concurrent streaming writes
and fragment the crap out of the files.

In general, the smaller you make the individual file writeback
window, the worse the fragmentation problems gets

> Probably write behind could provide hint about streaming pattern:
> pass something like "MSG_MORE" into writeback call.

How does that help when we've only got dirty data and block
reservations up to EOF which is no more than 16MB away?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments

2019-09-25 Thread Wei Yang
On Wed, Sep 25, 2019 at 08:55:14AM +0200, Ingo Molnar wrote:
>
>* Wei Yang  wrote:
>
>> To be honest, I have a question on how this works.
>> 
>> As the comment says, we need to call pmd_read_atomic before using
>> pte_offset_map_lock to avoid data corruption.
>
>There's only a risk of data corruption if mmap_sem is held for reading. 
>If it's held for writing then the pagetable contents should be stable.
>
>> For example, in function swapin_walk_pmd_entry:
>> 
>> pmd_none_or_trans_huge_or_clear_bad(pmd)
>> pmd_read_atomic(pmd)  ---   1
>> pte_offset_map_lock(mm, pmd, ...) ---   2
>> 
>> At point 1, we are assured the content is intact. While in point 2, we would
>> read pmd again to calculate the pte address. How we ensure this time the
>> content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
>> pte is stable, so that the content won't be changed?
>
>Indeed pte_offset_map_lock() will take a non-atomic *pmd value in 
>pte_lockptr() before taking the pte spinlock.
>
>I believe the rule here is that if pmd_none_or_trans_huge_or_clear_bad() 
>finds the pmd 'stable' while holding the mmap_sem read-locked, then the 
>pmd cannot change while we are continuously holding the mmap_sem.
>

I have the same gut feeling as you.

Then I have another question, why in page fault routines, we don't use
pmd_read_atomic() to retrieve pmd value? Since we support concurrent page
fault, and we just grap mmap_sem read-locked during page fault, the value
could be corrupted too.

For example in __handle_mm_fault()

pmd_t orig_pmd = *vmf.pmd;
barrier();

This is why the barrier() is here?

>Hence the followup pte_offset_map_lock() and other iterators can look at 
>the value of the pmd without locking. (Individual pte entries still need 
>the pte-lock, because they might be faulted-in in parallel.)
>
>So the pmd use pattern in swapin_walk_pmd_entry() should be safe.
>
>I'm not 100% sure though - so I've added a few more Cc:s ...
>
>I've also cleaned up the pmd_read_atomic() some more to make it more 
>readable - see the patch below.
>
>Thanks,
>
>   Ingo
>
>==>
>From: Ingo Molnar 
>Date: Wed, 25 Sep 2019 08:38:57 +0200
>Subject: [PATCH] x86/mm: Clean up the pmd_read_atomic() comments
>
>Fix spelling, consistent parenthesis and grammar - and also clarify
>the language where needed.
>
>Cc: Andy Lutomirski 
>Cc: Borislav Petkov 
>Cc: Dave Hansen 
>Cc: H. Peter Anvin 
>Cc: Linus Torvalds 
>Cc: Peter Zijlstra 
>Cc: Rik van Riel 
>Cc: Thomas Gleixner 
>Cc: Wei Yang 
>Link: 
>https://lkml.kernel.org/r/20190925014453.20236-1-richardw.y...@linux.intel.com
>Signed-off-by: Ingo Molnar 

The following change looks good to me.

Reviewed-by: Wei Yang 

>---
> arch/x86/include/asm/pgtable-3level.h | 44 ++-
> 1 file changed, 23 insertions(+), 21 deletions(-)
>
>diff --git a/arch/x86/include/asm/pgtable-3level.h 
>b/arch/x86/include/asm/pgtable-3level.h
>index 1796462ff143..5afb5e0fe903 100644
>--- a/arch/x86/include/asm/pgtable-3level.h
>+++ b/arch/x86/include/asm/pgtable-3level.h
>@@ -36,39 +36,41 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
> 
> #define pmd_read_atomic pmd_read_atomic
> /*
>- * pte_offset_map_lock on 32bit PAE kernels was reading the pmd_t with
>- * a "*pmdp" dereference done by gcc. Problem is, in certain places
>- * where pte_offset_map_lock is called, concurrent page faults are
>+ * pte_offset_map_lock() on 32-bit PAE kernels was reading the pmd_t with
>+ * a "*pmdp" dereference done by GCC. Problem is, in certain places
>+ * where pte_offset_map_lock() is called, concurrent page faults are
>  * allowed, if the mmap_sem is hold for reading. An example is mincore
>  * vs page faults vs MADV_DONTNEED. On the page fault side
>- * pmd_populate rightfully does a set_64bit, but if we're reading the
>+ * pmd_populate() rightfully does a set_64bit(), but if we're reading the
>  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
>- * because gcc will not read the 64bit of the pmd atomically. To fix
>- * this all places running pte_offset_map_lock() while holding the
>+ * because GCC will not read the 64-bit value of the pmd atomically.
>+ *
>+ * To fix this all places running pte_offset_map_lock() while holding the
>  * mmap_sem in read mode, shall read the pmdp pointer using this
>- * function to know if the pmd is null nor not, and in turn to know if
>+ * function to know if the pmd is null or not, and in turn to know if
>  * they can run pte_offset_map_lock() or pmd_trans_huge() or other pmd
>  * operations.
>  *
>- * Without THP if the mmap_sem is hold for reading, the pmd can only
>- * transition from null to not null while pmd_read_atomic runs. So
>+ * Without THP if the mmap_sem is held for reading, the pmd can only
>+ * transition from null to not null while pmd_read_atomic() runs. So
>  * we can always return atomic pmd values with this function.
>  *
>- * 

Re: [v1, 1/1] gpio: dts: aspeed: Add SGPIO driver

2019-09-25 Thread Bartosz Golaszewski
wt., 24 wrz 2019 o 21:07 Hongwei Zhang  napisał(a):
>
> Add SGPIO driver support for Aspeed AST2500 SoC.
>
> Signed-off-by: Hongwei Zhang 
> ---
>  arch/arm/Kconfig |  2 ++
>  arch/arm/boot/dts/aspeed-g5.dtsi | 16 +++-
>  drivers/gpio/Kconfig |  8 
>  drivers/gpio/Makefile|  1 +
>  4 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 2436021..c9f08ab 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1460,6 +1460,8 @@ config ARCH_NR_GPIO
> default 416 if ARCH_SUNXI
> default 392 if ARCH_U8500
> default 352 if ARCH_VT8500
> +   default 312 if MACH_ASPEED_G5
> +   default 304 if MACH_ASPEED_G4
> default 288 if ARCH_ROCKCHIP
> default 264 if MACH_H4700
> default 0
> diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi 
> b/arch/arm/boot/dts/aspeed-g5.dtsi
> index 00f05bd..85da7ea 100644
> --- a/arch/arm/boot/dts/aspeed-g5.dtsi
> +++ b/arch/arm/boot/dts/aspeed-g5.dtsi
> @@ -311,7 +311,7 @@
> #gpio-cells = <2>;
> gpio-controller;
> compatible = "aspeed,ast2500-gpio";
> -   reg = <0x1e78 0x1000>;
> +   reg = <0x1e78 0x200>;
> interrupts = <20>;
> gpio-ranges = < 0 0 232>;
> clocks = < ASPEED_CLK_APB>;
> @@ -319,6 +319,20 @@
> #interrupt-cells = <2>;
> };
>
> +   sgpio: sgpio@1e780200 {
> +   #gpio-cells = <2>;
> +   compatible = "aspeed,ast2500-sgpio";
> +   gpio-controller;
> +   interrupts = <40>;
> +   reg = <0x1e780200 0x0100>;
> +   clocks = < ASPEED_CLK_APB>;
> +   interrupt-controller;
> +   ngpios = <8>;
> +   bus-frequency = <1200>;
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_sgpm_default>;
> +   };
> +
> rtc: rtc@1e781000 {
> compatible = "aspeed,ast2500-rtc";
> reg = <0x1e781000 0x18>;
> diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> index bb13c26..e94f903 100644
> --- a/drivers/gpio/Kconfig
> +++ b/drivers/gpio/Kconfig
> @@ -120,6 +120,14 @@ config GPIO_ASPEED
> help
>   Say Y here to support Aspeed AST2400 and AST2500 GPIO controllers.
>
> +config SGPIO_ASPEED
> +   bool "Aspeed SGPIO support"
> +   depends on (ARCH_ASPEED || COMPILE_TEST) && OF_GPIO
> +   select GPIO_GENERIC
> +   select GPIOLIB_IRQCHIP
> +   help
> + Say Y here to support Aspeed AST2500 SGPIO functionality.
> +
>  config GPIO_ATH79
> tristate "Atheros AR71XX/AR724X/AR913X GPIO support"
> default y if ATH79
> diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
> index a4e9117..bebbd82 100644
> --- a/drivers/gpio/Makefile
> +++ b/drivers/gpio/Makefile
> @@ -32,6 +32,7 @@ obj-$(CONFIG_GPIO_AMD_FCH)+= gpio-amd-fch.o
>  obj-$(CONFIG_GPIO_AMDPT)   += gpio-amdpt.o
>  obj-$(CONFIG_GPIO_ARIZONA) += gpio-arizona.o
>  obj-$(CONFIG_GPIO_ASPEED)  += gpio-aspeed.o
> +obj-$(CONFIG_SGPIO_ASPEED) += sgpio-aspeed.o
>  obj-$(CONFIG_GPIO_ATH79)   += gpio-ath79.o
>  obj-$(CONFIG_GPIO_BCM_KONA)+= gpio-bcm-kona.o
>  obj-$(CONFIG_GPIO_BD70528) += gpio-bd70528.o
> --
> 2.7.4
>

This should be split into separate patches with one extending the
binding document and one adding actual support.

Bart


Re: [PATCH V3] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Mike Rapoport
On Wed, Sep 25, 2019 at 03:18:43PM +0800, Yunfeng Ye wrote:
> sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
> for page management structure, if memory allocation fails from specified
> node, it will fall back to allocate from other nodes.
> 
> Normally, the page management structure will not exceed 2% of the total
> memory, but a large continuous block of allocation is needed. In most
> cases, memory allocation from the specified node will success always,
> but a node memory become highly fragmented will fail. we expect to
> allocate memory base section rather than by allocating a large block of
> memory from other NUMA nodes
> 
> Add memblock_alloc_exact_nid_raw() for this situation, which allocate
> boot memory block on the exact node. If a large contiguous block memory
> allocate fail in sparse_buffer_init(), it will fall back to allocate
> small block memory base section.
> 
> Signed-off-by: Yunfeng Ye 

One nit below, otherwise

Reviewed-by: Mike Rapoport 

> ---
> v2 -> v3:
>  - use "bool exact_nid" instead of "int need_exact_nid"
>  - remove the comment "without panicking"
> 
> v1 -> v2:
>  - use memblock_alloc_exact_nid_raw() rather than using a flag
> 
>  include/linux/memblock.h |  3 +++
>  mm/memblock.c| 65 
> 
>  mm/sparse.c  |  2 +-
>  3 files changed, 58 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index f491690..b38bbef 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -358,6 +358,9 @@ static inline phys_addr_t memblock_phys_alloc(phys_addr_t 
> size,
>MEMBLOCK_ALLOC_ACCESSIBLE);
>  }
> 
> +void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
> +  phys_addr_t min_addr, phys_addr_t max_addr,
> +  int nid);
>  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>phys_addr_t min_addr, phys_addr_t max_addr,
>int nid);
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7d4f61a..0de9d83 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
> *zone,
>   * @start: the lower bound of the memory region to allocate (phys address)
>   * @end: the upper bound of the memory region to allocate (phys address)
>   * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
> + * @exact_nid: control the allocation fall back to other nodes
>   *
>   * The allocation is performed from memory region limited by
>   * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
>   *
> - * If the specified node can not hold the requested memory the
> - * allocation falls back to any node in the system
> + * If the specified node can not hold the requested memory and @exact_nid
> + * is zero, the allocation falls back to any node in the system

 ^ is false  ;-)

>   *
>   * For systems with memory mirroring, the allocation is attempted first
>   * from the regions with mirroring enabled and then retried from any
> @@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
> *zone,
>   */
>  static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>   phys_addr_t align, phys_addr_t start,
> - phys_addr_t end, int nid)
> + phys_addr_t end, int nid,
> + bool exact_nid)
>  {
>   enum memblock_flags flags = choose_memblock_flags();
>   phys_addr_t found;
> @@ -1365,7 +1367,7 @@ static phys_addr_t __init 
> memblock_alloc_range_nid(phys_addr_t size,
>   if (found && !memblock_reserve(found, size))
>   goto done;
> 
> - if (nid != NUMA_NO_NODE) {
> + if (nid != NUMA_NO_NODE && !exact_nid) {
>   found = memblock_find_in_range_node(size, align, start,
>   end, NUMA_NO_NODE,
>   flags);
> @@ -1413,7 +1415,8 @@ phys_addr_t __init 
> memblock_phys_alloc_range(phys_addr_t size,
>phys_addr_t start,
>phys_addr_t end)
>  {
> - return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
> + return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
> + false);
>  }
> 
>  /**
> @@ -1432,7 +1435,7 @@ phys_addr_t __init 
> memblock_phys_alloc_range(phys_addr_t size,
>  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t 
> align, int nid)
>  {
>   return memblock_alloc_range_nid(size, align, 0,
> - MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> +  

Re: [PATCH v1] powerpc/pseries: CMM: Drop page array

2019-09-25 Thread David Hildenbrand
On 10.09.19 18:39, David Hildenbrand wrote:
> We can simply store the pages in a list (page->lru), no need for a
> separate data structure (+ complicated handling). This is how most
> other balloon drivers store allocated pages without additional tracking
> data.
> 
> For the notifiers, use page_to_pfn() to check if a page is in the
> applicable range. plpar_page_set_loaned()/plpar_page_set_active() were
> called with __pa(page_address()) for now, I assume we can simply switch
> to page_to_phys() here. The pfn_to_kaddr() handling is now mostly gone.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Arun KS 
> Cc: Pavel Tatashin 
> Cc: Thomas Gleixner 
> Cc: Andrew Morton 
> Cc: Vlastimil Babka 
> Signed-off-by: David Hildenbrand 
> ---
> 
> Only compile-tested. I hope the page_to_phys() thingy is correct and I
> didn't mess up something else / ignoring something important why the array
> is needed.
> 
> I stumbled over this while looking at how the memory isolation notifier is
> used - and wondered why the additional array is necessary. Also, I think
> by switching to the generic balloon compaction mechanism, we could get
> rid of the memory hotplug notifier and the memory isolation notifier in
> this code, as the migration capability of the inflated pages is the real
> requirement:
>   commit 14b8a76b9d53346f2871bf419da2aaf219940c50
>   Author: Robert Jennings 
>   Date:   Thu Dec 17 14:44:52 2009 +
>   
>   powerpc: Make the CMM memory hotplug aware
>   
>   The Collaborative Memory Manager (CMM) module allocates individual 
> pages
>   over time that are not migratable.  On a long running system this 
> can
>   severely impact the ability to find enough pages to support a 
> hotplug
>   memory remove operation.
>   [...]
> 
> Thoughts?

Ping, is still feature still used at all?

If nobody can test, any advise on which HW I need and how to trigger it?

-- 

Thanks,

David / dhildenb


Re: [PATCH 2/3] task: RCU protect tasks on the runqueue

2019-09-25 Thread Peter Zijlstra
On Mon, Sep 09, 2019 at 07:22:15AM -0500, Eric W. Biederman wrote:

> Let me see if I can explain my confusion in terms of task_numa_compare.
> 
> The function task_numa_comare now does:
> 
>   rcu_read_lock();
>   cur = rcu_dereference(dst_rq->curr);
> 
> Then it proceeds to examine a few fields of cur.
> 
>   cur->flags;
> cur->cpus_ptr;
> cur->numa_group;
> cur->numa_faults[];
> cur->total_numa_faults;
> cur->se.cfs_rq;
> 
> My concern here is the ordering between setting rq->curr and and setting
> those other fields.  If we read values of those fields that were not
> current when we set cur than I think task_numa_compare would give an
> incorrect answer.

That code is explicitly without serialization. One memory barrier isn't
going to make any difference what so ever.

Specifically, those numbers will change _after_ we make it current, not
before.

> From what I can see pick_next_task_fair does not mess with any of
> the fields that task_numa_compare uses.  So the existing memory barrier
> should be more than sufficient for that case.

There should not be, but even if there is, load-balancing in general
(very much including the NUMA balancing) is completely unserialized and
prone to reading stale data at all times.

This is on purpose; it minimizes the interference of load-balancing, and
if we make a 'wrong' choice, the next pass can fix it up.

> So I think I just need to write up a patch 4/3 that replaces the my
> "rcu_assign_pointer(rq->curr, next)" with "RCU_INIT_POINTER(rq->curr,
> next)".  And includes a great big comment to that effect.
> 
> 
> 
> Now maybe I am wrong.  Maybe we can afford to see data that was changed
> before the task became rq->curr in task_numa_compare.  But I don't think
> so.  I really think it is that full memory barrier just a bit earlier
> in __schedule that is keeping us safe.
> 
> Am I wrong?

Not in so far that we now both seem to agree on using
RCU_INIT_POINTER(); but we're still disagreeing on why.

My argument is that since we can already obtain any task_struct pointer
through find_task_by_vpid() and friends, any access to its individual
members must already have serialzation rules (or explicitly none, as for
load-balancing).

I'm viewing this as just another variant of find_task_by_vpid(), except
we index by CPU instead of PID. There can be changes to task_struct
between to invocations of find_task_by_vpid(), any user will have to
somehow deal with that. This is no different.

Take for instance p->cpus_ptr, it has very specific serialzation rules
(which that NUMA balancer thing blatantly ignores), as do a lot of other
task_struct fields.

The memory barrier in question isn't going to help with any of that.

Specifically, if we care about the consistency of the numbers changed by
pick_next_task(), we must acquire rq->lock (of course, once we do that,
we can immediately use rq->curr without further RCU use).


Re: [PATCH V3] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Yunfeng Ye



On 2019/9/25 15:32, Mike Rapoport wrote:
> On Wed, Sep 25, 2019 at 03:18:43PM +0800, Yunfeng Ye wrote:
>> sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
>> for page management structure, if memory allocation fails from specified
>> node, it will fall back to allocate from other nodes.
>>
>> Normally, the page management structure will not exceed 2% of the total
>> memory, but a large continuous block of allocation is needed. In most
>> cases, memory allocation from the specified node will success always,
>> but a node memory become highly fragmented will fail. we expect to
>> allocate memory base section rather than by allocating a large block of
>> memory from other NUMA nodes
>>
>> Add memblock_alloc_exact_nid_raw() for this situation, which allocate
>> boot memory block on the exact node. If a large contiguous block memory
>> allocate fail in sparse_buffer_init(), it will fall back to allocate
>> small block memory base section.
>>
>> Signed-off-by: Yunfeng Ye 
> 
> One nit below, otherwise
> 
> Reviewed-by: Mike Rapoport 
> 
ok.

>> ---
>> v2 -> v3:
>>  - use "bool exact_nid" instead of "int need_exact_nid"
>>  - remove the comment "without panicking"
>>
>> v1 -> v2:
>>  - use memblock_alloc_exact_nid_raw() rather than using a flag
>>
>>  include/linux/memblock.h |  3 +++
>>  mm/memblock.c| 65 
>> 
>>  mm/sparse.c  |  2 +-
>>  3 files changed, 58 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f491690..b38bbef 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -358,6 +358,9 @@ static inline phys_addr_t 
>> memblock_phys_alloc(phys_addr_t size,
>>   MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>> + phys_addr_t min_addr, phys_addr_t max_addr,
>> + int nid);
>>  void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>>   phys_addr_t min_addr, phys_addr_t max_addr,
>>   int nid);
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7d4f61a..0de9d83 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
>> *zone,
>>   * @start: the lower bound of the memory region to allocate (phys address)
>>   * @end: the upper bound of the memory region to allocate (phys address)
>>   * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
>> + * @exact_nid: control the allocation fall back to other nodes
>>   *
>>   * The allocation is performed from memory region limited by
>>   * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
>>   *
>> - * If the specified node can not hold the requested memory the
>> - * allocation falls back to any node in the system
>> + * If the specified node can not hold the requested memory and @exact_nid
>> + * is zero, the allocation falls back to any node in the system
> 
>  ^ is false  ;-)
> 
sorry my negligence, thanks.

>>   *
>>   * For systems with memory mirroring, the allocation is attempted first
>>   * from the regions with mirroring enabled and then retried from any
>> @@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
>> *zone,
>>   */
>>  static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
>>  phys_addr_t align, phys_addr_t start,
>> -phys_addr_t end, int nid)
>> +phys_addr_t end, int nid,
>> +bool exact_nid)
>>  {
>>  enum memblock_flags flags = choose_memblock_flags();
>>  phys_addr_t found;
>> @@ -1365,7 +1367,7 @@ static phys_addr_t __init 
>> memblock_alloc_range_nid(phys_addr_t size,
>>  if (found && !memblock_reserve(found, size))
>>  goto done;
>>
>> -if (nid != NUMA_NO_NODE) {
>> +if (nid != NUMA_NO_NODE && !exact_nid) {
>>  found = memblock_find_in_range_node(size, align, start,
>>  end, NUMA_NO_NODE,
>>  flags);
>> @@ -1413,7 +1415,8 @@ phys_addr_t __init 
>> memblock_phys_alloc_range(phys_addr_t size,
>>   phys_addr_t start,
>>   phys_addr_t end)
>>  {
>> -return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
>> +return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
>> +false);
>>  }
>>
>>  /**
>> @@ -1432,7 +1435,7 @@ phys_addr_t __init 
>> memblock_phys_alloc_range(phys_addr_t size,
>>  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, 
>> 

Re: [PATCH] selftests: kvm: Fix libkvm build error

2019-09-25 Thread Paolo Bonzini
On 24/09/19 22:14, Shuah Khan wrote:
> Fix the following build error:
> 
> libkvm.a(assert.o): relocation R_X86_64_32 against `.rodata.str1.1' can not 
> be used when making a PIE object; recompile with -fPIC
> 
> Add -fPIC to CFLAGS to fix it.

This is wrong, these testcases cannot be position-independent 
executables.  Can you include the failing command line from "V=1" 
output?

The problem seems to be that these definitions are not working properly:

no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \
$(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -no-pie -x c - -o 
"$$TMP", -no-pie)

LDFLAGS += -pthread $(no-pie-option)

Thanks,

Paolo

> Signed-off-by: Shuah Khan 
> ---
>  tools/testing/selftests/kvm/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile 
> b/tools/testing/selftests/kvm/Makefile
> index 62c591f87dab..b4a55d300e75 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -44,7 +44,7 @@ INSTALL_HDR_PATH = $(top_srcdir)/usr
>  LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/
>  LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include
>  CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
> - -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
> + -fno-stack-protector -fPIC -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
>   -I$(LINUX_HDR_PATH) -Iinclude -I$(  
>  no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \
> 



[PATCH V4] mm: Support memblock alloc on the exact node for sparse_buffer_init()

2019-09-25 Thread Yunfeng Ye
sparse_buffer_init() use memblock_alloc_try_nid_raw() to allocate memory
for page management structure, if memory allocation fails from specified
node, it will fall back to allocate from other nodes.

Normally, the page management structure will not exceed 2% of the total
memory, but a large continuous block of allocation is needed. In most
cases, memory allocation from the specified node will success always,
but a node memory become highly fragmented will fail. we expect to
allocate memory base section rather than by allocating a large block of
memory from other NUMA nodes

Add memblock_alloc_exact_nid_raw() for this situation, which allocate
boot memory block on the exact node. If a large contiguous block memory
allocate fail in sparse_buffer_init(), it will fall back to allocate
small block memory base section.

Signed-off-by: Yunfeng Ye 
Reviewed-by: Mike Rapoport 
---
v3 -> v4:
 - add Reviewed-by
 - modify the comment "@exact_nid is false" instead of "@exact_nid is zero"

v2 -> v3:
 - use "bool exact_nid" instead of "int need_exact_nid"
 - remove the comment "without panicking"

v1 -> v2:
 - use memblock_alloc_exact_nid_raw() rather than using a flag

 include/linux/memblock.h |  3 +++
 mm/memblock.c| 65 
 mm/sparse.c  |  2 +-
 3 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f491690..b38bbef 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -358,6 +358,9 @@ static inline phys_addr_t memblock_phys_alloc(phys_addr_t 
size,
 MEMBLOCK_ALLOC_ACCESSIBLE);
 }

+void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
+phys_addr_t min_addr, phys_addr_t max_addr,
+int nid);
 void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
 phys_addr_t min_addr, phys_addr_t max_addr,
 int nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61a..0de9d83 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1323,12 +1323,13 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
*zone,
  * @start: the lower bound of the memory region to allocate (phys address)
  * @end: the upper bound of the memory region to allocate (phys address)
  * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ * @exact_nid: control the allocation fall back to other nodes
  *
  * The allocation is performed from memory region limited by
  * memblock.current_limit if @max_addr == %MEMBLOCK_ALLOC_ACCESSIBLE.
  *
- * If the specified node can not hold the requested memory the
- * allocation falls back to any node in the system
+ * If the specified node can not hold the requested memory and @exact_nid
+ * is false, the allocation falls back to any node in the system
  *
  * For systems with memory mirroring, the allocation is attempted first
  * from the regions with mirroring enabled and then retried from any
@@ -1342,7 +1343,8 @@ __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
  */
 static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t start,
-   phys_addr_t end, int nid)
+   phys_addr_t end, int nid,
+   bool exact_nid)
 {
enum memblock_flags flags = choose_memblock_flags();
phys_addr_t found;
@@ -1365,7 +1367,7 @@ static phys_addr_t __init 
memblock_alloc_range_nid(phys_addr_t size,
if (found && !memblock_reserve(found, size))
goto done;

-   if (nid != NUMA_NO_NODE) {
+   if (nid != NUMA_NO_NODE && !exact_nid) {
found = memblock_find_in_range_node(size, align, start,
end, NUMA_NO_NODE,
flags);
@@ -1413,7 +1415,8 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t 
size,
 phys_addr_t start,
 phys_addr_t end)
 {
-   return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE);
+   return memblock_alloc_range_nid(size, align, start, end, NUMA_NO_NODE,
+   false);
 }

 /**
@@ -1432,7 +1435,7 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t 
size,
 phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t 
align, int nid)
 {
return memblock_alloc_range_nid(size, align, 0,
-   MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+   MEMBLOCK_ALLOC_ACCESSIBLE, nid, false);
 }

 /**
@@ -1442,6 +1445,7 @@ phys_addr_t __init 
memblock_phys_alloc_try_nid(phys_addr_t size, 

Re: [PATCH 15/17] KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers

2019-09-25 Thread Paolo Bonzini
On 24/09/19 23:46, Andrea Arcangeli wrote:
>>>
>>> I would keep only EXIT_REASON_MSR_WRITE, EXIT_REASON_PREEMPTION_TIMER,
>>> EXIT_REASON_EPT_MISCONFIG and add EXIT_REASON_IO_INSTRUCTION.
>> Intuition doesn't work great when it comes to CPU speculative
>> execution runtime. I can however run additional benchmarks to verify
>> your theory that keeping around frequent retpolines will still perform
>> ok.
> On one most recent CPU model there's no measurable difference with
> your list or my list with a hrtimer workload (no cpuid). It's
> challenging to measure any difference below 0.5%.

Let's keep the short list then.

Paolo


Re: [PATCH v2 0/4] task: Making tasks on the runqueue rcu protected

2019-09-25 Thread Peter Zijlstra
On Tue, Sep 17, 2019 at 12:38:04PM -0500, Eric W. Biederman wrote:
> Linus Torvalds  writes:

> > Can anybody see anything wrong with the series? Because I'd love to
> > have it for 5.4,
> 
> Peter,
> 
> I am more than happy for these to come through your tree.  However
> if this is one thing to many I will be happy to send Linus a pull
> request myself early next week.

Yeah, sorry for being late, I fell ill after LPC and am only now
getting back to things.

I see nothing wrong with these patches; if they've not been picked up
(and I'm not seeing them in Linus' tree yet) I'll pick them up now and
munge them together with Mathieu's membarrier patches and get them to
Linus in a few days.


Re: [PATCH v9 03/11] pwm: mediatek: remove a property "has-clks"

2019-09-25 Thread Thierry Reding
On Wed, Sep 25, 2019 at 08:30:03AM +0200, Uwe Kleine-König wrote:
> On Fri, Sep 20, 2019 at 06:49:03AM +0800, Sam Shih wrote:
> > We can use fixed-clock to repair mt7628 pwm during configure from
> > userspace. The SoC is legacy MIPS and has no complex clock tree.
> > Due to we can get clock frequency for period calculation from DT
> > fixed-clock, so we can remove has-clock property, and directly
> > use devm_clk_get and clk_get_rate.
> > 
> > Signed-off-by: Ryder Lee 
> > Signed-off-by: Sam Shih 
> > Acked-by: Uwe Kleine-Kö 
> > ---
> > Changes since v9:
> > Added an Acked-by tag
> 
> Argh, my name was croped and ended up in this state in
> 5c50982af47ffe36df3e31bc9e11be5a067ddd18. Thierry, any chance to repair
> that? Something
> like
> 
>   git filter-branch --msg-filter 'sed "s/Kleine-Kö /Kleine-König /"' 
> linus/master..

Done, though I ended up doing it manually. I don't trust my git
filter-branch skills. =)

Thierry


signature.asc
Description: PGP signature


Re: [PATCH 14/17] KVM: monolithic: x86: inline more exit handlers in vmx.c

2019-09-25 Thread Paolo Bonzini
On 24/09/19 03:55, Andrea Arcangeli wrote:
>> So it's forty bytes.  I think we can leave this out.
> This commit I reverted adds literally 3 inlines called by 3 functions,
> in a very fast path, how many bytes of .text difference did you expect
> by dropping some call/ret from a very fast path when you asked me to
> test it? I mean it's just a couple of insn each.

Actually I was either expecting the difference to be zero, meaning GCC
was already inlining them.

I think it is not inlining the functions because they are still
referenced by vmx_exit_handlers.  After patch 15 you could drop them
from the array, and then GCC should inline them.

Paolo


Re: acer_wmi: Unknown function(s) on Acer Nitro 5 (AN515-43-R8BF)

2019-09-25 Thread Joey Lee
Hi Gabriel,

On Mon, Sep 23, 2019 at 09:45:05PM +0200, Gabriel C wrote:
> Hi guys,
> 
> I noticed some warning in dmesg on this Laptop.
> 
> Fn+right, Fn+left is BrightnessDown/Up and produce the following warning:
> 
> acer_wmi: Unknown function number - 4 - 0
> 
> The brightness has some other issue on this Laptop but not sure
> who to blame on this. Probably amdgpu.?
> 
> /sys/class/backlight/amdgpu_bl1/brightness <-> actual_brightness
> seems to mismatch.  EG: when brightness is 0 actual_brightness is still 5140.
>

Base on _BCM and _BQC in your DSDT. The backlight control is handled by EC.
But, in some Acer machines that the _BCM is broken. You can try to modify
brighess by echo to sys/class/backlight/acpi_video0/brightness
 
> Unplugging the AC gives the following warning:
> 
> acer_wmi: Unknown function number - 8 - 0
> 
> When plugging the AC back I see;
> 
> acer_wmi: Unknown function number - 8 - 1.
> 
> I uploaded a dump of the acpi tables and dmidecode of the box.
> 
> https://www.frugalware.org/~crazy/nitro5/ACPI
> https://www.frugalware.org/~crazy/nitro5/DMI
> 
> Please let me know if you need any other informations.
>

Thanks for your report the behavior for the function 4 and function 8.
Maybe we can use the platform event to do something. e.g. expose key code
to userland. Unfortunately my working list is too long that I do not have
time for it currently.

Thanks a lot!
Joey Lee 


[PATCH v1] ARM: config: Add Realtek defconfig

2019-09-25 Thread James Tai
From: "james.tai" 

Add a defconfig for Realtek RTD16XX and RTD13XX platform.

Signed-off-by: james.tai 
---
Changes since last version:
- disable CONFIG_EMBEDDED.
- disable CONFIG_ARM_THUMBEE.
- disable CONFIG_SCHED_SMT.
- disable CONFIG_OABI_COMPAT.
- disable CONFIG_EFI.
- disable CONFIG_QORIQ_CPUFREQ.
- disable CONFIG_NET_DSA.
- disable CONFIG_CAN.
- disable CONFIG_CAN_FLEXCAN.
- disable CONFIG_CAN_RCAR.
- disable CONFIG_BT.
- disable CONFIG_BT_HCIUART.
- disable CONFIG_BT_HCIUART_BCM.
- disable CONFIG_BT_MRVL.
- disable CONFIG_BT_MRVL_SDIO.
- disable CONFIG_MTD.
- disable CONFIG_MTD_CMDLINE_PARTS.
- disable CONFIG_MTD_BLOCK.
- disable CONFIG_MTD_CFI.
- disable CONFIG_MTD_PHYSMAP.
- disable CONFIG_MTD_PHYSMAP_OF.
- disable CONFIG_MTD_RAW_NAND.
- disable CONFIG_MTD_NAND_DENALI_DT.
- disable CONFIG_MTD_NAND_BRCMNAND.
- disable CONFIG_BLK_DEV_RAM_SIZE.
- disable CONFIG_USB_PEGASUS.
- disable CONFIG_USB_RTL8152.
- disable CONFIG_USB_LAN78XX.
- disable CONFIG_USB_USBNET.
- disable CONFIG_USB_NET_SMSC75XX.
- disable CONFIG_USB_NET_SMSC95XX.
- disable CONFIG_BRCMFMAC.
- disable CONFIG_MWIFIEX.
- disable CONFIG_MWIFIEX_SDIO.
- disable CONFIG_RT2X00.
- disable CONFIG_RT2800USB.
- enable CONFIG_NET_VENDOR_3COM.
- enable CONFIG_NET_VENDOR_ADAPTEC.
- enable CONFIG_NET_VENDOR_AGERE.
- enable CONFIG_NET_VENDOR_ALACRITECH.
- enable CONFIG_NET_VENDOR_ALTEON.
---
 arch/arm/configs/realtek_defconfig | 390 +
 1 file changed, 390 insertions(+)
 create mode 100644 arch/arm/configs/realtek_defconfig

diff --git a/arch/arm/configs/realtek_defconfig 
b/arch/arm/configs/realtek_defconfig
new file mode 100644
index ..5effebf14c02
--- /dev/null
+++ b/arch/arm/configs/realtek_defconfig
@@ -0,0 +1,390 @@
+CONFIG_SYSVIPC=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_CGROUPS=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_PERF_EVENTS=y
+CONFIG_ARCH_REALTEK=y
+CONFIG_ARCH_RTD13XX=y
+CONFIG_ARCH_RTD16XX=y
+# CONFIG_CACHE_L2X0 is not set
+# CONFIG_ARM_ERRATA_643719 is not set
+CONFIG_ARM_ERRATA_814220=y
+CONFIG_SMP=y
+CONFIG_SCHED_MC=y
+CONFIG_HAVE_ARM_ARCH_TIMER=y
+CONFIG_MCPM=y
+CONFIG_NR_CPUS=6
+CONFIG_HZ_250=y
+CONFIG_HIGHMEM=y
+CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_SECCOMP=y
+CONFIG_ARM_APPENDED_DTB=y
+CONFIG_ARM_ATAG_DTB_COMPAT=y
+CONFIG_KEXEC=y
+CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_STAT=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+CONFIG_CPU_FREQ_GOV_POWERSAVE=m
+CONFIG_CPU_FREQ_GOV_USERSPACE=m
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m
+CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
+CONFIG_CPUFREQ_DT=y
+CONFIG_CPU_IDLE=y
+CONFIG_ARM_CPUIDLE=y
+CONFIG_VFP=y
+CONFIG_NEON=y
+CONFIG_KERNEL_MODE_NEON=y
+CONFIG_TRUSTED_FOUNDATIONS=y
+CONFIG_EFI_VARS=m
+CONFIG_EFI_CAPSULE_LOADER=m
+CONFIG_ARM_CRYPTO=y
+CONFIG_CRYPTO_SHA1_ARM_NEON=m
+CONFIG_CRYPTO_SHA1_ARM_CE=m
+CONFIG_CRYPTO_SHA2_ARM_CE=m
+CONFIG_CRYPTO_SHA512_ARM=m
+CONFIG_CRYPTO_AES_ARM=m
+CONFIG_CRYPTO_AES_ARM_BS=m
+CONFIG_CRYPTO_AES_ARM_CE=m
+CONFIG_CRYPTO_GHASH_ARM_CE=m
+CONFIG_CRYPTO_CRC32_ARM_CE=m
+CONFIG_CRYPTO_CHACHA20_NEON=m
+CONFIG_JUMP_LABEL=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_CMDLINE_PARTITION=y
+CONFIG_CMA=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_IPV6_TUNNEL=m
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_CFG80211=m
+CONFIG_MAC80211=m
+CONFIG_RFKILL=y
+CONFIG_RFKILL_INPUT=y
+CONFIG_RFKILL_GPIO=y
+CONFIG_PCI=y
+CONFIG_PCIEPORTBUS=y
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+CONFIG_MTD_SPI_NOR=y
+CONFIG_MTD_UBI=y
+CONFIG_OF_OVERLAY=y
+CONFIG_VIRTIO_BLK=y
+CONFIG_AD525X_DPOT=y
+CONFIG_AD525X_DPOT_I2C=y
+CONFIG_SRAM=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_AHCI_PLATFORM=y
+# CONFIG_ATA_SFF is not set
+CONFIG_NETDEVICES=y
+CONFIG_VIRTIO_NET=y
+CONFIG_B53_MDIO_DRIVER=m
+CONFIG_B53_MMAP_DRIVER=m
+CONFIG_B53_SRAB_DRIVER=m
+CONFIG_B53_SERDES=m
+CONFIG_NET_DSA_BCM_SF2=m
+CONFIG_NET_VENDOR_3COM=y
+CONFIG_NET_VENDOR_ADAPTEC=y
+CONFIG_NET_VENDOR_AGERE=y
+CONFIG_NET_VENDOR_ALACRITECH=y
+CONFIG_NET_VENDOR_ALTEON=y
+# CONFIG_NET_VENDOR_AMAZON is not set
+# CONFIG_NET_VENDOR_AMD is not set
+# CONFIG_NET_VENDOR_AQUANTIA is not set
+# CONFIG_NET_VENDOR_ARC is not set
+# CONFIG_NET_VENDOR_ATHEROS is not set
+# CONFIG_NET_VENDOR_AURORA is not set
+# CONFIG_NET_VENDOR_BROADCOM is not set
+# CONFIG_NET_VENDOR_BROCADE is not set
+# CONFIG_NET_VENDOR_CADENCE is not set
+# CONFIG_NET_VENDOR_CAVIUM is not set
+# 

RE: [PATCH] dimlib: make DIMLIB a hidden symbol

2019-09-25 Thread Kiyanovski, Arthur
> -Original Message-
> From: Tal Gilboa 
> Sent: Friday, September 20, 2019 8:02 PM
> To: Uwe Kleine-König ; Saeed Mahameed
> ; Kiyanovski, Arthur 
> Cc: linux-kernel@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [PATCH] dimlib: make DIMLIB a hidden symbol
> 
> On 9/20/2019 4:31 PM, Uwe Kleine-König wrote:
> > According to Tal Gilboa the only benefit from DIM comes from a driver
> > that uses it. So it doesn't make sense to make this symbol user visible,
> > instead all drivers that use it should select it (as is already the case
> > AFAICT).
> >
> > Signed-off-by: Uwe Kleine-König 
> > ---
> >   lib/Kconfig | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/lib/Kconfig b/lib/Kconfig
> > index cc04124ed8f7..9fe8a21fd183 100644
> > --- a/lib/Kconfig
> > +++ b/lib/Kconfig
> > @@ -555,8 +555,7 @@ config SIGNATURE
> >   Implementation is done using GnuPG MPI library
> >
> >   config DIMLIB
> > -   bool "DIM library"
> > -   default y
> > +   bool
> > help
> >   Dynamic Interrupt Moderation library.
> >   Implements an algorithm for dynamically change CQ moderation
> values
> >
> There's a pending series using DIM which didn't add the select clause
> [1]. Arthur, FYI. Other than that LGTM.
> 
> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg314304.html

Thanks Tal, I missed that.


Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

2019-09-25 Thread Dave Chinner
On Tue, Sep 24, 2019 at 12:08:04PM -0700, Linus Torvalds wrote:
> On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner  wrote:
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> >
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
> 
> Our dirty_background stuff is very questionable, but it exists (and
> has those insane defaults) because of various legacy reasons.

That's not what I was asking about.  The context is in the previous
lines you didn't quote:

> > > > Is the faster speed reproducible?  I don't quite understand why this
> > > > would be.
> > >
> > > Writing to disk simply starts earlier.
> >
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:

i.e. I'm asking about the reasons for the performance differential
not asking for an explanation of what writebehind is. If the
performance differential really is caused by writeback starting
sooner, then winding down dirty_background_bytes should produce
exactly the same performance because it will start writeback -much
faster-.

If it doesn't, then the assertion that the difference is caused by
earlier writeout is questionable and the code may not actually be
doing what is claimed

Basically, I'm asking for proof that the explanation is correct.

> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> >
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
> 
> The thing is, that also accounts for dirty shared mmap pages. And it
> really will kill some benchmarks that people take very very seriously.

Yes, I know that. I'm not suggesting that we do this,

[snip]

> Anyway, the end result of all this is that we have that
> balance_dirty_pages() that is pretty darn complex and I suspect very
> few people understand everything that goes on in that function.

I'd agree with you there - most of the ground work for the
balance_dirty_pages IO throttling feedback loop was all based on
concepts I developed to solve dirty page writeback thrashing
problems on Irix back in 2003.  The code we have in Linux was
written by Fenguang Wu with help for a lot of people, but the
underlying concepts of delegating IO to dedicated writeback threads
that calculate and track page cleaning rates (BDI writeback rates)
and then throttling incoming page dirtying rate to the page cleaning
rate all came out of my head

So, much as it may surprise you, I am one of the few people who do
actually understand how that whole complex mass of accounting and
feedback is supposed to work. :)

> Now, whether write-behind really _does_ help that, or whether it's
> just yet another tweak and complication, I can't actually say.

Neither can I at this point - I lack the data and that's why I was
asking if there was a perf difference with the existing limits wound
right down. Knowing whether the performance difference is simply a
result of starting writeback IO sooner tells me an awful lot about
what other behaviour is happening as a result of the changes in this
patch.

> But I
> don't think 'dirty_background_bytes' is really an argument against
> write-behind, it's just one knob on the very complex dirty handling we
> have.

Never said it was - just trying to determine if a one line
explanation is true or not.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH trivial] usb: Fix Kconfig indentation

2019-09-25 Thread Johan Hovold
On Mon, Sep 23, 2019 at 05:49:56PM +0200, Krzysztof Kozlowski wrote:
> Adjust indentation from spaces to tab (+optional two spaces) as in
> coding style with command like:
> $ sed -e 's/^/\t/' -i */Kconfig
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/usb/dwc3/Kconfig   |  4 +-
>  drivers/usb/gadget/legacy/Kconfig  | 20 -
>  drivers/usb/gadget/udc/Kconfig |  2 +-
>  drivers/usb/host/Kconfig   | 68 +++---
>  drivers/usb/misc/Kconfig   |  8 ++--
>  drivers/usb/misc/sisusbvga/Kconfig |  2 +-
>  drivers/usb/serial/Kconfig | 44 +--
>  7 files changed, 74 insertions(+), 74 deletions(-)

Acked-by: Johan Hovold 


Re: [PATCH] platform/x86: dell-laptop: fix phantom kbd backlight on Inspiron 10xx

2019-09-25 Thread Andy Shevchenko
On Mon, Sep 23, 2019 at 4:24 PM  wrote:
> > From: Pali Rohár 
> > Sent: Sunday, September 22, 2019 8:43 AM
> > To: Pacien TRAN-GIRARD
> > Cc: Matthew Garrett; Darren Hart; Andy Shevchenko; platform-driver-
> > x...@vger.kernel.org; linux-kernel@vger.kernel.org; Limonciello, Mario
> > Subject: Re: [PATCH] platform/x86: dell-laptop: fix phantom kbd backlight on
> > Inspiron 10xx

> > We need to wait what Mario wrote about this particular problem.
> >
>
> I agree an Inspiron is unlikely to be updated 9 years later.  I think the 
> right thing
> to do in this instance is to blacklist this particular platform in kernel 
> driver.

Does it mean you are okay with the proposed patch? Can you give your tag then?

-- 
With Best Regards,
Andy Shevchenko


Re: [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups

2019-09-25 Thread Peter Zijlstra
On Mon, Sep 23, 2019 at 10:55:32AM -0400, Mathieu Desnoyers wrote:
> - On Sep 23, 2019, at 5:06 AM, Peter Zijlstra pet...@infradead.org wrote:
> 
> > On Thu, Sep 19, 2019 at 01:36:58PM -0400, Mathieu Desnoyers wrote:
> >> Hi,
> >> 
> >> Those series of fixes and cleanups are initially motivated by the report
> >> of race in membarrier, which can load p->mm->membarrier_state after mm
> >> has been freed (use-after-free).
> >> 
> > 
> > The lot looks good to me; what do you want done with them (them being
> > RFC and all) ?
> 
> I can either re-send them without the RFC tag, or you can pick them directly
> through the scheduler tree.

I've picked them up (and fixed them up, they didn't apply to tip) and
merge them with Eric's task_rcu_dereference() patches.

I'll push it out in a bit.


Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

2019-09-25 Thread Konstantin Khlebnikov

On 25/09/2019 10.18, Dave Chinner wrote:

On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:

On 24/09/2019 10.39, Dave Chinner wrote:

On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:

On 23/09/2019 17.52, Tejun Heo wrote:

Hello, Konstantin.

On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:

With vm.dirty_write_behind 1 or 2 files are written even faster and


Is the faster speed reproducible?  I don't quite understand why this
would be.


Writing to disk simply starts earlier.


Stupid question: how is this any different to simply winding down
our dirty writeback and throttling thresholds like so:

# echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes

to start background writeback when there's 100MB of dirty pages in
memory, and then:

# echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes

So that writers are directly throttled at 200MB of dirty pages in
memory?

This effectively gives us global writebehind behaviour with a
100-200MB cache write burst for initial writes.


Global limits affect all dirty pages including memory-mapped and
randomly touched. Write-behind aims only into sequential streams.


There are  apps that do sequential writes via mmap()d files.
They should do writebehind too, yes?


I see no reason for that. This is different scenario.

Mmap have no clear signal about "end of write", only page fault at
beginning. Theoretically we could implement similar sliding window and
start writeback on consequent page faults.

But applications who use memory mapped files probably knows better what
to do with this data. I prefer to leave them alone for now.




ANd, really such strict writebehind behaviour is going to cause all
sorts of unintended problesm with filesystems because there will be
adverse interactions with delayed allocation. We need a substantial
amount of dirty data to be cached for writeback for fragmentation
minimisation algorithms to be able to do their job


I think most sequentially written files never change after close.


There are lots of apps that write zeros to initialise and allocate
space, then go write real data to them. Database WAL files are
commonly initialised like this...


Those zeros are just bunch of dirty pages which have to be written.
Sync and memory pressure will do that, why write-behind don't have to?




Except of knowing final size of huge files (>16Mb in my patch)
there should be no difference for delayed allocation.


There is, because you throttle the writes down such that there is
only 16MB of dirty data in memory. Hence filesystems will only
typically allocate in 16MB chunks as that's all the delalloc range
spans.

I'm not so concerned for XFS here, because our speculative
preallocation will handle this just fine, but for ext4 and btrfs
it's going to interleave the allocate of concurrent streaming writes
and fragment the crap out of the files.

In general, the smaller you make the individual file writeback
window, the worse the fragmentation problems gets


AFAIR ext4 already preallocates extent beyond EOF too.

But this must be carefully tested for all modern fs for sure.




Probably write behind could provide hint about streaming pattern:
pass something like "MSG_MORE" into writeback call.


How does that help when we've only got dirty data and block
reservations up to EOF which is no more than 16MB away?


Block allocator should interpret this flags as "more data are
expected" and preallocate extent bigger than data and beyond EOF.



Cheers,

Dave.



Re: [PATCH v6 1/7] media: v4l2-core: Implement v4l2_ctrl_new_std_compound

2019-09-25 Thread Jacopo Mondi
Hi Ricardo,

On Fri, Sep 20, 2019 at 03:51:31PM +0200, Ricardo Ribalda Delgado wrote:
> Currently compound controls do not have a simple way of initializing its
> values. This results in ofuscated code with type_ops init.
>
> This patch introduces a new field on the control with the default value
> for the compound control that can be set with the brand new
> v4l2_ctrl_new_std_compound function
>
> Suggested-by: Hans Verkuil 
> Signed-off-by: Ricardo Ribalda Delgado 
> ---
>  drivers/media/v4l2-core/v4l2-ctrls.c | 50 
>  include/media/v4l2-ctrls.h   | 22 +++-
>  2 files changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
> b/drivers/media/v4l2-core/v4l2-ctrls.c
> index 1d8f38824631..219d8aeefa20 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -29,6 +29,8 @@
>  #define call_op(master, op) \
>   (has_op(master, op) ? master->ops->op(master) : 0)
>
> +static const union v4l2_ctrl_ptr ptr_null;
> +
>  /* Internal temporary helper struct, one for each v4l2_ext_control */
>  struct v4l2_ctrl_helper {
>   /* Pointer to the control reference of the master control */
> @@ -1530,7 +1532,10 @@ static void std_init_compound(const struct v4l2_ctrl 
> *ctrl, u32 idx,
>   struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params;
>   void *p = ptr.p + idx * ctrl->elem_size;
>
> - memset(p, 0, ctrl->elem_size);
> + if (ctrl->p_def.p)
> + memcpy(p, ctrl->p_def.p, ctrl->elem_size);
> + else
> + memset(p, 0, ctrl->elem_size);
>
>   /*
>* The cast is needed to get rid of a gcc warning complaining that
> @@ -2354,7 +2359,8 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
> v4l2_ctrl_handler *hdl,
>   s64 min, s64 max, u64 step, s64 def,
>   const u32 dims[V4L2_CTRL_MAX_DIMS], u32 elem_size,
>   u32 flags, const char * const *qmenu,
> - const s64 *qmenu_int, void *priv)
> + const s64 *qmenu_int, const union v4l2_ctrl_ptr p_def,
> + void *priv)
>  {
>   struct v4l2_ctrl *ctrl;
>   unsigned sz_extra;
> @@ -2460,6 +2466,9 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
> v4l2_ctrl_handler *hdl,
>is_array)
>   sz_extra += 2 * tot_ctrl_size;
>
> + if (type >= V4L2_CTRL_COMPOUND_TYPES && p_def.p)
> + sz_extra += elem_size;
> +

I'm not sure I get how sz_extra is used in this function and what's
its purpose, just be aware that the previous if() condition already

sz_extra += 2 * tot_ctrl_size

for compound controls.

Are you just making space for the new p_def elements ? Should't you
then use tot_cltr_size ? In patch 3 you add support for AREA which has
a single element, but could p_def in future transport multiple values?

>   ctrl = kvzalloc(sizeof(*ctrl) + sz_extra, GFP_KERNEL);
>   if (ctrl == NULL) {
>   handler_set_err(hdl, -ENOMEM);
> @@ -2503,6 +2512,12 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
> v4l2_ctrl_handler *hdl,
>   ctrl->p_new.p = >val;
>   ctrl->p_cur.p = >cur.val;
>   }
> +
> + if (type >= V4L2_CTRL_COMPOUND_TYPES && p_def.p) {
> + ctrl->p_def.p = ctrl->p_cur.p + tot_ctrl_size;
> + memcpy(ctrl->p_def.p, p_def.p, elem_size);
> + }
> +
>   for (idx = 0; idx < elems; idx++) {
>   ctrl->type_ops->init(ctrl, idx, ctrl->p_cur);
>   ctrl->type_ops->init(ctrl, idx, ctrl->p_new);
> @@ -2554,7 +2569,7 @@ struct v4l2_ctrl *v4l2_ctrl_new_custom(struct 
> v4l2_ctrl_handler *hdl,
>   type, min, max,
>   is_menu ? cfg->menu_skip_mask : step, def,
>   cfg->dims, cfg->elem_size,
> - flags, qmenu, qmenu_int, priv);
> + flags, qmenu, qmenu_int, ptr_null, priv);
>   if (ctrl)
>   ctrl->is_private = cfg->is_private;
>   return ctrl;
> @@ -2579,7 +2594,7 @@ struct v4l2_ctrl *v4l2_ctrl_new_std(struct 
> v4l2_ctrl_handler *hdl,
>   }
>   return v4l2_ctrl_new(hdl, ops, NULL, id, name, type,
>min, max, step, def, NULL, 0,
> -  flags, NULL, NULL, NULL);
> +  flags, NULL, NULL, ptr_null, NULL);
>  }
>  EXPORT_SYMBOL(v4l2_ctrl_new_std);
>
> @@ -2612,7 +2627,7 @@ struct v4l2_ctrl *v4l2_ctrl_new_std_menu(struct 
> v4l2_ctrl_handler *hdl,
>   }
>   return v4l2_ctrl_new(hdl, ops, NULL, id, name, type,
>0, max, mask, def, NULL, 0,
> -  flags, qmenu, qmenu_int, NULL);
> +  flags, qmenu, qmenu_int, ptr_null, NULL);
>  }
>  EXPORT_SYMBOL(v4l2_ctrl_new_std_menu);
>
> @@ -2644,11 +2659,32 @@ struct v4l2_ctrl *v4l2_ctrl_new_std_menu_items(struct 
> v4l2_ctrl_handler *hdl,
>   

Re: [v7 1/2] mtd: rawnand: Add new Cadence NAND driver to MTD subsystem (fwd)

2019-09-25 Thread Miquel Raynal
Hi Piotr,

Can you fix the below issue reported by Julia? Either convert the
structure parameter to a signed parameter or use an intermediate
variable.

Thanks,
Miquèl

Julia Lawall  wrote on Wed, 18 Sep 2019 21:04:37
+0200 (CEST):

> -- Forwarded message --
> Date: Wed, 18 Sep 2019 23:17:29 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [v7 1/2] mtd: rawnand: Add new Cadence NAND driver to MTD 
> subsystem
> 
> CC: kbuild-...@01.org
> In-Reply-To: <20190918123115.30510-1-pio...@cadence.com>
> References: <20190918123115.30510-1-pio...@cadence.com>
> TO: Piotr Sroka 
> CC: Kazuhiro Kasai , Piotr Sroka 
> , Miquel Raynal , Richard 
> Weinberger , David Woodhouse , Brian 
> Norris , Marek Vasut , 
> Vignesh Raghavendra , Mauro Carvalho Chehab 
> , "David S. Miller" , Greg 
> Kroah-Hartman , Linus Walleij 
> , Nicolas Ferre , 
> "Paul E. McKenney" , Boris Brezillon 
> , Thomas Gleixner , Paul 
> Cercueil , Arnd Bergmann , Marcel 
> Ziswiler , Liang Yang , 
> Anders Roxell , linux-kernel@vger.kernel.org, 
> linux-...@lists.infradead.org
> 
> Hi Piotr,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [cannot apply to v5.3 next-20190917]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Piotr-Sroka/mtd-rawnand-Add-new-Cadence-NAND-driver-to-MTD-subsystem/20190918-204505
> :: branch date: 3 hours ago
> :: commit date: 3 hours ago
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
> Reported-by: Julia Lawall 
> 
> >> drivers/mtd/nand/raw/cadence-nand-controller.c:2644:5-28: WARNING: 
> >> Unsigned expression compared with zero: cdns_chip -> corr_str_idx < 0  
> 
> # 
> https://github.com/0day-ci/linux/commit/3235ae79d58b8d95b44d5d3773f59065f04d4f00
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 3235ae79d58b8d95b44d5d3773f59065f04d4f00
> vim +2644 drivers/mtd/nand/raw/cadence-nand-controller.c
> 
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2584
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2585  int 
> cadence_nand_attach_chip(struct nand_chip *chip)
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2586  {
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2587   struct cdns_nand_ctrl 
> *cdns_ctrl = to_cdns_nand_ctrl(chip->controller);
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2588   struct cdns_nand_chip 
> *cdns_chip = to_cdns_nand_chip(chip);
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2589   u32 ecc_size = 
> cdns_chip->sector_count * chip->ecc.bytes;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2590   struct mtd_info *mtd = 
> nand_to_mtd(chip);
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2591   u32 max_oob_data_size;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2592   int ret;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2593
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2594   if (chip->options & 
> NAND_BUSWIDTH_16) {
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2595   ret = 
> cadence_nand_set_access_width16(cdns_ctrl, true);
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2596   if (ret)
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2597   goto free_buf;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2598   }
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2599
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2600   chip->bbt_options |= 
> NAND_BBT_USE_FLASH;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2601   chip->bbt_options |= 
> NAND_BBT_NO_OOB;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2602   chip->ecc.mode = NAND_ECC_HW;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2603
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2604   chip->options |= 
> NAND_NO_SUBPAGE_WRITE;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2605
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2606   cdns_chip->bbm_offs = 
> chip->badblockpos;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2607   if (chip->options & 
> NAND_BUSWIDTH_16) {
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2608   cdns_chip->bbm_offs &= 
> ~0x01;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2609   cdns_chip->bbm_len = 2;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2610   } else {
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2611   cdns_chip->bbm_len = 1;
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2612   }
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2613
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2614   ret = nand_ecc_choose_conf(chip,
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2615  
> _ctrl->ecc_caps,
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2616  
> mtd->oobsize - cdns_chip->bbm_len);
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2617   if (ret) {
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2618   dev_err(cdns_ctrl->dev, 
> "ECC configuration failed\n");
> 3235ae79d58b8d Piotr Sroka 2019-09-18  2619   goto free_buf;
> 3235ae79d58b8d Piotr Sroka 

Re: [PATCH v6 2/7] Documentation: v4l2_ctrl_new_std_compound

2019-09-25 Thread Jacopo Mondi
Hi Ricardo,

On Fri, Sep 20, 2019 at 03:51:32PM +0200, Ricardo Ribalda Delgado wrote:
> Function for initializing compound controls with a default value.
>
> Suggested-by: Hans Verkuil 
> Signed-off-by: Ricardo Ribalda Delgado 
> ---
>  Documentation/media/kapi/v4l2-controls.rst | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/Documentation/media/kapi/v4l2-controls.rst 
> b/Documentation/media/kapi/v4l2-controls.rst
> index ebe2a55908be..b20800cae3f2 100644
> --- a/Documentation/media/kapi/v4l2-controls.rst
> +++ b/Documentation/media/kapi/v4l2-controls.rst
> @@ -140,6 +140,15 @@ Menu controls with a driver specific menu are added by 
> calling
> const struct v4l2_ctrl_ops *ops, u32 id, s32 max,
> s32 skip_mask, s32 def, const char * const *qmenu);
>
> +Standard compound controls can be added by calling

Standard compound controls with a driver provided default value can be..

Or is it un-necessary to re-state it?

In any case:
Reviewed-by: Jacopo Mondi 

Thanks
  j
> +:c:func:`v4l2_ctrl_new_std_compound`:
> +
> +.. code-block:: c
> +
> +   struct v4l2_ctrl *v4l2_ctrl_new_std_compound(struct v4l2_ctrl_handler 
> *hdl,
> +   const struct v4l2_ctrl_ops *ops, u32 id,
> +   const union v4l2_ctrl_ptr p_def);
> +
>  Integer menu controls with a driver specific menu can be added by calling
>  :c:func:`v4l2_ctrl_new_int_menu`:
>
> --
> 2.23.0
>


signature.asc
Description: PGP signature


Re: [PATCH] platform/x86: dell-laptop: fix phantom kbd backlight on Inspiron 10xx

2019-09-25 Thread Pali Rohár
On Wednesday 25 September 2019 11:07:35 Andy Shevchenko wrote:
> On Mon, Sep 23, 2019 at 4:24 PM  wrote:
> > > From: Pali Rohár 
> > > Sent: Sunday, September 22, 2019 8:43 AM
> > > To: Pacien TRAN-GIRARD
> > > Cc: Matthew Garrett; Darren Hart; Andy Shevchenko; platform-driver-
> > > x...@vger.kernel.org; linux-kernel@vger.kernel.org; Limonciello, Mario
> > > Subject: Re: [PATCH] platform/x86: dell-laptop: fix phantom kbd backlight 
> > > on
> > > Inspiron 10xx
> 
> > > We need to wait what Mario wrote about this particular problem.
> > >
> >
> > I agree an Inspiron is unlikely to be updated 9 years later.  I think the 
> > right thing
> > to do in this instance is to blacklist this particular platform in kernel 
> > driver.
> 
> Does it mean you are okay with the proposed patch? Can you give your tag then?

I would rather use different name as kbd_phantom_backlight. E.g.
lbd_broken_backlight or something which indicates non-working / disabled
support.

Also in commit message is written that problem is with KBD_LED_OFF_TOKEN
and KBD_LED_ON_TOKEN tokens, but proposed patch does not blacklist these
tokens, but rather turn off whole kbd functionality. So proposed patch
does not patch commit message, nor explain it.

-- 
Pali Rohár
pali.ro...@gmail.com


[PATCH v1] drivers/base/memory.c: Drop the mem_sysfs_mutex

2019-09-25 Thread David Hildenbrand
The mem_sysfs_mutex isn't really helpful. Also, it's not really clear what
the mutex protects at all.

The device lists of the memory subsystem are protected separately. We don't
need that mutex when looking up. creating, or removing independent
devices. find_memory_block_by_id() will perform locking on its own and
grab a reference of the returned device.

At the time memory_dev_init() is called, we cannot have concurrent
hot(un)plug operations yet - we're still fairly early during boot. We
don't need any locking.

The creation/removal of memory block devices should be protected
on a higher level - especially using the device hotplug lock to avoid
documented issues (see Documentation/core-api/memory-hotplug.rst) - or
if that is reworked, using similar locking.

Protecting in the context of these functions only doesn't really make
sense. Especially, if we would have a situation where the same memory
blocks are created/deleted at the same time, there is something horribly
going wrong (imagining adding/removing a DIMM at the same time from two
call paths) - after the functions succeeded something else in the
callers would blow up (e.g., create_memory_block_devices() succeeded but
there are no memory block devices anymore).

All relevant call paths (except when adding memory early during boot
via ACPI, which is now documented) hold the device hotplug lock when
adding memory, and when removing memory. Let's document that instead.

Add a simple safety net to create_memory_block_devices() in case we
would actually remove memory blocks while adding them, so we'll never
dereference a NULL pointer. Simplify memory_dev_init() now that the
lock is gone.

Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Signed-off-by: David Hildenbrand 
---

Tested using my usual x86-64 DIMM based hot(un)plug setup.

---
 drivers/base/memory.c | 33 ++---
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 6bea4f3f8040..634aab8e1e19 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -19,15 +19,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
 #include 
 #include 
 
-static DEFINE_MUTEX(mem_sysfs_mutex);
-
 #define MEMORY_CLASS_NAME  "memory"
 
 #define to_memory_block(dev) container_of(dev, struct memory_block, dev)
@@ -702,6 +699,8 @@ static void unregister_memory(struct memory_block *memory)
  * Create memory block devices for the given memory area. Start and size
  * have to be aligned to memory block granularity. Memory block devices
  * will be initialized as offline.
+ *
+ * Called under device_hotplug_lock.
  */
 int create_memory_block_devices(unsigned long start, unsigned long size)
 {
@@ -715,7 +714,6 @@ int create_memory_block_devices(unsigned long start, 
unsigned long size)
 !IS_ALIGNED(size, memory_block_size_bytes(
return -EINVAL;
 
-   mutex_lock(_sysfs_mutex);
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
ret = init_memory_block(, block_id, MEM_OFFLINE);
if (ret)
@@ -727,11 +725,12 @@ int create_memory_block_devices(unsigned long start, 
unsigned long size)
for (block_id = start_block_id; block_id != end_block_id;
 block_id++) {
mem = find_memory_block_by_id(block_id);
+   if (WARN_ON_ONCE(!mem))
+   continue;
mem->section_count = 0;
unregister_memory(mem);
}
}
-   mutex_unlock(_sysfs_mutex);
return ret;
 }
 
@@ -739,6 +738,8 @@ int create_memory_block_devices(unsigned long start, 
unsigned long size)
  * Remove memory block devices for the given memory area. Start and size
  * have to be aligned to memory block granularity. Memory block devices
  * have to be offline.
+ *
+ * Called under device_hotplug_lock.
  */
 void remove_memory_block_devices(unsigned long start, unsigned long size)
 {
@@ -751,7 +752,6 @@ void remove_memory_block_devices(unsigned long start, 
unsigned long size)
 !IS_ALIGNED(size, memory_block_size_bytes(
return;
 
-   mutex_lock(_sysfs_mutex);
for (block_id = start_block_id; block_id != end_block_id; block_id++) {
mem = find_memory_block_by_id(block_id);
if (WARN_ON_ONCE(!mem))
@@ -760,7 +760,6 @@ void remove_memory_block_devices(unsigned long start, 
unsigned long size)
unregister_memory_block_under_nodes(mem);
unregister_memory(mem);
}
-   mutex_unlock(_sysfs_mutex);
 }
 
 /* return true if the memory block is offlined, otherwise, return false */
@@ -794,12 +793,13 @@ static const struct attribute_group 
*memory_root_attr_groups[] = {
 };
 
 /*
- * Initialize the sysfs support for memory 

Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely

2019-09-25 Thread Peter Zijlstra
On Tue, Sep 24, 2019 at 05:24:58PM -0600, Yu Zhao wrote:
> We don't want to expose a non-hugetlb page to the fast gup running
> on a remote CPU before all local non-atomic ops on the page flags
> are visible first.
> 
> For an anon page that isn't in swap cache, we need to make sure all
> prior non-atomic ops, especially __SetPageSwapBacked() in
> page_add_new_anon_rmap(), are ordered before set_pte_at() to prevent
> the following race:
> 
>   CPU 1   CPU1
>   set_pte_at()get_user_pages_fast()
> page_add_new_anon_rmap()gup_pte_range()
> __SetPageSwapBacked() SetPageReferenced()
> 
> This demonstrates a non-fatal scenario. Though haven't been directly
> observed, the fatal ones can exist, e.g., PG_lock set by fast gup
> caller and then overwritten by __SetPageSwapBacked().
> 
> For an anon page that is already in swap cache or a file page, we
> don't need smp_wmb() before set_pte_at() because adding to swap or
> file cach serves as a valid write barrier. Using non-atomic ops
> thereafter is a bug, obviously.
> 
> smp_wmb() is added following 11 of total 12 page_add_new_anon_rmap()
> call sites, with the only exception being
> do_huge_pmd_wp_page_fallback() because of an existing smp_wmb().
> 

I'm thinking this patch make stuff rather fragile.. Should we instead
stick the barrier in set_p*d_at() instead? Or rather, make that store a
store-release?




Re: [PATCH v6 3/7] media: add V4L2_CTRL_TYPE_AREA control type

2019-09-25 Thread Jacopo Mondi
Hi Ricardo,

On Fri, Sep 20, 2019 at 03:51:33PM +0200, Ricardo Ribalda Delgado wrote:
> This type contains the width and the height of a rectangular area.
>
> Signed-off-by: Ricardo Ribalda Delgado 
> ---
>  drivers/media/v4l2-core/v4l2-ctrls.c | 21 ++
>  include/media/v4l2-ctrls.h   | 41 
>  include/uapi/linux/videodev2.h   |  6 
>  3 files changed, 68 insertions(+)
>
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
> b/drivers/media/v4l2-core/v4l2-ctrls.c
> index 219d8aeefa20..b9a46f536406 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls.c
> @@ -1678,6 +1678,7 @@ static int std_validate_compound(const struct v4l2_ctrl 
> *ctrl, u32 idx,
>   struct v4l2_ctrl_mpeg2_slice_params *p_mpeg2_slice_params;
>   struct v4l2_ctrl_vp8_frame_header *p_vp8_frame_header;
>   void *p = ptr.p + idx * ctrl->elem_size;
> + struct v4l2_area *area;
>
>   switch ((u32)ctrl->type) {
>   case V4L2_CTRL_TYPE_MPEG2_SLICE_PARAMS:
> @@ -1753,6 +1754,11 @@ static int std_validate_compound(const struct 
> v4l2_ctrl *ctrl, u32 idx,
>   zero_padding(p_vp8_frame_header->entropy_header);
>   zero_padding(p_vp8_frame_header->coder_state);
>   break;
> + case V4L2_CTRL_TYPE_AREA:
> + area = p;
> + if (!area->width || !area->height)
> + return -EINVAL;
> + break;
>   default:
>   return -EINVAL;
>   }
> @@ -2427,6 +2433,9 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
> v4l2_ctrl_handler *hdl,
>   case V4L2_CTRL_TYPE_VP8_FRAME_HEADER:
>   elem_size = sizeof(struct v4l2_ctrl_vp8_frame_header);
>   break;
> + case V4L2_CTRL_TYPE_AREA:
> + elem_size = sizeof(struct v4l2_area);
> + break;
>   default:
>   if (type < V4L2_CTRL_COMPOUND_TYPES)
>   elem_size = sizeof(s32);
> @@ -4116,6 +4125,18 @@ int __v4l2_ctrl_s_ctrl_string(struct v4l2_ctrl *ctrl, 
> const char *s)
>  }
>  EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl_string);
>
> +int __v4l2_ctrl_s_ctrl_area(struct v4l2_ctrl *ctrl,
> + const struct v4l2_area *area)
> +{
> + lockdep_assert_held(ctrl->handler->lock);
> +
> + /* It's a driver bug if this happens. */
> + WARN_ON(ctrl->type != V4L2_CTRL_TYPE_AREA);
> + memcpy(ctrl->p_new.p_area, area, sizeof(*area));
> + return set_ctrl(NULL, ctrl, 0);
> +}
> +EXPORT_SYMBOL(__v4l2_ctrl_s_ctrl_area);
> +
>  void v4l2_ctrl_request_complete(struct media_request *req,
>   struct v4l2_ctrl_handler *main_hdl)
>  {
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index 4b356df850a1..746969559ef3 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -50,6 +50,7 @@ struct poll_table_struct;
>   * @p_h264_slice_params: Pointer to a struct v4l2_ctrl_h264_slice_params.
>   * @p_h264_decode_params:Pointer to a struct 
> v4l2_ctrl_h264_decode_params.
>   * @p_vp8_frame_header:  Pointer to a VP8 frame header structure.
> + * @p_area:  Pointer to an area.
>   * @p:   Pointer to a compound value.
>   */
>  union v4l2_ctrl_ptr {
> @@ -68,6 +69,7 @@ union v4l2_ctrl_ptr {
>   struct v4l2_ctrl_h264_slice_params *p_h264_slice_params;
>   struct v4l2_ctrl_h264_decode_params *p_h264_decode_params;
>   struct v4l2_ctrl_vp8_frame_header *p_vp8_frame_header;
> + struct v4l2_area *p_area;
>   void *p;
>  };
>
> @@ -1085,6 +1087,45 @@ static inline int v4l2_ctrl_s_ctrl_string(struct 
> v4l2_ctrl *ctrl, const char *s)
>   return rval;
>  }
>
> +/**
> + * __v4l2_ctrl_s_ctrl_area() - Unlocked variant of v4l2_ctrl_s_ctrl_area().
> + *
> + * @ctrl:The control.
> + * @area:The new area.
> + *
> + * This sets the control's new area safely by going through the control
> + * framework. This function assumes the control's handler is already locked,
> + * allowing it to be used from within the _ctrl_ops functions.
> + *
> + * This function is for area type controls only.
> + */
> +int __v4l2_ctrl_s_ctrl_area(struct v4l2_ctrl *ctrl,
> + const struct v4l2_area *area);
> +
> +/**
> + * v4l2_ctrl_s_ctrl_area() - Helper function to set a control's area value
> + *from within a driver.
> + *
> + * @ctrl:The control.
> + * @s:   The new area.
> + *
> + * This sets the control's new area safely by going through the control
> + * framework. This function will lock the control's handler, so it cannot be
> + * used from within the _ctrl_ops functions.
> + *
> + * This function is for area type controls only.
> + */
> +static inline int v4l2_ctrl_s_ctrl_area(struct v4l2_ctrl *ctrl,
> + const struct v4l2_area *area)
> +{
> + int rval;
> +
> + v4l2_ctrl_lock(ctrl);
> + 

WireGuard to port to existing Crypto API

2019-09-25 Thread Jason A. Donenfeld
Hi folks,

I'm at the Kernel Recipes conference now and got a chance to talk with
DaveM a bit about WireGuard upstreaming. His viewpoint has recently
solidified: in order to go upstream, WireGuard must port to the
existing crypto API, and handle the Zinc project separately. As DaveM
is the upstream network tree maintainer, his opinion is quite
instructive.

I've long resisted the idea of porting to the existing crypto API,
because I think there are serious problems with it, in terms of
primitives, API, performance, and overall safety. I didn't want to
ship WireGuard in a form that I thought was sub-optimal from a
security perspective, since WireGuard is a security-focused project.

But it seems like with or without us, WireGuard will get ported to the
existing crypto API. So it's probably better that we just fully
embrace it, and afterwards work evolutionarily to get Zinc into Linux
piecemeal. I've ported WireGuard already several times as a PoC to the
API and have a decent idea of the ways it can go wrong and generally
how to do it in the least-bad way.

I realize this kind of compromise might come as a disappointment for
some folks. But it's probably better that as a project we remain
intimately involved with our Linux kernel users and the security of
the implementation, rather than slinking away in protest because we
couldn't get it all in at once. So we'll work with upstream, port to
the crypto API, and get the process moving again. We'll pick up the
Zinc work after that's done.

I also understand there might be interested folks out there who enjoy
working with the crypto API quite a bit and would be happy to work on
the WireGuard port. Please do get in touch if you'd like to
collaborate.

Jason


[PATCH] Documentation/process: Clarify disclosure rules

2019-09-25 Thread Thomas Gleixner
The role of the contact list provided by the disclosing party and how it
affects the disclosure process and the ability to include experts into
the development process is not really well explained.

Neither is it entirely clear when the disclosing party will be informed
about the fact that a developer who is not covered by an employer NDA needs
to be brought in and disclosed.

Explain the role of the contact list and the information policy along with
an eventual conflict resolution better.

Reported-by: Dave Hansen 
Signed-off-by: Thomas Gleixner 
---
 Documentation/process/embargoed-hardware-issues.rst |   40 
 1 file changed, 33 insertions(+), 7 deletions(-)

--- a/Documentation/process/embargoed-hardware-issues.rst
+++ b/Documentation/process/embargoed-hardware-issues.rst
@@ -143,6 +143,20 @@ via their employer, they cannot enter in
 in their role as Linux kernel developers. They will, however, agree to
 adhere to this documented process and the Memorandum of Understanding.
 
+The disclosing party should provide a list of contacts for all other
+entities who have already been, or should be, informed about the issue.
+This serves several purposes:
+
+ - The list of disclosed entities allows communication accross the
+   industry, e.g. other OS vendors, HW vendors, etc.
+
+ - The disclosed entities can be contacted to name experts who should
+   participate in the mitigation development.
+
+ - If an expert which is required to handle an issue is employed by an
+   listed entity or member of an listed entity, then the response teams can
+   request the disclosure of that expert from that entity. This ensures
+   that the expert is also part of the entity's response team.
 
 Disclosure
 ""
@@ -158,10 +172,7 @@ Mitigation development
 ""
 
 The initial response team sets up an encrypted mailing-list or repurposes
-an existing one if appropriate. The disclosing party should provide a list
-of contacts for all other parties who have already been, or should be,
-informed about the issue. The response team contacts these parties so they
-can name experts who should be subscribed to the mailing-list.
+an existing one if appropriate.
 
 Using a mailing-list is close to the normal Linux development process and
 has been successfully used in developing mitigations for various hardware
@@ -175,9 +186,24 @@ development branch against the mainline
 stable kernel versions as necessary.
 
 The initial response team will identify further experts from the Linux
-kernel developer community as needed and inform the disclosing party about
-their participation. Bringing in experts can happen at any time of the
-development process and often needs to be handled in a timely manner.
+kernel developer community as needed. Bringing in experts can happen at any
+time of the development process and needs to be handled in a timely manner.
+
+If an expert is employed by or member of an entity on the disclosure list
+provided by the disclosing party, then participation will be requested from
+the relevant entity.
+
+If not, then the disclosing party will be informed about the experts
+participation. The experts are covered by the Memorandum of Understanding
+and the disclosing party is requested to acknowledge the participation. In
+case that the disclosing party has a compelling reason to object, then this
+objection has to be raised within five work days and resolved with the
+incident team immediately. If the disclosing party does not react within
+five work days this is taken as silent acknowledgement.
+
+After acknowledgement or resolution of an objection the expert is disclosed
+by the incident team and brought into the development process.
+
 
 Coordinated release
 """


Re: [PATCH v6 4/7] Documentation: media: Document V4L2_CTRL_TYPE_AREA

2019-09-25 Thread Jacopo Mondi
Hi Ricardo,

On Fri, Sep 20, 2019 at 03:51:34PM +0200, Ricardo Ribalda Delgado wrote:
> From: Ricardo Ribalda Delgado 
>
> A struct v4l2_area containing the width and the height of a rectangular
> area.
>
> Reviewed-by: Philipp Zabel 
> Signed-off-by: Ricardo Ribalda Delgado 
> ---
>  Documentation/media/uapi/v4l/vidioc-queryctrl.rst | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst 
> b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> index a3d56ffbf4cc..33aff21b7d11 100644
> --- a/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> +++ b/Documentation/media/uapi/v4l/vidioc-queryctrl.rst
> @@ -443,6 +443,12 @@ See also the examples in :ref:`control`.
>- n/a
>- A struct :c:type:`v4l2_ctrl_mpeg2_quantization`, containing MPEG-2
>   quantization matrices for stateless video decoders.
> +* - ``V4L2_CTRL_TYPE_AREA``
> +  - n/a
> +  - n/a
> +  - n/a
> +  - A struct :c:type:`v4l2_area`, containing the width and the height
> +of a rectangular area. Units depend on the use case.

I recall Hans too was in favour of having min, max and step defined
(and applied to both width and height).

Really a minor issue from my side, feel free to keep it the way it is
Reviewed-by: Jacopo Mondi 

Thanks
   j
>  * - ``V4L2_CTRL_TYPE_H264_SPS``
>- n/a
>- n/a
> --
> 2.23.0
>


signature.asc
Description: PGP signature


Re: [PATCH 0/5] hugetlbfs: Disable PMD sharing for large systems

2019-09-25 Thread Peter Zijlstra
On Fri, Sep 13, 2019 at 11:50:43AM +1000, Dave Chinner wrote:
> On Wed, Sep 11, 2019 at 04:05:32PM +0100, Waiman Long wrote:
> > A customer with large SMP systems (up to 16 sockets) with application
> > that uses large amount of static hugepages (~500-1500GB) are experiencing
> > random multisecond delays. These delays was caused by the long time it
> > took to scan the VMA interval tree with mmap_sem held.
> > 
> > To fix this problem while perserving existing behavior as much as
> > possible, we need to allow timeout in down_write() and disabling PMD
> > sharing when it is taking too long to do so. Since a transaction can
> > involving touching multiple huge pages, timing out for each of the huge
> > page interactions does not completely solve the problem. So a threshold
> > is set to completely disable PMD sharing if too many timeouts happen.
> > 
> > The first 4 patches of this 5-patch series adds a new
> > down_write_timedlock() API which accepts a timeout argument and return
> > true is locking is successful or false otherwise. It works more or less
> > than a down_write_trylock() but the calling thread may sleep.
> 
> Just on general principle, this is a non-starter. If a lock is being
> held too long, then whatever the lock is protecting needs fixing.
> Adding timeouts to locks and sysctls to tune them is not a viable
> solution to address latencies caused by algorithm scalability
> issues.

I'm very much agreeing here. Lock functions with timeouts are a sign of
horrific design.


Re: [PATCH] tty:vt: Add check the return value of kzalloc to avoid oops

2019-09-25 Thread Xiaoming Ni



On 2019/9/23 11:50, Nicolas Pitre wrote:
> On Sat, 21 Sep 2019, Xiaoming Ni wrote:
> 
>> @ Nicolas Pitre
>> Can I make a v2 patch based on your advice ?
>> Or you will submit a patch for "GFP_WONTFAIL" yourself ?
> 
> Here's a patch implementing what I had in mind. This is compile tested 
> only.
> 
> - >8
> 
> Subject: [PATCH] mm: add __GFP_WONTFAIL and GFP_ONBOOT
> 
> Some memory allocations are very unlikely to fail during system boot.
> Because of that, the code often doesn't bother to check for allocation
> failure, but this gets reported anyway.
> 
> As an alternative to adding code to check for NULL that has almost no
> chance of ever being exercised, let's use a GFP flag to identify those
> cases and panic the kernel if allocation failure ever occurs.
> 
> Conversion of one such instance is also included.
> 
> Signed-off-by: Nicolas Pitre 
> 
.


>  /**
> @@ -285,6 +293,9 @@ struct vm_area_struct;
>   * available and will not wake kswapd/kcompactd on failure. The _LIGHT
>   * version does not attempt reclaim/compaction at all and is by default used
>   * in page fault path, while the non-light is used by khugepaged.
> + *
> + * %GFP_ONBOOT is for relatively small allocations that are not expected
> + * to fail while the system is booting.
>   */
>  #define GFP_ATOMIC   (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
>  #define GFP_KERNEL   (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> @@ -300,6 +311,7 @@ struct vm_area_struct;
>  #define GFP_TRANSHUGE_LIGHT  ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
>__GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
>  #define GFP_TRANSHUGE(GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
> +#define GFP_ONBOOT   (GFP_NOWAIT | __GFP_WONTFAIL)
>  

Isn't it better to bind GFP_ONBOOT and GFP_NOWAIT?
Can be not GFP_NOWAIT when applying for memory at boot time

>  /* Convert GFP flags to their corresponding migrate type */
>  #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ff5484fdbd..36dee09f7f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4625,6 +4625,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>  fail:
>   warn_alloc(gfp_mask, ac->nodemask,
>   "page allocation failure: order:%u", order);
> + if (gfp_mask & __GFP_WONTFAIL) {

Is it more intuitive to use __GFP_DIE_IF_FAIL as the flag name?

> + /*
> +  * The assumption was wrong. This is never supposed to happen.
> +  * Caller most likely won't check for a returned NULL either.
> +  * So the only reasonable thing to do is to pannic.
> +  */
> + panic("Failed to allocate memory despite GFP_WONTFAIL\n");
> + }
>  got_pg:
>   return page;
>  }
> 
> .
> 

thanks
Niaoming Ni



Re: [PATCH 02/15] fs: Introduce i_blocks_per_page

2019-09-25 Thread Dave Chinner
On Tue, Sep 24, 2019 at 05:52:01PM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> This helper is useful for both large pages in the page cache and for
> supporting block size larger than page size.  Convert some example
> users (we have a few different ways of writing this idiom).
> 
> Signed-off-by: Matthew Wilcox (Oracle) 

I'm actually working on abstrcting this code from both block size
and page size via the helpers below. We ahve need to support block
size > page size, and so that requires touching a bunch of all the
same code as this patchset. I'm currently trying to combine your
last patch set with my patchset so I can easily test allocating 64k
page cache pages on a 64k block size filesystem on a 4k page size
machine with XFS

/*
 * Return the chunk size we should use for page cache based operations.
 * This supports both large block sizes and variable page sizes based on the
 * restriction that order-n blocks and page cache pages are order-n file offset
 * aligned.
 *
 * This will return the inode block size for block size < page_size(page),
 * otherwise it will return page_size(page).
 */
static inline unsigned
iomap_chunk_size(struct inode *inode, struct page *page)
{
return min_t(unsigned, page_size(page), i_blocksize(inode));
}

static inline unsigned
iomap_chunk_bits(struct inode *inode, struct page *page)
{
return min_t(unsigned, page_shift(page), inode->i_blkbits);
}

static inline unsigned
iomap_chunks_per_page(struct inode *inode, struct page *page)
{
return page_size(page) >> inode->i_blkbits;
}

Basically, the process is to convert the iomap code over to
iterating "chunks" rather than blocks or pages, and then allocate
a struct iomap_page according to the difference between page and
block size

> ---
>  fs/iomap/buffered-io.c  |  4 ++--
>  fs/jfs/jfs_metapage.c   |  2 +-
>  fs/xfs/xfs_aops.c   |  8 
>  include/linux/pagemap.h | 13 +
>  4 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e25901ae3ff4..0e76a4b6d98a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -24,7 +24,7 @@ iomap_page_create(struct inode *inode, struct page *page)
>  {
>   struct iomap_page *iop = to_iomap_page(page);
>  
> - if (iop || i_blocksize(inode) == PAGE_SIZE)
> + if (iop || i_blocks_per_page(inode, page) <= 1)
>   return iop;

That also means checks like these become:

if (iop || iomap_chunks_per_page(inode, page) <= 1)

as a single file can now have multiple pages per block, a page per
block and multiple blocks per page as the page size changes...

I'd like to only have to make one pass over this code to abstract
out page and block sizes, so I'm guessing we'll need to do some
co-ordination here

> @@ -636,4 +636,17 @@ static inline unsigned long dir_pages(struct inode 
> *inode)
>  PAGE_SHIFT;
>  }
>  
> +/**
> + * i_blocks_per_page - How many blocks fit in this page.
> + * @inode: The inode which contains the blocks.
> + * @page: The (potentially large) page.
> + *
> + * Context: Any context.
> + * Return: The number of filesystem blocks covered by this page.
> + */
> +static inline
> +unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
> +{
> + return page_size(page) >> inode->i_blkbits;
> +}
>  #endif /* _LINUX_PAGEMAP_H */

It also means that we largely don't need to touch mm headers as
all the helpers end up being iomap specific and private...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v5 3/3] leds: Add control of the voltage/current regulator to the LED core

2019-09-25 Thread Jean-Jacques Hiblot



On 24/09/2019 20:58, Jacek Anaszewski wrote:

Hi Jean,

Thank you for the patch.

I must say I'm not a big fan of this change.
It adds a bunch of code to the LED core and gives small
functionality in a reward.


I disagree. I remember having to tweak DTS in the past to force some 
regulators outputs, and then they would be always turned on. If all 
sub-systems were power-supply/regulator-aware, that kind of hack would 
not be needed.




  It may also influence maximum
software blinking rate, so I'd rather avoid calling
regulator_enable/disable when timer trigger is set.

It will of course require more code.


True. We might be able to mitigate this by delaying turning off the 
regulator; Turning it on would be an immediate action, turning it off 
would be delayed by a configurable amount of ms. That should take care 
of the max blinking rate and reduce the overhead for a LED that changes 
states quite often (like a mass storage indicator)


JJ



Since AFAIR Pavel was original proponent of this change then
I'd like to see his opinion before we move on to discussing
possible improvements to this patch.

Best regards,
Jacek Anaszewski

On 9/23/19 12:20 PM, Jean-Jacques Hiblot wrote:

A LED is usually powered by a voltage/current regulator. Let the LED core
know about it. This allows the LED core to turn on or off the power supply
as needed.

Signed-off-by: Jean-Jacques Hiblot
---
  drivers/leds/led-class.c | 17 +++
  drivers/leds/led-core.c  | 65 ++--
  drivers/leds/leds.h  |  3 ++
  include/linux/leds.h |  5 
  4 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index e11177d77b4c..d122b6982efd 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -352,6 +352,7 @@ int led_classdev_register_ext(struct device *parent,
char final_name[LED_MAX_NAME_SIZE];
const char *proposed_name = composed_name;
int ret;
+   struct regulator *regulator;
  
  	if (init_data) {

if (init_data->devname_mandatory && !init_data->devicename) {
@@ -387,6 +388,22 @@ int led_classdev_register_ext(struct device *parent,
dev_warn(parent, "Led %s renamed to %s due to name collision",
led_cdev->name, dev_name(led_cdev->dev));
  
+	regulator = devm_regulator_get_optional(led_cdev->dev, "power");

+   if (IS_ERR(regulator)) {
+   if (regulator != ERR_PTR(-ENODEV)) {
+   dev_err(led_cdev->dev, "Cannot get the power supply for 
%s\n",
+   led_cdev->name);
+   device_unregister(led_cdev->dev);
+   mutex_unlock(_cdev->led_access);
+   return PTR_ERR(regulator);
+   }
+   led_cdev->regulator = NULL;
+   } else {
+   led_cdev->regulator = regulator;
+   led_cdev->regulator_state = REG_OFF;
+   atomic_set(_cdev->target_regulator_state, REG_UNKNOWN);
+   }
+
if (led_cdev->flags & LED_BRIGHT_HW_CHANGED) {
ret = led_add_brightness_hw_changed(led_cdev);
if (ret) {
diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
index d318f9b0382d..155a158c7b8d 100644
--- a/drivers/leds/led-core.c
+++ b/drivers/leds/led-core.c
@@ -37,6 +37,43 @@ const char * const led_colors[LED_COLOR_ID_MAX] = {
  };
  EXPORT_SYMBOL_GPL(led_colors);
  
+static int __led_handle_regulator(struct led_classdev *led_cdev)

+{
+   int rc;
+   int target_state = led_cdev->delayed_set_value ?  REG_ON : REG_OFF;
+
+   if (!led_cdev->regulator)
+   return 0;
+
+   /*
+* if the current regulator state is not the target state, we
+* need to update it.
+* note: No need for spinlock or atomic here because
+* led_cdev->regulator_state is modified only in the context of
+* the worqueue
+*/
+   if (led_cdev->regulator_state != target_state) {
+
+   if (target_state == REG_ON)
+   rc = regulator_enable(led_cdev->regulator);
+   else
+   rc = regulator_disable(led_cdev->regulator);
+   if (rc) {
+   /*
+* If something went wrong with the regulator update,
+* Make sure that led_set_brightness_nosleep() assume
+* that the regultor is in the right state.
+*/
+   atomic_set(_cdev->target_regulator_state,
+  REG_UNKNOWN);
+   return rc;
+   }
+
+   led_cdev->regulator_state = target_state;
+   }
+   return 0;
+}
+
  static int __led_set_brightness(struct led_classdev *led_cdev,
enum led_brightness value)
  {
@@ -135,6 +172,11 @@ static 

Re: [PATCH] sched/fair: Speed-up energy-aware wake-ups

2019-09-25 Thread Peter Zijlstra
On Fri, Sep 20, 2019 at 01:23:00PM +0200, Quentin Perret wrote:
> On Friday 20 Sep 2019 at 16:03:38 (+0530), Pavan Kondeti wrote:
> > +1. Looks good to me.
> 
> Cool, thanks.
> 
> Peter/Ingo, is there anything else I should do ? Should I resend the
> patch 'properly' (that is, not inline) ?

Got it, thanks!


[PATCH v2 0/5] LogiCVC mfd and GPIO support

2019-09-25 Thread Paul Kocialkowski
This series introduces support for the LogiCVC GPIO block to the syscon GPIO
driver, with dt bindings documentation also including the top-level mfd
component.

Changes since v1:
- Converted dt bindings documentation to dt schemas;
- Used BIT macro and removed version from structure name;
- Improved documentation example with gpio-line-names;
- Added vendor prefix to dt bindings;
- Added mfd component dt bindings documentation.

Cheers,

Paul

Paul Kocialkowski (5):
  dt-bindings: Add Xylon vendor prefix
  dt-bindings: mfd: Document the Xylon LogiCVC multi-function device
  gpio: syscon: Add support for a custom get operation
  dt-bindings: gpio: Document the Xylon LogiCVC GPIO controller
  gpio: syscon: Add support for the Xylon LogiCVC GPIOs

 .../bindings/gpio/xylon,logicvc-gpio.yaml | 70 +
 .../bindings/mfd/xylon,logicvc.yaml   | 60 +++
 .../devicetree/bindings/vendor-prefixes.yaml  |  2 +
 drivers/gpio/gpio-syscon.c| 75 ++-
 4 files changed, 204 insertions(+), 3 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml
 create mode 100644 Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml

-- 
2.23.0



[PATCH v2 3/5] gpio: syscon: Add support for a custom get operation

2019-09-25 Thread Paul Kocialkowski
Some drivers might need a custom get operation to match custom
behavior implemented in the set operation.

Add plumbing for supporting that.

Signed-off-by: Paul Kocialkowski 
---
 drivers/gpio/gpio-syscon.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-syscon.c b/drivers/gpio/gpio-syscon.c
index 31f332074d7d..05c537ed73f1 100644
--- a/drivers/gpio/gpio-syscon.c
+++ b/drivers/gpio/gpio-syscon.c
@@ -43,8 +43,9 @@ struct syscon_gpio_data {
unsigned intbit_count;
unsigned intdat_bit_offset;
unsigned intdir_bit_offset;
-   void(*set)(struct gpio_chip *chip,
-  unsigned offset, int value);
+   int (*get)(struct gpio_chip *chip, unsigned offset);
+   void(*set)(struct gpio_chip *chip, unsigned offset,
+  int value);
 };
 
 struct syscon_gpio_priv {
@@ -252,7 +253,7 @@ static int syscon_gpio_probe(struct platform_device *pdev)
priv->chip.label = dev_name(dev);
priv->chip.base = -1;
priv->chip.ngpio = priv->data->bit_count;
-   priv->chip.get = syscon_gpio_get;
+   priv->chip.get = priv->data->get ? : syscon_gpio_get;
if (priv->data->flags & GPIO_SYSCON_FEAT_IN)
priv->chip.direction_input = syscon_gpio_dir_in;
if (priv->data->flags & GPIO_SYSCON_FEAT_OUT) {
-- 
2.23.0



[PATCH v2 5/5] gpio: syscon: Add support for the Xylon LogiCVC GPIOs

2019-09-25 Thread Paul Kocialkowski
The LogiCVC display hardware block comes with GPIO capabilities
that must be exposed separately from the main driver (as GPIOs) for
use with regulators and panels. A syscon is used to share the same
regmap across the two drivers.

Since the GPIO capabilities are pretty simple, add them to the syscon
GPIO driver.

Signed-off-by: Paul Kocialkowski 
---
 drivers/gpio/gpio-syscon.c | 68 ++
 1 file changed, 68 insertions(+)

diff --git a/drivers/gpio/gpio-syscon.c b/drivers/gpio/gpio-syscon.c
index 05c537ed73f1..cdcb913b8f0c 100644
--- a/drivers/gpio/gpio-syscon.c
+++ b/drivers/gpio/gpio-syscon.c
@@ -190,6 +190,70 @@ static const struct syscon_gpio_data keystone_dsp_gpio = {
.set= keystone_gpio_set,
 };
 
+#define LOGICVC_CTRL_REG   0x40
+#define LOGICVC_CTRL_GPIO_SHIFT11
+#define LOGICVC_CTRL_GPIO_BITS 5
+
+#define LOGICVC_POWER_CTRL_REG 0x78
+#define LOGICVC_POWER_CTRL_GPIO_SHIFT  0
+#define LOGICVC_POWER_CTRL_GPIO_BITS   4
+
+static void logicvc_gpio_offset(struct syscon_gpio_priv *priv,
+   unsigned offset, unsigned int *reg,
+   unsigned int *bit)
+{
+   if (offset >= LOGICVC_CTRL_GPIO_BITS) {
+   *reg = LOGICVC_POWER_CTRL_REG;
+
+   /* To the (virtual) power ctrl offset. */
+   offset -= LOGICVC_CTRL_GPIO_BITS;
+   /* To the actual bit offset in reg. */
+   offset += LOGICVC_POWER_CTRL_GPIO_SHIFT;
+   } else {
+   *reg = LOGICVC_CTRL_REG;
+
+   /* To the actual bit offset in reg. */
+   offset += LOGICVC_CTRL_GPIO_SHIFT;
+   }
+
+   *bit = BIT(offset);
+}
+
+static int logicvc_gpio_get(struct gpio_chip *chip, unsigned offset)
+{
+   struct syscon_gpio_priv *priv = gpiochip_get_data(chip);
+   unsigned int reg;
+   unsigned int bit;
+   unsigned int value;
+   int ret;
+
+   logicvc_gpio_offset(priv, offset, , );
+
+   ret = regmap_read(priv->syscon, reg, );
+   if (ret)
+   return ret;
+
+   return !!(value & bit);
+}
+
+static void logicvc_gpio_set(struct gpio_chip *chip, unsigned offset, int val)
+{
+   struct syscon_gpio_priv *priv = gpiochip_get_data(chip);
+   unsigned int reg;
+   unsigned int bit;
+
+   logicvc_gpio_offset(priv, offset, , );
+
+   regmap_update_bits(priv->syscon, reg, bit, val ? bit : 0);
+}
+
+static const struct syscon_gpio_data logicvc_gpio = {
+   .flags  = GPIO_SYSCON_FEAT_OUT,
+   .bit_count  = LOGICVC_CTRL_GPIO_BITS + LOGICVC_POWER_CTRL_GPIO_BITS,
+   .get= logicvc_gpio_get,
+   .set= logicvc_gpio_set,
+};
+
 static const struct of_device_id syscon_gpio_ids[] = {
{
.compatible = "cirrus,ep7209-mctrl-gpio",
@@ -203,6 +267,10 @@ static const struct of_device_id syscon_gpio_ids[] = {
.compatible = "rockchip,rk3328-grf-gpio",
.data   = _rk3328_gpio_mute,
},
+   {
+   .compatible = "xylon,logicvc-3.02.a-gpio",
+   .data   = _gpio,
+   },
{ }
 };
 MODULE_DEVICE_TABLE(of, syscon_gpio_ids);
-- 
2.23.0



[PATCH v2 2/5] dt-bindings: mfd: Document the Xylon LogiCVC multi-function device

2019-09-25 Thread Paul Kocialkowski
The LogiCVC is a display engine which also exposes GPIO functionality.
For this reason, it is described as a multi-function device that is expected
to provide register access to its children nodes for gpio and display.

Signed-off-by: Paul Kocialkowski 
---
 .../bindings/mfd/xylon,logicvc.yaml   | 60 +++
 1 file changed, 60 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml

diff --git a/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml 
b/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml
new file mode 100644
index ..f7a2a13d7918
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/xylon,logicvc.yaml
@@ -0,0 +1,60 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+# Copyright 2019 Bootlin
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/mfd/xylon,logicvc.yaml#;
+$schema: "http://devicetree.org/meta-schemas/core.yaml#;
+
+title: Xylon LogiCVC multi-function device
+
+maintainers:
+  - Paul Kocialkowski 
+
+description: |
+  The LogiCVC is a display controller that also contains a GPIO controller.
+  As a result, a multi-function device is exposed as parent of the display
+  and GPIO blocks.
+
+properties:
+  compatible:
+items:
+  - enum:
+  - xylon,logicvc-3.02.a
+  - const: syscon
+  - const: simple-mfd
+
+  reg:
+maxItems: 1
+
+select:
+  properties:
+compatible:
+  contains:
+enum:
+  - xylon,logicvc-3.02.a
+
+  required:
+- compatible
+
+required:
+  - compatible
+  - reg
+
+examples:
+  - |
+ {
+  logicvc: logicvc@43c0 {
+compatible = "xylon,logicvc-3.02.a", "syscon", "simple-mfd";
+reg = <0x43c0 0x6000>;
+#address-cells = <1>;
+#size-cells = <1>;
+
+logicvc_gpio: gpio@40 {
+  compatible = "xylon,logicvc-3.02.a-gpio";
+  reg = <0x40 0x40>;
+  ...
+};
+...
+  };
+  ...
+};
-- 
2.23.0



[PATCH v2 1/5] dt-bindings: Add Xylon vendor prefix

2019-09-25 Thread Paul Kocialkowski
Xylon is an electronics company that produces FPGA hardware block designs
optimized for Xilinx FPGAs.

Signed-off-by: Paul Kocialkowski 
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml 
b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index 29dcc6f8a64a..6c9913244e4c 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -1036,6 +1036,8 @@ patternProperties:
 description: Xilinx
   "^xunlong,.*":
 description: Shenzhen Xunlong Software CO.,Limited
+  "^xylon,.*":
+description: Xylon
   "^yones-toptech,.*":
 description: Yones Toptech Co., Ltd.
   "^ysoft,.*":
-- 
2.23.0



[PATCH v2 4/5] dt-bindings: gpio: Document the Xylon LogiCVC GPIO controller

2019-09-25 Thread Paul Kocialkowski
The Xylon LogiCVC display controller exports some GPIOs, which are
exposed as a separate entity.

Signed-off-by: Paul Kocialkowski 
---
 .../bindings/gpio/xylon,logicvc-gpio.yaml | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml

diff --git a/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml 
b/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml
new file mode 100644
index ..1503c922f845
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/xylon,logicvc-gpio.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+# Copyright 2019 Bootlin
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/gpio/xylon,logicvc-gpio.yaml#;
+$schema: "http://devicetree.org/meta-schemas/core.yaml#;
+
+title: Xylon LogiCVC GPIO controller
+
+maintainers:
+  - Paul Kocialkowski 
+
+description: |
+  The LogiCVC GPIO describes the GPIO block included in the LogiCVC display
+  controller. These are meant to be used for controlling display-related
+  signals.
+
+  The controller exposes GPIOs from the display and power control registers,
+  which are mapped by the driver as follows:
+  - GPIO[4:0] (display control) mapped to index 0-4
+  - EN_BLIGHT (power control) mapped to index 5
+  - EN_VDD (power control) mapped to index 6
+  - EN_VEE (power control) mapped to index 7
+  - V_EN (power control) mapped to index 8
+
+properties:
+  $nodename:
+pattern: "^gpio@[0-9a-f]+$"
+
+  compatible:
+enum:
+  - xylon,logicvc-3.02.a-gpio
+
+  reg:
+maxItems: 1
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  gpio-line-names:
+minItems: 1
+maxItems: 9
+
+required:
+  - compatible
+  - reg
+  - "#gpio-cells"
+  - gpio-controller
+
+examples:
+  - |
+ {
+  logicvc: logicvc@43c0 {
+compatible = "xylon,logicvc-3.02.a", "syscon", "simple-mfd";
+reg = <0x43c0 0x6000>;
+...
+
+logicvc_gpio: gpio@40 {
+  compatible = "xylon,logicvc-3.02.a-gpio";
+  reg = <0x40 0x40>;
+  gpio-controller;
+  #gpio-cells = <2>;
+  gpio-lines-names = "GPIO0", "GPIO1", "GPIO2", "GPIO3", "GPIO4",
+ "EN_BLIGHT", "EN_VDD", "EN_VEE", "V_EN";
+};
+  };
+  ...
+};
-- 
2.23.0



[PATCH] staging: exfat: Use kvzalloc() instead of kzalloc() for exfat_sb_info

2019-09-25 Thread jiayeli
From: Jia-Ye Li 

Fix mount failed "Cannot allocate memory".

When the memory gets fragmented, kzalloc() might fail to allocate
physically contiguous pages for the struct exfat_sb_info (its size is
about 34KiB) even the total free memory is enough.
Use kvzalloc() to solve this problem.

Reviewed-by: Ethan Wu 
Signed-off-by: Jia-Ye Li 
---
 drivers/staging/exfat/exfat_super.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/exfat/exfat_super.c 
b/drivers/staging/exfat/exfat_super.c
index 5f6caee819a6..bfad2a6bbcb3 100644
--- a/drivers/staging/exfat/exfat_super.c
+++ b/drivers/staging/exfat/exfat_super.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3450,7 +3451,7 @@ static void exfat_free_super(struct exfat_sb_info *sbi)
kfree(sbi->options.iocharset);
/* mutex_init is in exfat_fill_super function. only for 3.7+ */
mutex_destroy(>s_lock);
-   kfree(sbi);
+   kvfree(sbi);
 }
 
 static void exfat_put_super(struct super_block *sb)
@@ -3845,7 +3846,7 @@ static int exfat_fill_super(struct super_block *sb, void 
*data, int silent)
 * the filesystem, since we're only just about to mount
 * it and have no inodes etc active!
 */
-   sbi = kzalloc(sizeof(struct exfat_sb_info), GFP_KERNEL);
+   sbi = kvzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
mutex_init(>s_lock);
-- 
2.17.1



Re: [PATCH v22 02/24] x86/cpufeatures: x86/msr: Intel SGX Launch Control hardware bits

2019-09-25 Thread Borislav Petkov
On Tue, Sep 24, 2019 at 01:22:10PM -0700, Sean Christopherson wrote:
> The approach we chose (patch 04, which we were discussing) is to disable
> SGX if SGX_LE_WR is not set, i.e. disallow SGX unless the hash MSRs exist
> and are fully writable.

Hmm, so I see

+   if (!(fc & FEATURE_CONTROL_SGX_LE_WR)) {
+   pr_info_once("sgx: The launch control MSRs are not writable\n");
+   goto err_msrs_rdonly;

which clears only X86_FEATURE_SGX_LC but leaves the other three feature
bits set?!

If you'd want to disable SGX then you'd need to jump to the
err_unsupported label and get rid of the err_msrs_rdonly one.

Or am I missing something?

> WRMSR will #GP if FEATURE_CONTROL is locked (bit 0), e.g. attempting to
> set SGX_LE_WR will trap if FEATURE_CONTROL was locked by BIOS.

Right.

> And conversely, the various enable bits in FEATURE_CONTROL don't
> take effect until FEATURE_CONTROL is locked, e.g. the LE hash MSRs
> aren't writable if FEATURE_CONTROL is unlocked, regardless of whether
> SGX_LE_WR is set.

Ok. We want them writable.

> Sadly, because FEATURE_CONTROL must be locked to fully enable SGX, the
> reality is that any BIOS that supports SGX will lock FEATURE_CONTROL.

That's fine. The question is, would OEMs leave the hash MSRs writable?

If, as you say above, we clear SGX feature bit - not only
X86_FEATURE_SGX_LC - when the MSRs are not writable, then we're fine.
Platforms sticking their own hash in there won't be supported but I
guess their aim is not to be supported in Linux anyway.

> That's the status quo today as well since VMX (and SMX/TXT) is also
> enabled via FEATURE_CONTROL.  KVM does have logic to enable VMX and lock
> FEATURE_CONTROL if the MSR isn't locked, but AIUI that exists only to work
> with old BIOSes.
> 
> If we want to support setting and locking FEATURE_CONTROL in the extremely
> unlikely scenario that BIOS left it unlocked, the proper change would be

I wouldn't be too surprised if this happened. BIOS is very inventive.

> One note on Launch Control that isn't covered in the SDM: the LE hash
> MSRs can also be written before SGX is activated.  SGX activation must
> occur before FEATURE_CONTROL is locked, meaning BIOS can set the LE
> hash MSRs to a non-intel and then lock FEATURE_CONTROL with SGX_LE_WR=0.

This is exactly what I'm afraid of. The OEM vendors locking this down.

> Heh, why stop at 4?  12_EBX, 12_1_ECX and 12_1_EDX are effectively feature
> leafs as well, although the kernel can ignore them for the most part.

Yeah, we're mentally prepared for the feature bit space explosion. :)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: WireGuard to port to existing Crypto API

2019-09-25 Thread Toke Høiland-Jørgensen
"Jason A. Donenfeld"  writes:

> Hi folks,
>
> I'm at the Kernel Recipes conference now and got a chance to talk with
> DaveM a bit about WireGuard upstreaming. His viewpoint has recently
> solidified: in order to go upstream, WireGuard must port to the
> existing crypto API, and handle the Zinc project separately. As DaveM
> is the upstream network tree maintainer, his opinion is quite
> instructive.
>
> I've long resisted the idea of porting to the existing crypto API,
> because I think there are serious problems with it, in terms of
> primitives, API, performance, and overall safety. I didn't want to
> ship WireGuard in a form that I thought was sub-optimal from a
> security perspective, since WireGuard is a security-focused project.
>
> But it seems like with or without us, WireGuard will get ported to the
> existing crypto API. So it's probably better that we just fully
> embrace it, and afterwards work evolutionarily to get Zinc into Linux
> piecemeal. I've ported WireGuard already several times as a PoC to the
> API and have a decent idea of the ways it can go wrong and generally
> how to do it in the least-bad way.
>
> I realize this kind of compromise might come as a disappointment for
> some folks. But it's probably better that as a project we remain
> intimately involved with our Linux kernel users and the security of
> the implementation, rather than slinking away in protest because we
> couldn't get it all in at once. So we'll work with upstream, port to
> the crypto API, and get the process moving again. We'll pick up the
> Zinc work after that's done.

On the contrary, kudos on taking the pragmatic route! Much as I have
enjoyed watching your efforts on Zinc, I always thought it was a shame
it had to hold back the upstreaming of WireGuard. So as far as I'm
concerned, doing that separately sounds like the right approach at this
point, and I'll look forward to seeing the patches land :)

-Toke


[PATCH] KVM: vmx: fix a build warning in hv_enable_direct_tlbflush() on i386

2019-09-25 Thread Vitaly Kuznetsov
The following was reported on i386:

  arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
  arch/x86/kvm/vmx/vmx.c:503:10: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]

The particular pr_debug() causing it is more or less useless, let's just
remove it. Also, simplify the condition a little bit.

Reported-by: kbuild test robot 
Signed-off-by: Vitaly Kuznetsov 
---
 arch/x86/kvm/vmx/vmx.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a7c9922e3905..812553b7270f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -495,13 +495,11 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu 
*vcpu)
 * Synthetic VM-Exit is not enabled in current code and so All
 * evmcs in singe VM shares same assist page.
 */
-   if (!*p_hv_pa_pg) {
+   if (!*p_hv_pa_pg)
*p_hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
-   if (!*p_hv_pa_pg)
-   return -ENOMEM;
-   pr_debug("KVM: Hyper-V: allocated PA_PG for %llx\n",
-  (u64)>kvm);
-   }
+
+   if (!*p_hv_pa_pg)
+   return -ENOMEM;
 
evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
 
-- 
2.20.1



[PATCH] can: peakcan: report bus recovery as well

2019-09-25 Thread Jeroen Hofstee
While the state changes are reported when the error counters increase
and decrease, there is no event when the bus recovers and the error
counters decrease again. So add those as well.

Change the state going downward to be ERROR_PASSIVE -> ERROR_WARNING ->
ERROR_ACTIVE instead of directly to ERROR_ACTIVE again.

Signed-off-by: Jeroen Hofstee 
Cc: Stephane Grosjean 
---
 drivers/net/can/usb/peak_usb/pcan_usb.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb.c 
b/drivers/net/can/usb/peak_usb/pcan_usb.c
index 617da295b6c1..dd2a7f529012 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb.c
@@ -436,8 +436,8 @@ static int pcan_usb_decode_error(struct 
pcan_usb_msg_context *mc, u8 n,
}
if ((n & PCAN_USB_ERROR_BUS_LIGHT) == 0) {
/* no error (back to active state) */
-   mc->pdev->dev.can.state = CAN_STATE_ERROR_ACTIVE;
-   return 0;
+   new_state = CAN_STATE_ERROR_ACTIVE;
+   break;
}
break;
 
@@ -460,9 +460,9 @@ static int pcan_usb_decode_error(struct 
pcan_usb_msg_context *mc, u8 n,
}
 
if ((n & PCAN_USB_ERROR_BUS_HEAVY) == 0) {
-   /* no error (back to active state) */
-   mc->pdev->dev.can.state = CAN_STATE_ERROR_ACTIVE;
-   return 0;
+   /* no error (back to warning state) */
+   new_state = CAN_STATE_ERROR_WARNING;
+   break;
}
break;
 
@@ -501,6 +501,11 @@ static int pcan_usb_decode_error(struct 
pcan_usb_msg_context *mc, u8 n,
mc->pdev->dev.can.can_stats.error_warning++;
break;
 
+   case CAN_STATE_ERROR_ACTIVE:
+   cf->can_id |= CAN_ERR_CRTL;
+   cf->data[1] = CAN_ERR_CRTL_ACTIVE;
+   break;
+
default:
/* CAN_STATE_MAX (trick to handle other errors) */
cf->can_id |= CAN_ERR_CRTL;
-- 
2.17.1



with due respect

2019-09-25 Thread Mr Duna Wattara
Dear Friend,

I know that this mail will come to you as a surprise as we have never
met before, but need not to worry as I am contacting you independently
of my investigation and no one is informed of this communication.

I need your urgent assistance in transferring the sum of $11.3million
immediately to your private account.The money has been here in our
Bank lying dormant for years now without anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased
customer (the account owner) who died a long with his supposed NEXT OF
KIN since 16th October 2005. The Banking laws here does not allow such
money to stay more than 14 years, because the money will be recalled
to the Bank treasury account as unclaimed fund.

By indicating your interest I will send you the full details on how
the business will be executed.

Please respond urgently and delete if you are not interested.

Best Regards,
Mr. Duna Wattara.


Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression

2019-09-25 Thread Xing Zhengjun




On 8/30/2019 8:43 AM, Xing Zhengjun wrote:



On 8/7/2019 3:56 PM, Xing Zhengjun wrote:



On 7/24/2019 1:17 PM, Xing Zhengjun wrote:



On 7/12/2019 2:42 PM, Xing Zhengjun wrote:

Hi Trond,

 I attached perf-profile part big changes, hope it is useful for 
analyzing the issue.


Ping...


ping...


ping...


ping...






In testcase: fsmark
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 
3.00GHz with 384G memory

with following parameters:

 iterations: 20x
 nr_threads: 64t
 disk: 1BRD_48G
 fs: xfs
 fs2: nfsv4
 filesize: 4M
 test_size: 80G
 sync_method: fsyncBeforeClose
 cpufreq_governor: performance

test-description: The fsmark is a file system benchmark to test 
synchronous write workloads, for example, mail servers workload.

test-url: https://sourceforge.net/projects/fsmark/

commit:
   e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
   0472e47660 ("SUNRPC: Convert socket page send code to use 
iov_iter()")


e791f8e9380d945e 0472e476604998c127f3c80d291
 ---
  %stddev %change %stddev
  \  |    \
 527.29   -22.6% 407.96    fsmark.files_per_sec
   1.97 ± 11%  +0.9    2.88 ±  4% 
perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 

   0.00    +0.9    0.93 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.11 ± 10%  +0.9    3.05 ±  4% 
perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 

   5.29 ±  2%  +1.2    6.46 ±  7% 
perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 

  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
   0.00    +3.4    3.41 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 

   0.00    +3.4    3.44 ±  4% 
perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 

   0.00    +3.5    3.54 ±  4% 
perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 

   2.30 ±  5%  +3.7    6.02 ±  3% 
perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 

   1.81 ±  4%  +3.8    5.59 ±  4% 
perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 

   1.80 ±  3%  +3.8    5.59 ±  3% 
perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 

   1.73 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 

   1.72 ±  4%  +3.8    5.54 ±  4% 
perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 

   0.00    +5.4    5.42 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 

   0.00    +5.5    5.52 ±  4% 
perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 

   0.00    +5.5    5.53 ±  4% 
perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 

   9.61 ±  5%  +3.1   12.70 ±  2% 
perf-profile.children.cycles-pp.worker_thread
   9.27 ±  5%  +3.1   12.40 ±  2% 
perf-profile.children.cycles-pp.process_one_work
   6.19    +3.2    9.40 ±  4% 
perf-profile.children.cycles-pp.memcpy_erms
  34.53 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.ret_from_fork
  34.52 ±  4%  +3.3   37.78 ±  2% 
perf-profile.children.cycles-pp.kthread
   0.00    +3.5    3.46 ±  4% 
perf-profile.children.cycles-pp.memcpy_from_page
   0.00    +3.6    3.56 ±  4% 
perf-profile.children.cycles-pp._copy_from_iter_full
   2.47 ±  4%  +3.7    6.18 ±  3% 

Re: [PATCH v2] netfilter: use __u8 instead of uint8_t in uapi header

2019-09-25 Thread Pablo Neira Ayuso
On Tue, Sep 24, 2019 at 07:40:06AM +0900, Masahiro Yamada wrote:
> When CONFIG_UAPI_HEADER_TEST=y, exported headers are compile-tested to
> make sure they can be included from user-space.
> 
> Currently, linux/netfilter_bridge/ebtables.h is excluded from the test
> coverage. To make it join the compile-test, we need to fix the build
> errors attached below.
> 
> For a case like this, we decided to use __u{8,16,32,64} variable types
> in this discussion:
> 
>   https://lkml.org/lkml/2019/6/5/18
> 
> Build log:
> 
>   CC  usr/include/linux/netfilter_bridge/ebtables.h.s
> In file included from :32:0:
> ./usr/include/linux/netfilter_bridge/ebtables.h:126:4: error: unknown type 
> name ‘uint8_t’
> uint8_t revision;
> ^~~
> ./usr/include/linux/netfilter_bridge/ebtables.h:139:4: error: unknown type 
> name ‘uint8_t’
> uint8_t revision;
> ^~~
> ./usr/include/linux/netfilter_bridge/ebtables.h:152:4: error: unknown type 
> name ‘uint8_t’
> uint8_t revision;
> ^~~

Applied.


RE: [PATCH] KVM: vmx: fix a build warning in hv_enable_direct_tlbflush() on i386

2019-09-25 Thread Tianyu Lan
There is another warning in the report.

arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
arch/x86/kvm/vmx/vmx.c:507:20: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]
  evmcs->hv_vm_id = (u64)vcpu->kvm;
^
The following change can fix it.
-   evmcs->hv_vm_id = (u64)vcpu->kvm;
+   evmcs->hv_vm_id = (unsigned long)vcpu->kvm;
evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;

-Original Message-
From: Vitaly Kuznetsov  
Sent: Wednesday, September 25, 2019 4:53 PM
To: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini ; Radim 
Krčmář ; Sean Christopherson 
; Jim Mattson ; Tianyu 
Lan 
Subject: [PATCH] KVM: vmx: fix a build warning in hv_enable_direct_tlbflush() 
on i386

The following was reported on i386:

  arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
  arch/x86/kvm/vmx/vmx.c:503:10: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]

The particular pr_debug() causing it is more or less useless, let's just remove 
it. Also, simplify the condition a little bit.

Reported-by: kbuild test robot 
Signed-off-by: Vitaly Kuznetsov 
---
 arch/x86/kvm/vmx/vmx.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
a7c9922e3905..812553b7270f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -495,13 +495,11 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu 
*vcpu)
 * Synthetic VM-Exit is not enabled in current code and so All
 * evmcs in singe VM shares same assist page.
 */
-   if (!*p_hv_pa_pg) {
+   if (!*p_hv_pa_pg)
*p_hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
-   if (!*p_hv_pa_pg)
-   return -ENOMEM;
-   pr_debug("KVM: Hyper-V: allocated PA_PG for %llx\n",
-  (u64)>kvm);
-   }
+
+   if (!*p_hv_pa_pg)
+   return -ENOMEM;
 
evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
 
--
2.20.1



[PATCH v2 0/2] spi: Fix problem with interrupted slave transmission

2019-09-25 Thread Lukasz Majewski
This patch series fixes problem with recovering Vybrid's NXP DSPI
controller state after master transmission distortion.

It was tested with a setup where /dev/spidevX.Y devices were used in
a loopback mode (provided by proper HW connections). During this test
the distortion was introduced and the system was assessed if it is
possible to continue correct SPI transmission.

This series applies clearly on v5.2 (tag) and current mainline:
SHA1: 351c8a09b00b5c51c8f58b016fffe51f87e2d820

Lukasz Majewski (2):
  spi: Add call to spi_slave_abort() function when spidev driver is
released
  spi: Introduce dspi_slave_abort() function for NXP's dspi SPI driver

 drivers/spi/spi-fsl-dspi.c | 20 
 drivers/spi/spidev.c   |  3 +++
 2 files changed, 23 insertions(+)

-- 
2.20.1



[PATCH v2 2/2] spi: Introduce dspi_slave_abort() function for NXP's dspi SPI driver

2019-09-25 Thread Lukasz Majewski
This change provides the dspi_slave_abort() function, which is a callback
for slave_abort() method of SPI controller generic driver.

As in the SPI slave mode the transmission is driven by master, any
distortion may cause the slave to enter undefined internal state.
To avoid this problem the dspi_slave_abort() terminates all pending and
ongoing DMA transactions (with sync) and clears internal FIFOs.

Signed-off-by: Lukasz Majewski 

---
Changes for v2:
- None
---
 drivers/spi/spi-fsl-dspi.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index bec758e978fb..2c0f211eed87 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -1006,6 +1006,25 @@ static void dspi_init(struct fsl_dspi *dspi)
 SPI_CTARE_FMSZE(0) | SPI_CTARE_DTCP(1));
 }
 
+static int dspi_slave_abort(struct spi_master *master)
+{
+   struct fsl_dspi *dspi = spi_master_get_devdata(master);
+
+   /*
+* Terminate all pending DMA transactions for the SPI working
+* in SLAVE mode.
+*/
+   dmaengine_terminate_sync(dspi->dma->chan_rx);
+   dmaengine_terminate_sync(dspi->dma->chan_tx);
+
+   /* Clear the internal DSPI RX and TX FIFO buffers */
+   regmap_update_bits(dspi->regmap, SPI_MCR,
+  SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF,
+  SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF);
+
+   return 0;
+}
+
 static int dspi_probe(struct platform_device *pdev)
 {
struct device_node *np = pdev->dev.of_node;
@@ -1030,6 +1049,7 @@ static int dspi_probe(struct platform_device *pdev)
ctlr->dev.of_node = pdev->dev.of_node;
 
ctlr->cleanup = dspi_cleanup;
+   ctlr->slave_abort = dspi_slave_abort;
ctlr->mode_bits = SPI_CPOL | SPI_CPHA | SPI_LSB_FIRST;
 
pdata = dev_get_platdata(>dev);
-- 
2.20.1



[PATCH v2 1/2] spi: Add call to spi_slave_abort() function when spidev driver is released

2019-09-25 Thread Lukasz Majewski
This change is necessary for spidev devices (e.g. /dev/spidev3.0) working
in the slave mode (like NXP's dspi driver for Vybrid SoC).

When SPI HW works in this mode - the master is responsible for providing
CS and CLK signals. However, when some fault happens - like for example
distortion on SPI lines - the SPI Linux driver needs a chance to recover
from this abnormal situation and prepare itself for next (correct)
transmission.

This change doesn't pose any threat on drivers working in master mode as
spi_slave_abort() function checks if SPI slave mode is supported.

Signed-off-by: Lukasz Majewski 
Link: https://lore.kernel.org/r/20190924110547.14770-2-lu...@denx.de
Signed-off-by: Mark Brown 
Reported-by: kbuild test robot 

---
Changes for v2:
- Add #ifdef CONFIG_SPI_SLAVE to not compile in spi_slave_abort() on systems
  which are only using master mode.
  (The spi_slave_abort() is 'protected' by this Kconfig define in drivers/spi.c
  file).
  This error was spotted by kbuild test robot from Intel.
---
 drivers/spi/spidev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 255786f2e844..3ea9d8a3e6e8 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -627,6 +627,9 @@ static int spidev_release(struct inode *inode, struct file 
*filp)
if (dofree)
kfree(spidev);
}
+#ifdef CONFIG_SPI_SLAVE
+   spi_slave_abort(spidev->spi);
+#endif
mutex_unlock(_list_lock);
 
return 0;
-- 
2.20.1



RE: [PATCH v2 00/13] vfio_pci: wrap pci device as a mediated device

2019-09-25 Thread Liu, Yi L
Hi Alex,

Any comments on it? :-) With this version, the vfio-mdev-pci driver
could work with non-singleton groups, also it works with vfio-pci
as well.

Regards,
Yi Liu

> From: Liu, Yi L
> Sent: Thursday, September 5, 2019 3:59 PM
> To: alex.william...@redhat.com; kwankh...@nvidia.com
> Subject: [PATCH v2 00/13] vfio_pci: wrap pci device as a mediated device
> 
> This patchset aims to add a vfio-pci-like meta driver as a demo
> user of the vfio changes introduced in "vfio/mdev: IOMMU aware
> mediated device" patchset from Baolu Lu. Besides the test purpose,
> per Alex's comments, it could also be a good base driver for
> experimenting with device specific mdev migration.
> 
> Specific interface tested in this proposal:
>  *) int mdev_set_iommu_device(struct device *dev,
>   struct device *iommu_device)
> introduced in the patch as below:
> "[PATCH v5 6/8] vfio/mdev: Add iommu related member in mdev_device"
> 
> Patch Overview:
>  *) patch 1 ~ 7: code refactor for existing vfio-pci module
>  move the common codes from vfio_pci.c to
>  vfio_pci_common.c
>  *) patch 8: add protection to perm_bits alloc/free
>  *) patch 9: add vfio-mdev-pci sample driver
>  *) patch 10: refine the sample driver
>  *) patch 11 - 13: make the sample driver work for non-singleton groups
>also work for vfio-pci and vfio-mdev-pci mixed usage
>includes vfio-mdev-pci driver change and vfio_iommu_type1
>changes.
> 
> Links:
>  *) Link of "vfio/mdev: IOMMU aware mediated device"
>  https://lwn.net/Articles/780522/
>  *) Previous versions:
>  RFC v1: https://lkml.org/lkml/2019/3/4/529
>  RFC v2: https://lkml.org/lkml/2019/3/13/113
>  RFC v3: https://lkml.org/lkml/2019/4/24/495
>  Patch v1: https://www.spinics.net/lists/kvm/msg188952.html
>  *) may try it with the codes in below repo
> current version is branch "v5.3-rc7-pci-mdev":
>  https://github.com/luxis1999/vfio-mdev-pci-sample-driver.git
> 
> Test done on two NICs which share an iommu group.
> 
> ---
>  |  NIC0   |  NIC1  | vIOMMU  | VMs  | Passtrhu
> ---
>   Test#0 |  vfio-pci   |  vfio-pci  | no  |  1   | pass
> ---
>   Test#1 |  vfio-pci   |  vfio-pci  | no  |  2   | fail[1]
> ---
>   Test#2 |  vfio-pci   |  vfio-pci  | yes |  1   | fail[2]
> ---
>   Test#3 |  vfio-pci-mdev  |  vfio-pci  | no  |  1   | pass
> ---
>   Test#4 |  vfio-pci-mdev  |  vfio-pci  | no  |  2   | fail[3]
> ---
>   Test#5 |  vfio-pci-mdev  |  vfio-pci  | yes |  1   | fail[4]
> ---
> Tips:
> [1] qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,addr=0x6:
>  vfio :01:00.1: failed to open /dev/vfio/1: Device or resource busy
> [2] qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,addr=0x6:
>  vfio :01:00.1: group 1 used in multiple address spaces
> [3] qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,addr=0x6:
>  vfio :01:00.1: failed to setup container for group 1: Failed to set
>  iommu for container: Device or resource busy
> [4] qemu-system-x86_64: -device vfio-pci,host=01:00.1,id=hostdev0,addr=0x6:
>  vfio :01:00.1: failed to setup container for group 1: Failed to set
>  iommu for container: Device or resource busy
> Or
> qemu-system-x86_64: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/
>  83b8f4f2-509f-382f-3c1e-e6bfe0fa1003: vfio 83b8f4f2-509f-382f-3c1e-
>  e6bfe0fa1003: failed to setup container for group 11: Failed to set iommu
>  for container: Device or resource busy
> Some other tests are not listed. Like bind NIC0 to vfio-pci-mdev and try to
> passthru it with "vfio-pci,host=01:00.0", kernel will throw a warn log and
> fail the operation.
> 
> Please feel free give your comments.
> 
> Thanks,
> Yi Liu
> 
> Change log:
>   patch v1 -> patch v2:
>   - the sample driver implementation refined
>   - the sample driver can work on non-singleton iommu groups
>   - the sample driver can work with vfio-pci, devices from a non-singleton
> group can either be bound to vfio-mdev-pci or vfio-pci, and the
> assignment of this group still follows current vfio assignment rule.
> 
>   RFC v3 -> patch v1:
>   - split the patchset from 3 patches to 9 patches to better demonstrate
> the changes step by step
> 
>   v2->v3:
>   - use vfio-mdev-pci instead of vfio-pci-mdev

Re: [RFC PATCH] interconnect: Replace of_icc_get() with icc_get() and reduce DT binding

2019-09-25 Thread Maxime Ripard
Hi Stephen,

On Tue, Sep 24, 2019 at 10:41:33PM -0700, Stephen Boyd wrote:
> The DT binding could also be simplified somewhat. Currently a path needs
> to be specified in DT for each and every use case that is possible for a
> device to want. Typically the path is to memory, which looks to be
> reserved for in the binding with the "dma-mem" named path, but sometimes
> the path is from a device to the CPU or more generically from a device
> to another device which could be a CPU, cache, DMA master, or another
> device if some sort of DMA to DMA scenario is happening. Let's remove
> the pair part of the binding so that we just list out a device's
> possible endpoints on the bus or busses that it's connected to.
>
> If the kernel wants to figure out what the path is to memory or the CPU
> or a cache or something else it should be able to do that by finding the
> node for the "destination" endpoint, extracting that node's
> "interconnects" property, and deriving the path in software. For
> example, we shouldn't need to write out each use case path by path in DT
> for each endpoint node that wants to set a bandwidth to memory. We
> should just be able to indicate what endpoint(s) a device sits on based
> on the interconnect provider in the system and then walk the various
> interconnects to find the path from that source endpoint to the
> destination endpoint.

The dma-mem name is used by the OF core to adjust the mapping of the
devices as well. So, any solution needs to be generic (or provide a
generic helper).

Maxime


signature.asc
Description: PGP signature


Re: [PATCH v6 7/7] media: imx214: Add new control with V4L2_CID_UNIT_CELL_SIZE

2019-09-25 Thread Jacopo Mondi
Hi Ricardo,

On Fri, Sep 20, 2019 at 03:51:37PM +0200, Ricardo Ribalda Delgado wrote:
> From: Ricardo Ribalda Delgado 
>
> According to the product brief, the unit cell size is 1120 nanometers^2.
>
> https://www.sony-semicon.co.jp/products_en/IS/sensor1/img/products/ProductBrief_IMX214_20150428.pdf
>
> Signed-off-by: Ricardo Ribalda Delgado 
> ---
>  drivers/media/i2c/imx214.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c
> index 159a3a604f0e..57562e20c4ca 100644
> --- a/drivers/media/i2c/imx214.c
> +++ b/drivers/media/i2c/imx214.c
> @@ -47,6 +47,7 @@ struct imx214 {
>   struct v4l2_ctrl *pixel_rate;
>   struct v4l2_ctrl *link_freq;
>   struct v4l2_ctrl *exposure;
> + struct v4l2_ctrl *unit_size;
>
>   struct regulator_bulk_data  supplies[IMX214_NUM_SUPPLIES];
>
> @@ -948,6 +949,13 @@ static int imx214_probe(struct i2c_client *client)
>   static const s64 link_freq[] = {
>   IMX214_DEFAULT_LINK_FREQ,
>   };
> + struct v4l2_area unit_size = {
> + .width = 1120,
> + .height = 1120,
> + };
> + union v4l2_ctrl_ptr p_def = {
> + .p_area = _size,
> + };
>   int ret;
>
>   ret = imx214_parse_fwnode(dev);
> @@ -1029,6 +1037,10 @@ static int imx214_probe(struct i2c_client *client)
>V4L2_CID_EXPOSURE,
>0, 3184, 1, 0x0c70);
>
> + imx214->unit_size = v4l2_ctrl_new_std_compound(>ctrls,
> +NULL,
> +V4L2_CID_UNIT_CELL_SIZE,
> +p_def);
>   ret = imx214->ctrls.error;
>   if (ret) {
>   dev_err(>dev, "%s control init failed (%d)\n",

Would something like this scale? I'm a bit bothered by the need of
declaring v4l2_ctrl_ptr structure just to set a field there in
drivers.

diff --git a/drivers/media/i2c/imx214.c b/drivers/media/i2c/imx214.c
index 57562e20c4ca..a00d8fa481c2 100644
--- a/drivers/media/i2c/imx214.c
+++ b/drivers/media/i2c/imx214.c
@@ -953,9 +953,6 @@ static int imx214_probe(struct i2c_client *client)
.width = 1120,
.height = 1120,
};
-   union v4l2_ctrl_ptr p_def = {
-   .p_area = _size,
-   };
int ret;

ret = imx214_parse_fwnode(dev);
@@ -1040,7 +1037,7 @@ static int imx214_probe(struct i2c_client *client)
imx214->unit_size = v4l2_ctrl_new_std_compound(>ctrls,
   NULL,
   V4L2_CID_UNIT_CELL_SIZE,
-  p_def);
+  _size);
ret = imx214->ctrls.error;
if (ret) {
dev_err(>dev, "%s control init failed (%d)\n",
diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index f626f9983408..4a2648bee6f5 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -2681,18 +2681,26 @@ EXPORT_SYMBOL(v4l2_ctrl_new_std_menu_items);
 /* Helper function for standard compound controls */
 struct v4l2_ctrl *v4l2_ctrl_new_std_compound(struct v4l2_ctrl_handler *hdl,
const struct v4l2_ctrl_ops *ops, u32 id,
-   const union v4l2_ctrl_ptr p_def)
+   void *defval)
 {
const char *name;
enum v4l2_ctrl_type type;
u32 flags;
s64 min, max, step, def;
+   union v4l2_ctrl_ptr p_def;

v4l2_ctrl_fill(id, , , , , , , );
if (type < V4L2_CTRL_COMPOUND_TYPES) {
handler_set_err(hdl, -EINVAL);
return NULL;
}
+
+   switch ((u32)type) {
+   case V4L2_CTRL_TYPE_AREA:
+   p_def.p_area = defval;
+   break;
+   }
+
return v4l2_ctrl_new(hdl, ops, NULL, id, name, type,
 min, max, step, def, NULL, 0,
 flags, NULL, NULL, p_def, NULL);

However, due to my limitied understanding of the control framework, I
cannot tell how many cases the newly introduced switch should
handle...

> --
> 2.23.0
>


signature.asc
Description: PGP signature


Re: [PATCH 1/3] drm: Add some new format DRM_FORMAT_NVXX_10

2019-09-25 Thread Maarten Lankhorst
Op 25-09-2019 om 10:32 schreef sandy.huang:
>
> 在 2019/9/25 下午4:17, Maarten Lankhorst 写道:
>> Op 25-09-2019 om 10:06 schreef Sandy Huang:
>>> These new format is supported by some rockchip socs:
>>>
>>> DRM_FORMAT_NV12_10/DRM_FORMAT_NV21_10
>>> DRM_FORMAT_NV16_10/DRM_FORMAT_NV61_10
>>> DRM_FORMAT_NV24_10/DRM_FORMAT_NV42_10
>>>
>>> Signed-off-by: Sandy Huang 
>>> ---
>>>   drivers/gpu/drm/drm_fourcc.c  | 18 ++
>>>   include/uapi/drm/drm_fourcc.h | 14 ++
>>>   2 files changed, 32 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
>>> index c630064..f25fa81 100644
>>> --- a/drivers/gpu/drm/drm_fourcc.c
>>> +++ b/drivers/gpu/drm/drm_fourcc.c
>>> @@ -274,6 +274,24 @@ const struct drm_format_info *__drm_format_info(u32 
>>> format)
>>>   { .format = DRM_FORMAT_YUV420_10BIT,    .depth = 0,
>>>     .num_planes = 1, .cpp = { 0, 0, 0 }, .hsub = 2, .vsub = 2,
>>>     .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV12_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 2, .vsub = 2,
>>> +  .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV21_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 2, .vsub = 2,
>>> +  .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV16_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 2, .vsub = 1,
>>> +  .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV61_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 2, .vsub = 1,
>>> +  .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV24_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 1, .vsub = 1,
>>> +  .is_yuv = true },
>>> +    { .format = DRM_FORMAT_NV42_10,    .depth = 0,
>>> +  .num_planes = 2, .cpp = { 0, 0, 0 }, .hsub = 1, .vsub = 1,
>>> +  .is_yuv = true },
>>>   };
>>>     unsigned int i;
>>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>>> index 3feeaa3..0479f47 100644
>>> --- a/include/uapi/drm/drm_fourcc.h
>>> +++ b/include/uapi/drm/drm_fourcc.h
>>> @@ -238,6 +238,20 @@ extern "C" {
>>>   #define DRM_FORMAT_NV42    fourcc_code('N', 'V', '4', '2') /* 
>>> non-subsampled Cb:Cr plane */
>>>     /*
>>> + * 2 plane YCbCr 10bit
>>> + * index 0 = Y plane, [9:0] Y
>>> + * index 1 = Cr:Cb plane, [19:0]
>>> + * or
>>> + * index 1 = Cb:Cr plane, [19:0]
>>> + */
>>> +#define DRM_FORMAT_NV12_10    fourcc_code('N', 'A', '1', '2') /* 2x2 
>>> subsampled Cr:Cb plane */
>>> +#define DRM_FORMAT_NV21_10    fourcc_code('N', 'A', '2', '1') /* 2x2 
>>> subsampled Cb:Cr plane */
>>> +#define DRM_FORMAT_NV16_10    fourcc_code('N', 'A', '1', '6') /* 2x1 
>>> subsampled Cr:Cb plane */
>>> +#define DRM_FORMAT_NV61_10    fourcc_code('N', 'A', '6', '1') /* 2x1 
>>> subsampled Cb:Cr plane */
>>> +#define DRM_FORMAT_NV24_10    fourcc_code('N', 'A', '2', '4') /* 
>>> non-subsampled Cr:Cb plane */
>>> +#define DRM_FORMAT_NV42_10    fourcc_code('N', 'A', '4', '2') /* 
>>> non-subsampled Cb:Cr plane */
>>> +
>>> +/*
>>>    * 2 plane YCbCr MSB aligned
>>>    * index 0 = Y plane, [15:0] Y:x [10:6] little endian
>>>    * index 1 = Cr:Cb plane, [31:0] Cr:x:Cb:x [10:6:10:6] little endian
>> What are the other bits, they are not mentioned?
>
> It's compact layout
>
> Yplane:
>
>     Y0[9:0]Y1[9:0]Y2[9:0]Y3[9:0]...
>
> UVplane:
>
>     U0[9:0]V0[9:0]U1[9:0]V1[9:0]... 

This should be put in the comment then, for clarity. :) Probably needs 4 pixels 
to describe how it fits in 5 (or 10 for cbcr) bytes.

Cheers,

Maarten



[GIT PULL] fuse update for 5.4

2019-09-25 Thread Miklos Szeredi
Hi Linus,

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git 
tags/fuse-update-5.4

 - Continue separating the transport (user/kernel communication) and the
   filesystem layers of fuse.  Getting rid of most layering violations will
   allow for easier cleanup and optimization later on.

 - Prepare for the addition of the virtio-fs filesystem.  The actual
   filesystem will be introduced by a separate pull request.

 - Convert to new mount API.

 - Various fixes, optimizations and cleanups.

Thanks,
Miklos

---
Arnd Bergmann (1):
  fuse: unexport fuse_put_request

David Howells (2):
  fuse: convert to use the new mount API
  vfs: subtype handling moved to fuse

Eric Biggers (1):
  fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock

Khazhismel Kumykov (2):
  fuse: on 64-bit store time in d_fsdata directly
  fuse: kmemcg account fs data

Kirill Smelkov (1):
  fuse: require /dev/fuse reads to have enough buffer capacity (take 2)

Maxim Patlasov (1):
  fuse: cleanup fuse_wait_on_page_writeback

Michael S. Tsirkin (1):
  fuse: reserve byteswapped init opcodes

Miklos Szeredi (33):
  cuse: fix broken release
  fuse: flatten 'struct fuse_args'
  fuse: rearrange and resize fuse_args fields
  fuse: simplify 'nofail' request
  fuse: convert flush to simple api
  fuse: add noreply to fuse_args
  fuse: convert fuse_force_forget() to simple api
  fuse: add nocreds to fuse_args
  fuse: convert destroy to simple api
  fuse: add pages to fuse_args
  fuse: convert readlink to simple api
  fuse: move page alloc
  fuse: convert ioctl to simple api
  fuse: fuse_short_read(): don't take fuse_req as argument
  fuse: covert readpage to simple api
  fuse: convert sync write to simple api
  fuse: add simple background helper
  fuse: convert direct_io to simple api
  fuse: convert readpages to simple api
  fuse: convert readdir to simple api
  fuse: convert writepages to simple api
  fuse: convert init to simple api
  cuse: convert init to simple api
  fuse: convert release to simple api
  fuse: convert retrieve to simple api
  fuse: unexport request ops
  fuse: simplify request allocation
  fuse: clean up fuse_req
  fuse: stop copying args to fuse_req
  fuse: stop copying pages to fuse_req
  fuse: fix request limit
  fuse: delete dentry if timeout is zero
  fuse: dissociate DESTROY from fuseblk

Stefan Hajnoczi (5):
  fuse: export fuse_end_request()
  fuse: export fuse_len_args()
  fuse: export fuse_get_unique()
  fuse: extract fuse_fill_super_common()
  fuse: add fuse_iqueue_ops callbacks

Tejun Heo (1):
  fuse: fix beyond-end-of-page access in fuse_parse_cache()

Vasily Averin (1):
  fuse: fix missing unlock_page in fuse_writepage()

Vivek Goyal (4):
  fuse: export fuse_send_init_request()
  fuse: export fuse_dequeue_forget() function
  fuse: separate fuse device allocation and installation in fuse_conn
  fuse: allow skipping control interface and forced unmount

YueHaibing (1):
  fuse: Make fuse_args_to_req static

zhengbin (1):
  fuse: fix memleak in cuse_channel_open

---
 fs/fs_context.c|   14 -
 fs/fuse/cuse.c |  101 ++--
 fs/fuse/dev.c  |  654 ++-
 fs/fuse/dir.c  |  283 +-
 fs/fuse/file.c | 1227 
 fs/fuse/fuse_i.h   |  350 ++---
 fs/fuse/inode.c|  553 +++-
 fs/fuse/readdir.c  |   72 ++-
 fs/fuse/xattr.c|   76 +--
 fs/namespace.c |2 -
 fs/proc_namespace.c|2 +-
 fs/super.c |5 -
 include/linux/fs_context.h |1 -
 include/uapi/linux/fuse.h  |4 +
 14 files changed, 1730 insertions(+), 1614 deletions(-)


Re: [PATCH xfstests v3] overlay: Enable character device to be the base fs partition

2019-09-25 Thread Eryu Guan
On Wed, Sep 25, 2019 at 10:18:39AM +0300, Amir Goldstein wrote:
> On Wed, Sep 25, 2019 at 9:29 AM Zhihao Cheng  wrote:
> >
> > When running overlay tests using character devices as base fs partitions,
> > all overlay usecase results become 'notrun'. Function
> > '_overay_config_override' (common/config) detects that the current base
> > fs partition is not a block device and will set FSTYP to base fs. The
> > overlay usecase will check the current FSTYP, and if it is not 'overlay'
> > or 'generic', it will skip the execution.
> >
> > For example, using UBIFS as base fs skips all overlay usecases:
> >
> >   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
> >   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
> >   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> >
> >   overlay/001   [not run] not suitable for this filesystem type: ubifs
> >   overlay/002   [not run] not suitable for this filesystem type: ubifs
> >   overlay/003   [not run] not suitable for this filesystem type: ubifs
> >
> > When checking that the base fs partition is a block/character device,
> > FSTYP is overwritten as 'overlay'. This patch allows the base fs
> > partition to be a character device that can also execute overlay
> > usecases (such as ubifs).
> >
> > Signed-off-by: Zhihao Cheng 
> > Signed-off-by: Amir Goldstein 
> 
> Looks fine.
> Eryu, you may change this to Reviewed-by

Sure, thanks for the review!

Eryu


Re: [PATCH] mm/slub: fix a deadlock in shuffle_freelist()

2019-09-25 Thread Peter Zijlstra
On Fri, Sep 13, 2019 at 12:27:44PM -0400, Qian Cai wrote:
> The commit b7d5dc21072c ("random: add a spinlock_t to struct
> batched_entropy") insists on acquiring "batched_entropy_u32.lock" in
> get_random_u32() which introduced the lock chain,
> 
> ">lock --> batched_entropy_u32.lock"
> 
> even after crng init. As the result, it could result in deadlock below.
> Fix it by using get_random_bytes() in shuffle_freelist() which does not
> need to take on the batched_entropy locks.
> 
> WARNING: possible circular locking dependency detected
> 5.3.0-rc7-mm1+ #3 Tainted: G L
> --
> make/7937 is trying to acquire lock:
> 900012f225f8 (random_write_wait.lock){}, at:
> __wake_up_common_lock+0xa8/0x11c
> 
> but task is already holding lock:
> 0096b9429c00 (batched_entropy_u32.lock){-.-.}, at:
> get_random_u32+0x6c/0x1dc
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (batched_entropy_u32.lock){-.-.}:
>lock_acquire+0x31c/0x360
>_raw_spin_lock_irqsave+0x7c/0x9c
>get_random_u32+0x6c/0x1dc
>new_slab+0x234/0x6c0
>___slab_alloc+0x3c8/0x650
>kmem_cache_alloc+0x4b0/0x590
>__debug_object_init+0x778/0x8b4
>debug_object_init+0x40/0x50
>debug_init+0x30/0x29c
>hrtimer_init+0x30/0x50
>init_dl_task_timer+0x24/0x44
>__sched_fork+0xc0/0x168
>init_idle+0x78/0x26c
>fork_idle+0x12c/0x178
>idle_threads_init+0x108/0x178
>smp_init+0x20/0x1bc
>kernel_init_freeable+0x198/0x26c
>kernel_init+0x18/0x334
>ret_from_fork+0x10/0x18
> 
> -> #2 (>lock){-.-.}:

This relation is silly..

I suspect the below 'works'...

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 63900ca029e0..ec1d72f18b34 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6027,10 +6027,11 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
 
+   __sched_fork(0, idle);
+
raw_spin_lock_irqsave(>pi_lock, flags);
raw_spin_lock(>lock);
 
-   __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
idle->flags |= PF_IDLE;


Re: WireGuard to port to existing Crypto API

2019-09-25 Thread Bruno Wolff III
Are there going to be two branches, one for using the current API and one 
using Zinc?


Re: [PATCH] PCI: aardvark: Don't rely on jiffies while holding spinlock

2019-09-25 Thread Thomas Petazzoni
Hello Remi,

Thanks for the patch, I have a few comments/questions below.

On Sun,  1 Sep 2019 16:23:03 +0200
Remi Pommarel  wrote:

> diff --git a/drivers/pci/controller/pci-aardvark.c 
> b/drivers/pci/controller/pci-aardvark.c
> index fc0fe4d4de49..1fa6d04ad7aa 100644
> --- a/drivers/pci/controller/pci-aardvark.c
> +++ b/drivers/pci/controller/pci-aardvark.c
> @@ -175,7 +175,8 @@
>   (PCIE_CONF_BUS(bus) | PCIE_CONF_DEV(PCI_SLOT(devfn))| \
>PCIE_CONF_FUNC(PCI_FUNC(devfn)) | PCIE_CONF_REG(where))
>  
> -#define PIO_TIMEOUT_MS   1
> +#define PIO_RETRY_CNT10
> +#define PIO_RETRY_DELAY  100 /* 100 us*/
>  
>  #define LINK_WAIT_MAX_RETRIES10
>  #define LINK_WAIT_USLEEP_MIN 9
> @@ -383,17 +384,16 @@ static void advk_pcie_check_pio_status(struct advk_pcie 
> *pcie)
>  static int advk_pcie_wait_pio(struct advk_pcie *pcie)
>  {
>   struct device *dev = >pdev->dev;
> - unsigned long timeout;
> + size_t i;

Is it common to use a size_t for a loop counter ?

>  
> - timeout = jiffies + msecs_to_jiffies(PIO_TIMEOUT_MS);
> -
> - while (time_before(jiffies, timeout)) {
> + for (i = 0; i < PIO_RETRY_CNT; ++i) {

I find it more common to use post-increment for loop counters rather
than pre-increment, but that's a really nitpick and I don't care much.

>   u32 start, isr;
>  
>   start = advk_readl(pcie, PIO_START);
>   isr = advk_readl(pcie, PIO_ISR);
>   if (!start && isr)
>   return 0;
> + udelay(PIO_RETRY_DELAY);

But the bigger issue is that this change causes a 100us delay at
*every* single PIO read or write operation.

Indeed, at the first iteration of the loop, the PIO operation has not
completed, so you will always hit the udelay(100) a first time, and
it's only at the second iteration of the loop that the PIO operation
has completed (for successful PIO operations of course, which don't hit
the timeout).

I took a measurement around wait_pio() with sched_clock before and
after the patch. Before the patch, I have measurements like this (in
nanoseconds):

[1.562801] time = 6000
[1.565310] time = 6000
[1.567809] time = 6080
[1.570327] time = 6080
[1.572836] time = 6080
[1.575339] time = 6080
[1.577858] time = 2720
[1.580366] time = 2720
[1.582862] time = 6000
[1.585377] time = 2720
[1.587890] time = 2720
[1.590393] time = 2720

So it takes a few microseconds for each PIO operation.

With your patch applied:

[2.267291] time = 101680
[2.270002] time = 100880
[2.272852] time = 100800
[2.275573] time = 100880
[2.278285] time = 100800
[2.281005] time = 100880
[2.283722] time = 100800
[2.286444] time = 100880
[2.289264] time = 100880
[2.291981] time = 100800
[2.294690] time = 100800
[2.297405] time = 100800

We're jumping to 100us for every PIO read/write operation. To be
honest, I don't know if this is very important, there are not that many
PIO operations, and they are not used in any performance hot path. But
I thought it was worth pointing out the additional delay caused by this
implementation change.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Ok

2019-09-25 Thread AH ZA
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .Contact me for more detailed
explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama


[PATCH] arm64: dts: meson: g12a: add audio devices resets

2019-09-25 Thread Jerome Brunet
Provide the reset lines coming from the audio clock controller to
the audio devices of the g12 family

Signed-off-by: Jerome Brunet 
---
 arch/arm64/boot/dts/amlogic/meson-g12.dtsi | 28 +-
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-g12.dtsi
index 0d9df29994f3..3cf74fc96434 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12.dtsi
@@ -103,7 +103,9 @@
sound-name-prefix = "TODDR_A";
interrupts = ;
clocks = <_audio AUD_CLKID_TODDR_A>;
-   resets = < AXG_ARB_TODDR_A>;
+   resets = < AXG_ARB_TODDR_A>,
+<_audio AUD_RESET_TODDR_A>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -115,7 +117,9 @@
sound-name-prefix = "TODDR_B";
interrupts = ;
clocks = <_audio AUD_CLKID_TODDR_B>;
-   resets = < AXG_ARB_TODDR_B>;
+   resets = < AXG_ARB_TODDR_B>,
+<_audio AUD_RESET_TODDR_B>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -127,7 +131,9 @@
sound-name-prefix = "TODDR_C";
interrupts = ;
clocks = <_audio AUD_CLKID_TODDR_C>;
-   resets = < AXG_ARB_TODDR_C>;
+   resets = < AXG_ARB_TODDR_C>,
+<_audio AUD_RESET_TODDR_C>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -139,7 +145,9 @@
sound-name-prefix = "FRDDR_A";
interrupts = ;
clocks = <_audio AUD_CLKID_FRDDR_A>;
-   resets = < AXG_ARB_FRDDR_A>;
+   resets = < AXG_ARB_FRDDR_A>,
+<_audio AUD_RESET_FRDDR_A>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -151,7 +159,9 @@
sound-name-prefix = "FRDDR_B";
interrupts = ;
clocks = <_audio AUD_CLKID_FRDDR_B>;
-   resets = < AXG_ARB_FRDDR_B>;
+   resets = < AXG_ARB_FRDDR_B>,
+<_audio AUD_RESET_FRDDR_B>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -163,7 +173,9 @@
sound-name-prefix = "FRDDR_C";
interrupts = ;
clocks = <_audio AUD_CLKID_FRDDR_C>;
-   resets = < AXG_ARB_FRDDR_C>;
+   resets = < AXG_ARB_FRDDR_C>,
+<_audio AUD_RESET_FRDDR_C>;
+   reset-names = "arb", "rst";
status = "disabled";
};
 
@@ -249,6 +261,7 @@
clocks = <_audio AUD_CLKID_SPDIFIN>,
 <_audio AUD_CLKID_SPDIFIN_CLK>;
clock-names = "pclk", "refclk";
+   resets = <_audio AUD_RESET_SPDIFIN>;
status = "disabled";
};
 
@@ -261,6 +274,7 @@
clocks = <_audio AUD_CLKID_SPDIFOUT>,
 <_audio AUD_CLKID_SPDIFOUT_CLK>;
clock-names = "pclk", "mclk";
+   resets = <_audio AUD_RESET_SPDIFOUT>;
status = "disabled";
};
 
@@ -318,6 +332,7 @@
clocks = <_audio AUD_CLKID_SPDIFOUT_B>,
 <_audio AUD_CLKID_SPDIFOUT_B_CLK>;
clock-names = "pclk", "mclk";
+   resets = <_audio AUD_RESET_SPDIFOUT_B>;
status = "disabled";
};
 
@@ -326,6 +341,7 @@
reg = <0x0 0x744 0x0 0x4>;
#sound-dai-cells = <1>;
sound-name-prefix = "TOHDMITX";
+   resets = <_audio AUD_RESET_TOHDMITX>;
status = "disabled";
};
};
-- 
2.21.0



[PATCH] bpf: clean up indentation issue

2019-09-25 Thread Colin King
From: Colin Ian King 

There is a statement that is indented one level too deeply,
remove the extraneous tab.

Signed-off-by: Colin Ian King 
---
 kernel/bpf/btf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 722d38e543e9..29c7c06c6bd6 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -2332,7 +2332,7 @@ static int btf_enum_check_kflag_member(struct 
btf_verifier_env *env,
if (BITS_PER_BYTE_MASKED(struct_bits_off)) {
btf_verifier_log_member(env, struct_type, member,
"Member is not byte aligned");
-   return -EINVAL;
+   return -EINVAL;
}
 
nr_bits = int_bitsize;
-- 
2.20.1



Re: WireGuard to port to existing Crypto API

2019-09-25 Thread David Miller
From: "Jason A. Donenfeld" 
Date: Wed, 25 Sep 2019 10:29:45 +0200

> His viewpoint has recently solidified: in order to go upstream,
> WireGuard must port to the existing crypto API, and handle the Zinc
> project separately.

I didn't say "must" anything, I suggested this as a more smoothe
and efficient way forward.

I'm also a bit disappointed that you felt the need to so quickly
make such an explosive posting to the mailing list when we've
just spoken about this amongst ourselves only 20 minutes ago.

Please proceed in a more smoothe and considerate manner for all
parties involved.

Thank you.


Re: WireGuard to port to existing Crypto API

2019-09-25 Thread David Miller
From: Bruno Wolff III 
Date: Wed, 25 Sep 2019 04:17:00 -0500

> Are there going to be two branches, one for using the current API and
> one using Zinc?

This is inapproprate to even discuss at this point.

Let's see what the crypto based stuff looks like, evaluate it,
and then decide how to proceed forward.

Thank you.


Re: [PATCH] mm, debug, kasan: save and dump freeing stack trace for kasan

2019-09-25 Thread Andrey Ryabinin



On 9/23/19 11:20 AM, Vlastimil Babka wrote:
> On 9/16/19 5:57 PM, Andrey Ryabinin wrote:
>> I'd rather keep all logic in one place, i.e. "if (!page_owner_disabled && 
>> (IS_ENABLED(CONFIG_KASAN) || debug_pagealloc_enabled())"
>> With this no changes in early_debug_pagealloc() required and 
>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y should also work correctly.
> 
> OK.
> 
> 8<
> 
> From 7437c43f02682fdde5680fa83e87029f7529e222 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka 
> Date: Mon, 16 Sep 2019 11:28:19 +0200
> Subject: [PATCH] mm, debug, kasan: save and dump freeing stack trace for kasan
> 
> The commit "mm, page_owner, debug_pagealloc: save and dump freeing stack 
> trace"
> enhanced page_owner to also store freeing stack trace, when debug_pagealloc is
> also enabled. KASAN would also like to do this [1] to improve error reports to
> debug e.g. UAF issues. This patch therefore introduces a helper config option
> PAGE_OWNER_FREE_STACK, which is enabled when PAGE_OWNER and either of
> DEBUG_PAGEALLOC or KASAN is enabled. Boot-time, the free stack saving is
> enabled when booting a KASAN kernel with page_owner=on, or non-KASAN kernel
> with debug_pagealloc=on and page_owner=on.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967
> 
> Suggested-by: Dmitry Vyukov 
> Suggested-by: Walter Wu 
> Suggested-by: Andrey Ryabinin 
> Signed-off-by: Vlastimil Babka 
> ---

Reviewed-by: Andrey Ryabinin 


[PATCH 1/2] HID: i2c-hid: allow delay after SET_POWER

2019-09-25 Thread You-Sheng Yang
According to HID over I2C specification v1.0 section 7.2.8, a device is
allowed to take at most 1 second to make the transition to the specified
power state. On some touchpad devices implements Microsoft Precision
Touchpad, it may fail to execute following set PTP mode command without
the delay and leaves the device in an unsupported Mouse mode.

This change adds a post-setpower-delay-ms device property that allows
specifying the delay after a SET_POWER command is issued.

References: https://bugzilla.kernel.org/show_bug.cgi?id=204991
Signed-off-by: You-Sheng Yang 
---
 .../bindings/input/hid-over-i2c.txt   |  2 +
 drivers/hid/i2c-hid/i2c-hid-core.c| 46 +++
 include/linux/platform_data/i2c-hid.h |  3 ++
 3 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/Documentation/devicetree/bindings/input/hid-over-i2c.txt 
b/Documentation/devicetree/bindings/input/hid-over-i2c.txt
index c76bafaf98d2f..d82faae335da0 100644
--- a/Documentation/devicetree/bindings/input/hid-over-i2c.txt
+++ b/Documentation/devicetree/bindings/input/hid-over-i2c.txt
@@ -32,6 +32,8 @@ device-specific compatible properties, which should be used 
in addition to the
 - vdd-supply: phandle of the regulator that provides the supply voltage.
 - post-power-on-delay-ms: time required by the device after enabling its 
regulators
   or powering it on, before it is ready for communication.
+- post-setpower-delay-ms: time required by the device after a SET_POWER command
+  before it finished the state transition.
 
 Example:
 
diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c 
b/drivers/hid/i2c-hid/i2c-hid-core.c
index 2a7c6e33bb1c4..a5bc2786dc440 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -168,6 +168,7 @@ static const struct i2c_hid_quirks {
__u16 idVendor;
__u16 idProduct;
__u32 quirks;
+   __u32 post_setpower_delay_ms;
 } i2c_hid_quirks[] = {
{ USB_VENDOR_ID_WEIDA, HID_ANY_ID,
I2C_HID_QUIRK_SET_PWR_WAKEUP_DEV },
@@ -189,21 +190,20 @@ static const struct i2c_hid_quirks {
  * i2c_hid_lookup_quirk: return any quirks associated with a I2C HID device
  * @idVendor: the 16-bit vendor ID
  * @idProduct: the 16-bit product ID
- *
- * Returns: a u32 quirks value.
  */
-static u32 i2c_hid_lookup_quirk(const u16 idVendor, const u16 idProduct)
+static void i2c_hid_set_quirk(struct i2c_hid *ihid,
+   const u16 idVendor, const u16 idProduct)
 {
-   u32 quirks = 0;
int n;
 
for (n = 0; i2c_hid_quirks[n].idVendor; n++)
if (i2c_hid_quirks[n].idVendor == idVendor &&
(i2c_hid_quirks[n].idProduct == (__u16)HID_ANY_ID ||
-i2c_hid_quirks[n].idProduct == idProduct))
-   quirks = i2c_hid_quirks[n].quirks;
-
-   return quirks;
+i2c_hid_quirks[n].idProduct == idProduct)) {
+   ihid->quirks = i2c_hid_quirks[n].quirks;
+   ihid->pdata.post_setpower_delay_ms =
+   i2c_hid_quirks[n].post_setpower_delay_ms;
+   }
 }
 
 static int __i2c_hid_command(struct i2c_client *client,
@@ -431,8 +431,22 @@ static int i2c_hid_set_power(struct i2c_client *client, 
int power_state)
power_state == I2C_HID_PWR_SLEEP)
ihid->sleep_delay = jiffies + msecs_to_jiffies(20);
 
-   if (ret)
+   if (ret) {
dev_err(>dev, "failed to change power setting.\n");
+   goto set_pwr_exit;
+   }
+
+   /*
+* The HID over I2C specification states that if a DEVICE needs time
+* after the PWR_ON request, it should utilise CLOCK stretching.
+* However, it has been observered that the Windows driver provides a
+* 1ms sleep between the PWR_ON and RESET requests and that some devices
+* rely on this.
+*/
+   if (ihid->pdata.post_setpower_delay_ms)
+   msleep(ihid->pdata.post_setpower_delay_ms);
+   else
+   usleep_range(1000, 5000);
 
 set_pwr_exit:
return ret;
@@ -456,15 +470,6 @@ static int i2c_hid_hwreset(struct i2c_client *client)
if (ret)
goto out_unlock;
 
-   /*
-* The HID over I2C specification states that if a DEVICE needs time
-* after the PWR_ON request, it should utilise CLOCK stretching.
-* However, it has been observered that the Windows driver provides a
-* 1ms sleep between the PWR_ON and RESET requests and that some devices
-* rely on this.
-*/
-   usleep_range(1000, 5000);
-
i2c_hid_dbg(ihid, "resetting...\n");
 
ret = i2c_hid_command(client, _reset_cmd, NULL, 0);
@@ -1023,6 +1028,9 @@ static void i2c_hid_fwnode_probe(struct i2c_client 
*client,
if (!device_property_read_u32(>dev, "post-power-on-delay-ms",
  ))

[PATCH 2/2] HID: i2c-hid: add 60ms SET_POWER delay for Goodix touchpad

2019-09-25 Thread You-Sheng Yang
Goodix touchpad 27C6:01F0 fails to switch to PTP mode when resumed from
suspend. The traffic after resumed looks like:

  [ 275.312190] i2c_hid i2c-DELL096E:00: i2c_hid_set_power
  [ 275.312191] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 01 08
  [ 283.926905] i2c_hid i2c-DELL096E:00: i2c_hid_set_power
  [ 283.926910] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 00 08
  [ 283.927146] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.927149] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 37 03 06 
00 05 00 07 00 00
  [ 283.927872] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.927874] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 33 03 06 
00 05 00 03 03 00
  [ 283.929148] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.929151] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 35 03 06 
00 05 00 05 03 00
  [ 289.262675] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 00 00 00 00 00 00 00
  [ 289.270314] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fe 00 00 00 00 00 00
  [ 289.276806] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fd 00 00 00 00 00 00
  ...

The time delay between i2c_hid_set_power and i2c_hid_set_or_send_report
is less than vendor recommended 60ms, so it failed to complete its power
state transition, ignored i2c_hid_set_or_send_report and is still
operating in legacy mouse mode, and therefore it gives unsupported input
reports.

This change updates the quirk for the device to specifies a 60ms
post-setpower-delay-ms.

References: https://bugzilla.kernel.org/show_bug.cgi?id=204991
Signed-off-by: You-Sheng Yang 
---
 drivers/hid/i2c-hid/i2c-hid-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hid/i2c-hid/i2c-hid-core.c 
b/drivers/hid/i2c-hid/i2c-hid-core.c
index a5bc2786dc440..8c01ce33f1c61 100644
--- a/drivers/hid/i2c-hid/i2c-hid-core.c
+++ b/drivers/hid/i2c-hid/i2c-hid-core.c
@@ -180,7 +180,7 @@ static const struct i2c_hid_quirks {
{ USB_VENDOR_ID_LG, I2C_DEVICE_ID_LG_8001,
I2C_HID_QUIRK_NO_RUNTIME_PM },
{ I2C_VENDOR_ID_GOODIX, I2C_DEVICE_ID_GOODIX_01F0,
-   I2C_HID_QUIRK_NO_RUNTIME_PM },
+   I2C_HID_QUIRK_NO_RUNTIME_PM, 60 },
{ USB_VENDOR_ID_ELAN, HID_ANY_ID,
 I2C_HID_QUIRK_BOGUS_IRQ },
{ 0, 0 }
-- 
2.23.0



[PATCH 0/2] HID: i2c-hid: add 60ms SET_POWER delay for Goodix touchpad

2019-09-25 Thread You-Sheng Yang
I2C traffic after resumed from s2idle:

  [ 275.312190] i2c_hid i2c-DELL096E:00: i2c_hid_set_power
  [ 275.312191] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 01 08
  [ 283.926905] i2c_hid i2c-DELL096E:00: i2c_hid_set_power
  [ 283.926910] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 00 08
  [ 283.927146] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.927149] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 37 03 06 
00 05 00 07 00 00
  [ 283.927872] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.927874] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 33 03 06 
00 05 00 03 03 00
  [ 283.929148] i2c_hid i2c-DELL096E:00: i2c_hid_set_or_send_report
  [ 283.929151] i2c_hid i2c-DELL096E:00: __i2c_hid_command: cmd=05 00 35 03 06 
00 05 00 05 03 00
  # when touching touchpad:
  [ 289.262675] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 00 00 00 00 00 00 00
  [ 289.270314] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fe 00 00 00 00 00 00
  [ 289.276806] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fd 00 00 00 00 00 00
  [ 289.283863] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fb 01 00 00 00 00 00
  [ 289.291213] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 fa 02 00 00 00 00 00
  [ 289.297932] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 f9 03 00 00 00 00 00
  [ 289.304775] i2c_hid i2c-DELL096E:00: input: 0b 00 01 00 f9 03 00 00 00 00 00

On this device (Goodix touchpad 27C6:01F0), both Precision Touchpad
mode and a legacy Mouse mode are supported. When I2C SET_POWER ON is
issued, it would take as long as 60ms to complete state transition and
is then able to receive and execute further commands. On boot the
device is running under Mouse mode for legacy platform support, and
will be put to PTP mode with a SET_OR_SEND_REPORT command. If somehow
the time gap between the first SET_POWER ON and further
SET_OR_SEND_REPORT commands is less than 60ms, the device will still
operating under Mouse mode and gives reports like
"0b 00 01 00 fe 00 00 00 00 00 00".

On Linux system boot, it may take 500ms from i2c probe (SET_POWER ON)
to hid initialization (SET_OR_SEND_REPORT), so it will always be put
to PTP mode successfully. But when the device is put under s2idle
suspend and resumed, the hid reset resume process will be right after
i2c resume, so the device won't have enough time to complete its
state transition, which fails following SET_OR_SEND_REPORT executing
and therefore it operates under Mouse mode instead.

Currently Linux has a at most 5ms sleep in i2c_hid_hwreset(), but
according to HID over I2C Specification[1] section 7.2.8 "SET_POWER":

> The DEVICE must ensure that it transitions to the HOST
> specified Power State in under 1 second.

it can take as long as 1 second.

This changeset add a device property post-setpower-delay-ms and use it
to specify the delay after a SET_POWER command is issued for this
device.

References: https://bugzilla.kernel.org/show_bug.cgi?id=204991

You-Sheng Yang (2):
  HID: i2c-hid: allow delay after SET_POWER
  HID: i2c-hid: add 60ms SET_POWER delay for Goodix touchpad

 .../bindings/input/hid-over-i2c.txt   |  2 +
 drivers/hid/i2c-hid/i2c-hid-core.c| 48 +++
 include/linux/platform_data/i2c-hid.h |  3 ++
 3 files changed, 33 insertions(+), 20 deletions(-)

-- 
2.23.0



  1   2   3   4   5   6   7   8   >