date:20201016

[PATCH v7 3/4] docs: Add documentation for userspace client interface

2020-10-16 Thread Hemant Kumar

MHI userspace client driver is creating device file node
for user application to perform file operations. File
operations are handled by MHI core driver. Currently
Loopback MHI channel is supported by this driver.

Signed-off-by: Hemant Kumar 
---
 Documentation/mhi/index.rst |  1 +
 Documentation/mhi/uci.rst   | 83 +
 2 files changed, 84 insertions(+)
 create mode 100644 Documentation/mhi/uci.rst

diff --git a/Documentation/mhi/index.rst b/Documentation/mhi/index.rst
index 1d8dec3..c75a371 100644
--- a/Documentation/mhi/index.rst
+++ b/Documentation/mhi/index.rst
@@ -9,6 +9,7 @@ MHI
 
mhi
topology
+   uci
 
 .. only::  subproject and html
 
diff --git a/Documentation/mhi/uci.rst b/Documentation/mhi/uci.rst
new file mode 100644
index 000..fe901c4
--- /dev/null
+++ b/Documentation/mhi/uci.rst
@@ -0,0 +1,83 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Userspace Client Interface (UCI)
+=
+
+UCI driver enables userspace clients to communicate to external MHI devices
+like modem and WLAN. UCI driver probe creates standard character device file
+nodes for userspace clients to perform open, read, write, poll and release file
+operations.
+
+Operations
+==
+
+open
+
+
+Instantiates UCI channel object and starts MHI channels to move it to running
+state. Inbound buffers are queued to downlink channel transfer ring. Every
+subsequent open() increments UCI device reference count as well as UCI channel
+reference count.
+
+read
+
+
+When data transfer is completed on downlink channel, TRE buffer is copied to
+pending list. Reader is unblocked and data is copied to userspace buffer. TRE
+buffer is queued back to downlink channel transfer ring.
+
+write
+-
+
+Write buffer is queued to uplink channel transfer ring if ring is not full. 
Upon
+uplink transfer completion buffer is freed.
+
+poll
+
+
+Returns EPOLLIN | EPOLLRDNORM mask if pending list has buffers to be read by
+userspace. Returns EPOLLOUT | EPOLLWRNORM mask if MHI uplink channel transfer
+ring is not empty. Returns EPOLLERR when UCI driver is removed. MHI channels
+move to disabled state upon driver remove.
+
+release
+---
+
+Decrements UCI device reference count and UCI channel reference count upon last
+release(). UCI channel clean up is performed. MHI channel moves to disabled
+state and inbound buffers are freed.
+
+Usage
+=
+
+Device file node is created with format:-
+
+/dev/mhi__
+
+controller_name is the name of underlying bus used to transfer data. mhi_device
+name is the name of the MHI channel being used by MHI client in userspace to
+send or receive data using MHI protocol.
+
+There is a separate character device file node created for each channel
+specified in mhi device id table. MHI channels are statically defined by MHI
+specification. The list of supported channels is in the channel list variable
+of mhi_device_id table in UCI driver.
+
+LOOPBACK Channel
+
+
+Userspace MHI client using LOOPBACK channel opens device file node. As part of
+open operation TREs to transfer ring of LOOPBACK channel 1 gets queued and 
channel
+doorbell is rung. When userspace MHI client performs write operation on device 
node,
+data buffer gets queued as a TRE to transfer ring of LOOPBACK channel 0. MHI 
Core
+driver rings the channel doorbell for MHI device to move data over underlying 
bus.
+When userspace MHI client driver performs read operation, same data gets 
looped back
+to MHI host using LOOPBACK channel 1. LOOPBACK channel is used to verify data 
path
+and data integrity between MHI Host and MHI device.
+
+Other Use Cases
+---
+
+Getting MHI device specific diagnostics information to userspace MHI diag 
client
+using DIAG channel 4 (Host to device) and 5 (Device to Host).
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v7 1/4] bus: mhi: core: Add helper API to return number of free TREs

2020-10-16 Thread Hemant Kumar

Introduce mhi_get_free_desc_count() API to return number
of TREs available to queue buffer. MHI clients can use this
API to know before hand if ring is full without calling queue
API.

Signed-off-by: Hemant Kumar 
Reviewed-by: Jeffrey Hugo 
---
 drivers/bus/mhi/core/main.c | 12 
 include/linux/mhi.h |  9 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 2cff5dd..3950792 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -258,6 +258,18 @@ int mhi_destroy_device(struct device *dev, void *data)
return 0;
 }
 
+int mhi_get_free_desc_count(struct mhi_device *mhi_dev,
+   enum dma_data_direction dir)
+{
+   struct mhi_controller *mhi_cntrl = mhi_dev->mhi_cntrl;
+   struct mhi_chan *mhi_chan = (dir == DMA_TO_DEVICE) ?
+   mhi_dev->ul_chan : mhi_dev->dl_chan;
+   struct mhi_ring *tre_ring = _chan->tre_ring;
+
+   return get_nr_avail_ring_elements(mhi_cntrl, tre_ring);
+}
+EXPORT_SYMBOL_GPL(mhi_get_free_desc_count);
+
 void mhi_notify(struct mhi_device *mhi_dev, enum mhi_callback cb_reason)
 {
struct mhi_driver *mhi_drv;
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index d4841e5..7829b1d 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -597,6 +597,15 @@ void mhi_set_mhi_state(struct mhi_controller *mhi_cntrl,
 void mhi_notify(struct mhi_device *mhi_dev, enum mhi_callback cb_reason);
 
 /**
+ * mhi_get_free_desc_count - Get transfer ring length
+ * Get # of TD available to queue buffers
+ * @mhi_dev: Device associated with the channels
+ * @dir: Direction of the channel
+ */
+int mhi_get_free_desc_count(struct mhi_device *mhi_dev,
+   enum dma_data_direction dir);
+
+/**
  * mhi_prepare_for_power_up - Do pre-initialization before power up.
  *This is optional, call this before power up if
  *the controller does not want bus framework to
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v7 2/4] bus: mhi: core: Move MHI_MAX_MTU to external header file

2020-10-16 Thread Hemant Kumar

Currently this macro is defined in internal MHI header as
a TRE length mask. Moving it to external header allows MHI
client drivers to set this upper bound for the transmit
buffer size.

Signed-off-by: Hemant Kumar 
Reviewed-by: Jeffrey Hugo 
---
 drivers/bus/mhi/core/internal.h | 1 -
 include/linux/mhi.h | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
index 7989269..4abf0cf 100644
--- a/drivers/bus/mhi/core/internal.h
+++ b/drivers/bus/mhi/core/internal.h
@@ -453,7 +453,6 @@ enum mhi_pm_state {
 #define CMD_EL_PER_RING128
 #define PRIMARY_CMD_RING   0
 #define MHI_DEV_WAKE_DB127
-#define MHI_MAX_MTU0x
 #define MHI_RANDOM_U32_NONZERO(bmsk)   (prandom_u32_max(bmsk) + 1)
 
 enum mhi_er_type {
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index 7829b1d..6e1122c 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -15,6 +15,9 @@
 #include 
 #include 
 
+/* MHI client drivers to set this upper bound for tx buffer */
+#define MHI_MAX_MTU 0x
+
 #define MHI_MAX_OEM_PK_HASH_SEGMENTS 16
 
 struct mhi_chan;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH V2] mm: fix potential pte_unmap_unlock pte error

2020-10-16 Thread Shijie Luo

When flags don't have MPOL_MF_MOVE or MPOL_MF_MOVE_ALL bits, code breaks
 and passing origin pte - 1 to pte_unmap_unlock seems like not a good idea.

Signed-off-by: Shijie Luo 
Signed-off-by: Michal Hocko 
Signed-off-by: Miaohe Lin 
---
 mm/mempolicy.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3fde772ef5ef..3ca4898f3f24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long 
addr,
unsigned long flags = qp->flags;
int ret;
bool has_unmovable = false;
-   pte_t *pte;
+   pte_t *pte, *mapped_pte;
spinlock_t *ptl;
 
ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long 
addr,
if (pmd_trans_unstable(pmd))
return 0;
 
-   pte = pte_offset_map_lock(walk->mm, pmd, addr, );
+   mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, );
for (; addr != end; pte++, addr += PAGE_SIZE) {
if (!pte_present(*pte))
continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long 
addr,
} else
break;
}
-   pte_unmap_unlock(pte - 1, ptl);
+   pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
 
if (has_unmovable)
-- 
2.19.1

[PATCH 2/2] arm64: allow hotpluggable sections to be offlined

2020-10-16 Thread Sudarshan Rajagopalan

On receiving the MEM_GOING_OFFLINE notification, we disallow offlining of
any boot memory by checking if section_early or not. With the introduction
of SECTION_MARK_HOTPLUGGABLE, allow boot mem sections that are marked as
hotpluggable with this bit set to be offlined and removed. This now allows
certain boot mem sections to be offlined.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Gavin Shan 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
Cc: Suren Baghdasaryan 
---
 arch/arm64/mm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..fb8878698672 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1487,7 +1487,7 @@ static int prevent_bootmem_remove_notifier(struct 
notifier_block *nb,
 
for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
ms = __pfn_to_section(pfn);
-   if (early_section(ms))
+   if (early_section(ms) && !removable_section(ms))
return NOTIFY_BAD;
}
return NOTIFY_OK;
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH 1/2] mm/memory_hotplug: allow marking of memory sections as hotpluggable

2020-10-16 Thread Sudarshan Rajagopalan

Certain architectures such as arm64 doesn't allow boot memory to be
offlined and removed. Distinguish certain memory sections as
"hotpluggable" which can be marked by module drivers stating to memory
hotplug layer that these sections can be offlined and then removed.
This is done by using a separate section memory mab bit and setting it,
rather than clearing the existing SECTION_IS_EARLY bit.
This patch introduces SECTION_MARK_HOTPLUGGABLE bit into section mem map.
Only the allowed sections which are in movable zone and have unmovable
pages are allowed to be set with this new bit.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mike Rapoport 
Cc: Anshuman Khandual 
Cc: David Hildenbrand 
Cc: Mark Rutland 
Cc: Steven Price 
Cc: Logan Gunthorpe 
Cc: Suren Baghdasaryan 
---
 include/linux/memory_hotplug.h |  1 +
 include/linux/mmzone.h |  9 -
 mm/memory_hotplug.c| 20 
 mm/sparse.c| 31 +++
 4 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 375515803cd8..81df45b582c8 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,6 +319,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned 
long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
 extern void __remove_memory(int nid, u64 start, u64 size);
 extern int offline_and_remove_memory(int nid, u64 start, u64 size);
+extern int mark_memory_hotpluggable(unsigned long start, unsigned long end);
 
 #else
 static inline void try_offline_node(int nid) {}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8379432f4f2f..3df3a4975236 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1247,7 +1247,8 @@ extern size_t mem_section_usage_size(void);
 #define SECTION_HAS_MEM_MAP(1UL<<1)
 #define SECTION_IS_ONLINE  (1UL<<2)
 #define SECTION_IS_EARLY   (1UL<<3)
-#define SECTION_MAP_LAST_BIT   (1UL<<4)
+#define SECTION_MARK_HOTPLUGGABLE  (1UL<<4)
+#define SECTION_MAP_LAST_BIT   (1UL<<5)
 #define SECTION_MAP_MASK   (~(SECTION_MAP_LAST_BIT-1))
 #define SECTION_NID_SHIFT  3
 
@@ -1278,6 +1279,11 @@ static inline int early_section(struct mem_section 
*section)
return (section && (section->section_mem_map & SECTION_IS_EARLY));
 }
 
+static inline int removable_section(struct mem_section *section)
+{
+   return (section && (section->section_mem_map & 
SECTION_MARK_HOTPLUGGABLE));
+}
+
 static inline int valid_section_nr(unsigned long nr)
 {
return valid_section(__nr_to_section(nr));
@@ -1297,6 +1303,7 @@ static inline int online_section_nr(unsigned long nr)
 void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
+int section_mark_hotpluggable(struct mem_section *ms);
 #endif
 #endif
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e9d5ab5d3ca0..503b0de489a0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1860,4 +1860,24 @@ int offline_and_remove_memory(int nid, u64 start, u64 
size)
return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+int mark_memory_hotpluggable(unsigned long start_pfn, unsigned long end_pfn)
+{
+   struct mem_section *ms;
+   unsigned long nr;
+   int rc = -EINVAL;
+
+   if (end_pfn < start_pfn)
+   return rc;
+
+   for (nr = start_pfn; nr <= end_pfn; nr++) {
+   ms = __pfn_to_section(nr);
+   rc = section_mark_hotpluggable(ms);
+   if (!rc)
+   break;
+   }
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(mark_memory_hotpluggable);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
diff --git a/mm/sparse.c b/mm/sparse.c
index fcc3d176f1ea..cc21c23e2f1d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include 
@@ -644,6 +645,36 @@ void offline_mem_sections(unsigned long start_pfn, 
unsigned long end_pfn)
ms->section_mem_map &= ~SECTION_IS_ONLINE;
}
 }
+
+int section_mark_hotpluggable(struct mem_section *ms)
+{
+   unsigned long section_nr, pfn;
+   bool unmovable;
+   struct page *page;
+
+   /* section needs to be both valid and present to be marked */
+   if (WARN_ON(!valid_section(ms)) || !present_section(ms))
+   return -EINVAL;
+
+   /*
+* now check if this section is removable. This can be done by checking
+* if section has unmovable pages or not.
+*/
+   section_nr = __section_nr(ms);
+   pfn = section_nr_to_pfn(section_nr);
+   page = pfn_to_page(pfn);
+   unmovable = has_unmovable_pages(page_zone(page), page,
+   MIGRATE_MOVABLE, MEMORY_OFFLINE |

[PATCH 0/2] mm/memory_hotplug, arm64: allow certain bootmem sections to be offlinable

2020-10-16 Thread Sudarshan Rajagopalan

In the patch that enables memory hot-remove (commit bbd6ec605c0f ("arm64/mm: 
Enable memory hot remove")) for arm64, there’s a notifier put in place that 
prevents boot memory from being offlined and removed. The commit text mentions 
that boot memory on arm64 cannot be removed. But x86 and other archs doesn’t 
seem to do this prevention.

The current logic is that only “new” memory blocks which are hot-added can 
later be offlined and removed. The memory that system booted up with cannot be 
offlined and removed. But there could be many usercases such as inter-VM memory 
sharing where a primary VM could offline and hot-remove a block/section of 
memory and lend it to secondary VM where it could hot-add it. And after usecase 
is done, the reverse happens where secondary VM hot-removes and gives it back 
to primary which can hot-add it back. In such cases, the present logic for 
arm64 doesn’t allow this hot-remove in primary to happen.

Also, on systems with movable zone that sort of guarantees pages to be migrated 
and isolated so that blocks can be offlined, this logic also defeats the 
purpose of having a movable zone which system can rely on memory hot-plugging, 
which say virt-io mem also relies on for fully plugged memory blocks.

This patch tries to solve by introducing a new section mem map sit 
'SECTION_MARK_HOTPLUGGABLE' which allows the concerned module drivers be able
to mark requried sections as "hotpluggable" by setting this bit. Also this 
marking is only allowed for sections which are in movable zone and have 
unmovable pages. The arm64 mmu code on receiving the MEM_GOING_OFFLINE 
notification, we disallow offlining of any boot memory by checking if 
section_early or not. With the introduction of SECTION_MARK_HOTPLUGGABLE, we 
allow boot mem sections that are marked as hotpluggable with this bit set to be 
offlined and removed. Thereby allowing required bootmem sections to be 
offlinable.

Sudarshan Rajagopalan (2):
  mm/memory_hotplug: allow marking of memory sections as hotpluggable
  arm64: allow hotpluggable sections to be offlined

 arch/arm64/mm/mmu.c|  2 +-
 include/linux/memory_hotplug.h |  1 +
 include/linux/mmzone.h |  9 -
 mm/memory_hotplug.c| 20 
 mm/sparse.c| 31 +++
 5 files changed, 61 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v4 0/3] iommu/arm-smmu-qcom: Support maintaining bootloader mappings

2020-10-16 Thread Bjorn Andersson

This is the fourth attempt of inheriting the stream mapping for the framebuffer
on many Qualcomm platforms, in order to not hit catastrophic faults during
arm-smmu initialization.

The new approach does, based on Robin's suggestion, take a much more direct
approach with the allocation of a context bank for bypass emulation and use of
this context bank pretty much isolated to the Qualcomm specific implementation.

As before the patchset has been tested to boot DB845c (with splash screen) and
Lenovo Yoga C630 (with EFI framebuffer).

Bjorn Andersson (3):
  iommu/arm-smmu: Allow implementation specific write_s2cr
  iommu/arm-smmu-qcom: Read back stream mappings
  iommu/arm-smmu-qcom: Implement S2CR quirk

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 92 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 22 --
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
 3 files changed, 107 insertions(+), 8 deletions(-)

-- 
2.28.0

[PATCH v4 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-16 Thread Bjorn Andersson

The Qualcomm boot loader configures stream mapping for the peripherals
that it accesses and in particular it sets up the stream mapping for the
display controller to be allowed to scan out a splash screen or EFI
framebuffer.

Read back the stream mappings during initialization and make the
arm-smmu driver maintain the streams in bypass mode.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Extracted from different patch in v3.
- Now configures the stream as BYPASS, rather than translate, which should work
  for platforms with working S2CR handling as well.

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..0089048342dd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -23,6 +23,29 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
{ }
 };
 
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
+{
+   u32 smr;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   smmu->s2crs[i].count++;
+   }
+   }
+
+   return 0;
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -61,6 +84,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
 }
 
 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
 };
-- 
2.28.0

[PATCH v4 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-16 Thread Bjorn Andersson

The firmware found in some Qualcomm platforms intercepts writes to S2CR
in order to replace bypass type streams with fault; and ignore S2CR
updates of type fault.

Detect this behavior and implement a custom write_s2cr function in order
to trick the firmware into supporting bypass streams by the means of
configuring the stream for translation using a reserved and disabled
context bank.

Also circumvent the problem of configuring faulting streams by
configuring the stream as bypass.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Move the reservation of the "identity context bank" to the Qualcomm specific
  implementation.
- Implement the S2CR quirk with the newly introduced write_s2cr callback.

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 68 ++
 1 file changed, 68 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 0089048342dd..c0f42d6a6e01 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,8 +10,14 @@
 
 struct qcom_smmu {
struct arm_smmu_device smmu;
+   bool bypass_cbndx;
 };
 
+static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct qcom_smmu, smmu);
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -25,9 +31,32 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
 
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 reg;
u32 smr;
int i;
 
+   /*
+* With some firmware versions writes to S2CR of type FAULT are
+* ignored, and writing BYPASS will end up written as FAULT in the
+* register. Perform a write to S2CR to detect if this is the case and
+* if so reserve a context bank to emulate bypass streams.
+*/
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
+   arm_smmu_gr0_write(smmu, last_s2cr, reg);
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
+   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;
+
+   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
+
+   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
CBAR_TYPE_S1_TRANS_S2_BYPASS);
+   arm_smmu_gr1_write(smmu, 
ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+   }
+
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
 
@@ -46,6 +75,44 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
+static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
+{
+   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 cbndx = s2cr->cbndx;
+   u32 type = s2cr->type;
+   u32 reg;
+
+   if (qsmmu->bypass_cbndx) {
+   if (type == S2CR_TYPE_BYPASS) {
+   /*
+* Firmware with quirky S2CR handling will substitute
+* BYPASS writes with FAULT, so point the stream to the
+* reserved context bank and ask for translation on the
+* stream
+*/
+   type = S2CR_TYPE_TRANS;
+   cbndx = qsmmu->bypass_cbndx;
+   } else if (type == S2CR_TYPE_FAULT) {
+   /*
+* Firmware with quirky S2CR handling will ignore FAULT
+* writes, so trick it to write FAULT by asking for a
+* BYPASS.
+*/
+   type = S2CR_TYPE_BYPASS;
+   cbndx = 0xff;
+   }
+   }
+
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+
+   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs && 
smmu->smrs[idx].valid)
+   reg |= ARM_SMMU_S2CR_EXIDVALID;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -87,6 +154,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
+   .write_s2cr = qcom_smmu_write_s2cr,
 };
 
 struct

[PATCH v4 1/3] iommu/arm-smmu: Allow implementation specific write_s2cr

2020-10-16 Thread Bjorn Andersson

The firmware found in some Qualcomm platforms intercepts writes to the
S2CR register in order to replace the BYPASS type with FAULT. Further
more it treats faults at this level as catastrophic and restarts the
device.

Add support for providing implementation specific versions of the S2CR
write function, to allow the Qualcomm driver to work around this
behavior.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- New patch

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 22 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..ed3f0428c110 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -929,14 +929,20 @@ static void arm_smmu_write_smr(struct arm_smmu_device 
*smmu, int idx)
 static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
 {
struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
-   u32 reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
- FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
- FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
-
-   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
-   smmu->smrs[idx].valid)
-   reg |= ARM_SMMU_S2CR_EXIDVALID;
-   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   u32 reg;
+
+   if (smmu->impl && smmu->impl->write_s2cr) {
+   smmu->impl->write_s2cr(smmu, idx);
+   } else {
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+
+   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
+   smmu->smrs[idx].valid)
+   reg |= ARM_SMMU_S2CR_EXIDVALID;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   }
 }
 
 static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 1a746476927c..b71647eaa319 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -436,6 +436,7 @@ struct arm_smmu_impl {
int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
  struct arm_smmu_device *smmu,
  struct device *dev, int start);
+   void (*write_s2cr)(struct arm_smmu_device *smmu, int idx);
 };
 
 #define INVALID_SMENDX -1
-- 
2.28.0

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > Microsoft's documentation
> > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > Generation ID that we get after a fork "is a 128-bit,
> > cryptographically random integer value". If multiple people use the
> > same image, it guarantees that each use of the image gets its own,
> > fresh ID:
>
> No. It cannot be more unique than the source that feeds that cryptographic
> transformation. All it guarantees is that the entropy source is protected
> from being guessed based on the output. Applying cryptography on a simple
> counter provides apparently random numbers that will be unique for a long
> period for the same source, but as soon as you duplicate that code between
> users and they start from the same counter they'll get the same IDs.
>
> This is why I think that using a counter is better if you really need 
> something
> unique. Randoms only reduce predictability which helps avoiding collisions.

Microsoft's spec tells us that they're giving us cryptographically
random numbers. Where they're getting those from is not our problem.
(And if even the hypervisor is not able to collect enough entropy to
securely generate random numbers, worrying about RNG reseeding in the
guest would be kinda pointless, we'd be fairly screwed anyway.)

Also note that we don't actually need to *always* reinitialize RNG
state on forks for functional correctness; it is fine if that fails
with a probability of 2^-128, because functionally everything will be
fine, and an attacker who is that lucky could also just guess an AES
key (which has the same probability of being successful). (And also
2^-128 is such a tiny number that it doesn't matter anyway.)

> And I'm saying this as someone who had on his external gateway the same SSH
> host key as 89 other hosts on the net, each of them using randoms to provide
> a universally unique one...

If your SSH host key was shared with 89 other hosts, it evidently
wasn't generated from cryptographically random numbers. :P Either
because the key generator was not properly hooked up to the system's
entropy pool (if you're talking about the Debian fiasco), or because
the system simply did not have enough entropy available. (Or because
the key generator is broken, but I don't think that ever happened with
OpenSSH?)

drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:932:2: warning: 'strncpy' specified bound 128 equals destination size

2020-10-16 Thread kernel test robot

Hi Jacopo,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   8119c4332d253660e0a6b8748fe0749961cfbc97
commit: b18ee53ad297264a79cf4ea53f20786b6455 staging: bcm2835: Break MMAL 
support out from camera
date:   4 months ago
config: nds32-randconfig-r026-20201017 (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b18ee53ad297264a79cf4ea53f20786b6455
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout b18ee53ad297264a79cf4ea53f20786b6455
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=nds32 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:11,
from 
drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:19:
   include/linux/dma-mapping.h: In function 'dma_map_resource':
   arch/nds32/include/asm/memory.h:82:32: warning: comparison of unsigned 
expression >= 0 is always true [-Wtype-limits]
  82 | #define pfn_valid(pfn)  ((pfn) >= PHYS_PFN_OFFSET && (pfn) < 
(PHYS_PFN_OFFSET + max_mapnr))
 |^~
   include/linux/compiler.h:58:52: note: in definition of macro '__trace_if_var'
  58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
 |^~~~
   include/linux/dma-mapping.h:352:2: note: in expansion of macro 'if'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~
   include/linux/dma-mapping.h:352:6: note: in expansion of macro 'WARN_ON_ONCE'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~~~
   include/linux/dma-mapping.h:352:19: note: in expansion of macro 'pfn_valid'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |   ^
   arch/nds32/include/asm/memory.h:82:32: warning: comparison of unsigned 
expression >= 0 is always true [-Wtype-limits]
  82 | #define pfn_valid(pfn)  ((pfn) >= PHYS_PFN_OFFSET && (pfn) < 
(PHYS_PFN_OFFSET + max_mapnr))
 |^~
   include/linux/compiler.h:58:61: note: in definition of macro '__trace_if_var'
  58 | #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
 | ^~~~
   include/linux/dma-mapping.h:352:2: note: in expansion of macro 'if'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~
   include/linux/dma-mapping.h:352:6: note: in expansion of macro 'WARN_ON_ONCE'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~~~
   include/linux/dma-mapping.h:352:19: note: in expansion of macro 'pfn_valid'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |   ^
   arch/nds32/include/asm/memory.h:82:32: warning: comparison of unsigned 
expression >= 0 is always true [-Wtype-limits]
  82 | #define pfn_valid(pfn)  ((pfn) >= PHYS_PFN_OFFSET && (pfn) < 
(PHYS_PFN_OFFSET + max_mapnr))
 |^~
   include/linux/compiler.h:69:3: note: in definition of macro 
'__trace_if_value'
  69 |  (cond) ? \
 |   ^~~~
   include/linux/compiler.h:56:28: note: in expansion of macro '__trace_if_var'
  56 | #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) 
) )
 |^~
   include/linux/dma-mapping.h:352:2: note: in expansion of macro 'if'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~
   include/linux/dma-mapping.h:352:6: note: in expansion of macro 'WARN_ON_ONCE'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |  ^~~~
   include/linux/dma-mapping.h:352:19: note: in expansion of macro 'pfn_valid'
 352 |  if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
 |   ^
   drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c: In function 
'create_component':
>> drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:932:2: warning: 
>> 'strncpy' specified bound 128 equals destination size [-Wstringop-truncation]
 932 |  strncpy(m.u.component_create.name, name,
 |  ^~~~
 933 |   sizeof(m.u.component_create.name));
 |

Re: [PATCH v7 6/6] rcu/segcblist: Add additional comments to explain smp_mb()

2020-10-16 Thread joel

On Fri, Oct 16, 2020 at 09:27:53PM -0400, j...@joelfernandes.org wrote:
[..]
> > > + *
> > > + * Memory barrier is needed after adding to length for the case
> > > + * where length transitions from 0 -> 1. This is because rcu_barrier()
> > > + * should never miss an update to the length. So the update to length
> > > + * has to be seen *before* any modifications to the segmented list. 
> > > Otherwise a
> > > + * race can happen.
> > > + * P0 (what P1 sees) P1
> > > + * queue to list
> > > + *  rcu_barrier sees len as 0
> > > + * set len = 1.
> > > + *  rcu_barrier does nothing.
> > 
> > So that would be:
> > 
> >   call_rcu()rcu_barrier()
> >   ----
> >   WRITE(len, len + 1)   l = READ(len)
> >   smp_mb()  if (!l)
> >   queuecheck next CPU...
> > 
> > 
> > But I still don't see against what it pairs in rcu_barrier.
> 
> Actually, for the second case maybe a similar reasoning can be applied
> (control dependency) but I'm unable to come up with a litmus test.
> In fact, now I'm wondering how is it possible that call_rcu() races with
> rcu_barrier(). The module should ensure that no more call_rcu() should happen
> before rcu_barrier() is called.
> 
> confused

So I made a litmus test to show that smp_mb() is needed also after the update
to length. Basically, otherwise it is possible the callback will see garbage
that the module cleanup/unload did.

C rcubarrier+ctrldep

(*
 * Result: Never
 *
 * This litmus test shows that rcu_barrier (P1) prematurely
 * returning by reading len 0 can cause issues if P0 does
 * NOT have a smb_mb() after WRITE_ONCE(len, 1).
 * mod_data == 2 means module was unloaded (so data is garbage).
 *)

{ int len = 0; int enq = 0; }

P0(int *len, int *mod_data, int *enq)
{
int r0;

WRITE_ONCE(*len, 1);
smp_mb();   /* Needed! */
WRITE_ONCE(*enq, 1);

r0 = READ_ONCE(*mod_data);
}

P1(int *len, int *mod_data, int *enq)
{
int r0;
int r1;

r1 = READ_ONCE(*enq);

// barrier Just for test purpose ("exists" clause) to force the..
// ..rcu_barrier() to see enq before len
smp_mb();   
r0 = READ_ONCE(*len);

// implicit memory barrier due to conditional */
if (r0 == 0)
WRITE_ONCE(*mod_data, 2);
}

// Did P0 read garbage?
exists (0:r0=2 /\ 1:r0=0 /\ 1:r1=1)

Re: [PATCH v9 12/15] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR

2020-10-16 Thread Kuppuswamy, Sathyanarayanan




On 10/16/20 3:29 PM, Bjorn Helgaas wrote:

[+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
begin with since you're looking at this code too.  Particularly
interested in your thoughts about whether we should be touching
PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]

This part is not very clear in ACPI spec or PCI firmware spec.

IMO, since AEPI notifies the OS about the error, then I guess
we are allowed to clear the PCI_ERR_ROOT_STATUS register
after handling the error (similar to EDR case).


On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:

[+to Jonathan]

On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:

From: Qiuxu Zhuo 

When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC.  In
some non-native cases in which there is no OS-visible device associated
with the RCiEP, there is nothing to act upon as the firmware is acting
before the OS.

Add handling for the linked RCEC in AER/ERR while taking into account
non-native cases.

Co-developed-by: Sean V Kelley 
Link: 
https://lore.kernel.org/r/20201002184735.1229220-12-seanvk@oregontracks.org
Signed-off-by: Sean V Kelley 
Signed-off-by: Qiuxu Zhuo 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Jonathan Cameron 
---
  drivers/pci/pcie/aer.c | 53 ++
  drivers/pci/pcie/err.c | 20 
  2 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..083f69b67bfd 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
   */
  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
  {
-   int aer = dev->aer_cap;
+   int type = pci_pcie_type(dev);
+   struct pci_dev *root;
+   int aer = 0;
+   int rc = 0;
u32 reg32;
-   int rc;
  
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)

"type == PCI_EXP_TYPE_RC_END"


+   /*
+* The reset should only clear the Root Error Status
+* of the RCEC. Only perform this for the
+* native case, i.e., an RCEC is present.
+*/
+   root = dev->rcec;
+   else
+   root = dev;
  
-	/* Disable Root's interrupt in response to error messages */

-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   if (root)
+   aer = dev->aer_cap;
  
-	rc = pci_bus_error_reset(dev);

-   pci_info(dev, "Root Port link has been reset\n");
+   if (aer) {
+   /* Disable Root's interrupt in response to error messages */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);

Not directly related to *this* patch, but my assumption was that in
the APEI case, the firmware should retain ownership of the AER
Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
PCI_ERR_ROOT_STATUS.

But this code appears to ignore that ownership.  Jonathan, you must
have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
Device Status errors only if OS owns AER").  Do you have any insight
about this?


-   /* Clear Root Error Status */
-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
+   /* Clear Root Error Status */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
  
-	/* Enable Root Port's interrupt in response to error messages */

-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   /* Enable Root Port's interrupt in response to error messages */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   }
+
+   if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
+   if (pcie_has_flr(root)) {
+   rc = pcie_flr(root);
+   pci_info(dev, "has been reset (%d)\n", rc);
+   }
+   } else {
+   rc = pci_bus_error_reset(root);

Don't we want "dev" for both the FLR and pci_bus_error_reset()?  I
think "root == dev" except when dev is an RCiEP.  When dev is an
RCiEP, "root" is

Re: [PATCH v2] checkpatch: add new exception to repeated word check

2020-10-16 Thread Dwaipayan Ray

On Sat, Oct 17, 2020 at 8:26 AM Joe Perches  wrote:
>
> On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches  wrote:
> > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> > > > > moved the repeated word test to check for more file types. But after
> > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > > new warnings of the type:
> > > >
> > > > Perhaps instead of adding more content checks so that
> > > > word boundaries are not something like \S but also
> > > > not punctuation so that content like
> > > >
> > > > git git://
> > > > @size size
> > > >
> > > > does not match?
> > > >
> > > >
> > > Hi,
> > > So currently the words are trimmed of non alphabets before the check:
> > >
> > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > my $first = $1;
> > > my $second = $2;
> > >
> > > where, the word_pattern is:
> > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> >
> > I'm familiar.
> >
> > > So do you perhaps recommend modifying this word pattern to
> > > include the punctuation as well rather than trimming them off?
> >
> > Not really, perhaps use the capture group position
> > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> > substr could be used to see what characters are
> > before and after the word matches.
>
> Perhaps something like:
> ---
>  scripts/checkpatch.pl | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..a65eb40a5539 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3054,15 +3054,25 @@ sub process {
>
> my $first = $1;
> my $second = $2;
> +   my $start_pos = $-[1];
> +   my $end_pos = $+[2];
>
> if ($first =~ /(?:struct|union|enum)/) {
> pos($rawline) += length($first) + 
> length($second) + 1;
> next;
> }
>
> -   next if ($first ne $second);
> +   next if (lc($first) ne lc($second));
> next if ($first eq 'long');
>
> +   my $start_char = "";
> +   my $end_char = "";
> +   $start_char = substr($rawline, $start_pos - 
> 1, 1) if ($start_pos > 0);
> +   $end_char = substr($rawline, $end_pos, 1) if 
> (length($rawline) > $end_pos);
> +
> +   next if ($start_char =~ /^\S$/);
> +   next if ($end_char !~ /^[\.\,\s]?$/);
> +
> if (WARN("REPEATED_WORD",
>  "Possible repeated word: '$first'\n" 
> . $herecurr) &&
> $fix) {
>
>

Hi Joe,
Thank you for the insight. I was also doing something similar:

---
 scripts/checkpatch.pl | 16 
 1 file changed, 16 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..82497a71ac96 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -595,6 +595,7 @@ our @mode_permission_funcs = (
 );

 my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
+my $punctuation_chars = '[,:;@\.\-]';

 #Create a search pattern for all these functions to speed up a loop below
 our $mode_perms_search = "";
@@ -3065,6 +3066,21 @@ sub process {
  next if ($first ne $second);
  next if ($first eq 'long');

+ # check for character before and after the word matches
+ my $ca_first = substr($rawline, $-[1]-1, 1);
+ my $cb_first = substr($rawline, $+[1], 1);
+ my $ca_second = substr($rawline, $-[2]-1, 1);
+ my $cb_second = substr($rawline, $+[2], 1);
+
+ if ($ca_first ne $ca_second || $cb_first ne $cb_second) {
+ if ($ca_first =~ /$punctuation_chars/ ||
+ $ca_second =~ /$punctuation_chars/ ||
+ $cb_first =~ /$punctuation_chars/ ||
+ $cb_second =~ /$punctuation_chars/) {
+ next;
+ }
+ }
+
  if (WARN("REPEATED_WORD",
  "Possible repeated word: '$first'\n" . $herecurr) &&
  $fix) {

Does it look okay to you??

Thanks,
Dwaipayan.

drivers/media/i2c/mt9t112.c:670:12: warning: stack frame size of 8344 bytes in function 'mt9t112_init_camera'

2020-10-16 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   071a0578b0ce0b0e543d1e38ee6926b9cc21c198
commit: f0fe00d4972a8cd4b98cc2c29758615e4d51cdfe security: allow using Clang's 
zero initialization for stack variables
date:   4 months ago
config: x86_64-randconfig-a012-20201017 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
efd02c1548ee458d59063f6393e94e972b5c3d50)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f0fe00d4972a8cd4b98cc2c29758615e4d51cdfe
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout f0fe00d4972a8cd4b98cc2c29758615e4d51cdfe
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/media/i2c/mt9t112.c:670:12: warning: stack frame size of 8344 bytes 
>> in function 'mt9t112_init_camera' [-Wframe-larger-than=]
   static int mt9t112_init_camera(const struct i2c_client *client)
  ^
   1 warning generated.

vim +/mt9t112_init_camera +670 drivers/media/i2c/mt9t112.c

7641b04421954ea Jacopo Mondi 2018-03-12  669  
7641b04421954ea Jacopo Mondi 2018-03-12 @670  static int 
mt9t112_init_camera(const struct i2c_client *client)
7641b04421954ea Jacopo Mondi 2018-03-12  671  {
7641b04421954ea Jacopo Mondi 2018-03-12  672int ret;
7641b04421954ea Jacopo Mondi 2018-03-12  673  
7641b04421954ea Jacopo Mondi 2018-03-12  674ECHECKER(ret, 
mt9t112_reset(client));
7641b04421954ea Jacopo Mondi 2018-03-12  675ECHECKER(ret, 
mt9t112_init_pll(client));
7641b04421954ea Jacopo Mondi 2018-03-12  676ECHECKER(ret, 
mt9t112_init_setting(client));
7641b04421954ea Jacopo Mondi 2018-03-12  677ECHECKER(ret, 
mt9t112_auto_focus_setting(client));
7641b04421954ea Jacopo Mondi 2018-03-12  678  
7641b04421954ea Jacopo Mondi 2018-03-12  679mt9t112_reg_mask_set(ret, 
client, 0x0018, 0x0004, 0);
7641b04421954ea Jacopo Mondi 2018-03-12  680  
6a26f141bf6200a Jacopo Mondi 2018-03-12  681/* Analog setting B.*/
7641b04421954ea Jacopo Mondi 2018-03-12  682mt9t112_reg_write(ret, client, 
0x3084, 0x2409);
7641b04421954ea Jacopo Mondi 2018-03-12  683mt9t112_reg_write(ret, client, 
0x3092, 0x0A49);
7641b04421954ea Jacopo Mondi 2018-03-12  684mt9t112_reg_write(ret, client, 
0x3094, 0x4949);
7641b04421954ea Jacopo Mondi 2018-03-12  685mt9t112_reg_write(ret, client, 
0x3096, 0x4950);
7641b04421954ea Jacopo Mondi 2018-03-12  686  
7641b04421954ea Jacopo Mondi 2018-03-12  687/*
6a26f141bf6200a Jacopo Mondi 2018-03-12  688 * Disable adaptive clock.
7641b04421954ea Jacopo Mondi 2018-03-12  689 * 
PRI_A_CONFIG_JPEG_OB_TX_CONTROL_VAR
7641b04421954ea Jacopo Mondi 2018-03-12  690 * 
PRI_B_CONFIG_JPEG_OB_TX_CONTROL_VAR
7641b04421954ea Jacopo Mondi 2018-03-12  691 */
7641b04421954ea Jacopo Mondi 2018-03-12  692mt9t112_mcu_write(ret, client, 
VAR(26, 160), 0x0A2E);
7641b04421954ea Jacopo Mondi 2018-03-12  693mt9t112_mcu_write(ret, client, 
VAR(27, 160), 0x0A2E);
7641b04421954ea Jacopo Mondi 2018-03-12  694  
6a26f141bf6200a Jacopo Mondi 2018-03-12  695/*
6a26f141bf6200a Jacopo Mondi 2018-03-12  696 * Configure Status in 
Status_before_length Format and enable header.
6a26f141bf6200a Jacopo Mondi 2018-03-12  697 * 
PRI_B_CONFIG_JPEG_OB_TX_CONTROL_VAR
6a26f141bf6200a Jacopo Mondi 2018-03-12  698 */
7641b04421954ea Jacopo Mondi 2018-03-12  699mt9t112_mcu_write(ret, client, 
VAR(27, 144), 0x0CB4);
7641b04421954ea Jacopo Mondi 2018-03-12  700  
6a26f141bf6200a Jacopo Mondi 2018-03-12  701/*
6a26f141bf6200a Jacopo Mondi 2018-03-12  702 * Enable JPEG in context B.
6a26f141bf6200a Jacopo Mondi 2018-03-12  703 * 
PRI_B_CONFIG_JPEG_OB_TX_CONTROL_VAR
6a26f141bf6200a Jacopo Mondi 2018-03-12  704 */
7641b04421954ea Jacopo Mondi 2018-03-12  705mt9t112_mcu_write(ret, client, 
VAR8(27, 142), 0x01);
7641b04421954ea Jacopo Mondi 2018-03-12  706  
6a26f141bf6200a Jacopo Mondi 2018-03-12  707/* Disable Dac_TXLO. */
7641b04421954ea Jacopo Mondi 2018-03-12  708mt9t112_reg_write(ret, client, 
0x316C, 0x350F);
7641b04421954ea Jacopo Mondi 2018-03-12  709  
6a26f141bf6200a Jacopo Mondi 2018-03-12  710/* Set max slew rates. */
7641b04421954ea Jacopo Mondi 2018-03-12  711mt9t112_reg_write(ret, client, 
0x1E, 0x777);
7641b04421954ea Jacopo Mondi 2018-03-12  712  
7641b04421954ea Jacopo Mondi 2018-03-12  713

Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-16 Thread Alex Belits

On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:
> On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> > > On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> > > > External Email
> > > > 
> > > > -
> > > > --
> > > > ---
> > > > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
> > > > > +/*
> > > > > + * Description of the last two tasks that ran isolated on a
> > > > > given
> > > > > CPU.
> > > > > + * This is intended only for messages about isolation
> > > > > breaking. We
> > > > > + * don't want any references to actual task while accessing
> > > > > this
> > > > > from
> > > > > + * CPU that caused isolation breaking -- we know nothing
> > > > > about
> > > > > timing
> > > > > + * and don't want to use locking or RCU.
> > > > > + */
> > > > > +struct isol_task_desc {
> > > > > + atomic_t curr_index;
> > > > > + atomic_t curr_index_wr;
> > > > > + boolwarned[2];
> > > > > + pid_t   pid[2];
> > > > > + pid_t   tgid[2];
> > > > > + charcomm[2][TASK_COMM_LEN];
> > > > > +};
> > > > > +static DEFINE_PER_CPU(struct isol_task_desc,
> > > > > isol_task_descs);
> > > > So that's quite a huge patch that would have needed to be split
> > > > up.
> > > > Especially this tracing engine.
> > > > 
> > > > Speaking of which, I agree with Thomas that it's unnecessary.
> > > > It's
> > > > too much
> > > > code and complexity. We can use the existing trace events and
> > > > perform
> > > > the
> > > > analysis from userspace to find the source of the disturbance.
> > > The idea behind this is that isolation breaking events are
> > > supposed to
> > > be known to the applications while applications run normally, and
> > > they
> > > should not require any analysis or human intervention to be
> > > handled.
> > Sure but you can use trace events for that. Just trace interrupts,
> > workqueues,
> > timers, syscalls, exceptions and scheduler events and you get all
> > the local
> > disturbance. You might want to tune a few filters but that's pretty
> > much it.
> > 
> > As for the source of the disturbances, if you really need that
> > information,
> > you can trace the workqueue and timer queue events and just filter
> > those that
> > target your isolated CPUs.
> > 
> 
> I agree that we can do all those things with tracing.
> However, IMHO having a simplified logging mechanism to gather the
> source of
> violation may help in reducing the manual effort.
> 
> Although, I am not sure how easy will it be to maintain such an
> interface
> over time.

I think that the goal of "finding source of disturbance" interface is
different from what can be accomplished by tracing in two ways:

1. "Source of disturbance" should provide some useful information about
category of event and it cause as opposed to determining all precise
details about things being called that resulted or could result in
disturbance. It should not depend on the user's knowledge about details
of implementations, it should provide some definite answer of what
happened (with whatever amount of details can be given in a generic
mechanism) even if the user has no idea how those things happen and
what part of kernel is responsible for either causing or processing
them. Then if the user needs further details, they can be obtained with
tracing.

2. It should be usable as a runtime error handling mechanism, so the
information it provides should be suitable for application use and
logging. It should be usable when applications are running on a system
in production, and no specific tracing or monitoring mechanism can be
in use. If, say, thousands of devices are controlling neutrino
detectors on an ocean floor, and in a month of work one of them got one
isolation breaking event, it should be able to report that isolation
was broken by an interrupt from a network interface, so the users will
be able to track it down to some userspace application reconfiguring
those interrupts.

It will be a good idea to make such mechanism optional and suitable for
tracking things on conditions other than "always enabled" and "enabled
with task isolation". However in my opinion, there should be something
in kernel entry procedure that, if enabled, prepared something to be
filled by the cause data, and we know at least one such situation when
this kernel entry procedure should be triggered -- when task isolation
is on.

-- 
Alex

Re: [PATCH v4 3/3] clk: qcom: lpasscc-sc7180: Re-configure the PLL in case lost

2020-10-16 Thread Doug Anderson

Hi,

On Fri, Oct 16, 2020 at 7:01 PM Stephen Boyd  wrote:
>
> Quoting Stephen Boyd (2020-10-15 20:16:27)
> > Quoting Douglas Anderson (2020-10-14 17:13:29)
> > > From: Taniya Das 
> > >
> > > In the case where the PLL configuration is lost, then the pm runtime
> > > resume will reconfigure before usage.
> >
> > Taniya, this commit needs a lot more describing than one sentence. I see
> > that the PLL's L value is reset at boot, but only once. That seems to be
> > because the bootloader I have doesn't set bit 11 for the RETAIN_FF bit
> > on the lpass_core_hm_gdsc. Once the gdsc is turned off the first time,
> > the PLL settings are lost and the L val is reset to 0. That makes sense
> > because RETAIN_FF isn't set. This also means the other register writes
> > during probe are lost during the first suspend of the lpass core clk
> > controller. Then when the GDSC is turned on the next time for this clk
> > controller  being runtime resumed we will set the retain bit and then
> > configure the PLL again. BTW, I see that runtime PM is called for this
> > clk controller for all the clk operations. Maybe there should be some
> > auto suspend timeout so that we're not toggling the gdsc constantly?
> >
> > I hacked up the GDSC code to set the bit at gdsc registration time and
> > it seems to fix the problem I'm seeing (i.e. that the PLL is stuck,
> > which should also be in the commit text here). When I try to set the bit
> > in the bootloader though my kernel won't boot. I guess something is
> > hanging the system if I enable the retain bit in the GDSC?
> >
>
> After hacking on this for some time it looks like we can apply this
> patch instead and things are good. The first two patches in this series
> look mostly good to me minus some nitpicks so please resend.

By this you mean the two newlines you mentioned on
, right?  I think all the rest of your
comments were on patch #3 (this patch) which I think we're dropping.

I'm happy to repost a v5 of just patches #1 and #2 with the newlines
fixed next week, or I'm happy if you want to fix them when applying as
you alluded to on the Chrome OS gerrit.  Just let me know.  I just
want to make sure I'm not missing some other nits before I post the
v5.  ;-)

-Doug

Re: [PATCH 1/1] net: ftgmac100: add handling of mdio/phy nodes for ast2400/2500

2020-10-16 Thread Andrew Lunn

> + if (priv->is_aspeed &&
> + phy_intf != PHY_INTERFACE_MODE_RMII &&
> + phy_intf != PHY_INTERFACE_MODE_RGMII &&
> + phy_intf != PHY_INTERFACE_MODE_RGMII_ID &&
> + phy_intf != PHY_INTERFACE_MODE_RGMII_RXID &&
> + phy_intf != PHY_INTERFACE_MODE_RGMII_TXID) {


phy_interface_mode_is_rgmii()

Andrew

Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-16 Thread Alex Belits


On Tue, 2020-10-06 at 12:35 +0200, Frederic Weisbecker wrote:
> On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote:
> > On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> > > 
> > > > The idea behind this is that isolation breaking events are
> > > > supposed to
> > > > be known to the applications while applications run normally,
> > > > and they
> > > > should not require any analysis or human intervention to be
> > > > handled.
> > > Sure but you can use trace events for that. Just trace
> > > interrupts, workqueues,
> > > timers, syscalls, exceptions and scheduler events and you get all
> > > the local
> > > disturbance. You might want to tune a few filters but that's
> > > pretty much it.
> 
> formation,
> > > you can trace the workqueue and timer queue events and just
> > > filter those that
> > > target your isolated CPUs.
> > > 
> > 
> > I agree that we can do all those things with tracing.
> > However, IMHO having a simplified logging mechanism to gather the
> > source of
> > violation may help in reducing the manual effort.
> > 
> > Although, I am not sure how easy will it be to maintain such an
> > interface
> > over time.
> 
> The thing is: tracing is your simplified logging mechanism here. You
> can achieve
> the same in userspace with _way_ less code, no race, and you can do
> it in
> bash.

The idea is that this mechanism should be usable when no one is there
to run things in bash, or no information about what might happen. It
should be able to report rare events in production when users may not
be able to reproduce them.

-- 
Alex

Re: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API

2020-10-16 Thread Ira Weiny

On Fri, Oct 16, 2020 at 01:07:47PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 12:42:54PM -0700, ira.we...@intel.com wrote:
> > +static inline void pks_update_protection(int pkey, unsigned long 
> > protection)
> > +{
> > +   current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
> > +pkey, protection);
> > +   preempt_disable();
> > +   write_pkrs(current->thread.saved_pkrs);
> > +   preempt_enable();
> > +}
> 
> write_pkrs() already disables preemption itself. Wrapping it in yet
> another layer is useless.

I was thinking the update to saved_pkrs needed this protection as well and that
was to be included in the preemption disable.  But that too is incorrect.

I've removed this preemption disable.

Thanks,
Ira

Re: [PATCH] mm: fix potential pte_unmap_unlock pte error

2020-10-16 Thread Shijie Luo


On 2020/10/16 22:05, osalva...@suse.de wrote:

On 2020-10-16 15:42, Michal Hocko wrote:

OK, I finally managed to convince my friday brain to think and grasped
what the code is intended to do. The loop is hairy and we want to
prevent from spurious EIO when all the pages are on a proper node. So
the check has to be done inside the loop. Anyway I would find the
following fix less error prone and easier to follow
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index eddbe4e56c73..8cc1fc9c4d13 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *pmd,
unsigned long addr,
 unsigned long flags = qp->flags;
 int ret;
 bool has_unmovable = false;
-    pte_t *pte;
+    pte_t *pte, *mapped_pte;
 spinlock_t *ptl;

 ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *pmd,
unsigned long addr,
 if (pmd_trans_unstable(pmd))
 return 0;

-    pte = pte_offset_map_lock(walk->mm, pmd, addr, );
+    mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, );
 for (; addr != end; pte++, addr += PAGE_SIZE) {
 if (!pte_present(*pte))
 continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *pmd,
unsigned long addr,
 } else
 break;
 }
-    pte_unmap_unlock(pte - 1, ptl);
+    pte_unmap_unlock(mapped_pte, ptl);
 cond_resched();

 if (has_unmovable)


It is more clear to grasp, definitely.

Yeah, this one is more comprehensible, I 'll send a v2 patch, thank you.

Re: [PATCH v3] checkpatch: add new exception to repeated word check

2020-10-16 Thread Joe Perches

On Sat, 2020-10-17 at 10:52 +0530, Dwaipayan Ray wrote:
> Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
> moved the repeated word test to check for more file types. But after
> this, if checkpatch.pl is run on MAINTAINERS, it generates several
> new warnings of the type:
> 
> WARNING: Possible repeated word: 'git'
> 
> For example:
> WARNING: Possible repeated word: 'git'
> +T:   git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git
> 
> So, the pattern "git git://..." is a false positive in this case.
> 
> There are several other combinations which may produce a wrong
> warning message, such as "@size size", ":Begin begin", etc.
> 
> Extend repeated word check to compare the characters before and
> after the word matches. If the preceding or succeeding character
> belongs to the exception list, the warning is avoided.
> 
> Link: 
> https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.ca...@perches.com/
> Suggested-by: Joe Perches 
> Suggested-by: Lukas Bulwahn 
> Signed-off-by: Dwaipayan Ray 
> ---
>  scripts/checkpatch.pl | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index f1a4e61917eb..89430dfd6652 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -595,6 +595,7 @@ our @mode_permission_funcs = (
>  );
>  
>  my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> +my $exclude_chars = '[^\.\,\+\s]';

Why include a + character here?

>  #Create a search pattern for all these functions to speed up a loop below
>  our $mode_perms_search = "";
> @@ -3056,15 +3057,27 @@ sub process {
>  
>   my $first = $1;
>   my $second = $2;
> -
> + my $start_pos = $-[1];
> + my $end_pos = $+[2];
>   if ($first =~ /(?:struct|union|enum)/) {
>   pos($rawline) += length($first) + 
> length($second) + 1;
>   next;
>   }
>  
> - next if ($first ne $second);
> + next if (lc($first) ne lc($second));
>   next if ($first eq 'long');
>  
> + # check for character before and after the word 
> matches
> + my $start_char = '';
> + my $end_char = '';
> + $start_char = substr($rawline, $start_pos - 1, 
> 1) if ($start_pos > 0);
> + $end_char = substr($rawline, $end_pos, 1) if 
> ($end_pos <= length($rawline));


substr uses index 0, so I believe the if should be < 

> +
> + if ($start_char =~ /^$exclude_chars$/ ||
> + $end_char =~ /^$exclude_chars$/) {
> + next;
> + }
 
Please use "next if (test);" to be similar to the other uses above.

And this doesn't work on end of phrase or sentence.

ie: "my sentence is is, a duplicate word word."

so $end_char could be a comma or a period.

so likely the $end_char test should be !~

What is the reason to add and use $exclude_chars?

Re: 5.9.0-next-20201015: autofs oops in update-binfmts

2020-10-16 Thread Ian Kent

On Fri, 2020-10-16 at 14:35 +0200, Pavel Machek wrote:
> Hi!
> 
> I'm getting this during boot: 32-bit thinkpad x60.

This is very odd.

The change in next is essentially a revert of a change, maybe I'm
missing something and the revert isn't quite a revert. Although
there was one difference.

I'll check for other revert differences too.

Are you in a position to check a kernel without the 5.9 change
if I send you a patch?

And we should check if that difference to what was originally
there is the source of the problem, so probably two things to
follow up on, reverting that small difference first would be
the way to go.

Are you able to reliably reproduce it?

> 
> [   10.718377] BUG: kernel NULL pointer dereference, address:
> 
> [   10.721848] #PF: supervisor read access in kernel mode
> [   10.722763] #PF: error_code(0x) - not-present page
> [   10.726759] *pdpt = 0339e001 *pde =  
> [   10.730793] Oops:  [#1] PREEMPT SMP PTI
> [   10.736201] CPU: 1 PID: 2762 Comm: update-binfmts Not tainted
> 5.9.0-next-20201015+ #152
> [   10.738769] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW
> (2.19 ) 03/31/2011
> [   10.742769] EIP: __kernel_write+0xd4/0x230
> [   10.746769] Code: 89 d6 64 8b 15 b4 77 4c c5 8b 8a 38 0b 00 00 31
> d2 85 c9 74 04 0f b7 51 30 66 89 75 e8 8b 75 ac 8d 4d b0 89 45 e4 66
> 89 55 ea <8b> 06 8b 56 04 57 6a 01 89 45 d4 8d 45 b8 89 55 d8 ba 01
> 00 00 00
> [   10.758762] EAX: 0002 EBX: c1922a40 ECX: c33cdad0 EDX:
> 
> [   10.762791] ESI:  EDI: 012c EBP: c33cdb20 ESP:
> c33cdacc
> [   10.766766] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS:
> 00010286
> [   10.770762] CR0: 80050033 CR2:  CR3: 033d CR4:
> 06b0
> [   10.770762] Call Trace:
> [   10.770762]  ? __mutex_unlock_slowpath+0x2b/0x2c0
> [   10.770762]  ? dma_direct_map_sg+0x13a/0x320
> [   10.770762]  autofs_notify_daemon+0x14d/0x2b0
> [   10.770762]  autofs_wait+0x4cd/0x770
> [   10.793051]  ? autofs_d_automount+0xd6/0x1e0
> [   10.793051]  autofs_mount_wait+0x43/0xe0
> [   10.797808]  autofs_d_automount+0xdf/0x1e0
> [   10.797808]  __traverse_mounts+0x85/0x200
> [   10.797808]  step_into+0x368/0x620
> [   10.797808]  ? proc_setup_thread_self+0x110/0x110
> [   10.797808]  walk_component+0x58/0x190
> [   10.811838]  link_path_walk.part.0+0x245/0x360
> [   10.811838]  path_lookupat.isra.0+0x31/0x130
> [   10.811838]  filename_lookup+0x8d/0x130
> [   10.818749]  ? cache_alloc_debugcheck_after+0x151/0x180
> [   10.818749]  ? getname_flags+0x1f/0x160
> [   10.818749]  ? kmem_cache_alloc+0x75/0x100
> [   10.818749]  user_path_at_empty+0x25/0x30
> [   10.818749]  vfs_statx+0x63/0x100
> [   10.831022]  ? _raw_spin_unlock+0x18/0x30
> [   10.831022]  ? replace_page_cache_page+0x160/0x160
> [   10.831022]  __do_sys_stat64+0x36/0x60
> [   10.831022]  ? exit_to_user_mode_prepare+0x35/0xe0
> [   10.831022]  ? irqentry_exit_to_user_mode+0x8/0x20
> [   10.838773]  ? irqentry_exit+0x55/0x70
> [   10.838773]  ? exc_page_fault+0x228/0x3c0
> [   10.838773]  __ia32_sys_stat64+0xd/0x10
> [   10.838773]  do_int80_syscall_32+0x2c/0x40
> [   10.848561]  entry_INT80_32+0x111/0x111
> [   10.848561] EIP: 0xb7ee2092
> [   10.848561] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30
> 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00
> 00 cd 80  8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d
> b4 26 00
> [   10.848561] EAX: ffda EBX: 00468490 ECX: bfbce6ec EDX:
> 00467348
> [   10.848561] ESI:  EDI: 00468490 EBP: bfbce6ec ESP:
> bfbce6c4
> [   10.848561] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS:
> 0292
> [   10.848561] Modules linked in:
> [   10.848561] CR2: 
> [   10.851552] ---[ end trace d01bd7323c2317a5 ]---
> [   10.851558] EIP: __kernel_write+0xd4/0x230
> [   10.851561] Code: 89 d6 64 8b 15 b4 77 4c c5 8b 8a 38 0b 00 00 31
> d2 85 c9 74 04 0f b7 51 30 66 89 75 e8 8b 75 ac 8d 4d b0 89 45 e4 66
> 89 55 ea <8b> 06 8b 56 04 57 6a 01 89 45 d4 8d 45 b8 89 55 d8 ba 01
> 00 00 00
> [   10.851563] EAX: 0002 EBX: c1922a40 ECX: c33cdad0 EDX:
> 
> [   10.851565] ESI:  EDI: 012c EBP: c33cdb20 ESP:
> c33cdacc
> [   10.851568] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS:
> 00010286
> [   10.851570] CR0: 80050033 CR2: 004a700e CR3: 033d CR4:
> 06b0
> [   11.803128] systemd-journald[2514]: Received request to flush
> runtime journal from PID 1
> [   26.113941] iwl3945 :03:00.0: loaded firmware version
> 15.32.2.9
> [   59.809322] traps: clock-applet[3636] trap int3 ip:b724ffc0
> sp:bf879b90 error:0 in libglib-2.0.so.0.5000.3[b7203000+12a000]
> [   59.812036] traps: mateweather-app[3638] trap int3 ip:b7283fc0
> sp:bfb65760 error:0 in libglib-2.0.so.0.5000.3[b7237000+12a000]
> [   64.628401] wlan0: authenticate with 5c:f4:ab:10:d2:bb
> 
-- 
Ian Kent

drivers/dma/ppc4xx/adma.c:1188:34: sparse: sparse: incorrect type in argument 1 (different address spaces)

2020-10-16 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   071a0578b0ce0b0e543d1e38ee6926b9cc21c198
commit: 8f28ca6bd8211214faf717677bbffe375c2a6072 iomap: constify ioreadX() 
iomem argument (as in generic implementation)
date:   9 weeks ago
config: powerpc-randconfig-s032-20201017 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-rc1-2-g368fd9ce-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f28ca6bd8211214faf717677bbffe375c2a6072
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 8f28ca6bd8211214faf717677bbffe375c2a6072
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"sparse warnings: (new ones prefixed by >>)"
   drivers/dma/ppc4xx/adma.c:73:1: sparse: sparse: symbol 
'ppc440spe_adma_chan_list' was not declared. Should it be static?
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:140:17: sparse: sparse: cast to restricted __le32
   drivers/dma/ppc4xx/adma.c:543:35: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected unsigned int [usertype] sg1l 
@@ got restricted __le32 [usertype] @@
   drivers/dma/ppc4xx/adma.c:543:35: sparse: expected unsigned int

Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-16 Thread Nicolin Chen

On Fri, Oct 16, 2020 at 03:10:26PM +0100, Robin Murphy wrote:
> On 2020-10-16 04:53, Nicolin Chen wrote:
> > On Thu, Oct 15, 2020 at 10:55:52AM +0100, Robin Murphy wrote:
> > > On 2020-10-15 05:13, Nicolin Chen wrote:
> > > > On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote:
> > > > > On 2020-10-09 17:19, Nicolin Chen wrote:
> > > > > > This patch simply adds support for PCI devices.
> > > > > > 
> > > > > > Reviewed-by: Dmitry Osipenko 
> > > > > > Tested-by: Dmitry Osipenko 
> > > > > > Signed-off-by: Nicolin Chen 
> > > > > > ---
> > > > > > 
> > > > > > Changelog
> > > > > > v6->v7
> > > > > > * Renamed goto labels, suggested by Thierry.
> > > > > > v5->v6
> > > > > > * Added Dmitry's Reviewed-by and Tested-by.
> > > > > > v4->v5
> > > > > > * Added Dmitry's Reviewed-by
> > > > > > v3->v4
> > > > > > * Dropped !iommu_present() check
> > > > > > * Added CONFIG_PCI check in the exit path
> > > > > > v2->v3
> > > > > > * Replaced ternary conditional operator with if-else in 
> > > > > > .device_group()
> > > > > > * Dropped change in tegra_smmu_remove()
> > > > > > v1->v2
> > > > > > * Added error-out labels in tegra_smmu_probe()
> > > > > > * Dropped pci_request_acs() since IOMMU core would call it.
> > > > > > 
> > > > > > drivers/iommu/tegra-smmu.c | 35 
> > > > > > +--
> > > > > > 1 file changed, 25 insertions(+), 10 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> > > > > > index be29f5977145..2941d6459076 100644
> > > > > > --- a/drivers/iommu/tegra-smmu.c
> > > > > > +++ b/drivers/iommu/tegra-smmu.c
> > > > > > @@ -10,6 +10,7 @@
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > +#include 
> > > > > > #include 
> > > > > > #include 
> > > > > > #include 
> > > > > > @@ -865,7 +866,11 @@ static struct iommu_group 
> > > > > > *tegra_smmu_device_group(struct device *dev)
> > > > > > group->smmu = smmu;
> > > > > > group->soc = soc;
> > > > > > -   group->group = iommu_group_alloc();
> > > > > > +   if (dev_is_pci(dev))
> > > > > > +   group->group = pci_device_group(dev);
> > > > > 
> > > > > Just to check, is it OK to have two or more swgroups "owning" the same
> > > > > iommu_group if an existing one gets returned here? It looks like that 
> > > > > might
> > > > > not play nice with the use of iommu_group_set_iommudata().
> > > > 
> > > > Do you mean by "gets returned here" the "IS_ERR" check below?
> > > 
> > > I mean that unlike iommu_group_alloc()/generic_device_group(),
> > > pci_device_group() may give you back a group that already contains another
> > > device and has already been set up from that device's perspective. This 
> > > can
> > > happen for topological reasons like requester ID aliasing through a 
> > > PCI-PCIe
> > > bridge or lack of isolation between functions.
> > 
> > Okay..but we don't really have two swgroups owning the same groups
> > in case of PCI devices. For Tegra210, all PCI devices inherit the
> > same swgroup from the PCI controller. And I'd think previous chips
> > do the same. The only use case currently of 2+ swgroups owning the
> > same iommu_group is for display controller.
> > 
> > Or do you suggest we need an additional check for pci_device_group?
> 
> Ah, OK - I still don't have the best comprehension of what exactly swgroups

The "swgroup" stands for "software group", which associates with
an ASID (Address Space Identifier) for SMMU to determine whether
this client is going through SMMU translation or not.

> are, and the path through .of_xlate looked like you might be using the PCI
> requester ID as the swgroup identifier, but I guess that ultimately depends
> on what your "iommu-map" is supposed to look like. If pci_device_group()

This is copied from pcie node in our downstream dtsi file:

iommus = < TEGRA_SWGROUP_AFI>;
iommu-map = <0x0  TEGRA_SWGROUP_AFI 0x1000>;
iommu-map-mask = <0x0>;

> will effectively only ever get called once regardless of how many endpoints
> exist, then indeed this won't be a concern (although if that's *guaranteed*
> to be the case then you may as well just stick with calling
> iommu_group_alloc() directly). Thanks for clarifying.

All PCI devices are supposed to get this swgroup of the pcie node
above. So the function will return the existing group of the pcie
controller, before giving a chance to call iommu_group_alloc().

Thanks for the review and input.

[PATCH] skd_main: remove unused including

2020-10-16 Thread Tian Tao

Remove including  that don't need it.

Signed-off-by: Tian Tao 
---
 drivers/block/skd_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index ae6454c..a962b45 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.7.4

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn

On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh  wrote:
> On 16 Oct 2020, at 21:02, Jann Horn wrote:
> > On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
> > But in userspace, we just need a simple counter. There's no need for
> > us to worry about anything else, like timestamps or whatever. If we
> > repeatedly fork a paused VM, the forked VMs will see the same counter
> > value, but that's totally fine, because the only thing that matters to
> > userspace is that the counter changes when the VM is forked.
>
> For user-space, even a single bit would do. We added MADVISE_WIPEONFORK
> so that userspace libraries can detect fork()/clone() robustly, for the
> same reasons. It just wipes a page as the indicator, which is
> effectively a single-bit signal, and it works well. On the user-space
> side of this, I’m keen to find a solution like that that we can use
> fairly easily inside of portable libraries and applications. The “have
> I forked” checks do end up in hot paths, so it’s nice if they can be
> CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my
> favorite.

I'm pretty sure a single bit is not enough if you want to have a
single page, shared across the entire system, that stores the VM
forking state; you need a counter for that.

> > And actually, since the value is a cryptographically random 128-bit
> > value, I think that we should definitely use it to help reseed the
> > kernel's RNG, and keep it secret from userspace. That way, even if the
> > VM image is public, we can ensure that going forward, the kernel RNG
> > will return securely random data.
>
> If the image is public, you need some extra new raw entropy from
> somewhere. The gen-id could be mixed in, that can’t do any harm as
> long as rigorous cryptographic mixing with the prior state is used, but
> if that’s all you do then the final state is still deterministic and
> non-secret.

Microsoft's documentation
(http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
Generation ID that we get after a fork "is a 128-bit,
cryptographically random integer value". If multiple people use the
same image, it guarantees that each use of the image gets its own,
fresh ID: The table in section "How to implement virtual machine
generation ID support in a virtualization platform" says that (among
other things) "Virtual machine is imported, copied, or cloned"
generates a new generation ID.

So the RNG state after mixing in the new VM Generation ID would
contain 128 bits of secret entropy not known to anyone else, including
people with access to the VM image.

Now, 128 bits of cryptographically random data aren't _optimal_; I
think something on the order of 256 bits would be nicer from a
theoretical standpoint. But in practice I think we'll be good with the
128 bits we're getting (since the number of users who fork a VM image
is probably not going to be so large that worst-case collision
probabilities matter).

> The kernel would need to use the change as a trigger to
> measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just
> define the machine contract as “this has to be unique random data and
> if it’s not unique, or if it’s pubic, you’re toast”.

As far as I can tell from Microsoft's spec, that is a guarantee we're
already getting.

Re: [PATCH 1/2] tools/include: Update if_link.h and netlink.h

2020-10-16 Thread Jesse Brandeburg

Jakub Kicinski wrote:

> On Fri, 16 Oct 2020 14:23:48 -0700 Jesse Brandeburg wrote:
> > > These are tested to be the latest as part of the tools/lib/bpf build.  
> > 
> > But you didn't mention why you're making these changes, and you're
> > removing a lot of comments without explaining why/where there might be
> > a replacement or why the comments are useless. I now see that you're
> > adding actual kdoc which is good, except for the part where
> > you don't put kdoc on all the structures.
> 
> Note that he's just syncing the uAPI headers to tools/
> 
> The source of the change is here:
> 
> 78a3ea555713 ("net: remove comments on struct rtnl_link_stats")
> 0db0c34cfbc9 ("net: tighten the definition of interface statistics")


Thanks Kuba, I'm not trying to be a hard ass, but the commit message
didn't say why he's making the change, and if I bisect back to this
and see "sync" as the commit message, I think I'd be stuck chasing
"sync to what?"

I guess that his changelog could just say what you said?

Proposed:
Sync the uAPI headers so that userspace and the kernel match. These
changes match the updates to the files in the tools directory that were
already updated by commits:
78a3ea555713 ("net: remove comments on struct rtnl_link_stats")
0db0c34cfbc9 ("net: tighten the definition of interface statistics")

Re: [PATCH RFC V3 4/9] x86/pks: Preserve the PKRS MSR on context switch

2020-10-16 Thread Ira Weiny

On Fri, Oct 16, 2020 at 01:06:36PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 12:42:53PM -0700, ira.we...@intel.com wrote:
> 
> > @@ -644,6 +663,8 @@ void __switch_to_xtra(struct task_struct *prev_p, 
> > struct task_struct *next_p)
> >  
> > if ((tifp ^ tifn) & _TIF_SLD)
> > switch_to_sld(tifn);
> > +
> > +   pks_sched_in();
> >  }
> >  
> 
> You seem to have lost the comment proposed here:
> 
>   
> https://lkml.kernel.org/r/20200717083140.gw10...@hirez.programming.kicks-ass.net
> 
> It is useful and important information that the wrmsr normally doesn't
> happen.

Added back in here.

> 
> > diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
> > index 3cf8f775f36d..30f65dd3d0c5 100644
> > --- a/arch/x86/mm/pkeys.c
> > +++ b/arch/x86/mm/pkeys.c
> > @@ -229,3 +229,31 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int 
> > flags)
> >  
> > return pk_reg;
> >  }
> > +
> > +DEFINE_PER_CPU(u32, pkrs_cache);
> > +
> > +/**
> > + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not
> > + * serializing but still maintains ordering properties similar to WRPKRU.
> > + * The current SDM section on PKRS needs updating but should be the same as
> > + * that of WRPKRU.  So to quote from the WRPKRU text:
> > + *
> > + * WRPKRU will never execute transiently. Memory accesses
> > + * affected by PKRU register will not execute (even transiently)
> > + * until all prior executions of WRPKRU have completed execution
> > + * and updated the PKRU register.
> 
> (whitespace damage; space followed by tabstop)

Fixed thanks.

> 
> > + */
> > +void write_pkrs(u32 new_pkrs)
> > +{
> > +   u32 *pkrs;
> > +
> > +   if (!static_cpu_has(X86_FEATURE_PKS))
> > +   return;
> > +
> > +   pkrs = get_cpu_ptr(_cache);
> > +   if (*pkrs != new_pkrs) {
> > +   *pkrs = new_pkrs;
> > +   wrmsrl(MSR_IA32_PKRS, new_pkrs);
> > +   }
> > +   put_cpu_ptr(pkrs);
> > +}
> 
> looks familiar that... :-)

Added you as a co-developer if that is ok?

Ira

Re: [PATCH v4 3/3] clk: qcom: lpasscc-sc7180: Re-configure the PLL in case lost

2020-10-16 Thread Stephen Boyd

Quoting Stephen Boyd (2020-10-15 20:16:27)
> Quoting Douglas Anderson (2020-10-14 17:13:29)
> > From: Taniya Das 
> > 
> > In the case where the PLL configuration is lost, then the pm runtime
> > resume will reconfigure before usage.
> 
> Taniya, this commit needs a lot more describing than one sentence. I see
> that the PLL's L value is reset at boot, but only once. That seems to be
> because the bootloader I have doesn't set bit 11 for the RETAIN_FF bit
> on the lpass_core_hm_gdsc. Once the gdsc is turned off the first time,
> the PLL settings are lost and the L val is reset to 0. That makes sense
> because RETAIN_FF isn't set. This also means the other register writes
> during probe are lost during the first suspend of the lpass core clk
> controller. Then when the GDSC is turned on the next time for this clk
> controller  being runtime resumed we will set the retain bit and then
> configure the PLL again. BTW, I see that runtime PM is called for this
> clk controller for all the clk operations. Maybe there should be some
> auto suspend timeout so that we're not toggling the gdsc constantly?
> 
> I hacked up the GDSC code to set the bit at gdsc registration time and
> it seems to fix the problem I'm seeing (i.e. that the PLL is stuck,
> which should also be in the commit text here). When I try to set the bit
> in the bootloader though my kernel won't boot. I guess something is
> hanging the system if I enable the retain bit in the GDSC?
> 

After hacking on this for some time it looks like we can apply this
patch instead and things are good. The first two patches in this series
look mostly good to me minus some nitpicks so please resend.

---8<---
diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index 99834564bcc2..508c2901abfa 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -343,6 +343,14 @@ static int gdsc_init(struct gdsc *sc)
if ((sc->flags & VOTABLE) && on)
gdsc_enable(>pd);
 
+   /*
+* Make sure the retain bit is set if the GDSC is already on, otherwise
+* we end up turning off the GDSC and destroying all the register
+* contents that we thought we were saving.
+*/
+   if ((sc->flags & RETAIN_FF_ENABLE) && on)
+   gdsc_retain_ff_on(sc);
+
/* If ALWAYS_ON GDSCs are not ON, turn them ON */
if (sc->flags & ALWAYS_ON) {
if (!on)

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Willy Tarreau

On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> Microsoft's documentation
> (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> Generation ID that we get after a fork "is a 128-bit,
> cryptographically random integer value". If multiple people use the
> same image, it guarantees that each use of the image gets its own,
> fresh ID:

No. It cannot be more unique than the source that feeds that cryptographic
transformation. All it guarantees is that the entropy source is protected
from being guessed based on the output. Applying cryptography on a simple
counter provides apparently random numbers that will be unique for a long
period for the same source, but as soon as you duplicate that code between
users and they start from the same counter they'll get the same IDs.

This is why I think that using a counter is better if you really need something
unique. Randoms only reduce predictability which helps avoiding collisions.
And I'm saying this as someone who had on his external gateway the same SSH
host key as 89 other hosts on the net, each of them using randoms to provide
a universally unique one...

Willy

[PATCH] clk: qcom: gdsc: Keep RETAIN_FF bit set if gdsc is already on

2020-10-16 Thread Stephen Boyd

If the GDSC is enabled out of boot but doesn't have the retain ff bit
set we will get confusing results where the registers that are powered
by the GDSC lose their contents on the first power off of the GDSC but
thereafter they retain their contents. This is because gdsc_init() fails
to make sure the RETAIN_FF bit is set when it probes the GDSC the first
time and thus powering off the GDSC causes the register contents to be
reset. We do set the RETAIN_FF bit the next time we power on the GDSC,
see gdsc_enable(), so that subsequent GDSC power off's don't lose
register contents state.

Forcibly set the bit at device probe time so that the kernel's assumed
view of the GDSC is consistent with the state of the hardware. This
fixes a problem where the audio PLL doesn't work on sc7180 when the
bootloader leaves the lpass_core_hm GDSC enabled at boot (e.g. to make a
noise) but critically doesn't set the RETAIN_FF bit.

Cc: Douglas Anderson 
Cc: Taniya Das 
Cc: Rajendra Nayak 
Fixes: 173722995cdb ("clk: qcom: gdsc: Add support to enable retention of 
GSDCR")
Signed-off-by: Stephen Boyd 
---
 drivers/clk/qcom/gdsc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index bfc4ac02f9ea..af26e0695b86 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -358,6 +358,14 @@ static int gdsc_init(struct gdsc *sc)
if ((sc->flags & VOTABLE) && on)
gdsc_enable(>pd);
 
+   /*
+* Make sure the retain bit is set if the GDSC is already on, otherwise
+* we end up turning off the GDSC and destroying all the register
+* contents that we thought we were saving.
+*/
+   if ((sc->flags & RETAIN_FF_ENABLE) && on)
+   gdsc_retain_ff_on(sc);
+
/* If ALWAYS_ON GDSCs are not ON, turn them ON */
if (sc->flags & ALWAYS_ON) {
if (!on)

base-commit: 9ff9b0d392ea08090cd1780fb196f36dbb586529
-- 
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/

Re: [PATCH RFC V3 4/9] x86/pks: Preserve the PKRS MSR on context switch

2020-10-16 Thread Ira Weiny

On Fri, Oct 16, 2020 at 01:12:26PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2020 at 11:31:45AM -0700, Dave Hansen wrote:
> > > +/**
> > > + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is 
> > > not
> > > + * serializing but still maintains ordering properties similar to WRPKRU.
> > > + * The current SDM section on PKRS needs updating but should be the same 
> > > as
> > > + * that of WRPKRU.  So to quote from the WRPKRU text:
> > > + *
> > > + *   WRPKRU will never execute transiently. Memory accesses
> > > + *   affected by PKRU register will not execute (even transiently)
> > > + *   until all prior executions of WRPKRU have completed execution
> > > + *   and updated the PKRU register.
> > > + */
> > > +void write_pkrs(u32 new_pkrs)
> > > +{
> > > + u32 *pkrs;
> > > +
> > > + if (!static_cpu_has(X86_FEATURE_PKS))
> > > + return;
> > > +
> > > + pkrs = get_cpu_ptr(_cache);
> > > + if (*pkrs != new_pkrs) {
> > > + *pkrs = new_pkrs;
> > > + wrmsrl(MSR_IA32_PKRS, new_pkrs);
> > > + }
> > > + put_cpu_ptr(pkrs);
> > > +}
> > > 
> > 
> > It bugs me a *bit* that this is being called in a preempt-disabled
> > region, but we still bother with the get/put_cpu jazz.  Are there other
> > future call-sites for this that aren't in preempt-disabled regions?
> 
> So the previous version had a useful comment that got lost.

Ok Looking back I see what happened...  This comment...

 /*
  * PKRS is only temporarily changed during specific code paths.
  * Only a preemption during these windows away from the default
  * value would require updating the MSR.
  */

... was added to pks_sched_in() but that got simplified down because cleaning
up write_pkrs() made the code there obsolete.

> This stuff
> needs to fundamentally be preempt disabled,

I agree, the update to the percpu cache value and MSR can not be torn.

> so it either needs to
> explicitly do so, or have an assertion that preemption is indeed
> disabled.

However, I don't think I understand clearly.  Doesn't [get|put]_cpu_ptr()
handle the preempt_disable() for us?  Is it not sufficient to rely on that?

Dave's comment seems to be the opposite where we need to eliminate preempt
disable before calling write_pkrs().

FWIW I think I'm mistaken in my response to Dave regarding the
preempt_disable() in pks_update_protection().

Ira

[PATCH] nds32: Fix a broken copyright header in gen_vdso_offsets.sh

2020-10-16 Thread Palmer Dabbelt

From: Palmer Dabbelt 

I was going to copy this but I didn't want to chase around the build
system stuff so I did it a different way.

Signed-off-by: Palmer Dabbelt 
---
 arch/nds32/kernel/vdso/gen_vdso_offsets.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/nds32/kernel/vdso/gen_vdso_offsets.sh 
b/arch/nds32/kernel/vdso/gen_vdso_offsets.sh
index 01924ff071ad..5b329aed3501 100755
--- a/arch/nds32/kernel/vdso/gen_vdso_offsets.sh
+++ b/arch/nds32/kernel/vdso/gen_vdso_offsets.sh
@@ -7,7 +7,7 @@
 # Doing this inside the Makefile will break the $(filter-out) function,
 # causing Kbuild to rebuild the vdso-offsets header file every time.
 #
-# Author: Will Deacon 
 #
 
 LC_ALL=C
-- 
2.29.0.rc1.297.gfa9743e501-goog

Re: [PATCH RFC V3 2/9] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support

2020-10-16 Thread Ira Weiny

On Fri, Oct 16, 2020 at 12:57:43PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 12:42:51PM -0700, ira.we...@intel.com wrote:
> > From: Fenghua Yu 
> > 
> > Define a helper, update_pkey_val(), which will be used to support both
> > Protection Key User (PKU) and the new Protection Key for Supervisor
> > (PKS) in subsequent patches.
> > 
> > Co-developed-by: Ira Weiny 
> > Signed-off-by: Ira Weiny 
> > Signed-off-by: Fenghua Yu 
> > ---
> >  arch/x86/include/asm/pkeys.h |  2 ++
> >  arch/x86/kernel/fpu/xstate.c | 22 --
> >  arch/x86/mm/pkeys.c  | 21 +
> >  3 files changed, 27 insertions(+), 18 deletions(-)
> 
> This is not from Fenghua.
> 
>   
> https://lkml.kernel.org/r/20200717085442.gx10...@hirez.programming.kicks-ass.net
> 
> This is your patch based on the code I wrote.

Ok, I apologize.  Yes the code below was all yours.

Is it ok to add?

Co-developed-by: Peter Zijlstra 
Signed-off-by: Peter Zijlstra 

?

Thanks,
Ira

> 
> > diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
> > index f5efb4007e74..3cf8f775f36d 100644
> > --- a/arch/x86/mm/pkeys.c
> > +++ b/arch/x86/mm/pkeys.c
> > @@ -208,3 +208,24 @@ static __init int setup_init_pkru(char *opt)
> > return 1;
> >  }
> >  __setup("init_pkru=", setup_init_pkru);
> > +
> > +/*
> > + * Update the pk_reg value and return it.
> > + *
> > + * Kernel users use the same flags as user space:
> > + * PKEY_DISABLE_ACCESS
> > + * PKEY_DISABLE_WRITE
> > + */
> > +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
> > +{
> > +   int pkey_shift = pkey * PKR_BITS_PER_PKEY;
> > +
> > +   pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
> > +
> > +   if (flags & PKEY_DISABLE_ACCESS)
> > +   pk_reg |= PKR_AD_BIT << pkey_shift;
> > +   if (flags & PKEY_DISABLE_WRITE)
> > +   pk_reg |= PKR_WD_BIT << pkey_shift;
> > +
> > +   return pk_reg;
> > +}
> > -- 
> > 2.28.0.rc0.12.gb6a658bd00c9
> >

Re: [PATCH net-next 1/3] net: dsa: don't pass cloned skb's to drivers xmit function

2020-10-16 Thread Vladimir Oltean

On Fri, Oct 16, 2020 at 10:02:24PM +0200, Christian Eggers wrote:
> Ensure that the skb is not cloned and has enough tail room for the tail
> tag. This code will be removed from the drivers in the next commits.
> 
> Signed-off-by: Christian Eggers 
> ---

Does 1588 work for you using this change, or you haven't finished
implementing it yet? If you haven't, I would suggest finishing that
part first.

The post-reallocation skb looks nothing like the one before.

Before:
skb len=68 headroom=2 headlen=68 tailroom=186
mac=(2,14) net=(16,-1) trans=-1
shinfo(txflags=1 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x9d6927ec sw=1 l4=0) proto=0x88f7 pkttype=0 iif=0
dev name=swp2 feat=0x0x00025020
sk family=17 type=3 proto=0

After:
skb len=68 headroom=2 headlen=68 tailroom=186
mac=(2,16) net=(18,-17) trans=1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x pkttype=0 iif=0

Notice how you've changed shinfo(txflags), among other things.

Which proves that you can't just copy whatever you found in
tag_trailer.c.

I am not yet sure whether there is any helper that can be used instead
of this crazy open-coding. Right now, not having tested anything yet, my
candidates of choice would be pskb_expand_head or __pskb_pull_tail. You
should probably also try to cater here for the potential reallocation
done in the skb_cow_head() of non-tail taggers. Which would lean the
balance towards pskb_expand_head(), I believe.

Also, if the result is going to be longer than ~20 lines of code, I
strongly suggest moving the reallocation to a separate function so you
don't clutter dsa_slave_xmit.

Also, please don't redeclare struct sk_buff *nskb, you don't need to.

Re: [PATCH 3/5] gpio: msc313: MStar MSC313 GPIO driver

2020-10-16 Thread Daniel Palmer

Hi Linus

On Sat, 17 Oct 2020 at 01:56, Linus Walleij  wrote:
> (...)
>
> > +config GPIO_MSC313
> > +   bool "MStar MSC313 GPIO support"
> > +   default y if ARCH_MSTARV7
> > +   depends on ARCH_MSTARV7
> > +   select GPIO_GENERIC
>
> Selecting GPIO_GENERIC, that is good.
> But you're not using it, because you can't.
> This chip does not have the bits lined up nicely
> in one register, instead there seems to be something
> like one register per line, right?
> So skip GPIO_GENERIC.

Well spotted. Copy/paste fail on my side :).

> > +#define MSC313_GPIO_IN  BIT(0)
> > +#define MSC313_GPIO_OUT BIT(4)
> > +#define MSC313_GPIO_OEN BIT(5)
> > +
> > +#define MSC313_GPIO_BITSTOSAVE (MSC313_GPIO_OUT | MSC313_GPIO_OEN)
>
> Some comment here telling us why these need saving and
> not others.

There is a comment near to the save function that explains it I think.
When the hardware goes into low power mode with the CPU turned off
the register contents are lost and those two bits are the only ones that are
writable from what I can tell. I'll add an extra comment above that line.

> > +#define FUART_NAMES\
> > +   MSC313_PINNAME_FUART_RX,\
> > +   MSC313_PINNAME_FUART_TX,\
> > +   MSC313_PINNAME_FUART_CTS,   \
> > +   MSC313_PINNAME_FUART_RTS
> > +
> > +#define OFF_FUART_RX   0x50
> > +#define OFF_FUART_TX   0x54
> > +#define OFF_FUART_CTS  0x58
> > +#define OFF_FUART_RTS  0x5c
> > +
> > +#define FUART_OFFSETS  \
> > +   OFF_FUART_RX,   \
> > +   OFF_FUART_TX,   \
> > +   OFF_FUART_CTS,  \
> > +   OFF_FUART_RTS
>
> This looks a bit strange. The GPIO driver should not really
> have to know about any other use cases for pins than
> GPIO. But I guess it is intuitive for the driver.
>

>
> Same with all these. I suppose it is the offsets of stuff
> that would be there unless we were using it for GPIO.

The pad FUART_RX can't move but the function FUART_RX can.
If the function FUART_RX (or another function) isn't on the pad/pin
FUART_RX it's connected to the GPIO block.
Even more confusingly some of the other chips (SSD201/SSD202)
have pads called GPIO1, GPIO2 etc that only have GPIO functionality
but the offsets of the registers to control the GPIO on those pads might
not have a relation to the name.
GPIO1 isn't gpio_base + (1 * 4) and instead some random address.

Basically using the pad name as the name of the GPIO made sense
because it's fixed and the pad name and offset are the same with all
of the chips I've seen so far.

> > +static int msc313_gpio_to_irq(struct gpio_chip *chip, unsigned int offset)
> > +{
> > +   struct msc313_gpio *gpio = gpiochip_get_data(chip);
> > +> +
>
> > +   return gpio->irqs[offset];
> > +}
>
> Please do not use custom IRQ handling like this.
> As there seems to be one IRQ per line, look into using
>
> select GPIOLIB_IRQCHIP
> select IRQ_DOMAIN_HIERARCHY
>
> See for example in gpio-ixp4xx.c how we deal with
> hiearchical GPIO IRQs.

> Use hierarchical generic GPIO IRQs for these.
>
> Assign ->fwnode, ->parent_domain, ->child_to_parent_hwirq,
> and probably also ->handler on the struct gpio_irq_chip *.
>
> Skip assigning gpiochip->to_irq, the generic code will
> handle that.
>
> Again see gpio-ixp4xx.c for an example.

I'll look into this.
I don't have datasheets so I'm working from some crusty header
files from the vendor kernel but there isn't one irq per line from
what I can tell.
There seems to have been 4 spare lines on an interrupt controller
so they wired GPIOs to them.

Thank you for the comments. I'll send a v2 in a few days.

Thanks,

Daniel

[PATCH] arm64: Fix a broken copyright header in gen_vdso_offsets.sh

2020-10-16 Thread Palmer Dabbelt

From: Palmer Dabbelt 

I was going to copy this but I didn't want to chase around the build
system stuff so I did it a different way.

Signed-off-by: Palmer Dabbelt 
---
 arch/arm64/kernel/vdso/gen_vdso_offsets.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/gen_vdso_offsets.sh 
b/arch/arm64/kernel/vdso/gen_vdso_offsets.sh
index 0664acaf61ff..8b806eacd0a6 100755
--- a/arch/arm64/kernel/vdso/gen_vdso_offsets.sh
+++ b/arch/arm64/kernel/vdso/gen_vdso_offsets.sh
@@ -8,7 +8,7 @@
 # Doing this inside the Makefile will break the $(filter-out) function,
 # causing Kbuild to rebuild the vdso-offsets header file every time.
 #
-# Author: Will Deacon 
 #
 
 LC_ALL=C
-- 
2.29.0.rc1.297.gfa9743e501-goog

Re: [PATCH v2] HID: i2c-hid: add polling mode based on connected GPIO chip's pin status

2020-10-16 Thread Coiby Xu


Hi,

Thank you for examine this patch in such a careful way!

On Fri, Oct 16, 2020 at 03:00:49PM +, Barnabás Pőcze wrote:

Hi,

I still think that `i2c_hid_resume()` and `i2c_hid_suspend()` are asymmetric and
that might lead to issues.



Do you think this commit message is relevant to your concern?

$ git show d1c48038b849e9df0475621a52193a62424a4e87
commit d1c48038b849e9df0475621a52193a62424a4e87
HID: i2c-hid: Only disable irq wake if it was successfully enabled during 
suspend

Enabling irq wake could potentially fail and calling disable_irq_wake
after a failed call to enable_irq_wake could result in an unbalanced irq
warning. This patch warns if enable_irq_wake fails and avoids other
potential issues caused by calling disable_irq_wake on resume after
enable_irq_wake failed during suspend.

So I think all cases about irq have been handled. But for the regulator
part, you are right. I made a mistake.



[...]
When polling mode is enabled, an I2C device can't wake up the suspended
system since enable/disable_irq_wake is invalid for polling mode.



Excuse my ignorance, but could you elaborate this because I am not sure I 
understand.
Aren't the two things orthogonal (polling and waking up the system)?


Waking up the system depends on irq. Since we use polling, we don't set
up irq.



[...]
 #define I2C_HID_PWR_ON 0x00
 #define I2C_HID_PWR_SLEEP  0x01

+/* polling mode */
+#define I2C_POLLING_DISABLED 0
+#define I2C_POLLING_GPIO_PIN 1


This is a very small detail, but I personally think that these defines should be
called I2C_HID_ since they are only used here.


Thank you! This is absolutely a good suggestion.



+#define POLLING_INTERVAL 10
+
+static u8 polling_mode;
+module_param(polling_mode, byte, 0444);
+MODULE_PARM_DESC(polling_mode, "How to poll - 0 disabled; 1 based on GPIO pin's 
status");
+
+static unsigned int polling_interval_active_us = 4000;
+module_param(polling_interval_active_us, uint, 0644);
+MODULE_PARM_DESC(polling_interval_active_us,
+"Poll every {polling_interval_active_us} us when the touchpad is 
active. Default to 4000 us");
+
+static unsigned int polling_interval_idle_ms = 10;


There is a define for this value, I don't really see why you don't use it here.
And if there is a define for one value, I don't really see why there isn't one
for the other. (As far as I see `POLLING_INTERVAL` is not even used anywhere.)


Thank you for spotting this leftover issue after introducing two
parameters to control the polling interval.

Another issue is "MODULE_PARM_DESC(polling_interval_ms" should be
"MODULE_PARM_DESC(polling_interval_idle_ms" although this won't cause
real problem.



+module_param(polling_interval_idle_ms, uint, 0644);
+MODULE_PARM_DESC(polling_interval_ms,
+"Poll every {polling_interval_idle_ms} ms when the touchpad is 
idle. Default to 10 ms");
 /* debug option */
 static bool debug;
 module_param(debug, bool, 0444);
@@ -158,6 +178,8 @@ struct i2c_hid {

struct i2c_hid_platform_data pdata;

+   struct task_struct *polling_thread;
+
boolirq_wake_enabled;
struct mutexreset_lock;
 };
@@ -772,7 +794,9 @@ static int i2c_hid_start(struct hid_device *hid)
i2c_hid_free_buffers(ihid);

ret = i2c_hid_alloc_buffers(ihid, bufsize);
-   enable_irq(client->irq);
+
+   if (polling_mode == I2C_POLLING_DISABLED)
+   enable_irq(client->irq);

if (ret)
return ret;
@@ -814,6 +838,86 @@ struct hid_ll_driver i2c_hid_ll_driver = {
 };
 EXPORT_SYMBOL_GPL(i2c_hid_ll_driver);

+static int get_gpio_pin_state(struct irq_desc *irq_desc)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(_desc->irq_data);
+
+   return gc->get(gc, irq_desc->irq_data.hwirq);
+}
+
+static bool interrupt_line_active(struct i2c_client *client)
+{
+   unsigned long trigger_type = irq_get_trigger_type(client->irq);
+   struct irq_desc *irq_desc = irq_to_desc(client->irq);
+
+   /*
+* According to Windows Precsiontion Touchpad's specs
+* 
https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/windows-precision-touchpad-device-bus-connectivity,
+* GPIO Interrupt Assertion Leve could be either ActiveLow or
+* ActiveHigh.
+*/
+   if (trigger_type & IRQF_TRIGGER_LOW)
+   return !get_gpio_pin_state(irq_desc);
+
+   return get_gpio_pin_state(irq_desc);
+}


Excuse my ignorance, but I think some kind of error handling regarding the 
return
value of `get_gpio_pin_state()` should be present here.


What kind of errors would you expect? It seems (struct gpio_chip *)->get
only return 0 or 1.



+
+static int i2c_hid_polling_thread(void *i2c_hid)
+{
+   struct i2c_hid *ihid = i2c_hid;
+   struct i2c_client *client = ihid->client;
+   unsigned int polling_interval_idle;

Re: fw_devlink on will break all snps,dw-apb-gpio users

2020-10-16 Thread Saravana Kannan

On Thu, Oct 15, 2020 at 8:39 PM Jisheng Zhang
 wrote:
>
> On Thu, 15 Oct 2020 15:08:33 +0100
> Robin Murphy  wrote:
>
> >
> >
> > On 2020-10-15 10:52, Jisheng Zhang wrote:
> > > On Thu, 15 Oct 2020 01:48:13 -0700
> > > Saravana Kannan  wrote:
> > >
> > >> On Thu, Oct 15, 2020 at 1:15 AM Jisheng Zhang
> > >>  wrote:
> > >>>
> > >>> On Wed, 14 Oct 2020 22:04:24 -0700 Saravana Kannan wrote:
> > >>>
> > 
> > 
> >  On Wed, Oct 14, 2020 at 9:02 PM Jisheng Zhang
> >   wrote:
> > >
> > > On Wed, 14 Oct 2020 10:29:36 -0700
> > > Saravana Kannan  wrote:
> > >
> > >>
> > >>
> > >> On Wed, Oct 14, 2020 at 4:12 AM Jisheng Zhang
> > >>  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> If set fw_devlink as on, any consumers of dw apb gpio won't probe.
> > >>>
> > >>> The related dts looks like:
> > >>>
> > >>> gpio0: gpio@2400 {
> > >>> compatible = "snps,dw-apb-gpio";
> > >>> #address-cells = <1>;
> > >>> #size-cells = <0>;
> > >>>
> > >>> porta: gpio-port@0 {
> > >>>compatible = "snps,dw-apb-gpio-port";
> > >>>gpio-controller;
> > >>>#gpio-cells = <2>;
> > >>>ngpios = <32>;
> > >>>reg = <0>;
> > >>> };
> > >>> };
> > >>>
> > >>> device_foo {
> > >>>  status = "okay"
> > >>>  ...;
> > >>>  reset-gpio = <, 0, GPIO_ACTIVE_HIGH>;
> > >>> };
> > >>>
> > >>> If I change the reset-gpio property to use another kind of gpio 
> > >>> phandle,
> > >>> e.g gpio expander, then device_foo can be probed successfully.
> > >>>
> > >>> The gpio expander dt node looks like:
> > >>>
> > >>>  expander3: gpio@44 {
> > >>>  compatible = "fcs,fxl6408";
> > >>>  pinctrl-names = "default";
> > >>>  pinctrl-0 = <_pmux>;
> > >>>  reg = <0x44>;
> > >>>  gpio-controller;
> > >>>  #gpio-cells = <2>;
> > >>>  interrupt-parent = <>;
> > >>>  interrupts = <23 IRQ_TYPE_NONE>;
> > >>>  interrupt-controller;
> > >>>  #interrupt-cells = <2>;
> > >>>  };
> > >>>
> > >>> The common pattern looks like the devlink can't cope with suppliers 
> > >>> from
> > >>> child dt node.
> > >>
> > >> fw_devlink doesn't have any problem dealing with child devices being
> > >> suppliers. The problem with your case is that the
> > >> drivers/gpio/gpio-dwapb.c driver directly parses the child nodes and
> > >> never creates struct devices for them. If you have a node with
> > >> compatible string, fw_devlink expects you to create and probe a 
> > >> struct
> > >> device for it. So change your driver to add the child devices as
> > >> devices instead of just parsing the node directly and doing stuff 
> > >> with
> > >> it.
> > >>
> > >> Either that, or stop putting "compatible" string in a node if you
> > >> don't plan to actually treat it as a device -- but that's too late 
> > >> for
> > >> this driver (it needs to be backward compatible). So change the 
> > >> driver
> > >> to add of_platform_populate() and write a driver that probes
> > >> "snps,dw-apb-gpio-port".
> > >>
> > >
> > > Thanks for the information. The "snps,dw-apb-gpio-port" is never used,
> > > so I just sent out a series to remove it.
> > 
> >  I'd actually prefer that you fix the kernel code to actually use it.
> >  So that fw_devlink can be backward compatible (Older DT + new kernel).
> >  The change is pretty trivial (I just have time to do it for you).
> > 
> > >>>
> > >>> I agree the change is trivial, but it will add some useless LoCs like 
> > >>> below.
> > >>
> > >> It's not useless if it preserves backward compatibility with DT.
> > >>
> > >>> I'm not sure whether this is acceptable.So add GPIO and DT maintainers 
> > >>> to comment.
> > >>>
> > >>> Hi Linus, Rob,
> > >>>
> > >>> Could you please comment? A simple introduction of the problem:
> > >>>
> > >>> As pointed out by Saravana, "gpio-dwapb.c driver directly parses the 
> > >>> child
> > >>> nodes and never creates struct devices for them. If you have a node with
> > >>> compatible string, fw_devlink expects you to create and probe a struct
> > >>> device for it", so once we set fw_devlink=on, then any users of 
> > >>> gpio-dwapb
> > >>> as below won't be probed.
> > >>>
> > >>> device_foo {
> > >>>   status = "okay"
> > >>>   ...;
> > >>>   reset-gpio = <, 0, GPIO_ACTIVE_HIGH>;
> > >>> };
> > >>>
> > >>> The compatible string "snps,dw-apb-gpio-port" is never used, but it's in
> > >>> the dt-binding since the dw gpio mainlined. I believe the every dw apb
> >

Re: [EXT] Re: [PATCH v4 10/13] task_isolation: don't interrupt CPUs with tick_nohz_full_kick_cpu()

2020-10-16 Thread Alex Belits

On Tue, 2020-10-06 at 23:41 +0200, Frederic Weisbecker wrote:
> On Sun, Oct 04, 2020 at 03:22:09PM +, Alex Belits wrote:
> > On Thu, 2020-10-01 at 16:44 +0200, Frederic Weisbecker wrote:
> > > > @@ -268,7 +269,8 @@ static void tick_nohz_full_kick(void)
> > > >   */
> > > >  void tick_nohz_full_kick_cpu(int cpu)
> > > >  {
> > > > -   if (!tick_nohz_full_cpu(cpu))
> > > > +   smp_rmb();
> > > 
> > > What is it ordering?
> > 
> > ll_isol_flags will be read in task_isolation_on_cpu(), that accrss
> > should be ordered against writing in
> > task_isolation_kernel_enter(), fast_task_isolation_cpu_cleanup()
> > and task_isolation_start().
> > 
> > Since task_isolation_on_cpu() is often called for multiple CPUs in
> > a
> > sequence, it would be wasteful to include a barrier inside it.
> 
> Then I think you meant a full barrier: smp_mb()

For read-only operation? task_isolation_on_cpu() is the only place
where per-cpu ll_isol_flags is accessed, read-only, from multiple CPUs.
All other access to ll_isol_flags is done from the local CPU, and
writes are followed by smp_mb(). There are no other dependencies here,
except operations that depend on the value returned from
task_isolation_on_cpu().

If/when more flags will be added, those rules will be still followed,
because the intention is to store the state of isolation and phases of
entering/breaking/reporting it that can only be updated from the local
CPUs.

> 
> > > > +   if (!tick_nohz_full_cpu(cpu) ||
> > > > task_isolation_on_cpu(cpu))
> > > > return;
> > > 
> > > You can't simply ignore an IPI. There is always a reason for a
> > > nohz_full CPU
> > > to be kicked. Something triggered a tick dependency. It can be
> > > posix
> > > cpu timers
> > > for example, or anything.

This was added some time ago, when timers appeared and CPUs were kicked
seemingly out of nowhere. At that point breaking posix timers when
running tasks that are not supposed to rely on posix timers, was the
least problematic solution. From user's point of view in this case
entering isolation had an effect on timer similar to task exiting while
the timer is running.

Right now, there are still sources of superfluous calls to this, when
tick_nohz_full_kick_all() is used. If I will be able to confirm that
this is the only problematic place, I would rather fix calls to it, and
make this condition produce a warning.

This gives me an idea that if there will be a mechanism specifically
for reporting kernel entry and isolation breaking, maybe it should be
possible to add a distinction between:

1. isolation breaking that already happened upon kernel entry;
2. performing operation that will immediately and synchronously cause
isolation breaking;
3. operations or conditions that will eventually or asynchronously
cause isolation breaking (having timers running, possibly sending
signals should be in the same category).

This will be (2).

I assume that when reporting of isolation breaking will be separated
from the isolation implementation, it will be implemented as a runtime
error condition reporting mechanism. Then it can be focused on
providing information about category of events and their sources, and
have internal logic designed for that purpose, as opposed to designed
entirely for debugging, providing flexibility and obtaining maximum
details about internals involved.

> > 
> > I realize that this is unusual, however the idea is that while the
> > task
> > is running in isolated mode in userspace, we assume that from this
> > CPUs
> > point of view whatever is happening in kernel, can wait until CPU
> > is
> > back in kernel and when it first enters kernel from this mode, it
> > should "catch up" with everything that happened in its absence.
> > task_isolation_kernel_enter() is supposed to do that, so by the
> > time
> > anything should be done involving the rest of the kernel, CPU is
> > back
> > to normal.
> 
> You can't assume that. If something needs the tick, this can't wait.
> If the user did something wrong, such as setting a posix cpu timer
> to an isolated task, that's his fault and the kernel has to stick
> with
> correctness and kick that task out of isolation mode.

That would be true if not multiple "let's just tell all other CPUs that
they should check if they have to update something" situations like the
above.

In case of timers it's possible that I will be able to eliminate all
specific instances when this is done, however I think that as a general
approach we have to establish some distinction between things that must
cause IPI (and break isolation) and things that may be delayed until
the isolated userspace task will allow that or some other unavoidable
isolation-breaking event will happen.

> 
> > It is application's responsibility to avoid triggering things that
> > break its isolation
> 
> Precisely.

Right. However there are tings like tick_nohz_full_kick_all() and
similar procedures that result in mass-sending of IPIs without
determining

[rcu:master] BUILD SUCCESS 78b9b11ec17709681e5fc7b50287354f9b0f7728

2020-10-16 Thread kernel test robot

allyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a005-20201016
i386 randconfig-a006-20201016
i386 randconfig-a001-20201016
i386 randconfig-a003-20201016
i386 randconfig-a004-20201016
i386 randconfig-a002-20201016
x86_64   randconfig-a016-20201016
x86_64   randconfig-a012-20201016
x86_64   randconfig-a015-20201016
x86_64   randconfig-a013-20201016
x86_64   randconfig-a014-20201016
x86_64   randconfig-a011-20201016
i386 randconfig-a016-20201016
i386 randconfig-a013-20201016
i386 randconfig-a015-20201016
i386 randconfig-a011-20201016
i386 randconfig-a012-20201016
i386 randconfig-a014-20201016
riscvnommu_k210_defconfig
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a004-20201016
x86_64   randconfig-a002-20201016
x86_64   randconfig-a006-20201016
x86_64   randconfig-a001-20201016
x86_64   randconfig-a005-20201016
x86_64   randconfig-a003-20201016
x86_64   randconfig-a016-20201017
x86_64   randconfig-a012-20201017
x86_64   randconfig-a015-20201017
x86_64   randconfig-a013-20201017
x86_64   randconfig-a014-20201017
x86_64   randconfig-a011-20201017

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

Re: [PATCH v9 12/15] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR

2020-10-16 Thread Kuppuswamy, Sathyanarayanan




On 10/16/20 3:29 PM, Bjorn Helgaas wrote:

[+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
begin with since you're looking at this code too.  Particularly
interested in your thoughts about whether we should be touching
PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]

This part is not very clear in ACPI spec or PCI firmware spec.

IMO, since AEPI notifies the OS about the error, then I guess
we are allowed to clear the PCI_ERR_ROOT_STATUS register
after handling the error.



On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:

[+to Jonathan]

On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:

From: Qiuxu Zhuo 

When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC.  In
some non-native cases in which there is no OS-visible device associated
with the RCiEP, there is nothing to act upon as the firmware is acting
before the OS.

Add handling for the linked RCEC in AER/ERR while taking into account
non-native cases.

Co-developed-by: Sean V Kelley 
Link: 
https://lore.kernel.org/r/20201002184735.1229220-12-seanvk@oregontracks.org
Signed-off-by: Sean V Kelley 
Signed-off-by: Qiuxu Zhuo 
Signed-off-by: Bjorn Helgaas 
Reviewed-by: Jonathan Cameron 
---
  drivers/pci/pcie/aer.c | 53 ++
  drivers/pci/pcie/err.c | 20 
  2 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..083f69b67bfd 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
   */
  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
  {
-   int aer = dev->aer_cap;
+   int type = pci_pcie_type(dev);
+   struct pci_dev *root;
+   int aer = 0;
+   int rc = 0;
u32 reg32;
-   int rc;
  
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)

"type == PCI_EXP_TYPE_RC_END"


+   /*
+* The reset should only clear the Root Error Status
+* of the RCEC. Only perform this for the
+* native case, i.e., an RCEC is present.
+*/
+   root = dev->rcec;
+   else
+   root = dev;
  
-	/* Disable Root's interrupt in response to error messages */

-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   if (root)
+   aer = dev->aer_cap;
  
-	rc = pci_bus_error_reset(dev);

-   pci_info(dev, "Root Port link has been reset\n");
+   if (aer) {
+   /* Disable Root's interrupt in response to error messages */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);

Not directly related to *this* patch, but my assumption was that in
the APEI case, the firmware should retain ownership of the AER
Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
PCI_ERR_ROOT_STATUS.

But this code appears to ignore that ownership.  Jonathan, you must
have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
Device Status errors only if OS owns AER").  Do you have any insight
about this?


-   /* Clear Root Error Status */
-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
+   /* Clear Root Error Status */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, );
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
  
-	/* Enable Root Port's interrupt in response to error messages */

-   pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, );
-   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
-   pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   /* Enable Root Port's interrupt in response to error messages */
+   pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, );
+   reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
+   pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+   }
+
+   if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
+   if (pcie_has_flr(root)) {
+   rc = pcie_flr(root);
+   pci_info(dev, "has been reset (%d)\n", rc);
+   }
+   } else {
+   rc = pci_bus_error_reset(root);

Don't we want "dev" for both the FLR and pci_bus_error_reset()?  I
think "root == dev" except when dev is an RCiEP.  When dev is an
RCiEP, "root" is the RCEC (if

[PATCH v3] checkpatch: add new exception to repeated word check

2020-10-16 Thread Dwaipayan Ray

Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word test")
moved the repeated word test to check for more file types. But after
this, if checkpatch.pl is run on MAINTAINERS, it generates several
new warnings of the type:

WARNING: Possible repeated word: 'git'

For example:
WARNING: Possible repeated word: 'git'
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml.git

So, the pattern "git git://..." is a false positive in this case.

There are several other combinations which may produce a wrong
warning message, such as "@size size", ":Begin begin", etc.

Extend repeated word check to compare the characters before and
after the word matches. If the preceding or succeeding character
belongs to the exception list, the warning is avoided.

Link: 
https://lore.kernel.org/linux-kernel-mentees/81b6a0bb2c7b9256361573f7a13201ebcd4876f1.ca...@perches.com/
Suggested-by: Joe Perches 
Suggested-by: Lukas Bulwahn 
Signed-off-by: Dwaipayan Ray 
---
 scripts/checkpatch.pl | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index f1a4e61917eb..89430dfd6652 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -595,6 +595,7 @@ our @mode_permission_funcs = (
 );
 
 my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
+my $exclude_chars = '[^\.\,\+\s]';
 
 #Create a search pattern for all these functions to speed up a loop below
 our $mode_perms_search = "";
@@ -3056,15 +3057,27 @@ sub process {
 
my $first = $1;
my $second = $2;
-
+   my $start_pos = $-[1];
+   my $end_pos = $+[2];
if ($first =~ /(?:struct|union|enum)/) {
pos($rawline) += length($first) + 
length($second) + 1;
next;
}
 
-   next if ($first ne $second);
+   next if (lc($first) ne lc($second));
next if ($first eq 'long');
 
+   # check for character before and after the word 
matches
+   my $start_char = '';
+   my $end_char = '';
+   $start_char = substr($rawline, $start_pos - 1, 
1) if ($start_pos > 0);
+   $end_char = substr($rawline, $end_pos, 1) if 
($end_pos <= length($rawline));
+
+   if ($start_char =~ /^$exclude_chars$/ ||
+   $end_char =~ /^$exclude_chars$/) {
+   next;
+   }
+
if (WARN("REPEATED_WORD",
 "Possible repeated word: '$first'\n" . 
$herecurr) &&
$fix) {
-- 
2.27.0

[PATCH v5 0/4] Qualcomm Light Pulse Generator

2020-10-16 Thread Bjorn Andersson

This series introduces a generic pattern interface in the LED class and
a driver for the Qualcomm Light Pulse Generator.

It seems like it's been almost 3 years since I posted v3, which was hung
up on the lack of conclusion on the hw_pattern and multicolor support.
Now that those are concluded I hope we can make some progress on the LPG
support again.

The dts patches are included in the series as "examples", ultimately my
expectation is that the dt binding and driver patches are picked up
through the leds tree, while Andy or myself take the dts patches.

Bjorn Andersson (4):
  dt-bindings: leds: Add Qualcomm Light Pulse Generator binding
  leds: Add driver for Qualcomm LPG
  arm64: dts: qcom: pm(i)8994: Add mpp and lpg blocks
  arm64: dts: qcom: Add user LEDs on db820c

 .../bindings/leds/leds-qcom-lpg.yaml  |  170 +++
 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi  |   49 +
 arch/arm64/boot/dts/qcom/pm8994.dtsi  |9 +
 arch/arm64/boot/dts/qcom/pmi8994.dtsi |   20 +
 drivers/leds/Kconfig  |9 +
 drivers/leds/Makefile |1 +
 drivers/leds/leds-qcom-lpg.c  | 1206 +
 7 files changed, 1464 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-qcom-lpg.yaml
 create mode 100644 drivers/leds/leds-qcom-lpg.c

-- 
2.28.0

[PATCH net-next] drivers/net/wan/hdlc_fr: Improve fr_rx and add support for any Ethertype

2020-10-16 Thread Xie He

1. Change the fr_rx function to make this driver support any Ethertype
when receiving. (This driver is already able to handle any Ethertype
when sending.)

Originally in the fr_rx function, the code that parses the long (10-byte)
header only recognizes a few Ethertype values and drops frames with other
Ethertype values. This patch replaces this code to make fr_rx support
any Ethertype. This patch also creates a new function fr_snap_parse as
part of the new code.

2. Change the use of the "dev" variable in fr_rx. Originally we do
"dev = something", and then at the end do "if (dev) skb->dev = dev".
Now we do "if (something) skb->dev = something", then at the end do
"dev = skb->dev".

This is to make the logic of our code consistent with eth_type_trans
(which we call). The eth_type_trans function expects a non-NULL pointer
as a parameter and assigns it directly to skb->dev.

3. Change the initial skb->len check in fr_fx from "<= 4" to "< 4".
At first we only need to ensure a 4-byte header is present. We indeed
normally need the 5th byte, too, but it'd be more logical to check its
existence when we actually need it.

Also add an fh->ea2 check to the initial checks in fr_fx. fh->ea2 == 1
means the second address byte is the final address byte. We only support
the case where the address length is 2 bytes.

4. Use "goto rx_drop" whenever we need to drop a valid frame.

Cc: Krzysztof Halasa 
Signed-off-by: Xie He 
---
 drivers/net/wan/hdlc_fr.c | 119 +++---
 1 file changed, 73 insertions(+), 46 deletions(-)

diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index 409e5a7ad8e2..e95efc14bc97 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -871,6 +871,45 @@ static int fr_lmi_recv(struct net_device *dev, struct 
sk_buff *skb)
return 0;
 }
 
+static int fr_snap_parse(struct sk_buff *skb, struct pvc_device *pvc)
+{
+   /* OUI 00-00-00 indicates an Ethertype follows */
+   if (skb->data[0] == 0x00 &&
+   skb->data[1] == 0x00 &&
+   skb->data[2] == 0x00) {
+   if (!pvc->main)
+   return -1;
+   skb->dev = pvc->main;
+   skb->protocol = *(__be16 *)(skb->data + 3); /* Ethertype */
+   skb_pull(skb, 5);
+   skb_reset_mac_header(skb);
+   return 0;
+
+   /* OUI 00-80-C2 stands for the 802.1 organization */
+   } else if (skb->data[0] == 0x00 &&
+  skb->data[1] == 0x80 &&
+  skb->data[2] == 0xC2) {
+   /* PID 00-07 stands for Ethernet frames without FCS */
+   if (skb->data[3] == 0x00 &&
+   skb->data[4] == 0x07) {
+   if (!pvc->ether)
+   return -1;
+   skb_pull(skb, 5);
+   if (skb->len < ETH_HLEN)
+   return -1;
+   skb->protocol = eth_type_trans(skb, pvc->ether);
+   return 0;
+
+   /* PID unsupported */
+   } else {
+   return -1;
+   }
+
+   /* OUI unsupported */
+   } else {
+   return -1;
+   }
+}
 
 static int fr_rx(struct sk_buff *skb)
 {
@@ -880,9 +919,9 @@ static int fr_rx(struct sk_buff *skb)
u8 *data = skb->data;
u16 dlci;
struct pvc_device *pvc;
-   struct net_device *dev = NULL;
+   struct net_device *dev;
 
-   if (skb->len <= 4 || fh->ea1 || data[2] != FR_UI)
+   if (skb->len < 4 || fh->ea1 || !fh->ea2 || data[2] != FR_UI)
goto rx_error;
 
dlci = q922_to_dlci(skb->data);
@@ -904,8 +943,7 @@ static int fr_rx(struct sk_buff *skb)
netdev_info(frad, "No PVC for received frame's DLCI %d\n",
dlci);
 #endif
-   dev_kfree_skb_any(skb);
-   return NET_RX_DROP;
+   goto rx_drop;
}
 
if (pvc->state.fecn != fh->fecn) {
@@ -931,63 +969,52 @@ static int fr_rx(struct sk_buff *skb)
}
 
if (data[3] == NLPID_IP) {
+   if (!pvc->main)
+   goto rx_drop;
skb_pull(skb, 4); /* Remove 4-byte header (hdr, UI, NLPID) */
-   dev = pvc->main;
+   skb->dev = pvc->main;
skb->protocol = htons(ETH_P_IP);
+   skb_reset_mac_header(skb);
 
} else if (data[3] == NLPID_IPV6) {
+   if (!pvc->main)
+   goto rx_drop;
skb_pull(skb, 4); /* Remove 4-byte header (hdr, UI, NLPID) */
-   dev = pvc->main;
+   skb->dev = pvc->main;
skb->protocol = htons(ETH_P_IPV6);
+   skb_reset_mac_header(skb);
 
-   } else if (skb->len > 10 && data[3] == FR_PAD &&
-  data[4] == NLPID_SNAP && data[5] == FR_PAD) {
-   u16 oui = ntohs(*(__be16*)(data + 6));
-

[PATCH v5 3/4] arm64: dts: qcom: pm(i)8994: Add mpp and lpg blocks

2020-10-16 Thread Bjorn Andersson

The pm8994 contains a 6 LPG channels and the pmi8994 contains 4 MPP
channels and a 4 channel LPG, with TRILED and LUT blocks.

Add nodes for these blocks.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Replaced msm8996 with pm(i)8994 in subject

 arch/arm64/boot/dts/qcom/pm8994.dtsi  |  9 +
 arch/arm64/boot/dts/qcom/pmi8994.dtsi | 20 
 2 files changed, 29 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm8994.dtsi 
b/arch/arm64/boot/dts/qcom/pm8994.dtsi
index 7e4f46cb..b5bef687aa3c 100644
--- a/arch/arm64/boot/dts/qcom/pm8994.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8994.dtsi
@@ -86,6 +86,15 @@ pmic@1 {
#address-cells = <1>;
#size-cells = <0>;
 
+   pm8994_lpg: lpg {
+   compatible = "qcom,pm8994-lpg";
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   status = "disabled";
+   };
+
pm8994_spmi_regulators: regulators {
compatible = "qcom,pm8994-regulators";
};
diff --git a/arch/arm64/boot/dts/qcom/pmi8994.dtsi 
b/arch/arm64/boot/dts/qcom/pmi8994.dtsi
index e5ed28ab9b2d..23f41328d191 100644
--- a/arch/arm64/boot/dts/qcom/pmi8994.dtsi
+++ b/arch/arm64/boot/dts/qcom/pmi8994.dtsi
@@ -19,6 +19,17 @@ pmi8994_gpios: gpios@c000 {
interrupt-controller;
#interrupt-cells = <2>;
};
+
+   pmi8994_mpps: mpps@a000 {
+   compatible = "qcom,pm8994-mpp";
+   reg = <0xa000>;
+   gpio-controller;
+   #gpio-cells = <2>;
+   interrupts = <0 0xa0 0 IRQ_TYPE_NONE>,
+<0 0xa1 0 IRQ_TYPE_NONE>,
+<0 0xa2 0 IRQ_TYPE_NONE>,
+<0 0xa3 0 IRQ_TYPE_NONE>;
+   };
};
 
pmic@3 {
@@ -27,6 +38,15 @@ pmic@3 {
#address-cells = <1>;
#size-cells = <0>;
 
+   pmi8994_lpg: lpg@b100 {
+   compatible = "qcom,pmi8994-lpg";
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   status = "disabled";
+   };
+
pmi8994_spmi_regulators: regulators {
compatible = "qcom,pmi8994-regulators";
#address-cells = <1>;
-- 
2.28.0

[PATCH v5 4/4] arm64: dts: qcom: Add user LEDs on db820c

2020-10-16 Thread Bjorn Andersson

The db820c has 4 "user LEDs", all connected to the PMI8994. The first
three are connected to the three current sinks provided by the TRILED
and the fourth is connected to MPP2.

By utilizing the DTEST bus the MPP is fed the control signal from the
fourth LPG block, providing a consistent interface to the user.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- None

 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 49 
 1 file changed, 49 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi 
b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
index defcbd15edf9..7e51677d256e 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
@@ -8,6 +8,7 @@
 #include "pmi8994.dtsi"
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -682,6 +683,54 @@ pinconf {
};
 };
 
+_mpps {
+   pmi8994_mpp2_userled4: mpp2-userled4 {
+   pins = "mpp2";
+   function = "sink";
+
+   output-low;
+   qcom,dtest = <4>;
+   };
+};
+
+_lpg {
+   qcom,power-source = <1>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_mpp2_userled4>;
+
+   status = "okay";
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   led@1 {
+   reg = <1>;
+   label = "green:user1";
+
+   linux,default-trigger = "heartbeat";
+   default-state = "on";
+   };
+
+   led@2 {
+   reg = <2>;
+   label = "green:user0";
+   default-state = "on";
+   };
+
+   led@3 {
+   reg = <3>;
+   label = "green:user2";
+   };
+
+   led@4 {
+   reg = <4>;
+   label = "green:user3";
+
+   qcom,dtest = <4 1>;
+   };
+};
+
 _spmi_regulators {
vdd_gfx: s2@1700 {
reg = <0x1700 0x100>;
-- 
2.28.0

Re: [PATCH v2] checkpatch: add new exception to repeated word check

2020-10-16 Thread Dwaipayan Ray

On Sat, Oct 17, 2020 at 10:12 AM Joe Perches  wrote:
>
> On Sat, 2020-10-17 at 10:02 +0530, Dwaipayan Ray wrote:
> > On Sat, Oct 17, 2020 at 8:26 AM Joe Perches  wrote:
> > > On Wed, 2020-10-14 at 11:35 -0700, Joe Perches wrote:
> > > > On Wed, 2020-10-14 at 23:42 +0530, Dwaipayan Ray wrote:
> > > > > On Wed, Oct 14, 2020 at 11:33 PM Joe Perches  wrote:
> > > > > > On Wed, 2020-10-14 at 22:07 +0530, Dwaipayan Ray wrote:
> > > > > > > Recently, commit 4f6ad8aa1eac ("checkpatch: move repeated word 
> > > > > > > test")
> > > > > > > moved the repeated word test to check for more file types. But 
> > > > > > > after
> > > > > > > this, if checkpatch.pl is run on MAINTAINERS, it generates several
> > > > > > > new warnings of the type:
> > > > > >
> > > > > > Perhaps instead of adding more content checks so that
> > > > > > word boundaries are not something like \S but also
> > > > > > not punctuation so that content like
> > > > > >
> > > > > > git git://
> > > > > > @size size
> > > > > >
> > > > > > does not match?
> > > > > >
> > > > > >
> > > > > Hi,
> > > > > So currently the words are trimmed of non alphabets before the check:
> > > > >
> > > > > while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
> > > > > my $first = $1;
> > > > > my $second = $2;
> > > > >
> > > > > where, the word_pattern is:
> > > > > my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> > > >
> > > > I'm familiar.
> > > >
> > > > > So do you perhaps recommend modifying this word pattern to
> > > > > include the punctuation as well rather than trimming them off?
> > > >
> > > > Not really, perhaps use the capture group position
> > > > markers @- @+ or $-[1] $+[1] and $-[2] $+[2] with the
> > > > substr could be used to see what characters are
> > > > before and after the word matches.
> > >
> > > Perhaps something like:
> > > ---
> > >  scripts/checkpatch.pl | 12 +++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index fab38b493cef..a65eb40a5539 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -3054,15 +3054,25 @@ sub process {
> > >
> > > my $first = $1;
> > > my $second = $2;
> > > +   my $start_pos = $-[1];
> > > +   my $end_pos = $+[2];
> > >
> > > if ($first =~ /(?:struct|union|enum)/) {
> > > pos($rawline) += length($first) + 
> > > length($second) + 1;
> > > next;
> > > }
> > >
> > > -   next if ($first ne $second);
> > > +   next if (lc($first) ne lc($second));
> > > next if ($first eq 'long');
> > >
> > > +   my $start_char = "";
> > > +   my $end_char = "";
> > > +   $start_char = substr($rawline, $start_pos 
> > > - 1, 1) if ($start_pos > 0);
> > > +   $end_char = substr($rawline, $end_pos, 1) 
> > > if (length($rawline) > $end_pos);
> > > +
> > > +   next if ($start_char =~ /^\S$/);
> > > +   next if ($end_char !~ /^[\.\,\s]?$/);
> > > +
> > > if (WARN("REPEATED_WORD",
> > >  "Possible repeated word: 
> > > '$first'\n" . $herecurr) &&
> > > $fix) {
> > >
> > >
> >
> > Hi Joe,
> > Thank you for the insight. I was also doing something similar:
> >
> > ---
> >  scripts/checkpatch.pl | 16 
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index f1a4e61917eb..82497a71ac96 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -595,6 +595,7 @@ our @mode_permission_funcs = (
> >  );
> >
> >  my $word_pattern = '\b[A-Z]?[a-z]{2,}\b';
> > +my $punctuation_chars = '[,:;@\.\-]';
> >
> >  #Create a search pattern for all these functions to speed up a loop below
> >  our $mode_perms_search = "";
> > @@ -3065,6 +3066,21 @@ sub process {
> >   next if ($first ne $second);
> >   next if ($first eq 'long');
> >
> > + # check for character before and after the word matches
> > + my $ca_first = substr($rawline, $-[1]-1, 1);
> > + my $cb_first = substr($rawline, $+[1], 1);
> > + my $ca_second = substr($rawline, $-[2]-1, 1);
> > + my $cb_second = substr($rawline, $+[2], 1);
> > +
> > + if ($ca_first ne $ca_second || $cb_first ne $cb_second) {
> > + if ($ca_first =~ /$punctuation_chars/ ||
> > + $ca_second =~ /$punctuation_chars/ ||
> > + $cb_first =~ /$punctuation_chars/ ||
> > + $cb_second =~ /$punctuation_chars/) {
> > + next;
> > + }
> > + }
> > +
> >   if

[PATCH v5 1/4] dt-bindings: leds: Add Qualcomm Light Pulse Generator binding

2020-10-16 Thread Bjorn Andersson

This adds the binding document describing the three hardware blocks
related to the Light Pulse Generator found in a wide range of Qualcomm
PMICs.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Dropped quotes around power-source
- Moved "multi-led" to properties
- Corrected tab-indented line in example

 .../bindings/leds/leds-qcom-lpg.yaml  | 170 ++
 1 file changed, 170 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-qcom-lpg.yaml

diff --git a/Documentation/devicetree/bindings/leds/leds-qcom-lpg.yaml 
b/Documentation/devicetree/bindings/leds/leds-qcom-lpg.yaml
new file mode 100644
index ..5ccf0f3d8f1b
--- /dev/null
+++ b/Documentation/devicetree/bindings/leds/leds-qcom-lpg.yaml
@@ -0,0 +1,170 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/leds/leds-qcom-lpg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm Light Pulse Generator
+
+maintainers:
+  - Bjorn Andersson 
+
+description: >
+  The Qualcomm Light Pulse Generator consists of three different hardware 
blocks;
+  a ramp generator with lookup table, the light pulse generator and a three
+  channel current sink. These blocks are found in a wide range of Qualcomm 
PMICs.
+
+properties:
+  compatible:
+enum:
+  - qcom,pm8916-pwm
+  - qcom,pm8941-lpg
+  - qcom,pm8994-lpg
+  - qcom,pmi8994-lpg
+  - qcom,pmi8998-lpg
+
+  "#pwm-cells":
+const: 2
+
+  "#address-cells":
+const: 1
+
+  "#size-cells":
+const: 0
+
+  qcom,power-source:
+$ref: /schemas/types.yaml#definitions/uint32
+description: >
+  power-source used to drive the output, as defined in the datasheet.
+  Should be specified if the TRILED block is present
+enum:
+  - 0
+  - 1
+  - 3
+
+  multi-led:
+type: object
+$ref: leds-class-multicolor.yaml#
+properties:
+  "#address-cells":
+const: 1
+
+  "#size-cells":
+const: 0
+
+  "^led@[0-9a-f]$":
+type: object
+$ref: common.yaml#
+
+properties:
+  "qcom,dtest":
+$ref: /schemas/types.yaml#definitions/uint32-array
+description: >
+  configures the output into an internal test line of the pmic. 
Specified
+  by a list of u32 pairs, one pair per channel, where each pair 
denotes the
+  test line to drive and the second configures how the value 
should be
+  outputed, as defined in the datasheet
+minItems: 2
+maxItems: 2
+
+required:
+  - reg
+
+patternProperties:
+  "^led@[0-9a-f]$":
+type: object
+$ref: common.yaml#
+properties:
+  "qcom,dtest":
+$ref: /schemas/types.yaml#definitions/uint32-array
+description: >
+  configures the output into an internal test line of the pmic. 
Specified
+  by a list of u32 pairs, one pair per channel, where each pair 
denotes the
+  test line to drive and the second configures how the value should be
+  outputed, as defined in the datasheet
+minItems: 2
+maxItems: 2
+
+required:
+  - reg
+
+required:
+  - compatible
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+
+lpg {
+  compatible = "qcom,pmi8994-lpg";
+
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  qcom,power-source = <1>;
+
+  led@1 {
+reg = <1>;
+label = "green:user1";
+  };
+
+  led@2 {
+reg = <2>;
+label = "green:user0";
+default-state = "on";
+  };
+
+  led@3 {
+reg = <3>;
+label = "green:user2";
+  };
+
+  led@4 {
+reg = <4>;
+label = "green:user3";
+
+qcom,dtest = <4 1>;
+  };
+};
+  - |
+#include 
+
+lpg {
+  compatible = "qcom,pmi8994-lpg";
+
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  qcom,power-source = <1>;
+
+  multi-led {
+color = ;
+label = "rgb:notification";
+
+#address-cells = <1>;
+#size-cells = <0>;
+
+led@1 {
+  reg = <1>;
+  color = ;
+};
+
+led@2 {
+  reg = <2>;
+  color = ;
+};
+
+led@3 {
+  reg = <3>;
+  color = ;
+};
+  };
+};
+  - |
+lpg {
+  compatible = "qcom,pm8916-pwm";
+  #pwm-cells = <2>;
+};
+...
-- 
2.28.0

[PATCH v5 2/4] leds: Add driver for Qualcomm LPG

2020-10-16 Thread Bjorn Andersson

The Light Pulse Generator (LPG) is a PWM-block found in a wide range of
PMICs from Qualcomm. It can operate on fixed parameters or based on a
lookup-table, altering the duty cycle over time - which provides the
means for e.g. hardware assisted transitions of LED brightness.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Introduced div_u64() to avoid compilation errors on 32 bit machines
- Fixed SPDX license header and a couple of spacing issues pointed out by 
checkpatch

 drivers/leds/Kconfig |9 +
 drivers/leds/Makefile|1 +
 drivers/leds/leds-qcom-lpg.c | 1206 ++
 3 files changed, 1216 insertions(+)
 create mode 100644 drivers/leds/leds-qcom-lpg.c

diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 849d3c5f908e..d500648c586f 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -803,6 +803,15 @@ config LEDS_POWERNV
  To compile this driver as a module, choose 'm' here: the module
  will be called leds-powernv.
 
+config LEDS_QCOM_LPG
+   tristate "LED support for Qualcomm LPG"
+   depends on LEDS_CLASS_MULTICOLOR
+   depends on OF
+   depends on SPMI
+   help
+ This option enables support for the Light Pulse Generator found in a
+ wide variety of Qualcomm PMICs.
+
 config LEDS_SYSCON
bool "LED support for LEDs on system controllers"
depends on LEDS_CLASS=y
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index 73e603e1727e..52d0ea26fc35 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -79,6 +79,7 @@ obj-$(CONFIG_LEDS_PCA963X)+= leds-pca963x.o
 obj-$(CONFIG_LEDS_PM8058)  += leds-pm8058.o
 obj-$(CONFIG_LEDS_POWERNV) += leds-powernv.o
 obj-$(CONFIG_LEDS_PWM) += leds-pwm.o
+obj-$(CONFIG_LEDS_QCOM_LPG)+= leds-qcom-lpg.o
 obj-$(CONFIG_LEDS_REGULATOR)   += leds-regulator.o
 obj-$(CONFIG_LEDS_S3C24XX) += leds-s3c24xx.o
 obj-$(CONFIG_LEDS_SC27XX_BLTC) += leds-sc27xx-bltc.o
diff --git a/drivers/leds/leds-qcom-lpg.c b/drivers/leds/leds-qcom-lpg.c
new file mode 100644
index ..49282cedeb6e
--- /dev/null
+++ b/drivers/leds/leds-qcom-lpg.c
@@ -0,0 +1,1206 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2017-2020 Linaro Ltd
+ * Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define LPG_PATTERN_CONFIG_REG 0x40
+#define LPG_SIZE_CLK_REG   0x41
+#define LPG_PREDIV_CLK_REG 0x42
+#define PWM_TYPE_CONFIG_REG0x43
+#define PWM_VALUE_REG  0x44
+#define PWM_ENABLE_CONTROL_REG 0x46
+#define PWM_SYNC_REG   0x47
+#define LPG_RAMP_DURATION_REG  0x50
+#define LPG_HI_PAUSE_REG   0x52
+#define LPG_LO_PAUSE_REG   0x54
+#define LPG_HI_IDX_REG 0x56
+#define LPG_LO_IDX_REG 0x57
+#define PWM_SEC_ACCESS_REG 0xd0
+#define PWM_DTEST_REG(x)   (0xe2 + (x) - 1)
+
+#define TRI_LED_SRC_SEL0x45
+#define TRI_LED_EN_CTL 0x46
+#define TRI_LED_ATC_CTL0x47
+
+#define LPG_LUT_REG(x) (0x40 + (x) * 2)
+#define RAMP_CONTROL_REG   0xc8
+
+struct lpg_channel;
+struct lpg_data;
+
+/**
+ * struct lpg - LPG device context
+ * @dev:   struct device for LPG device
+ * @map:   regmap for register access
+ * @pwm:   PWM-chip object, if operating in PWM mode
+ * @lut_base:  base address of the LUT block (optional)
+ * @lut_size:  number of entries in the LUT block
+ * @lut_bitmap:allocation bitmap for LUT entries
+ * @triled_base: base address of the TRILED block (optional)
+ * @triled_src:power-source for the TRILED
+ * @channels:  list of PWM channels
+ * @num_channels: number of @channels
+ */
+struct lpg {
+   struct device *dev;
+   struct regmap *map;
+
+   struct pwm_chip pwm;
+
+   const struct lpg_data *data;
+
+   u32 lut_base;
+   u32 lut_size;
+   unsigned long *lut_bitmap;
+
+   u32 triled_base;
+   u32 triled_src;
+
+   struct lpg_channel *channels;
+   unsigned int num_channels;
+};
+
+/**
+ * struct lpg_channel - per channel data
+ * @lpg:   reference to parent lpg
+ * @base:  base address of the PWM channel
+ * @triled_mask: mask in TRILED to enable this channel
+ * @lut_mask:  mask in LUT to start pattern generator for this channel
+ * @in_use:channel is exposed to LED framework
+ * @color: color of the LED attached to this channel
+ * @dtest_line:DTEST line for output, or 0 if disabled
+ * @dtest_value: DTEST line configuration
+ * @pwm_value: duty (in microseconds) of the generated pulses, overridden by 
LUT
+ * @enabled:   output enabled?
+ * @period_us: period (in microseconds) of the generated pulses
+ * @pwm_size:  resolution of the @pwm_value, 6 or 9 bits
+ * @clk:   base frequency of the clock generator
+ * @pre_div:

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Colm MacCarthaigh





On 16 Oct 2020, at 22:01, Jann Horn wrote:


On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh 
 wrote:
For user-space, even a single bit would do. We added 
MADVISE_WIPEONFORK
so that userspace libraries can detect fork()/clone() robustly, for 
the

same reasons. It just wipes a page as the indicator, which is
effectively a single-bit signal, and it works well. On the user-space
side of this, I’m keen to find a solution like that that we can use
fairly easily inside of portable libraries and applications. The 
“have
I forked” checks do end up in hot paths, so it’s nice if they can 
be

CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my
favorite.


I'm pretty sure a single bit is not enough if you want to have a
single page, shared across the entire system, that stores the VM
forking state; you need a counter for that.


You’re right. WIPEONFORK is more like a single-bit per use. If it’s 
something system wide then a counter is better.



So the RNG state after mixing in the new VM Generation ID would
contain 128 bits of secret entropy not known to anyone else, including
people with access to the VM image.

Now, 128 bits of cryptographically random data aren't _optimal_; I
think something on the order of 256 bits would be nicer from a
theoretical standpoint. But in practice I think we'll be good with the
128 bits we're getting (since the number of users who fork a VM image
is probably not going to be so large that worst-case collision
probabilities matter).


This reminds me on key/IV usage limits for AES encryption, where the 
same birthday bounds apply, and even though 256-bits would be better, we 
routinely make 128-bit birthday bounds work for massively scalable 
systems.



The kernel would need to use the change as a trigger to
measure some entropy (e.g. interrupts and RDRAND, or whatever). Our 
just
define the machine contract as “this has to be unique random data 
and

if it’s not unique, or if it’s pubic, you’re toast”.


As far as I can tell from Microsoft's spec, that is a guarantee we're
already getting.


Neat.

-
Colm

[PATCH] ipmi_si: replace spin_lock_irqsave by spin_lock in hard IRQ

2020-10-16 Thread Tian Tao

It is redundant to do irqsave and irqrestore in hardIRQ context.

Signed-off-by: Tian Tao 
---
 drivers/char/ipmi/ipmi_si_intf.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index 45546ac..97452a8 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -1116,7 +1116,6 @@ static void smi_timeout(struct timer_list *t)
 irqreturn_t ipmi_si_irq_handler(int irq, void *data)
 {
struct smi_info *smi_info = data;
-   unsigned long   flags;
 
if (smi_info->io.si_type == SI_BT)
/* We need to clear the IRQ flag for the BT interface. */
@@ -1124,14 +1123,14 @@ irqreturn_t ipmi_si_irq_handler(int irq, void *data)
 IPMI_BT_INTMASK_CLEAR_IRQ_BIT
 | IPMI_BT_INTMASK_ENABLE_IRQ_BIT);
 
-   spin_lock_irqsave(&(smi_info->si_lock), flags);
+   spin_lock(&(smi_info->si_lock));
 
smi_inc_stat(smi_info, interrupts);
 
debug_timestamp("Interrupt");
 
smi_event_handler(smi_info, 0);
-   spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+   spin_unlock(&(smi_info->si_lock));
return IRQ_HANDLED;
 }
 
-- 
2.7.4

Re: [PATCH v6 07/25] treewide: remove DISABLE_LTO

2020-10-16 Thread Masahiro Yamada

On Thu, Oct 15, 2020 at 7:43 AM Kees Cook  wrote:
>
> On Mon, Oct 12, 2020 at 05:31:45PM -0700, Sami Tolvanen wrote:
> > This change removes all instances of DISABLE_LTO from
> > Makefiles, as they are currently unused, and the preferred
> > method of disabling LTO is to filter out the flags instead.
> >
> > Suggested-by: Kees Cook 
> > Signed-off-by: Sami Tolvanen 
> > Reviewed-by: Kees Cook 
>
> Hi Masahiro,
>
> Since this is independent of anything else and could be seen as a
> general cleanup, can this patch be taken into your tree, just to
> separate it from the list of dependencies for this series?
>
> -Kees
>
> --
> Kees Cook



Yes, this is stale code because GCC LTO was not pulled.

Applied to linux-kbuild.

I added the following historical background.



Note added by Masahiro Yamada:
DISABLE_LTO was added as preparation for GCC LTO, but GCC LTO was
not pulled into the mainline. (https://lkml.org/lkml/2014/4/8/272)




-- 
Best Regards
Masahiro Yamada

Re: [PATCH 3/3] MIPS: Loongson64: Add /proc/boardinfo

2020-10-16 Thread Tiezhu Yang


On 10/16/2020 09:11 PM, Pavel Machek wrote:

Hi!


Add /proc/boardinfo to get mainboard and BIOS info easily on the Loongson
platform, this is useful to point out the current used mainboard type and
BIOS version when there exists problems related with hardware or firmware.

E.g. with this patch:

[loongson@linux ~]$ cat /proc/boardinfo
Board Info
Manufacturer: LEMOTE
Board Name  : LEMOTE-LS3A4000-7A1000-1w-V01-pc
Family  : LOONGSON3

BIOS Info
Vendor  : Kunlun
Version : Kunlun-A1901-V4.1.3-20200414093938
ROM Size: 4 KB
Release Date: 2020-04-14

Please put this into /sys somewhere, with usual rules. This is hard to
extend/parse.


Hi Pavel,

Thanks for your suggestion. I submitted a new version some days ago.

[1/2] MIPS: Loongson64: Add /sys/firmware/lefi/boardinfo
https://lore.kernel.org/patchwork/patch/1320574/

[2/2] Documentation: ABI: Add /sys/firmware/lefi/boardinfo description 
for Loongson64

https://lore.kernel.org/patchwork/patch/1320573/

Thanks,
Tiezhu



Pavel

Re: [PATCH v9 09/10] fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile

2020-10-16 Thread kernel test robot

Hi Konstantin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.9 next-20201016]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Konstantin-Komarov/NTFS-read-write-driver-GPL-implementation-by-Paragon-Software/20201016-233309
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
9ff9b0d392ea08090cd1780fb196f36dbb586529
config: x86_64-randconfig-a011-20201017 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
efd02c1548ee458d59063f6393e94e972b5c3d50)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/3339f0d0890cfe6ed760dc24916de15e74c4f67d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Konstantin-Komarov/NTFS-read-write-driver-GPL-implementation-by-Paragon-Software/20201016-233309
git checkout 3339f0d0890cfe6ed760dc24916de15e74c4f67d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> fs/ntfs3/attrib.c:1256:7: warning: variable 'hint' is used uninitialized 
>> whenever '&&' condition is false [-Wsometimes-uninitialized]
   if (vcn + clst_data &&
   ^~~
   fs/ntfs3/attrib.c:1263:11: note: uninitialized use occurs here
hint + 1, len - clst_data, 
NULL, 0,
^~~~
   fs/ntfs3/attrib.c:1256:7: note: remove the '&&' if its condition is always 
true
   if (vcn + clst_data &&
   ^~
   fs/ntfs3/attrib.c:1254:18: note: initialize the variable 'hint' to silence 
this warning
   CLST alen, hint;
  ^
   = 0
   fs/ntfs3/attrib.c:72:20: warning: unused function 'attr_must_be_resident' 
[-Wunused-function]
   static inline bool attr_must_be_resident(struct ntfs_sb_info *sbi,
  ^
   2 warnings generated.

vim +1256 fs/ntfs3/attrib.c

dc58d89d2835db2 Konstantin Komarov 2020-10-16  1171  
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1172  /*
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1173   * attr_allocate_frame
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1174   *
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1175   * allocate/free clusters 
for 'frame'
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1176   */
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1177  int 
attr_allocate_frame(struct ntfs_inode *ni, CLST frame, size_t compr_size,
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1178 u64 
new_valid)
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1179  {
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1180 int err = 0;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1181 struct runs_tree *run = 
>file.run;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1182 struct ntfs_sb_info 
*sbi = ni->mi.sbi;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1183 struct ATTRIB *attr, 
*attr_b;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1184 struct ATTR_LIST_ENTRY 
*le, *le_b;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1185 struct mft_inode *mi, 
*mi_b;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1186 CLST svcn, evcn1, 
next_evcn1, next_svcn, lcn, len;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1187 CLST vcn, end, 
clst_data;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1188 u64 total_size, 
valid_size, data_size;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1189 bool is_compr;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1190  
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1191 le_b = NULL;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1192 attr_b = 
ni_find_attr(ni, NULL, _b, ATTR_DATA, NULL, 0, NULL, _b);
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1193 if (!attr_b)
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1194 return -ENOENT;
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1195  
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1196 if 
(!is_attr_ext(attr_b))
dc58d89d2835db2 Konstantin Komarov 2020-10-16  1197 return -EINVAL;
dc58d89d2835db2 Konstant

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Willy Tarreau

On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote:
> [adding some more people who are interested in RNG stuff: Andy, Jason,
> Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
> concerns some pretty fundamental API stuff related to RNG usage]
> 
> On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
>  wrote:
> > This patch is a driver which exposes the Virtual Machine Generation ID
> > via a char-dev FS interface that provides ID update sync and async
> > notification, retrieval and confirmation mechanisms:
> >
> > When the device is 'open()'ed a copy of the current vm UUID is
> > associated with the file handle. 'read()' operations block until the
> > associated UUID is no longer up to date - until HW vm gen id changes -
> > at which point the new UUID is provided/returned. Nonblocking 'read()'
> > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
> >
> > 'poll()' is implemented to allow polling for UUID updates. Such
> > updates result in 'EPOLLIN' events.
> >
> > Subsequent read()s following a UUID update no longer block, but return
> > the updated UUID. The application needs to acknowledge the UUID update
> > by confirming it through a 'write()'.
> > Only on writing back to the driver the right/latest UUID, will the
> > driver mark this "watcher" as up to date and remove EPOLLIN status.
> >
> > 'mmap()' support allows mapping a single read-only shared page which
> > will always contain the latest UUID value at offset 0.
> 
> It would be nicer if that page just contained an incrementing counter,
> instead of a UUID. It's not like the application cares *what* the UUID
> changed to, just that it *did* change and all RNGs state now needs to
> be reseeded from the kernel, right? And an application can't reliably
> read the entire UUID from the memory mapping anyway, because the VM
> might be forked in the middle.
> 
> So I think your kernel driver should detect UUID changes and then turn
> those into a monotonically incrementing counter. (Probably 64 bits
> wide?) (That's probably also a little bit faster than comparing an
> entire UUID.)

I agree with this. Further, I'm observing there is a very common
confusion between "universally unique" and "random". Randoms are
needed when seeking unpredictability. A random number generator
*must* be able to return the same value multiple times in a row
(though this is rare), otherwise it's not random.

To illustrate this, a die has less than 3 bits of randomness and
is sufficient to play games with friends where a counter would allow
everyone to cheat. Conversely if you want to assign IDs to members
of your family you'd rather use a counter than a die for this or
you risk collisions and/or long series of retries to pick unique
IDs.

RFC4122 explains in great length how to produce guaranteed unique
IDs, and this only involves space, time and counters. There's
indeed a lazy variant that probably everyone uses nowadays,
consisting in picking random numbers, but this is not guaranteed
unique anymore.

If the UUIDs used there are real UUIDs, it could be as simple as
updating them according to their format, i.e. updating the timestamp,
and if the timestamp is already the same, just increase the seq counter.
Doing this doesn't require entropy, doesn't need to block and doesn't
needlessly leak randoms that sometimes make people feel nervous.

Just my two cents,
Willy

Re: [PATCH] Kbuild.include: remove leftover comment for filechk utility

2020-10-16 Thread Masahiro Yamada

On Fri, Oct 16, 2020 at 9:59 PM Rasmus Villemoes
 wrote:
>
> After commit 43fee2b23895 ("kbuild: do not redirect the first
> prerequisite for filechk"), the rule is no longer automatically passed
> $< as stdin, so remove the stale comment.
>
> Fixes: 43fee2b23895 ("kbuild: do not redirect the first prerequisite for 
> filechk")
> Signed-off-by: Rasmus Villemoes 
> ---

Applied to linux-kbuild. Thanks.


>  scripts/Kbuild.include | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index 83a1637417e5..08e011175b4c 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -56,8 +56,6 @@ kecho := $($(quiet)kecho)
>  # - If no file exist it is created
>  # - If the content differ the new file is used
>  # - If they are equal no change, and no timestamp update
> -# - stdin is piped in from the first prerequisite ($<) so one has
> -#   to specify a valid file as first prerequisite (often the kbuild file)
>  define filechk
> $(Q)set -e; \
> mkdir -p $(dir $@); \
> --
> 2.23.0
>


-- 
Best Regards
Masahiro Yamada

Re: [PATCH 1/1] pci: pciehp: Handle MRL interrupts to enable slot for hotplug.

2020-10-16 Thread Raj, Ashok

Hi Bjorn


On Fri, Sep 25, 2020 at 04:01:38PM -0700, Ashok Raj wrote:
> When Mechanical Retention Lock (MRL) is present, Linux doesn't process
> those change events.
> 
> The following changes need to be enabled when MRL is present.
> 
> 1. Subscribe to MRL change events in SlotControl.
> 2. When MRL is closed,
>- If there is no ATTN button, then POWER on the slot.
>- If there is ATTN button, and an MRL event pending, ignore
>  Presence Detect. Since we want ATTN button to drive the
>  hotplug event.
> 

Did you get a chance to review this? Thought i'll just check with you. 

Seems like there was a lot happening in the error handling and hotplug
side, I thought you might have wanted to wait for the dust to settle :-)

> 
> Signed-off-by: Ashok Raj 
> Co-developed-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  drivers/pci/hotplug/pciehp.h  |  1 +
>  drivers/pci/hotplug/pciehp_ctrl.c | 69 
> +++
>  drivers/pci/hotplug/pciehp_hpc.c  | 27 ++-
>  3 files changed, 96 insertions(+), 1 deletion(-)
>

Re: [PATCH] task_work: cleanup notification modes

2020-10-16 Thread Jens Axboe

On 10/16/20 5:38 PM, Thomas Gleixner wrote:
> On Fri, Oct 16 2020 at 17:13, Jens Axboe wrote:
>> /**
>>  * task_work_add - ask the @task to execute @work->func()
>>  * @task: the task which should run the callback
>>  * @work: the callback to run
>>  * @notify: how to notify the targeted task
>>  *
>>  * Queue @work for task_work_run() below and notify the @task if @notify
>>  * is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL work like signals, in that the
> 
> s/the//

Thanks, good catch.

>>  * it will interrupt the targeted task and run the task_work. @TWA_RESUME
>>  * work is run only when the task exits the kernel and returns to user mode.
>>  * Fails if the @task is exiting/exited and thus it can't process this @work.
>>  * Otherwise @work->func() will be called when the @task returns from kernel
>>  * mode or exits.
> 
> Yes, that makes a lot more sense.
> 
> What's still lacking is a description of the return value and how to act
> upon it.

That's really up to the caller. But we should add some explanation of
that. Most callers use some alternative if the task is exiting, like
using a work queue for example.

> Most of the call sites ignore it, some are acting upon it but I can't

If you know the task isn't exiting, then yeah you can ignore it. But
seems a bit dicey...

> make any sense of these actions:
> 
> fs/io_uring.c-notify = 0;
> fs/io_uring.c-if (!(ctx->flags & IORING_SETUP_SQPOLL) && 
> twa_signal_ok)
> fs/io_uring.c-notify = TWA_SIGNAL;
> fs/io_uring.c-
> fs/io_uring.c:ret = task_work_add(tsk, >task_work, notify);
> fs/io_uring.c-if (!ret)
> fs/io_uring.c-wake_up_process(tsk);
> 
> ???
> 
> fs/io_uring.c-if (unlikely(ret)) {
> fs/io_uring.c-struct task_struct *tsk;
> fs/io_uring.c-
> fs/io_uring.c-init_task_work(>task_work, 
> io_req_task_cancel);
> fs/io_uring.c-tsk = io_wq_get_task(req->ctx->io_wq);
> fs/io_uring.c:task_work_add(tsk, >task_work, 0);
> fs/io_uring.c-wake_up_process(tsk);
> 
> yet more magic wakeup.

It's not magic, but probably needs a comment... If we fail, that task is
exiting. But we know we have our io-wq threads, so we use that as a
fallback. Not really expected in the fast path.

> fs/io_uring.c-
> fs/io_uring.c-init_task_work(>task_work, io_req_task_submit);
> fs/io_uring.c-percpu_ref_get(>ctx->refs);
> fs/io_uring.c-
> fs/io_uring.c-/* submit ref gets dropped, acquire a new one */
> fs/io_uring.c-refcount_inc(>refs);
> fs/io_uring.c:ret = io_req_task_work_add(req, true);
> fs/io_uring.c-if (unlikely(ret)) {
> fs/io_uring.c-struct task_struct *tsk;
> fs/io_uring.c-
> fs/io_uring.c-/* queue just for cancelation */
> fs/io_uring.c-init_task_work(>task_work, 
> io_req_task_cancel);
> fs/io_uring.c-tsk = io_wq_get_task(req->ctx->io_wq);
> fs/io_uring.c:task_work_add(tsk, >task_work, 0);
> fs/io_uring.c-wake_up_process(tsk);
> 
> Ditto. Why the heck is this wakeup making any sense? The initial
> task_work_add() within io_req_task_work_add() failed already ...

Right, but we're using a new task for this. And that task is a kthread
that we manage, hence no notification is needed outside of just waking
it up.

-- 
Jens Axboe

drivers/staging/wfx/data_tx.c:34:19: warning: variable 'band' is uninitialized when used here

2020-10-16 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   8119c4332d253660e0a6b8748fe0749961cfbc97
commit: 868fd970e187d39c586565c875837e530c6f7e1b staging: wfx: improve 
robustness of wfx_get_hw_rate()
date:   7 days ago
config: powerpc64-randconfig-r001-20201016 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
5fbab4025eb57b12f2842ab188ff07a110708e1d)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc64 cross compiling tool for clang build
# apt-get install binutils-powerpc64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=868fd970e187d39c586565c875837e530c6f7e1b
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout 868fd970e187d39c586565c875837e530c6f7e1b
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/staging/wfx/data_tx.c:34:19: warning: variable 'band' is 
>> uninitialized when used here [-Wuninitialized]
   if (rate->idx >= band->n_bitrates) {
^~~~
   include/linux/compiler.h:56:47: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
 ^~~~
   include/linux/compiler.h:58:52: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))
  ^~~~
   drivers/staging/wfx/data_tx.c:23:39: note: initialize the variable 'band' to 
silence this warning
   struct ieee80211_supported_band *band;
^
 = NULL
   1 warning generated.

vim +/band +34 drivers/staging/wfx/data_tx.c

19  
20  static int wfx_get_hw_rate(struct wfx_dev *wdev,
21 const struct ieee80211_tx_rate *rate)
22  {
23  struct ieee80211_supported_band *band;
24  
25  if (rate->idx < 0)
26  return -1;
27  if (rate->flags & IEEE80211_TX_RC_MCS) {
28  if (rate->idx > 7) {
29  WARN(1, "wrong rate->idx value: %d", rate->idx);
30  return -1;
31  }
32  return rate->idx + 14;
33  }
  > 34  if (rate->idx >= band->n_bitrates) {
35  WARN(1, "wrong rate->idx value: %d", rate->idx);
36  return -1;
37  }
38  // WFx only support 2GHz, else band information should be 
retrieved
39  // from ieee80211_tx_info
40  band = wdev->hw->wiphy->bands[NL80211_BAND_2GHZ];
41  return band->bitrates[rate->idx].hw_value;
42  }
43  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH v2] gpiolib: cdev: document that line eflags are shared

2020-10-16 Thread Kent Gibson

On Fri, Oct 16, 2020 at 05:24:14PM +0300, Andy Shevchenko wrote:
> On Wed, Oct 14, 2020 at 12:21 PM Kent Gibson  wrote:
> >
> > The line.eflags field is shared so document this fact and highlight it
> > throughout using READ_ONCE() and WRITE_ONCE() accessors.
> >
> > Also use a local copy of the eflags in edge_irq_thread() to ensure
> > consistent control flow even if eflags changes.  This is only a defensive
> > measure as edge_irq_thread() is currently disabled when the eflags are
> > changed.
> 
> > -   if (line->eflags == (GPIO_V2_LINE_FLAG_EDGE_RISING |
> > -GPIO_V2_LINE_FLAG_EDGE_FALLING)) {
> > +   eflags = READ_ONCE(line->eflags);
> > +   if (eflags == (GPIO_V2_LINE_FLAG_EDGE_RISING |
> > +  GPIO_V2_LINE_FLAG_EDGE_FALLING)) {
> 
> Hmm... side note: perhaps at some point
> 
> #define GPIO_V2_LINE_FLAG_EDGE_BOTH  \
> (GPIO_V2_LINE_FLAG_EDGE_RISING | GPIO_V2_LINE_FLAG_EDGE_FALLING)
> 
>if (eflags == GPIO_V2_LINE_FLAG_EDGE_BOTH) {
> 
> ?

Yeah, that would make sense.  I think I used GPIO_V2_LINE_EDGE_FLAGS,
which is defined the same as your GPIO_V2_LINE_FLAG_EDGE_BOTH, here at
some point, but that just looked wrong.

The GPIO_V2_LINE_FLAG_EDGE_BOTH does read better.  I'll add it to the
todo list.

Cheers,
Kent.

Re: [PATCH] task_work: cleanup notification modes

2020-10-16 Thread Thomas Gleixner

On Fri, Oct 16 2020 at 17:13, Jens Axboe wrote:
> /**
>  * task_work_add - ask the @task to execute @work->func()
>  * @task: the task which should run the callback
>  * @work: the callback to run
>  * @notify: how to notify the targeted task
>  *
>  * Queue @work for task_work_run() below and notify the @task if @notify
>  * is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL work like signals, in that the

s/the//

>  * it will interrupt the targeted task and run the task_work. @TWA_RESUME
>  * work is run only when the task exits the kernel and returns to user mode.
>  * Fails if the @task is exiting/exited and thus it can't process this @work.
>  * Otherwise @work->func() will be called when the @task returns from kernel
>  * mode or exits.

Yes, that makes a lot more sense.

What's still lacking is a description of the return value and how to act
upon it.

Most of the call sites ignore it, some are acting upon it but I can't
make any sense of these actions:

fs/io_uring.c-  notify = 0;
fs/io_uring.c-  if (!(ctx->flags & IORING_SETUP_SQPOLL) && twa_signal_ok)
fs/io_uring.c-  notify = TWA_SIGNAL;
fs/io_uring.c-
fs/io_uring.c:  ret = task_work_add(tsk, >task_work, notify);
fs/io_uring.c-  if (!ret)
fs/io_uring.c-  wake_up_process(tsk);

???

fs/io_uring.c-  if (unlikely(ret)) {
fs/io_uring.c-  struct task_struct *tsk;
fs/io_uring.c-
fs/io_uring.c-  init_task_work(>task_work, io_req_task_cancel);
fs/io_uring.c-  tsk = io_wq_get_task(req->ctx->io_wq);
fs/io_uring.c:  task_work_add(tsk, >task_work, 0);
fs/io_uring.c-  wake_up_process(tsk);

yet more magic wakeup.

fs/io_uring.c-
fs/io_uring.c-  init_task_work(>task_work, io_req_task_submit);
fs/io_uring.c-  percpu_ref_get(>ctx->refs);
fs/io_uring.c-
fs/io_uring.c-  /* submit ref gets dropped, acquire a new one */
fs/io_uring.c-  refcount_inc(>refs);
fs/io_uring.c:  ret = io_req_task_work_add(req, true);
fs/io_uring.c-  if (unlikely(ret)) {
fs/io_uring.c-  struct task_struct *tsk;
fs/io_uring.c-
fs/io_uring.c-  /* queue just for cancelation */
fs/io_uring.c-  init_task_work(>task_work, io_req_task_cancel);
fs/io_uring.c-  tsk = io_wq_get_task(req->ctx->io_wq);
fs/io_uring.c:  task_work_add(tsk, >task_work, 0);
fs/io_uring.c-  wake_up_process(tsk);

Ditto. Why the heck is this wakeup making any sense? The initial
task_work_add() within io_req_task_work_add() failed already ...

fs/io_uring.c:  ret = io_req_task_work_add(req, twa_signal_ok);
fs/io_uring.c-  if (unlikely(ret)) {
fs/io_uring.c-  struct task_struct *tsk;
fs/io_uring.c-
fs/io_uring.c-  WRITE_ONCE(poll->canceled, true);
fs/io_uring.c-  tsk = io_wq_get_task(req->ctx->io_wq);
fs/io_uring.c:  task_work_add(tsk, >task_work, 0);
fs/io_uring.c-  wake_up_process(tsk);

...

Thanks,

tglx

Re: [PATCHv4] selftests: rtnetlink: load fou module for kci_test_encap_fou() test

2020-10-16 Thread Jakub Kicinski

On Fri, 16 Oct 2020 12:12:11 +0800 Po-Hsu Lin wrote:
> The kci_test_encap_fou() test from kci_test_encap() in rtnetlink.sh
> needs the fou module to work. Otherwise it will fail with:
> 
>   $ ip netns exec "$testns" ip fou add port  ipproto 47
>   RTNETLINK answers: No such file or directory
>   Error talking to the kernel
> 
> Add the CONFIG_NET_FOU into the config file as well. Which needs at
> least to be set as a loadable module.
> 
> Signed-off-by: Po-Hsu Lin 

Doesn't apply :( Could you rebase on top of:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/

Re: [PATCH v2 1/3] gpiolib: cdev: allow edge event timestamps to be configured as REALTIME

2020-10-16 Thread Kent Gibson

On Fri, Oct 16, 2020 at 05:13:22PM +0300, Andy Shevchenko wrote:
> On Thu, Oct 15, 2020 at 6:53 AM Kent Gibson  wrote:
> >
> > Using CLOCK_REALTIME as the source for event timestamps is crucial for
> > some specific applications, particularly those requiring timetamps
> > relative to a PTP clock, so provide an option to switch the event
> > timestamp source from the default CLOCK_MONOTONIC to CLOCK_REALTIME.
> >
> > Note that CLOCK_REALTIME was the default source clock for GPIO until
> > Linux 5.7 when it was changed to CLOCK_MONOTONIC due to issues with the
> > shifting of the realtime clock.
> > Providing this option maintains the CLOCK_MONOTONIC as the default,
> > while also providing a path forward for those dependent on the pre-5.7
> > behaviour.
> 
> ...
> 
> >  GPIO_V2_LINE_DIRECTION_FLAGS | \
> >  GPIO_V2_LINE_DRIVE_FLAGS | \
> >  GPIO_V2_LINE_EDGE_FLAGS | \
> > +GPIO_V2_LINE_FLAG_EVENT_CLOCK_REALTIME | \
> 
> Wondering if we would have something like
> 
>   GPIO_V2_LINE_CLOCK_FLAGS | \
> 
> here for the sake of consistency.
> 

I considered it, but thought the chance of there ever being another
CLOCK flag to be near zero - the remaining clocks relate to CPU or
thread time which don't have any relevance for external events like
GPIO.  If there ever is one we can split it out then.

And it is consistent with the GPIO_V2_LINE_FLAG_ACTIVE_LOW flag that you
pruned out in your response. i.e. lone flags don't get grouped.

> >  GPIO_V2_LINE_BIAS_FLAGS)
> 
> ...
> 
> > +static u64 line_event_timestamp(struct line *line)
> > +{
> 
> > +   if (test_bit(FLAG_EVENT_CLOCK_REALTIME, >desc->flags))
> 
> I dunno if we can actually drop the word EVENT from these definitions.
> I don't think we would have in the near future something similar for
> the non-event data.
> 

I considered this too, as another clock flag seems unlikely.
But if we ever add a non-event clock flag then this one would become
confusing, and the overhead of the long name seemed minor compared to
the clarity it brings with it.

Cheers,
Kent.

Re: [RFC PATCH resend 3/6] mm: Add refcount for preserving mm_struct without pgd

2020-10-16 Thread Jason Gunthorpe

On Sat, Oct 17, 2020 at 01:09:12AM +0200, Jann Horn wrote:
> Currently, mm_struct has two refcounts:
> 
>  - mm_users: preserves everything - the mm_struct, the page tables, the
>memory mappings, and so on
>  - mm_count: preserves the mm_struct and pgd
> 
> However, there are three types of users of mm_struct:
> 
> 1. users that want page tables, memory mappings and so on
> 2. users that want to preserve the pgd (for lazy TLB)
> 3. users that just want to keep the mm_struct itself around (e.g. for
>mmget_not_zero() or __ptrace_may_access())
> 
> Dropping mm_count references can be messy because dropping mm_count to
> zero deletes the pgd, which takes the pgd_lock on x86, meaning it doesn't
> work from RCU callbacks (which run in IRQ context). In those cases,
> mmdrop_async() must be used to punt the invocation of __mmdrop() to
> workqueue context.
> 
> That's fine when mmdrop_async() is a rare case, but the preceding patch
> "ptrace: Keep mm around after exit_mm() for __ptrace_may_access()" makes it
> the common case; we should probably avoid punting freeing to workqueue
> context all the time if we can avoid it?
> 
> To resolve this, add a third refcount that just protects the mm_struct and
> the user_ns it points to, and which can be dropped with synchronous freeing
> from (almost) any context.
> 
> Signed-off-by: Jann Horn 
> ---
>  arch/x86/kernel/tboot.c|  2 ++
>  drivers/firmware/efi/efi.c |  2 ++
>  include/linux/mm_types.h   | 13 +++--
>  include/linux/sched/mm.h   | 13 +
>  kernel/fork.c  | 14 ++
>  mm/init-mm.c   |  2 ++
>  6 files changed, 40 insertions(+), 6 deletions(-)

I think mmu notifiers and the stuff in drivers/infiniband/core/ can be
converted to this as well..

Actually I kind of wonder if you should go the reverse and find the
few callers that care about the pgd and give them a new api with a
better name (mmget_pgd?).

Jason

Re: [PATCH v1 1/6] staging: qlge: Initialize devlink health dump framework for the dlge driver

2020-10-16 Thread Coiby Xu

On Thu, Oct 15, 2020 at 08:06:06PM +0900, Benjamin Poirier wrote:

On 2020-10-15 11:37 +0800, Coiby Xu wrote:

On Tue, Oct 13, 2020 at 09:37:04AM +0900, Benjamin Poirier wrote:
> On 2020-10-12 19:24 +0800, Coiby Xu wrote:
> [...]
> > > I think, but didn't check in depth, that in those drivers, the devlink
> > > device is tied to the pci device and can exist independently of the
> > > netdev, at least in principle.
> > >
> > You are right. Take drivers/net/ethernet/mellanox/mlxsw as an example,
> > devlink reload would first first unregister_netdev and then
> > register_netdev but struct devlink stays put. But I have yet to
> > understand when unregister/register_netdev is needed.
>
> Maybe it can be useful to manually recover if the hardware or driver
> gets in an erroneous state. I've used `modprobe -r qlge && modprobe
> qlge` for the same in the past.

Thank you for providing this user case!
>
> > Do we need to
> > add "devlink reload" for qlge?
>
> Not for this patchset. That would be a new feature.

To implement this feature, it seems I need to understand how qlge work
under the hood. For example, what's the difference between
qlge_soft_reset_mpi_risc and qlge_hard_reset_mpi_risc? Or should we use
a brute-force way like do the tasks in qlge_remove and then re-do the
tasks in qlge_probe?

I don't know. Like I've said before, I'd recommend testing on actual
hardware. I don't have access to it anymore.

Yeah, as I'm changing more code, it's more and more important to test
it on actual hardware. Have you heard anyone installing qle8142 to
Raspberry Pi which has a PCIe bus.

Is a hardware reference manual for qlge device?

I've never gotten access to one.

My experience of wrestling with an AMD GPIO chip [1] shows it would
be a bit annoying to deal with a device without a reference manual.
I have to treat it like a blackbox and try different kinds of input
to see what would happen.

Btw, it seems resetting the device is a kind of panacea. For example,
according to the specs of my touchpad (Synaptics RMI4 Specification),
it even has the feature of spontaneous reset. devlink health [2] also
has the so-called auto-recovery. So resetting is a common phenomenon. I
wonder if there are some common guidelines to do resetting which also
apply to the qlge8*** devices.

The only noteworthy thing from Qlogic that I know of is the firmware
update:
http://driverdownloads.qlogic.com/QLogicDriverDownloads_UI/SearchByProduct.aspx?ProductCategory=322=1104=190

It did fix some weird behavior when I applied it so I'd recommend doing
the same if you get an adapter.

Thank you for sharing the info!

[1] https://www.spinics.net/lists/linux-gpio/msg53901.html
[2] 
https://www.kernel.org/doc/html/latest/networking/devlink/devlink-health.html

--
Best regards,
Coiby

Re: [PATCH v2 1/7] staging: qlge: replace ql_* with qlge_* to avoid namespace clashes with other qlogic drivers

2020-10-16 Thread Coiby Xu


On Thu, Oct 15, 2020 at 10:01:36AM +0900, Benjamin Poirier wrote:

On 2020-10-14 18:43 +0800, Coiby Xu wrote:

To avoid namespace clashes with other qlogic drivers and also for the
sake of naming consistency, use the "qlge_" prefix as suggested in
drivers/staging/qlge/TODO.

Suggested-by: Benjamin Poirier 
Signed-off-by: Coiby Xu 
---
 drivers/staging/qlge/TODO   |4 -
 drivers/staging/qlge/qlge.h |  190 ++--
 drivers/staging/qlge/qlge_dbg.c | 1073 ---
 drivers/staging/qlge/qlge_ethtool.c |  231 ++---
 drivers/staging/qlge/qlge_main.c| 1257 +--
 drivers/staging/qlge/qlge_mpi.c |  352 
 6 files changed, 1551 insertions(+), 1556 deletions(-)

diff --git a/drivers/staging/qlge/TODO b/drivers/staging/qlge/TODO
index f93f7428f5d5..5ac55664c3e2 100644
--- a/drivers/staging/qlge/TODO
+++ b/drivers/staging/qlge/TODO
@@ -28,10 +28,6 @@
 * the driver has a habit of using runtime checks where compile time checks are
   possible (ex. ql_free_rx_buffers(), ql_alloc_rx_buffers())
 * reorder struct members to avoid holes if it doesn't impact performance
-* in terms of namespace, the driver uses either qlge_, ql_ (used by
-  other qlogic drivers, with clashes, ex: ql_sem_spinlock) or nothing (with
-  clashes, ex: struct ob_mac_iocb_req). Rename everything to use the "qlge_"
-  prefix.


You only renamed ql -> qlge. The prefix needs to be added where there is
currently none like the second example of that text.


On second thoughts, these structs like ob_mac_iocb_req are defined in
local headers and there is no mixed usage. So even when we want to
build this diver and other qlogic drivers into the kernel instead of
as separate modules, it won't lead to real problems, is it right?


Besides, the next patch reintroduces the name struct ql_adapter.


--
Best regards,
Coiby

Re: [PATCH v3 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap

2020-10-16 Thread John Stultz

On Fri, Oct 16, 2020 at 12:03 PM John Stultz  wrote:
> On Thu, Oct 8, 2020 at 4:51 AM Brian Starkey  wrote:
> > On Sat, Oct 03, 2020 at 04:02:57AM +, John Stultz wrote:
> > > @@ -426,6 +487,16 @@ static int system_heap_create(void)
> > >   if (IS_ERR(sys_heap))
> > >   return PTR_ERR(sys_heap);
> > >
> > > + exp_info.name = "system-uncached";
> > > + exp_info.ops = _uncached_heap_ops;
> > > + exp_info.priv = NULL;
> > > +
> > > + sys_uncached_heap = dma_heap_add(_info);
> > > + if (IS_ERR(sys_uncached_heap))
> > > + return PTR_ERR(sys_heap);
> > > +
> >
> > In principle, there's a race here between the heap getting registered
> > to sysfs and the dma_mask getting updated.
> >
> > I don't think it would cause a problem in practice, but with the API
> > as it is, there's no way to avoid it - so I wonder if the
> > dma_heap_get_dev() accessor isn't the right API design.
>
> Hrm.  I guess to address your concern we would need split
> dma_heap_add() into something like:
>   dma_heap_create()
>   dma_heap_add()
>
> Which breaks the creation of the heap with the registering it to the
> subsystem, so some attributes can be tweaked inbetween?

Looking at this some more, this approach isn't going to work. We
create the device and then we call dma_coerce_mask_and_coherent() on
it, but as soon as the device is created it seems possible for
userland to directly access it. Again, though, as you mentioned this
isn't terribly likely in practice.

The best thing I can think of for now is to have the uncached heap's
allocate pointer initially point to a dummy function that returns
EBUSY and then after we update the dma mask then we can set it to the
real alloc.  I'll go with that for now, but let me know if you have
other suggestions.

thanks
-john

Re: [RFC v2 1/8] drm/i915/dp: Program source OUI on eDP panels

2020-10-16 Thread Vasily Khoruzhick

On Wed, Sep 16, 2020 at 10:19 AM Lyude Paul  wrote:
>
> Since we're about to start adding support for Intel's magic HDR
> backlight interface over DPCD, we need to ensure we're properly
> programming this field so that Intel specific sink services are exposed.
> Otherwise, 0x300-0x3ff will just read zeroes.
>
> We also take care not to reprogram the source OUI if it already matches
> what we expect. This is just to be careful so that we don't accidentally
> take the panel out of any backlight control modes we found it in.
>
> v2:
> * Add careful parameter to intel_edp_init_source_oui() to avoid
>   re-writing the source OUI if it's already been set during driver
>   initialization
>
> Signed-off-by: Lyude Paul 
> Cc: thay...@noraisin.net
> Cc: Vasily Khoruzhick 

Hi Lyude,

Sorry for a late reply.

Whole series is:

Tested-by: Vasily Khoruzhick 

Regards,
Vasily

> ---
>  drivers/gpu/drm/i915/display/intel_dp.c | 33 +
>  1 file changed, 33 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 4bd10456ad188..7db2b6a3cd52e 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -3424,6 +3424,29 @@ void intel_dp_sink_set_decompression_state(struct 
> intel_dp *intel_dp,
> enable ? "enable" : "disable");
>  }
>
> +static void
> +intel_edp_init_source_oui(struct intel_dp *intel_dp, bool careful)
> +{
> +   struct drm_i915_private *i915 = dp_to_i915(intel_dp);
> +   u8 oui[] = { 0x00, 0xaa, 0x01 };
> +   u8 buf[3] = { 0 };
> +
> +   /*
> +* During driver init, we want to be careful and avoid changing the 
> source OUI if it's
> +* already set to what we want, so as to avoid clearing any state by 
> accident
> +*/
> +   if (careful) {
> +   if (drm_dp_dpcd_read(_dp->aux, DP_SOURCE_OUI, buf, 
> sizeof(buf)) < 0)
> +   drm_err(>drm, "Failed to read source OUI\n");
> +
> +   if (memcmp(oui, buf, sizeof(oui)) == 0)
> +   return;
> +   }
> +
> +   if (drm_dp_dpcd_write(_dp->aux, DP_SOURCE_OUI, oui, 
> sizeof(oui)) < 0)
> +   drm_err(>drm, "Failed to write source OUI\n");
> +}
> +
>  /* If the sink supports it, try to set the power state appropriately */
>  void intel_dp_sink_dpms(struct intel_dp *intel_dp, int mode)
>  {
> @@ -3443,6 +3466,10 @@ void intel_dp_sink_dpms(struct intel_dp *intel_dp, int 
> mode)
> } else {
> struct intel_lspcon *lspcon = dp_to_lspcon(intel_dp);
>
> +   /* Write the source OUI as early as possible */
> +   if (intel_dp_is_edp(intel_dp))
> +   intel_edp_init_source_oui(intel_dp, false);
> +
> /*
>  * When turning on, we need to retry for 1ms to give the sink
>  * time to wake up.
> @@ -4607,6 +4634,12 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
> if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
> intel_dp_get_dsc_sink_cap(intel_dp);
>
> +   /*
> +* If needed, program our source OUI so we can make various 
> Intel-specific AUX services
> +* available (such as HDR backlight controls)
> +*/
> +   intel_edp_init_source_oui(intel_dp, true);
> +
> return true;
>  }
>
> --
> 2.26.2
>

Re: [PATCH] task_work: cleanup notification modes

2020-10-16 Thread Jens Axboe

On 10/16/20 5:09 PM, Thomas Gleixner wrote:
> On Fri, Oct 16 2020 at 16:39, Jens Axboe wrote:
>> On 10/16/20 3:44 PM, Thomas Gleixner wrote:
 - * @notify: send the notification if true
 + * @notify: send chosen notification, if any
>>>
>>> Is that really all you found to be wrong in that comment?
>>
>> There really is nothing wrong, but it's not very descriptive (wasn't
>> before either).
> 
>  * This is like the signal handler which runs in kernel mode, but it doesn't  
>   
>   
>  
>  * try to wake up the @task.   
> 
> If find a lot of wrongs in that sentence in context of TWA_SIGNAL.
> 
> Agreed, it was hard to understand before that, but with TWA_SIGNAL it
> does not make sense at all.

This is what I currently have:

/**
 * task_work_add - ask the @task to execute @work->func()
 * @task: the task which should run the callback
 * @work: the callback to run
 * @notify: how to notify the targeted task
 *
 * Queue @work for task_work_run() below and notify the @task if @notify
 * is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL work like signals, in that the
 * it will interrupt the targeted task and run the task_work. @TWA_RESUME
 * work is run only when the task exits the kernel and returns to user mode.
 * Fails if the @task is exiting/exited and thus it can't process this @work.
 * Otherwise @work->func() will be called when the @task returns from kernel
 * mode or exits.
 *
 * Note: there is no ordering guarantee on works queued here.
 *
 * RETURNS:
 * 0 if succeeds or -ESRCH.
 */

-- 
Jens Axboe

arm64: dropping prevent_bootmem_remove_notifier

2020-10-16 Thread Sudarshan Rajagopalan




Hello Anshuman,

In the patch that enables memory hot-remove (commit bbd6ec605c0f 
("arm64/mm: Enable memory hot remove")) for arm64, there’s a notifier 
put in place that prevents boot memory from being offlined and removed. 
Also commit text mentions that boot memory on arm64 cannot be removed. 
We wanted to understand more about the reasoning for this. X86 and other 
archs doesn’t seem to do this prevention. There’s also comment in the 
code that this notifier could be dropped in future if and when boot 
memory can be removed.


The current logic is that only “new” memory blocks which are hot-added 
can later be offlined and removed. The memory that system booted up with 
cannot be offlined and removed. But there could be many usercases such 
as inter-VM memory sharing where a primary VM could offline and 
hot-remove a block/section of memory and lend it to secondary VM where 
it could hot-add it. And after usecase is done, the reverse happens 
where secondary VM hot-removes and gives it back to primary which can 
hot-add it back. In such cases, the present logic for arm64 doesn’t 
allow this hot-remove in primary to happen.


Also, on systems with movable zone that sort of guarantees pages to be 
migrated and isolated so that blocks can be offlined, this logic also 
defeats the purpose of having a movable zone which system can rely on 
memory hot-plugging, which say virt-io mem also relies on for fully 
plugged memory blocks.


I understand that some region of boot RAM shouldn’t be allowed to be 
removed, but such regions won’t be allowed to be offlined in first place 
since pages cannot be migrated and isolated, example reserved pages.


So we’re trying to understand the reasoning for such a prevention put in 
place for arm64 arch alone.


One possible way to solve this is by marking the required sections as 
“non-early” by removing the SECTION_IS_EARLY bit in its section_mem_map. 
This puts these sections in the context of “memory hotpluggable” which 
can be offlined-removed and added-onlined which are part of boot RAM 
itself and doesn’t need any extra blocks to be hot added. This way of 
marking certain sections as “non-early” could be exported so that module 
drivers can set the required number of sections as “memory 
hotpluggable”. This could have certain checks put in place to see which 
sections are allowed, example only movable zone sections can be marked 
as “non-early”.


Your thoughts on this? We are also looking for different ways to solve 
the problem without having to completely dropping this notifier, but 
just putting out the concern here about the notifier logic that is 
breaking our usecase which is a generic memory sharing usecase using 
memory hotplug feature.



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

[RFC PATCH resend 6/6] mm: remove now-unused mmdrop_async()

2020-10-16 Thread Jann Horn

The preceding patches have removed all users of mmdrop_async(); get rid of
it.

Note that on MMU, we still need async_put_work because mmput_async() uses
it, which in turn is used by binder's shrinker callback. We could claw back
those 4 words per mm if we made mmput_async() depend on
CONFIG_ANDROID_BINDER_IPC.

Signed-off-by: Jann Horn 
---
 include/linux/mm_types.h |  2 ++
 kernel/fork.c| 16 
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 764d251966c7..8fde2068bde1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -560,7 +560,9 @@ struct mm_struct {
 #ifdef CONFIG_HUGETLB_PAGE
atomic_long_t hugetlb_usage;
 #endif
+#ifdef CONFIG_MMU
struct work_struct async_put_work;
+#endif
} __randomize_layout;
 
/*
diff --git a/kernel/fork.c b/kernel/fork.c
index 4383bf055b40..c5f2ec544933 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -666,22 +666,6 @@ void __mmdrop(struct mm_struct *mm)
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
 
-static void mmdrop_async_fn(struct work_struct *work)
-{
-   struct mm_struct *mm;
-
-   mm = container_of(work, struct mm_struct, async_put_work);
-   __mmdrop(mm);
-}
-
-static void mmdrop_async(struct mm_struct *mm)
-{
-   if (unlikely(atomic_dec_and_test(>mm_count))) {
-   INIT_WORK(>async_put_work, mmdrop_async_fn);
-   schedule_work(>async_put_work);
-   }
-}
-
 static inline void free_signal_struct(struct signal_struct *sig)
 {
taskstats_tgid_free(sig);
-- 
2.29.0.rc1.297.gfa9743e501-goog

Re: [PATCH] task_work: cleanup notification modes

2020-10-16 Thread Thomas Gleixner

On Fri, Oct 16 2020 at 16:39, Jens Axboe wrote:
> On 10/16/20 3:44 PM, Thomas Gleixner wrote:
>>> - * @notify: send the notification if true
>>> + * @notify: send chosen notification, if any
>> 
>> Is that really all you found to be wrong in that comment?
>
> There really is nothing wrong, but it's not very descriptive (wasn't
> before either).

 * This is like the signal handler which runs in kernel mode, but it doesn't

 * try to wake up the @task.   

If find a lot of wrongs in that sentence in context of TWA_SIGNAL.

Agreed, it was hard to understand before that, but with TWA_SIGNAL it
does not make sense at all.

Thanks,

tglx

[RFC PATCH resend 4/6] mm, oom: Use mm_ref()/mm_unref() and avoid mmdrop_async()

2020-10-16 Thread Jann Horn

The OOM killer uses MMF_OOM_SKIP to avoid running on an mm that has started
__mmput(); it only uses the mmgrab() reference to ensure that the mm_struct
itself stays alive.

This means that we don't need a full mmgrab() reference, which will keep
the pgd (and potentially also some pmd pages) alive and can't be cleaned up
from RCU callback context; we can use an mm_ref() reference instead.

Signed-off-by: Jann Horn 
---
 kernel/fork.c | 6 +-
 mm/oom_kill.c | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index fcdd1ace79e4..59c119b03351 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -686,12 +686,8 @@ static inline void free_signal_struct(struct signal_struct 
*sig)
 {
taskstats_tgid_free(sig);
sched_autogroup_exit(sig);
-   /*
-* __mmdrop is not safe to call from softirq context on x86 due to
-* pgd_dtor so postpone it to the async context
-*/
if (sig->oom_mm)
-   mmdrop_async(sig->oom_mm);
+   mm_unref(sig->oom_mm);
kmem_cache_free(signal_cachep, sig);
 }
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index e90f25d6385d..12967f54fbcf 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -704,7 +704,7 @@ static void mark_oom_victim(struct task_struct *tsk)
 
/* oom_mm is bound to the signal struct life time. */
if (!cmpxchg(>signal->oom_mm, NULL, mm)) {
-   mmgrab(tsk->signal->oom_mm);
+   mm_ref(tsk->signal->oom_mm);
set_bit(MMF_OOM_VICTIM, >flags);
}
 
-- 
2.29.0.rc1.297.gfa9743e501-goog

[RFC PATCH resend 5/6] ptrace: Use mm_ref() for ->exit_mm

2020-10-16 Thread Jann Horn

We only use ->exit_mm to look up dumpability and the ->user_mm; we don't
need to keep the PGD alive for this.
mmgrab() is also inconvenient here, because it means that we need to use
mmdrop_async() when dropping the reference to the mm from an RCU callback.
Use mm_ref() instead of mmgrab() to make things neater.

Signed-off-by: Jann Horn 
---
 kernel/exit.c | 2 +-
 kernel/fork.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 97253ef33486..03ba6d13ef1e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -476,7 +476,7 @@ static void exit_mm(void)
/* more a memory barrier than a real lock */
task_lock(current);
current->mm = NULL;
-   mmgrab(mm); /* for current->exit_mm */
+   mm_ref(mm); /* for current->exit_mm */
current->exit_mm = mm;
mmap_read_unlock(mm);
enter_lazy_tlb(mm, current);
diff --git a/kernel/fork.c b/kernel/fork.c
index 59c119b03351..4383bf055b40 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -720,7 +720,7 @@ void free_task(struct task_struct *tsk)
if (tsk->flags & PF_KTHREAD)
free_kthread_struct(tsk);
if (tsk->exit_mm)
-   mmdrop_async(tsk->exit_mm);
+   mm_unref(tsk->exit_mm);
free_task_struct(tsk);
 }
 EXPORT_SYMBOL(free_task);
-- 
2.29.0.rc1.297.gfa9743e501-goog

[RFC PATCH resend 1/6] ptrace: Keep mm around after exit_mm() for __ptrace_may_access()

2020-10-16 Thread Jann Horn

__ptrace_may_access() checks can happen on target tasks that are in the
middle of do_exit(), past exit_mm(). At that point, the ->mm pointer has
been NULLed out, and the mm_struct has been mmput().

Unfortunately, the mm_struct contains the dumpability and the user_ns in
which the task last went through execve(), and we need those for
__ptrace_may_access(). Currently, that problem is handled by failing open:
If the ->mm is gone, we assume that the task was dumpable. In some edge
cases, this could potentially expose access to things like
/proc/$pid/fd/$fd of originally non-dumpable processes.
(exit_files() comes after exit_mm(), so the file descriptor table is still
there when we've gone through exit_mm().)

One way to fix this would be to move mm->user_ns and the dumpability state
over into the task_struct. However, that gets quite ugly if we want to
preserve existing semantics because e.g. PR_SET_DUMPABLE and commit_creds()
would then have to scan through all tasks sharing the mm_struct and keep
them in sync manually - that'd be a bit error-prone and overcomplicated.

(Moving these things into the signal_struct is not an option because that
is kept across executions, and pre-execve co-threads will share the
signal_struct that is also used by the task that has gone through
execve().)

I believe that this patch may be the least bad option to fix this - keep
the mm_struct (but not process memory) around with an mmgrab() reference
from exit_mm() until the task goes away completely.

Note that this moves free_task() down in order to make mmdrop_async()
available without a forward declaration.

Cc: sta...@vger.kernel.org
Fixes: bfedb589252c ("mm: Add a user_ns owner to mm_struct and fix ptrace 
permission checks")
Signed-off-by: Jann Horn 
---
 include/linux/sched.h |  8 +++
 kernel/exit.c |  2 ++
 kernel/fork.c | 54 ++-
 kernel/ptrace.c   | 10 
 4 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e232935..55bec6ff5626 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -747,6 +747,14 @@ struct task_struct {
 
struct mm_struct*mm;
struct mm_struct*active_mm;
+   /*
+* When we exit and ->mm (the reference pinning ->mm's address space)
+* goes away, we stash a reference to the mm_struct itself (counted via
+* exit_mm->mm_count) in this member.
+* This allows us to continue using the mm_struct for security checks
+* and such even after the task has started exiting.
+*/
+   struct mm_struct*exit_mm;
 
/* Per-thread vma caching: */
struct vmacache vmacache;
diff --git a/kernel/exit.c b/kernel/exit.c
index 733e80f334e7..97253ef33486 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -476,6 +476,8 @@ static void exit_mm(void)
/* more a memory barrier than a real lock */
task_lock(current);
current->mm = NULL;
+   mmgrab(mm); /* for current->exit_mm */
+   current->exit_mm = mm;
mmap_read_unlock(mm);
enter_lazy_tlb(mm, current);
task_unlock(current);
diff --git a/kernel/fork.c b/kernel/fork.c
index da8d360fb032..4942428a217c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -438,32 +438,6 @@ void put_task_stack(struct task_struct *tsk)
 }
 #endif
 
-void free_task(struct task_struct *tsk)
-{
-   scs_release(tsk);
-
-#ifndef CONFIG_THREAD_INFO_IN_TASK
-   /*
-* The task is finally done with both the stack and thread_info,
-* so free both.
-*/
-   release_task_stack(tsk);
-#else
-   /*
-* If the task had a separate stack allocation, it should be gone
-* by now.
-*/
-   WARN_ON_ONCE(refcount_read(>stack_refcount) != 0);
-#endif
-   rt_mutex_debug_task_free(tsk);
-   ftrace_graph_exit_task(tsk);
-   arch_release_task_struct(tsk);
-   if (tsk->flags & PF_KTHREAD)
-   free_kthread_struct(tsk);
-   free_task_struct(tsk);
-}
-EXPORT_SYMBOL(free_task);
-
 #ifdef CONFIG_MMU
 static __latent_entropy int dup_mmap(struct mm_struct *mm,
struct mm_struct *oldmm)
@@ -722,6 +696,34 @@ static inline void put_signal_struct(struct signal_struct 
*sig)
free_signal_struct(sig);
 }
 
+void free_task(struct task_struct *tsk)
+{
+   scs_release(tsk);
+
+#ifndef CONFIG_THREAD_INFO_IN_TASK
+   /*
+* The task is finally done with both the stack and thread_info,
+* so free both.
+*/
+   release_task_stack(tsk);
+#else
+   /*
+* If the task had a separate stack allocation, it should be gone
+* by now.
+*/
+   WARN_ON_ONCE(refcount_read(>stack_refcount) != 0);
+#endif
+   rt_mutex_debug_task_free(tsk);
+   ftrace_graph_exit_task(tsk);
+

[RFC PATCH resend 2/6] refcount: Move refcount_t definition into linux/types.h

2020-10-16 Thread Jann Horn

I want to use refcount_t in mm_struct, but if refcount_t is defined in
linux/refcount.h, that header would have to be included in
linux/mm_types.h; that would be wasteful.

Let's move refcount_t over into linux/types.h so that includes can be
written less wastefully.

Signed-off-by: Jann Horn 
---
 include/linux/refcount.h | 13 +
 include/linux/types.h| 12 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/refcount.h b/include/linux/refcount.h
index 0e3ee25eb156..fd8cf65e4e2f 100644
--- a/include/linux/refcount.h
+++ b/include/linux/refcount.h
@@ -96,22 +96,11 @@
 #include 
 #include 
 #include 
+#include  /* refcount_t is defined here */
 #include 
 
 struct mutex;
 
-/**
- * struct refcount_t - variant of atomic_t specialized for reference counts
- * @refs: atomic_t counter field
- *
- * The counter saturates at REFCOUNT_SATURATED and will not move once
- * there. This avoids wrapping the counter and causing 'spurious'
- * use-after-free bugs.
- */
-typedef struct refcount_struct {
-   atomic_t refs;
-} refcount_t;
-
 #define REFCOUNT_INIT(n)   { .refs = ATOMIC_INIT(n), }
 #define REFCOUNT_MAX   INT_MAX
 #define REFCOUNT_SATURATED (INT_MIN / 2)
diff --git a/include/linux/types.h b/include/linux/types.h
index a147977602b5..34e4e779e767 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -175,6 +175,18 @@ typedef struct {
 } atomic64_t;
 #endif
 
+/**
+ * struct refcount_t - variant of atomic_t specialized for reference counts
+ * @refs: atomic_t counter field
+ *
+ * The counter saturates at REFCOUNT_SATURATED and will not move once
+ * there. This avoids wrapping the counter and causing 'spurious'
+ * use-after-free bugs.
+ */
+typedef struct refcount_struct {
+   atomic_t refs;
+} refcount_t;
+
 struct list_head {
struct list_head *next, *prev;
 };
-- 
2.29.0.rc1.297.gfa9743e501-goog

[RFC PATCH resend 3/6] mm: Add refcount for preserving mm_struct without pgd

2020-10-16 Thread Jann Horn

Currently, mm_struct has two refcounts:

 - mm_users: preserves everything - the mm_struct, the page tables, the
   memory mappings, and so on
 - mm_count: preserves the mm_struct and pgd

However, there are three types of users of mm_struct:

1. users that want page tables, memory mappings and so on
2. users that want to preserve the pgd (for lazy TLB)
3. users that just want to keep the mm_struct itself around (e.g. for
   mmget_not_zero() or __ptrace_may_access())

Dropping mm_count references can be messy because dropping mm_count to
zero deletes the pgd, which takes the pgd_lock on x86, meaning it doesn't
work from RCU callbacks (which run in IRQ context). In those cases,
mmdrop_async() must be used to punt the invocation of __mmdrop() to
workqueue context.

That's fine when mmdrop_async() is a rare case, but the preceding patch
"ptrace: Keep mm around after exit_mm() for __ptrace_may_access()" makes it
the common case; we should probably avoid punting freeing to workqueue
context all the time if we can avoid it?

To resolve this, add a third refcount that just protects the mm_struct and
the user_ns it points to, and which can be dropped with synchronous freeing
from (almost) any context.

Signed-off-by: Jann Horn 
---
 arch/x86/kernel/tboot.c|  2 ++
 drivers/firmware/efi/efi.c |  2 ++
 include/linux/mm_types.h   | 13 +++--
 include/linux/sched/mm.h   | 13 +
 kernel/fork.c  | 14 ++
 mm/init-mm.c   |  2 ++
 6 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 992fb1415c0f..b92ea1bb3bb9 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -93,6 +94,7 @@ static struct mm_struct tboot_mm = {
.pgd= swapper_pg_dir,
.mm_users   = ATOMIC_INIT(2),
.mm_count   = ATOMIC_INIT(1),
+   .mm_bare_refs   = REFCOUNT_INIT(1),
MMAP_LOCK_INITIALIZER(init_mm)
.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3aa07c3b5136..3b73a0717c6e 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -54,6 +55,7 @@ struct mm_struct efi_mm = {
.mm_rb  = RB_ROOT,
.mm_users   = ATOMIC_INIT(2),
.mm_count   = ATOMIC_INIT(1),
+   .mm_bare_refs   = REFCOUNT_INIT(1),
MMAP_LOCK_INITIALIZER(efi_mm)
.page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ed028af3cb19..764d251966c7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -429,13 +429,22 @@ struct mm_struct {
 
/**
 * @mm_count: The number of references to  mm_struct
-* (@mm_users count as 1).
+* including its pgd (@mm_users count as 1).
 *
 * Use mmgrab()/mmdrop() to modify. When this drops to 0, the
-*  mm_struct is freed.
+* pgd is freed.
 */
atomic_t mm_count;
 
+   /**
+* @mm_bare_refs: The number of references to  mm_struct
+* that preserve no page table state whatsoever (@mm_count
+* counts as 1).
+*
+* When this drops to 0, the  mm_struct is freed.
+*/
+   refcount_t mm_bare_refs;
+
/**
 * @has_pinned: Whether this mm has pinned any pages.  This can
 * be either replaced in the future by @pinned_vm when it
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f889e332912f..e5b236e15757 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -109,6 +109,19 @@ extern void mmput(struct mm_struct *);
 void mmput_async(struct mm_struct *);
 #endif
 
+static inline void mm_ref(struct mm_struct *mm)
+{
+   refcount_inc(>mm_bare_refs);
+}
+
+void __mm_unref(struct mm_struct *mm);
+
+static inline void mm_unref(struct mm_struct *mm)
+{
+   if (refcount_dec_and_test(>mm_bare_refs))
+   __mm_unref(mm);
+}
+
 /* Grab a reference to a task's mm, if it is not already going away */
 extern struct mm_struct *get_task_mm(struct task_struct *task);
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 4942428a217c..fcdd1ace79e4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -642,10 +642,16 @@ static void check_mm(struct mm_struct *mm)
 #define allocate_mm()  (kmem_cache_alloc(mm_cachep, GFP_KERNEL))
 #define

[RFC PATCH resend 0/6] mm and ptrace: Track dumpability until task is freed

2020-10-16 Thread Jann Horn

[sorry, had to resend - it was pointed out to me that when I sent
this series the first time, DKIM got broken by the kvack list
rewriting 8-bit into quoted-printable]

At the moment, there is a lifetime issue (no, not the UAF kind) around
__ptrace_may_access():

__ptrace_may_access() wants to check mm->flags and mm->user_ns to figure
out whether the caller should be allowed to access some target task.
__ptrace_may_access() can be called as long as __put_task_struct()
hasn't happened yet; but __put_task_struct() happens when the task is
about to be freed, which is much later than exit_mm() (which happens
pretty early during task exit).
So we can have a situation where we need to consult the mm for a
security check, but we don't have an mm anymore.

At the moment, this is solved by failing open: If the mm is gone, we
pretend that it was dumpable. That's dubious from a security
perspective - as one example, we drop the mm_struct before the file
descriptor table, so someone might be able to steal file descriptors
from an exiting tasks when dumpability was supposed to prevent that.

The easy fix would be to let __ptrace_may_access() instead always refuse
access to tasks that have lost their mm; but then that would e.g. mean
that the ability to inspect dead tasks in procfs would be restricted.
So while that might work in practice, it'd be a bit ugly, too.

Another option would be to move the dumpability information elsewhere -
but that would have to be the task_struct (the signal_struct can be
shared with dead pre-execve threads, so we can't use it here). So we'd
have to keep dumpability information in sync across threads - that'd
probably be pretty ugly.


So I think the proper fix is to let the task_struct hold a reference on
the mm_struct until the task goes away completely. This is implemented
in patch 1/6, which is also the only patch in this series that I
actually care about (and the only one with a stable backport marking);
the rest of the series are some tweaks in case people dislike the idea
of constantly freeing mm_structs from workqueue context.
Those tweaks should also reduce the memory usage of dead tasks, by
ensuring that they don't keep their PGDs alive.


Patch 1/6 is not particularly pretty, but I can't think of any better
way to do it.

So: Does this series (and in particular patch 1/6) look vaguely sane?
And if not, does anyone have a better approach?


Jann Horn (6):
  ptrace: Keep mm around after exit_mm() for __ptrace_may_access()
  refcount: Move refcount_t definition into linux/types.h
  mm: Add refcount for preserving mm_struct without pgd
  mm, oom: Use mm_ref()/mm_unref() and avoid mmdrop_async()
  ptrace: Use mm_ref() for ->exit_mm
  mm: remove now-unused mmdrop_async()

 arch/x86/kernel/tboot.c|  2 +
 drivers/firmware/efi/efi.c |  2 +
 include/linux/mm_types.h   | 15 ++-
 include/linux/refcount.h   | 13 +-
 include/linux/sched.h  |  8 
 include/linux/sched/mm.h   | 13 ++
 include/linux/types.h  | 12 +
 kernel/exit.c  |  2 +
 kernel/fork.c  | 90 +-
 kernel/ptrace.c| 10 +
 mm/init-mm.c   |  2 +
 mm/oom_kill.c  |  2 +-
 12 files changed, 105 insertions(+), 66 deletions(-)


base-commit: bbf5c979011a099af5dc76498918ed7df445635b
-- 
2.29.0.rc1.297.gfa9743e501-goog

Re: [PATCH 1/2] arm64: dts: qcom: sc7180-lite: Tweak DDR/L3 scaling on SC7180-lite

2020-10-16 Thread Doug Anderson

Hi,

On Thu, Oct 15, 2020 at 10:53 AM Sibi Sankar  wrote:
>
> Tweak the DDR/L3 bandwidth votes on the lite variant of the SC7180 SoC
> since the gold cores only support frequencies upto 2.1 GHz.
>
> Signed-off-by: Sibi Sankar 
> ---
>  arch/arm64/boot/dts/qcom/sc7180-lite.dtsi | 14 ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/qcom/sc7180-lite.dtsi
>
> diff --git a/arch/arm64/boot/dts/qcom/sc7180-lite.dtsi 
> b/arch/arm64/boot/dts/qcom/sc7180-lite.dtsi
> new file mode 100644
> index ..cff50275cfe1
> --- /dev/null
> +++ b/arch/arm64/boot/dts/qcom/sc7180-lite.dtsi
> @@ -0,0 +1,14 @@
> +// SPDX-License-Identifier: BSD-3-Clause
> +/*
> + * SC7180 lite device tree source
> + *
> + * Copyright (c) 2020, The Linux Foundation. All rights reserved.
> + */
> +
> +_opp11 {
> +   opp-peak-kBps = <8532000 22425600>;
> +};
> +
> +_opp12 {
> +   opp-peak-kBps = <8532000 23347200>;
> +};

I guess this is OK, but something about it smells just a little
strange...  I guess:

a) There's suddenly a big jump from opp10 to opp11.  You don't use
7216000 at all anymore.

b) The fact that we need to do this at all feels like a sign that
somehow this wasn't designed quite right.

Just brainstorming a bit: If the higher memory rate wasn't useful for
OPP11/12 on the non-lite version of the chip, why are they useful for
that OPP on the lite version?  I guess you're just trying to eek out
the last little bits of performance once the cpufreq is maxed out?  It
almost feels like a better way to do this (though it wouldn't be
monotonically increasing anymore so it wouldn't actually work) would
be to have a few "OPP" points at the top where the cpufreq stops
increasing and all you do is increase the memory frequency.

c) In theory we're supposed to be able to probe whether we're on the
normal, lite, or pro version, right?  Anyway we could tweak this in
code so we don't have to know to include the right dtsi file?

-Doug

[PATCH resend v3 0/2] Broad write-locking of nascent mm in execve

2020-10-16 Thread Jann Horn

(resending because DKIM got mangled on the first try by the kvack
list, hopefully setting sendemail.transferEncoding to
quoted-printable helps...)

v3:
 - add note about how this also fixes arch/um/ locking in patch 1
   (Johannes Berg)
 - use IS_DEFINED() instead of #ifdef in patch 2 (Jason Gunthorpe)
v2:
 - fix commit message of patch 1/2 and be more verbose about where
   the old mmap lock is taken (Michel, Jason)
 - resending without mangling the diffs :/ (Michel, Jason)

These two patches replace "mmap locking API: don't check locking
if the mm isn't live yet"[1], which is currently in the mmotm tree,
and should be placed in the same spot where the old patch was.

While I originally said that this would be an alternative
patch (meaning that the existing patch would have worked just
as well), the new patches actually address an additional issue
that the old patch missed (bprm->vma is used after the switch
to the new mm).

I have boot-tested these patches on x64-64 (with lockdep) and
!MMU arm (the latter with both FLAT and ELF).

[1] 
https://lkml.kernel.org/r/cag48ez03yjg9ju_6tgimcavjutyre_o4leq7901b5zocnna...@mail.gmail.com

Jann Horn (2):
  mmap locking API: Order lock of nascent mm outside lock of live mm
  exec: Broadly lock nascent mm until setup_arg_pages()

 arch/um/include/asm/mmu_context.h |  3 +-
 fs/exec.c | 64 ---
 include/linux/binfmts.h   |  2 +-
 include/linux/mmap_lock.h | 23 ++-
 kernel/fork.c |  7 +---
 5 files changed, 59 insertions(+), 40 deletions(-)


base-commit: fb0155a09b0224a7147cb07a4ce6034c8d29667f
prerequisite-patch-id: 08f97130a51898a5f6efddeeb5b42638577398c7
prerequisite-patch-id: 577664d761cd23fe9031ffdb1d3c9ac313572c67
prerequisite-patch-id: dc29a39716aa8689f80ba2767803d9df3709beaa
prerequisite-patch-id: 42b1b546d33391ead2753621f541bcc408af1769
prerequisite-patch-id: 2cbb839f57006f32e21f4229e099ae1bd782be24
prerequisite-patch-id: 1b4daf01cf61654a5ec54b5c3f7c7508be7244ee
prerequisite-patch-id: f46cc8c99f1909fe2a65fbc3cf1f6bc57489a086
prerequisite-patch-id: 2b0caed97223241d5008898dde995d02fda544e4
prerequisite-patch-id: 6b7adcb54989e1ec3370f256ff2c35d19cf785aa
-- 
2.29.0.rc1.297.gfa9743e501-goog

[PATCH resend v3 1/2] mmap locking API: Order lock of nascent mm outside lock of live mm

2020-10-16 Thread Jann Horn

Until now, the mmap lock of the nascent mm was ordered inside the mmap lock
of the old mm (in dup_mmap() and in UML's activate_mm()).
A following patch will change the exec path to very broadly lock the
nascent mm, but fine-grained locking should still work at the same time for
the old mm.

In particular, mmap locking calls are hidden behind the copy_from_user()
calls and such that are reached through functions like copy_strings() -
when a page fault occurs on a userspace memory access, the mmap lock
will be taken.

To do this in a way that lockdep is happy about, let's turn around the lock
ordering in both places that currently nest the locks.
Since SINGLE_DEPTH_NESTING is normally used for the inner nesting layer,
make up our own lock subclass MMAP_LOCK_SUBCLASS_NASCENT and use that
instead.
The added locking calls in exec_mmap() are temporary; the following patch
will move the locking out of exec_mmap().

As Johannes Berg pointed out[1][2], moving the mmap locking of
arch/um/'s activate_mm() up into the execve code also fixes an issue that
would've caused a scheduling-in-atomic bug due to mmap_write_lock_nested()
while holding a spinlock if UM had support for voluntary preemption.
(Even when a semaphore is uncontended, locking it can still trigger
rescheduling via might_sleep().)

[1] 
https://lore.kernel.org/linux-mm/115d17aa221b73a479e26ffee52899ddb18b1f53.ca...@sipsolutions.net/
[2] 
https://lore.kernel.org/linux-mm/7b7d6954b74e109e653539d880173fa9cb5c5ddf.ca...@sipsolutions.net/

Signed-off-by: Jann Horn 
---
 arch/um/include/asm/mmu_context.h |  3 +--
 fs/exec.c |  4 
 include/linux/mmap_lock.h | 23 +--
 kernel/fork.c |  7 ++-
 4 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/mmu_context.h 
b/arch/um/include/asm/mmu_context.h
index 17ddd4edf875..c13bc5150607 100644
--- a/arch/um/include/asm/mmu_context.h
+++ b/arch/um/include/asm/mmu_context.h
@@ -48,9 +48,8 @@ static inline void activate_mm(struct mm_struct *old, struct 
mm_struct *new)
 * when the new ->mm is used for the first time.
 */
__switch_mm(>context.id);
-   mmap_write_lock_nested(new, SINGLE_DEPTH_NESTING);
+   mmap_assert_write_locked(new);
uml_setup_stubs(new);
-   mmap_write_unlock(new);
 }
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
diff --git a/fs/exec.c b/fs/exec.c
index a91003e28eaa..229dbc7aa61a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1114,6 +1114,8 @@ static int exec_mmap(struct mm_struct *mm)
if (ret)
return ret;
 
+   mmap_write_lock_nascent(mm);
+
if (old_mm) {
/*
 * Make sure that if there is a core dump in progress
@@ -1125,6 +1127,7 @@ static int exec_mmap(struct mm_struct *mm)
if (unlikely(old_mm->core_state)) {
mmap_read_unlock(old_mm);
mutex_unlock(>signal->exec_update_mutex);
+   mmap_write_unlock(mm);
return -EINTR;
}
}
@@ -1138,6 +1141,7 @@ static int exec_mmap(struct mm_struct *mm)
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
+   mmap_write_unlock(mm);
if (old_mm) {
mmap_read_unlock(old_mm);
BUG_ON(active_mm != old_mm);
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 0707671851a8..24de1fe99ee4 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -3,6 +3,18 @@
 
 #include 
 
+/*
+ * Lock subclasses for the mmap_lock.
+ *
+ * MMAP_LOCK_SUBCLASS_NASCENT is for core kernel code that wants to lock an mm
+ * that is still being constructed and wants to be able to access the active mm
+ * normally at the same time. It nests outside MMAP_LOCK_SUBCLASS_NORMAL.
+ */
+enum {
+   MMAP_LOCK_SUBCLASS_NORMAL = 0,
+   MMAP_LOCK_SUBCLASS_NASCENT
+};
+
 #define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
 
@@ -16,9 +28,16 @@ static inline void mmap_write_lock(struct mm_struct *mm)
down_write(>mmap_lock);
 }
 
-static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass)
+/*
+ * Lock an mm_struct that is still being set up (during fork or exec).
+ * This nests outside the mmap locks of live mm_struct instances.
+ * No interruptible/killable versions exist because at the points where you're
+ * supposed to use this helper, the mm isn't visible to anything else, so we
+ * expect the mmap_lock to be uncontended.
+ */
+static inline void mmap_write_lock_nascent(struct mm_struct *mm)
 {
-   down_write_nested(>mmap_lock, subclass);
+   down_write_nested(>mmap_lock, MMAP_LOCK_SUBCLASS_NASCENT);
 }
 
 static inline int mmap_write_lock_killable(struct mm_struct *mm)
diff --git a/kernel/fork.c b/kernel/fork.c
index

[PATCH resend v3 2/2] exec: Broadly lock nascent mm until setup_arg_pages()

2020-10-16 Thread Jann Horn

While AFAIK there currently is nothing that can modify the VMA tree of a
new mm until userspace has started running under the mm, we should properly
lock the mm here anyway, both to keep lockdep happy when adding locking
assertions and to be safe in the future in case someone e.g. decides to
permit VMA-tree-mutating operations in process_madvise_behavior_valid().

The goal of this patch is to broadly lock the nascent mm in the exec path,
from around the time it is created all the way to the end of
setup_arg_pages() (because setup_arg_pages() accesses bprm->vma).
As long as the mm is write-locked, keep it around in bprm->mm, even after
it has been installed on the task (with an extra reference on the mm, to
reduce complexity in free_bprm()).
After setup_arg_pages(), we have to unlock the mm so that APIs such as
copy_to_user() will work in the following binfmt-specific setup code.

Suggested-by: Jason Gunthorpe 
Suggested-by: Michel Lespinasse 
Signed-off-by: Jann Horn 
---
 fs/exec.c   | 68 -
 include/linux/binfmts.h |  2 +-
 2 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 229dbc7aa61a..00edf833781f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
return -ENOMEM;
vma_set_anonymous(vma);
 
-   if (mmap_write_lock_killable(mm)) {
-   err = -EINTR;
-   goto err_free;
-   }
-
/*
 * Place the stack at the largest stack address the architecture
 * supports. Later, we'll move this to an appropriate place. We don't
@@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
goto err;
 
mm->stack_vm = mm->total_vm = 1;
-   mmap_write_unlock(mm);
bprm->p = vma->vm_end - sizeof(void *);
return 0;
 err:
-   mmap_write_unlock(mm);
-err_free:
bprm->vma = NULL;
vm_area_free(vma);
return err;
@@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm)
struct mm_struct *mm = NULL;
 
bprm->mm = mm = mm_alloc();
-   err = -ENOMEM;
if (!mm)
-   goto err;
+   return -ENOMEM;
+   mmap_write_lock_nascent(mm);
 
/* Save current stack limit for all calculations made during exec. */
task_lock(current->group_leader);
@@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm)
task_unlock(current->group_leader);
 
err = __bprm_mm_init(bprm);
-   if (err)
-   goto err;
-
-   return 0;
-
-err:
-   if (mm) {
-   bprm->mm = NULL;
-   mmdrop(mm);
-   }
+   if (!err)
+   return 0;
 
+   bprm->mm = NULL;
+   mmap_write_unlock(mm);
+   mmdrop(mm);
return err;
 }
 
@@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma, 
unsigned long shift)
 /*
  * Finalizes the stack vm_area_struct. The flags and permissions are updated,
  * the stack is optionally relocated, and some extra space is added.
+ * At the end of this, the mm_struct will be unlocked on success.
  */
 int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_top,
@@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm,
bprm->loader -= stack_shift;
bprm->exec -= stack_shift;
 
-   if (mmap_write_lock_killable(mm))
-   return -EINTR;
-
vm_flags = VM_STACK_FLAGS;
 
/*
@@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
ret = mprotect_fixup(vma, , vma->vm_start, vma->vm_end,
vm_flags);
if (ret)
-   goto out_unlock;
+   return ret;
BUG_ON(prev != vma);
 
if (unlikely(vm_flags & VM_EXEC)) {
@@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
if (stack_shift) {
ret = shift_arg_pages(vma, stack_shift);
if (ret)
-   goto out_unlock;
+   return ret;
}
 
/* mprotect_fixup is overkill to remove the temporary stack flags */
@@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm,
current->mm->start_stack = bprm->p;
ret = expand_stack(vma, stack_base);
if (ret)
-   ret = -EFAULT;
+   return -EFAULT;
 
-out_unlock:
+   /*
+* From this point on, anything that wants to poke around in the
+* mm_struct must lock it by itself.
+*/
+   bprm->vma = NULL;
mmap_write_unlock(mm);
-   return ret;
+   mmput(mm);
+   bprm->mm = NULL;
+   return 0;
 }
 EXPORT_SYMBOL(setup_arg_pages);
 
@@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm)
if (ret)
return ret;
 
-   mmap_write_lock_nascent(mm);
-
if (old_mm) {

Re: [PATCH net] net: dsa: point out the tail taggers

2020-10-16 Thread Jakub Kicinski

On Sat, 17 Oct 2020 00:33:02 +0300 Vladimir Oltean wrote:
> On Sat, Oct 17, 2020 at 12:16:28AM +0300, Vladimir Oltean wrote:
> > On Fri, Oct 16, 2020 at 11:03:18PM +0200, Andrew Lunn wrote:  
> > > 2ecbc1f684482b4ed52447a39903bd9b0f222898 does not have net-next, as
> > > far as i see,  
> > 
> > Not sure what you mean by that.  
> 
> Ah, I do understand what you mean now. In git, that is what I see as
> well. But in my cgit link, why would tail_tag be there?
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/include/net/dsa.h#n93?id=2ecbc1f684482b4ed52447a39903bd9b0f222898
> I think either cgit is plainly dumb at showing the kernel tree at a
> particular commit, or I'm plainly incapable of using it.

The link is bamboozled.

The #n93 needs to be after the ? parameters.

Like this:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/include/net/dsa.h?id=2ecbc1f684482b4ed52447a39903bd9b0f222898#n86

Re: simplify pstore-blk

2020-10-16 Thread Kees Cook

On Fri, Oct 16, 2020 at 03:20:38PM +0200, Christoph Hellwig wrote:
> this series cleans up and massively simplifies the pstore-blk code,
> please take a look.

Cool! Thanks for doing this work. I have a few things I'd like to see
done differently, and while I'm not a huge fan of the general reduction
in utility, I can live with it as long as it doesn't make other things
worse. :) I'll get this reviewed with specific feedback soon, but I'm
about to be EOW. ;)

-- 
Kees Cook

[PATCH rfc 2/2] mm: hugetlb: don't drop hugetlb_lock around cma_release() call

2020-10-16 Thread Roman Gushchin

cma_release() now is a non-blocking function, so there is no more need
to drop hugetlb_lock to call it.

Signed-off-by: Roman Gushchin 
---
 mm/hugetlb.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe76f8fd5a73..c8c892b1cabc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1312,14 +1312,8 @@ static void update_and_free_page(struct hstate *h, 
struct page *page)
set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
set_page_refcounted(page);
if (hstate_is_gigantic(h)) {
-   /*
-* Temporarily drop the hugetlb_lock, because
-* we might block in free_gigantic_page().
-*/
-   spin_unlock(_lock);
destroy_compound_gigantic_page(page, huge_page_order(h));
free_gigantic_page(page, huge_page_order(h));
-   spin_lock(_lock);
} else {
__free_pages(page, huge_page_order(h));
}
-- 
2.26.2

[PATCH rfc 1/2] mm: cma: make cma_release() non-blocking

2020-10-16 Thread Roman Gushchin

cma_release() has to lock the cma_lock mutex to clear the cma bitmap.
It makes it a blocking function, which complicates its usage from
non-blocking contexts. For instance, hugetlbfs code is temporarily
dropping the hugetlb_lock spinlock to call cma_release().

This patch makes cma_release non-blocking by postponing the cma
bitmap clearance. It's done later from a work context. The first page
in the cma allocation is used to store the work struct.

To make sure that subsequent cma_alloc() call will pass, cma_alloc()
flushes the corresponding workqueue.

Because CMA allocations and de-allocations are usually not that
frequent, a single global workqueue is used.

Signed-off-by: Roman Gushchin 
---
 mm/cma.c | 51 +--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/mm/cma.c b/mm/cma.c
index 7f415d7cda9f..523cd9356bc7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -36,10 +36,19 @@
 
 #include "cma.h"
 
+struct cma_clear_bitmap_work {
+   struct work_struct work;
+   struct cma *cma;
+   unsigned long pfn;
+   unsigned int count;
+};
+
 struct cma cma_areas[MAX_CMA_AREAS];
 unsigned cma_area_count;
 static DEFINE_MUTEX(cma_mutex);
 
+struct workqueue_struct *cma_release_wq;
+
 phys_addr_t cma_get_base(const struct cma *cma)
 {
return PFN_PHYS(cma->base_pfn);
@@ -148,6 +157,10 @@ static int __init cma_init_reserved_areas(void)
for (i = 0; i < cma_area_count; i++)
cma_activate_area(_areas[i]);
 
+   cma_release_wq = create_workqueue("cma_release");
+   if (!cma_release_wq)
+   return -ENOMEM;
+
return 0;
 }
 core_initcall(cma_init_reserved_areas);
@@ -437,6 +450,13 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align,
return NULL;
 
for (;;) {
+   /*
+* CMA bitmaps are cleared asynchronously from works,
+* scheduled by cma_release(). To make sure the allocation
+* will success, cma release workqueue is flushed here.
+*/
+   flush_workqueue(cma_release_wq);
+
mutex_lock(>lock);
bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap,
bitmap_maxno, start, bitmap_count, mask,
@@ -495,6 +515,17 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align,
return page;
 }
 
+static void cma_clear_bitmap_fn(struct work_struct *work)
+{
+   struct cma_clear_bitmap_work *w;
+
+   w = container_of(work, struct cma_clear_bitmap_work, work);
+
+   cma_clear_bitmap(w->cma, w->pfn, w->count);
+
+   __free_page(pfn_to_page(w->pfn));
+}
+
 /**
  * cma_release() - release allocated pages
  * @cma:   Contiguous memory region for which the allocation is performed.
@@ -507,6 +538,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align,
  */
 bool cma_release(struct cma *cma, const struct page *pages, unsigned int count)
 {
+   struct cma_clear_bitmap_work *work;
unsigned long pfn;
 
if (!cma || !pages)
@@ -521,8 +553,23 @@ bool cma_release(struct cma *cma, const struct page 
*pages, unsigned int count)
 
VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
 
-   free_contig_range(pfn, count);
-   cma_clear_bitmap(cma, pfn, count);
+   /*
+* To make cma_release() non-blocking, cma bitmap is cleared from
+* a work context (see cma_clear_bitmap_fn()). The first page
+* in the cma allocation is used to store the work structure,
+* so it's released after the cma bitmap clearance. Other pages
+* are released immediately as previously.
+*/
+   if (count > 1)
+   free_contig_range(pfn + 1, count - 1);
+
+   work = (struct cma_clear_bitmap_work *)page_to_virt(pages);
+   INIT_WORK(>work, cma_clear_bitmap_fn);
+   work->cma = cma;
+   work->pfn = pfn;
+   work->count = count;
+   queue_work(cma_release_wq, >work);
+
trace_cma_release(pfn, pages, count);
 
return true;
-- 
2.26.2

[PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-16 Thread Roman Gushchin

This small patchset makes cma_release() non-blocking and simplifies
the code in hugetlbfs, where previously we had to temporarily drop
hugetlb_lock around the cma_release() call.

It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
THP under a memory pressure requires a cma_release() call. If it's
a blocking function, it complicates the already complicated code.
Because there are at least two use cases like this (hugetlbfs is
another example), I believe it's just better to make cma_release()
non-blocking.

It also makes it more consistent with other memory releasing functions
in the kernel: most of them are non-blocking.


Roman Gushchin (2):
  mm: cma: make cma_release() non-blocking
  mm: hugetlb: don't drop hugetlb_lock around cma_release() call

 mm/cma.c | 51 +--
 mm/hugetlb.c |  6 --
 2 files changed, 49 insertions(+), 8 deletions(-)

-- 
2.26.2

Re: [PATCH v3 00/11] Introduce Simple atomic counters

2020-10-16 Thread Kees Cook

On Fri, Oct 16, 2020 at 12:53:13PM +0200, Peter Zijlstra wrote:
> That's like saying: "I'm too lazy to track what I've looked at already".
> You're basically proposing to graffiti "Kees was here -- 16/10/2020" all
> over the kernel. Just so you can see where you still need to go.
> 
> It says the code was (assuming your audit was correct) good at that
> date, but has no guarantees for any moment after that.

That kind of bit-rot marking is exactly what I would like to avoid: just
putting a comment in is pointless. Making the expectations of the usage
become _enforced_ is the goal. And having it enforced by the _compiler_
is key. Just adding a meaningless attribute that a static checker
will notice some time and hope people fix them doesn't scale either
(just look at how many sparse warnings there are). So with C's limits,
the API and type enforcement become the principle way to get this done.

So, if there are behavioral patterns we CAN carve away from atomic_t
cleanly (and I think there are), those are the ones I want to work
towards. The "corner case" uses of atomic_t are much less common than
the "big" code patterns like lifetime management (now delegated to and
enforced by refcount_t). My estimation was that _statistics_ (and not
"serial identifiers") was the next biggest code pattern using atomic_t
that could benefit from having its usage isolated. It is not itself a
dangerous code pattern, but it can mask finding them.

Then, at the end of the day, only the corner cases remain, and those can
be seen clearly as they change over time. Since we can never have a
one-time audit be anything other than advisory, we need to make it EASY
to do those kinds of audits so they can be done regularly.

-- 
Kees Cook

[PATCH] ALSA: hda/ca0132: make some const arrays static, makes object smaller

2020-10-16 Thread Colin King

From: Colin Ian King 

Don't populate const arrays on the stack but instead make them
static. Makes the object code smaller by 57 bytes.

Before:
   textdata bss dec hex filename
 173256   38016 192  211464   33a08 sound/pci/hda/patch_ca0132.o

After:
   textdata bss dec hex filename
 172879   38336 192  211407   339cf sound/pci/hda/patch_ca0132.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King 
---
 sound/pci/hda/patch_ca0132.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c
index 2b38b2a716a1..e0c38f2735c6 100644
--- a/sound/pci/hda/patch_ca0132.c
+++ b/sound/pci/hda/patch_ca0132.c
@@ -7910,8 +7910,12 @@ static void ae7_post_dsp_asi_stream_setup(struct 
hda_codec *codec)
 
 static void ae7_post_dsp_pll_setup(struct hda_codec *codec)
 {
-   const unsigned int addr[] = { 0x41, 0x45, 0x40, 0x43, 0x51 };
-   const unsigned int data[] = { 0xc8, 0xcc, 0xcb, 0xc7, 0x8d };
+   static const unsigned int addr[] = {
+   0x41, 0x45, 0x40, 0x43, 0x51
+   };
+   static const unsigned int data[] = {
+   0xc8, 0xcc, 0xcb, 0xc7, 0x8d
+   };
unsigned int i;
 
for (i = 0; i < ARRAY_SIZE(addr); i++) {
@@ -7925,10 +7929,12 @@ static void ae7_post_dsp_pll_setup(struct hda_codec 
*codec)
 static void ae7_post_dsp_asi_setup_ports(struct hda_codec *codec)
 {
struct ca0132_spec *spec = codec->spec;
-   const unsigned int target[] = { 0x0b, 0x04, 0x06, 0x0a, 0x0c, 0x11,
-   0x12, 0x13, 0x14 };
-   const unsigned int data[]   = { 0x12, 0x00, 0x48, 0x05, 0x5f, 0xff,
-   0xff, 0xff, 0x7f };
+   static const unsigned int target[] = {
+   0x0b, 0x04, 0x06, 0x0a, 0x0c, 0x11, 0x12, 0x13, 0x14
+   };
+   static const unsigned int data[] = {
+   0x12, 0x00, 0x48, 0x05, 0x5f, 0xff, 0xff, 0xff, 0x7f
+   };
unsigned int i;
 
mutex_lock(>chipio_mutex);
-- 
2.27.0

Re: [PATCH] task_work: cleanup notification modes

2020-10-16 Thread Jens Axboe

On 10/16/20 3:44 PM, Thomas Gleixner wrote:
> On Fri, Oct 16 2020 at 09:16, Jens Axboe wrote:
>> A previous commit changed the notification mode from 0/1 to allowing
> 
> No. It changed it from boolean to an int.
> 
> There is a fundamental difference between 0/1 and false/true simply
> because it's a compiler implementation detail how to represent a boolean
> value.
> 
> Assume the following:
> 
> #define BAZ   0x08
> 
>task_work_add(tsk, , foo & BAZ);
> 
> So with a function signature of
> 
>task_work_add(*tsk, *work, bool signal);
> 
> the above 'foo & BAZ' becomes either true of false.
> 
> With the changed function signature of
> 
>task_work_add(*tsk, *work, int signal);
> 
> the above becomes the result of 'foo & BAZ' which means that this
> construct will not longer do the right thing.
> 
> It's pure luck that none of the usage sites relied on the boolean
> property of that argument.

It wasn't pure luck, that was checked before that change was made. No
users did anything funky, it was all false/true or 0/1.

> Please spell it out correctly that converting a boolean argument to an
> integer argument is not equivalent.

Fixed up the commit message to be more descriptive.

>>  switch (notify) {
>> +case TWA_NONE:
>> +break;
>>  case TWA_RESUME:
>>  set_notify_resume(task);
>>  break;
> 
> The enum will not prevent that either and what you really want to do is
> to have some form of debug warning if 'notify' is out of range, which
> would have been the right thing to do in the first place.

I added a WARN_ON_ONCE() in the default case for that one.

>> - * @notify: send the notification if true
>> + * @notify: send chosen notification, if any
> 
> Is that really all you found to be wrong in that comment?

There really is nothing wrong, but it's not very descriptive (wasn't
before either). I've added a fuller description of the various TWA_*
notification types now.

-- 
Jens Axboe

Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory

2020-10-16 Thread Wei Yang

On Mon, Oct 12, 2020 at 02:52:59PM +0200, David Hildenbrand wrote:
>Let's check by traversing busy system RAM resources instead, to avoid
>relying on memory block states.
>
>Don't use walk_system_ram_range(), as that works on pages and we want to
>use the bare addresses we have easily at hand.
>
>Cc: "Michael S. Tsirkin" 
>Cc: Jason Wang 
>Cc: Pankaj Gupta 
>Signed-off-by: David Hildenbrand 

Reviewed-by: Wei Yang 

>---
> drivers/virtio/virtio_mem.c | 19 +++
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>index b3eebac7191f..6bbd1cfd10d3 100644
>--- a/drivers/virtio/virtio_mem.c
>+++ b/drivers/virtio/virtio_mem.c
>@@ -1749,6 +1749,20 @@ static void virtio_mem_delete_resource(struct 
>virtio_mem *vm)
>   vm->parent_resource = NULL;
> }
> 
>+static int virtio_mem_range_has_system_ram(struct resource *res, void *arg)
>+{
>+  return 1;
>+}
>+
>+static bool virtio_mem_has_memory_added(struct virtio_mem *vm)
>+{
>+  const unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>+
>+  return walk_iomem_res_desc(IORES_DESC_NONE, flags, vm->addr,
>+ vm->addr + vm->region_size, NULL,
>+ virtio_mem_range_has_system_ram) == 1;
>+}
>+
> static int virtio_mem_probe(struct virtio_device *vdev)
> {
>   struct virtio_mem *vm;
>@@ -1870,10 +1884,7 @@ static void virtio_mem_remove(struct virtio_device 
>*vdev)
>* the system. And there is no way to stop the driver/device from going
>* away. Warn at least.
>*/
>-  if (vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE] ||
>-  vm->nb_mb_state[VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL] ||
>-  vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE] ||
>-  vm->nb_mb_state[VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL]) {
>+  if (virtio_mem_has_memory_added(vm)) {
>   dev_warn(>dev, "device still has system memory added\n");
>   } else {
>   virtio_mem_delete_resource(vm);
>-- 
>2.26.2

-- 
Wei Yang
Help you, Help me

RE: [PATCH v2] soc: fsl: dpio: Change 'cpumask_t mask' to the driver's private data

2020-10-16 Thread Leo Li



> -Original Message-
> From: Leo Li
> Sent: Friday, October 16, 2020 4:20 PM
> To: 'Yi Wang' ; Roy Pledge ;
> Laurentiu Tudor 
> Cc: linux-kernel@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-arm-
> ker...@lists.infradead.org; xue.zhih...@zte.com.cn;
> jiang.xue...@zte.com.cn; Hao Si ; Lin Chen
> 
> Subject: RE: [PATCH v2] soc: fsl: dpio: Change 'cpumask_t mask' to the
> driver's private data
> 
> 
> 
> > -Original Message-
> > From: Yi Wang 
> > Sent: Friday, October 16, 2020 1:49 AM
> > To: Roy Pledge ; Laurentiu Tudor
> > 
> > Cc: Leo Li ; linux-kernel@vger.kernel.org;
> > linuxppc- d...@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> > xue.zhih...@zte.com.cn; wang.y...@zte.com.cn;
> jiang.xue...@zte.com.cn;
> > Hao Si ; Lin Chen 
> > Subject: [PATCH v2] soc: fsl: dpio: Change 'cpumask_t mask' to the
> > driver's private data
> >
> > From: Hao Si 
> >
> > The local variable 'cpumask_t mask' is in the stack memory, and its
> > address is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
> > But the memory area where this variable is located is at risk of being
> > modified.
> >
> > During LTP testing, the following error was generated:
> >
> > Unable to handle kernel paging request at virtual address
> > 12e9b790 Mem abort info:
> >   ESR = 0x9607
> >   Exception class = DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> > Data abort info:
> >   ISV = 0, ISS = 0x0007
> >   CM = 0, WnR = 0
> > swapper pgtable: 4k pages, 48-bit VAs, pgdp = 75ac5e07
> > [12e9b790] pgd=0027dbffe003, pud=0027dbffd003,
> > pmd=0027b6d61003, pte= Internal error: Oops:
> > 9607 [#1] PREEMPT SMP Modules linked in: xt_conntrack Process
> > read_all (pid: 20171, stack limit = 0x44ea4095)
> > CPU: 14 PID: 20171 Comm: read_all Tainted: GB   W
> > Hardware name: NXP Layerscape LX2160ARDB (DT)
> > pstate: 8085 (Nzcv daIf -PAN -UAO) pc :
> > irq_affinity_hint_proc_show+0x54/0xb0
> > lr : irq_affinity_hint_proc_show+0x4c/0xb0
> > sp : 1138bc10
> > x29: 1138bc10 x28: d131d1e0
> > x27: 007000c0 x26: 8025b9480dc0
> > x25: 8025b9480da8 x24: 03ff
> > x23: 8027334f8300 x22: 80272e97d000
> > x21: 80272e97d0b0 x20: 8025b9480d80
> > x19: 09a49000 x18: 
> > x17:  x16: 
> > x15:  x14: 
> > x13:  x12: 0040
> > x11:  x10: 802735b79b88
> > x9 :  x8 : 
> > x7 : 09a49848 x6 : 0003
> > x5 :  x4 : 08157d6c
> > x3 : 1138bc10 x2 : 12e9b790
> > x1 :  x0 :  Call trace:
> >  irq_affinity_hint_proc_show+0x54/0xb0
> >  seq_read+0x1b0/0x440
> >  proc_reg_read+0x80/0xd8
> >  __vfs_read+0x60/0x178
> >  vfs_read+0x94/0x150
> >  ksys_read+0x74/0xf0
> >  __arm64_sys_read+0x24/0x30
> >  el0_svc_common.constprop.0+0xd8/0x1a0
> >  el0_svc_handler+0x34/0x88
> >  el0_svc+0x10/0x14
> > Code: f9001bbf 943e0732 f94066c2 b462 (f9400041) ---[ end trace
> > b495bdcb0b3b732b ]--- Kernel panic - not syncing: Fatal exception
> > SMP: stopping secondary CPUs
> > SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15 Kernel Offset:
> > disabled CPU features: 0x0,21006008 Memory Limit: none ---[ end Kernel
> > panic - not syncing: Fatal exception ]---
> >
> > Fix it by changing 'cpumask_t mask' to the driver's private data.
> 
> Thanks for the patch.  Sorry to chime in late.
> 
> Since we are only setting single bit in the cpumask, it is actually not 
> necessary
> to keep the cpumask in private data as we already kept the cpu number in
> desc.cpu.  The better and easier approach is to actually use
> get_cpu_mask(cpu) API to get the pre-defined cpumask in the static
> cpu_bit_bitmap array.  We don't even need to declare the mask/cpu_mask
> in the register_dpio_irq_handlers().

Btw, cpumask_of(cpu) is more commonly used than get_cpu_mask(cpu), although 
they are essentially the same.

> 
> >
> > Signed-off-by: Hao Si 
> > Signed-off-by: Lin Chen 
> > Signed-off-by: Yi Wang 
> > ---
> > v2: Place 'cpumask_t mask' in the driver's private data and while at
> > it, rename it to cpu_mask.
> >
> >  drivers/soc/fsl/dpio/dpio-driver.c | 9 +
> >  include/linux/fsl/mc.h | 2 ++
> >  2 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/soc/fsl/dpio/dpio-driver.c
> > b/drivers/soc/fsl/dpio/dpio- driver.c index 7b642c3..e9d820d 100644
> > --- a/drivers/soc/fsl/dpio/dpio-driver.c
> > +++ b/drivers/soc/fsl/dpio/dpio-driver.c
> > @@ -95,7 +95,7 @@ static int register_dpio_irq_handlers(struct
> > fsl_mc_device *dpio_dev, int cpu)  {
> > int error;
> > struct fsl_mc_device_irq *irq;
> > -   cpumask_t mask;
> > +   cpumask_t *cpu_mask;
> >
> > irq = dpio_dev->irqs[0];
> >

Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory

2020-10-16 Thread Wei Yang

On Fri, Oct 16, 2020 at 12:32:50PM +0200, David Hildenbrand wrote:
 Ok, I seems to understand the logic now.

 But how we prevent ONLINE_PARTIAL memory block get offlined? There are 
 three
 calls in virtio_mem_set_fake_offline(), while all of them adjust page's 
 flag.
 How they hold reference to struct page?
>>>
>>> Sorry, I should have given you the right pointer. (similar to my other
>>> reply)
>>>
>>> We hold a reference either via
>>>
>>> 1. alloc_contig_range()
>> 
>> I am not familiar with this one, need to spend some time to look into.
>
>Each individual page will have a pagecount of 1.
>
>> 
>>> 2. memmap init code, when not calling generic_online_page().
>> 
>> I may miss some code here. Before online pages, memmaps are allocated in
>> section_activate(). They are supposed to be zero-ed. (I don't get the exact
>> code line.) I am not sure when we grab a refcount here.
>
>Best to refer to __init_single_page() -> init_page_count().
>
>Each page that wasn't onlined via generic_online_page() has a refcount
>of 1 and looks like allocated.
>

Thanks, I see the logic.

online_pages()
move_pfn_range_to_zone()  --- 1)
online_pages_range()  --- 2)

At 1), __init_single_page() would set page count to 1. At 2),
generic_online_page() would clear page count, while the call back would not.

Then I am trying to search the place where un-zero page count prevent offline.
scan_movable_pages() would fail, since this is a PageOffline() and has 1 page
count.

So the GUARD we prevent offline partial-onlined pages is

(PageOffline && page_count)

And your commit aa218795cb5fd583c94f

mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

is introduced to handle this case.

That's pretty clear now.

>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue

2020-10-16 Thread Jakub Kicinski

On Wed, 14 Oct 2020 14:06:32 +0800 Dylan Hung wrote:
> The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> hang when handling scatter-gather DMA.  Disable the problematic feature
> by setting MAC register 0x58 bit28 and bit27.
> 
> Signed-off-by: Dylan Hung 

Applied, thank you.

[PATCH] staging: wfx: make a const array static, makes object smaller

2020-10-16 Thread Colin King

From: Colin Ian King 

Don't populate const array filter_ies on the stack but instead
make it static. Makes the object code smaller by 261 bytes.

Before:
   textdata bss dec hex filename
  216743166 448   2528862c8 drivers/staging/wfx/sta.o

After:
   textdata bss dec hex filename
  213493230 448   2502761c3 drivers/staging/wfx/sta.o

(gcc version 10.2.0)

Signed-off-by: Colin Ian King 
---
 drivers/staging/wfx/sta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/wfx/sta.c b/drivers/staging/wfx/sta.c
index 2320a81eae0b..196779a1b89a 100644
--- a/drivers/staging/wfx/sta.c
+++ b/drivers/staging/wfx/sta.c
@@ -63,7 +63,7 @@ void wfx_suspend_hot_dev(struct wfx_dev *wdev, enum 
sta_notify_cmd cmd)
 
 static void wfx_filter_beacon(struct wfx_vif *wvif, bool filter_beacon)
 {
-   const struct hif_ie_table_entry filter_ies[] = {
+   static const struct hif_ie_table_entry filter_ies[] = {
{
.ie_id= WLAN_EID_VENDOR_SPECIFIC,
.has_changed  = 1,
-- 
2.27.0

Re: [GIT PULL] overlayfs update for 5.10

2020-10-16 Thread pr-tracker-bot

The pull request you sent on Fri, 16 Oct 2020 22:34:53 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git 
> tags/ovl-update-5.10

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/071a0578b0ce0b0e543d1e38ee6926b9cc21c198

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1099 matches

Mail list logo