Re: [PATCH net-next v3 0/7] bcm63xx_enet: major makeover of driver

2021-01-07 Thread Jakub Kicinski
On Wed,  6 Jan 2021 22:42:01 +0800 Sieng Piaw Liew wrote:
> This patch series aim to improve the bcm63xx_enet driver by integrating the
> latest networking features, i.e. batched rx processing, BQL, build_skb,
> etc.
> 
> The newer enetsw SoCs are found to be able to do unaligned rx DMA by adding
> NET_IP_ALIGN padding which, combined with these patches, improved packet
> processing performance by ~50% on BCM6328.
> 
> Older non-enetsw SoCs still benefit mainly from rx batching. Performance
> improvement of ~30% is observed on BCM6333.
> 
> The BCM63xx SoCs are designed for routers. As such, having BQL is
> beneficial as well as trivial to add.
> 
> v3:
> * Simplify xmit_more patch by not moving around the code needlessly.
> * Fix indentation in xmit_more patch.
> * Fix indentation in build_skb patch.
> * Split rx ring cleanup patch from build_skb patch and precede build_skb
>   patch for better understanding, as suggested by Florian Fainelli.
> 
> v2:
> * Add xmit_more support and rx loop improvisation patches.
> * Moved BQL netdev_reset_queue() to bcm_enet_stop()/bcm_enetsw_stop()
>   functions as suggested by Florian Fainelli.
> * Improved commit messages.

Applied, thanks!


Re: [PATCH] ubsan: disable unsigned-integer-overflow sanitizer with clang

2021-01-07 Thread Arnd Bergmann
On Thu, Jan 7, 2021 at 7:15 PM Nathan Chancellor
 wrote:
> On Thu, Jan 07, 2021 at 09:22:00AM -0800, Kees Cook wrote:

> > I think this is:
> > https://github.com/ClangBuiltLinux/linux/issues/256
> > and that bug needs re-opening? Or maybe there's a new bug I can't find?
>
> The problem is that applying the fix for that issue does not work, nor
> does converting p4d_index to a macro like mips and s390. I am not sure
> what exactly is going on there, it appears that clang cannot reason
> about ptrs_per_p4d because it is an extern variable with no assigned
> value in its translation unit?

Right, I tried the __always_inline trick already and concluded the same.

   Arnd


Re: [PATCH] RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp

2021-01-07 Thread Jason Gunthorpe
On Sat, Dec 26, 2020 at 03:42:48PM +0800, Dinghao Liu wrote:
> If usnic_ib_qp_grp_create() fails at the first call, dev_list
> will not be freed on error, which leads to memleak.
> 
> Signed-off-by: Dinghao Liu 
> Reviewed-by: Leon Romanovsky 
> ---
>  drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 3 +++
>  1 file changed, 3 insertions(+)

Applied to for-rc with

Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver")

Thanks,
Jason


RE: [PATCH v2] genirq: add IRQF_NO_AUTOEN for request_irq

2021-01-07 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Friday, January 8, 2021 7:58 AM
> To: Song Bao Hua (Barry Song) 
> Cc: t...@linutronix.de; m...@kernel.org; linux-in...@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux...@openeuler.org; Dmitry Torokhov
> 
> Subject: Re: [PATCH v2] genirq: add IRQF_NO_AUTOEN for request_irq
> 
> On Tue, Jan 05, 2021 at 03:14:11PM +1300, Barry Song wrote:
> > Many drivers don't want interrupts enabled automatically due to
> > request_irq(). So they are handling this issue by either way of
> > the below two:
> > (1)
> > irq_set_status_flags(irq, IRQ_NOAUTOEN);
> > request_irq(dev, irq...);
> > (2)
> > request_irq(dev, irq...);
> > disable_irq(irq);
> >
> > The code in the second way is silly and unsafe. In the small time
> > gap between request_irq() and disable_irq(), interrupts can still
> > come.
> > The code in the first way is safe though we might be able to do it
> > in the generic irq code.
> >
> > With this patch, drivers can request_irq with IRQF_NO_AUTOEN flag.
> > They will need neither irq_set_status_flags() nor disable_irq().
> > Hundreds of drivers with this problem will be handled afterwards.
> >
> > Cc: Dmitry Torokhov 
> > Signed-off-by: Barry Song 
> 
> Can you also convert some in-kernel drivers to this new api so that we
> can see how this works?

Sure. As the discussion got started from input, so I'll take some
input drivers as examples before moving to other folders.

> 
> thanks,
> 
> greg k-h

Thanks
Barry



Re: [PATCH] arm64: PCI: Enable SMC conduit

2021-01-07 Thread Rob Herring
On Thu, Jan 7, 2021 at 12:45 PM Jeremy Linton  wrote:
>
> Hi,
>
> On 1/7/21 11:36 AM, Rob Herring wrote:
> > On Thu, Jan 7, 2021 at 9:24 AM Jeremy Linton  wrote:
> >>
> >> Hi,
> >>
> >>
> >> On 1/7/21 9:28 AM, Rob Herring wrote:
> >>> On Mon, Jan 4, 2021 at 9:57 PM Jeremy Linton  
> >>> wrote:
> 
>  Given that most arm64 platform's PCI implementations needs quirks
>  to deal with problematic config accesses, this is a good place to
>  apply a firmware abstraction. The ARM PCI SMMCCC spec details a
>  standard SMC conduit designed to provide a simple PCI config
>  accessor. This specification enhances the existing ACPI/PCI
>  abstraction and expects power, config, etc functionality is handled
>
> (trimming)
>
> 
>  +static int smccc_pcie_check_conduit(u16 seg)
> >>>
> >>> check what? Based on how you use this, perhaps _has_conduit() instead.
> >>
> >> Sure.
> >>
> >>>
>  +{
>  +   struct arm_smccc_res res;
>  +
>  +   if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
>  +   return -EOPNOTSUPP;
>  +
>  +   arm_smccc_smc(SMCCC_PCI_VERSION, 0, 0, 0, 0, 0, 0, 0, );
>  +   if ((int)res.a0 < 0)
>  +   return -EOPNOTSUPP;
>  +
>  +   arm_smccc_smc(SMCCC_PCI_SEG_INFO, seg, 0, 0, 0, 0, 0, 0, );
>  +   if ((int)res.a0 < 0)
>  +   return -EOPNOTSUPP;
> >>>
> >>> Don't you need to check that read and write functions are supported?
> >>
> >> In theory no, the first version of the specification makes them
> >> mandatory for all implementations. There isn't a partial access method,
> >> so nothing works if only read or write were implemented.
> >
> > Then the spec should change:
> >
> > 2.3.3 Caller responsibilities
> > The caller has the following responsibilities:
> > • The caller must ensure that this function is implemented before
> > issuing a call. This function is discoverable
> > by calling PCI_FEATURES with pci_func_id set to 0x8400_0132.
> >
> >
> > I guess knowing the version is ensuring, but the 2nd sentence makes it
> > seem that is how one should ensure.
>
> Ok, yes i understand, I will add the check.
>
> >
> > Related, are there any sort of tests for the interface? I generally
> > don't think the kernel's job is validating firmware (a frequent topic
> > for DT), but we should have something. Maybe an SMC unittest module?
> > If nothing else, seems like we should have at least one PCI_FEATURES
> > call to make sure it works. We don't want to add it later only to find
> > that it breaks on some firmware implementations. Though we can just
> > add firmware quirks. ;)
>
> I'm not aware of any unit tests at the moment. My testing so far has
> been against these patches:
> https://review.trustedfirmware.org/q/topic:"Arm_PCI_Config_Space_Interface;
>
> But given the next version does the PCI_FEATURES calls, that will
> satisfy your concern, right?

Somewhat, but that doesn't replace the need for unittests. We have
PSCI checker, maybe the same thing should be required here. Otherwise,
implementations will only be as good as what Linux currently expects.

Rob


[PATCH 1/2] drm/msm/dp: postpone irq_hpd event during connection pending state

2021-01-07 Thread Kuogee Hsieh
irq_hpd event can only be executed at connected state. Therefore
irq_hpd event should be postponed if it happened at connection
pending state. This patch also make sure both link rate and lane
are valid before start link training.

Signed-off-by: Kuogee Hsieh 
---
 drivers/gpu/drm/msm/dp/dp_display.c |  7 +++
 drivers/gpu/drm/msm/dp/dp_panel.c   | 12 +---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index 6e971d5..3bc7ed2 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -693,6 +693,13 @@ static int dp_irq_hpd_handle(struct dp_display_private 
*dp, u32 data)
return 0;
}
 
+   if (state == ST_CONNECT_PENDING) {
+   /* wait until ST_CONNECTED */
+   dp_add_event(dp, EV_IRQ_HPD_INT, 0, 1); /* delay = 1 */
+   mutex_unlock(>event_mutex);
+   return 0;
+   }
+
ret = dp_display_usbpd_attention_cb(>pdev->dev);
if (ret == -ECONNRESET) { /* cable unplugged */
dp->core_initialized = false;
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 97dca3e..d1780bc 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -167,12 +167,18 @@ int dp_panel_read_sink_caps(struct dp_panel *dp_panel,
panel = container_of(dp_panel, struct dp_panel_private, dp_panel);
 
rc = dp_panel_read_dpcd(dp_panel);
+   if (rc) {
+   DRM_ERROR("read dpcd failed %d\n", rc);
+   return rc;
+   }
+
bw_code = drm_dp_link_rate_to_bw_code(dp_panel->link_info.rate);
-   if (rc || !is_link_rate_valid(bw_code) ||
+   if (!is_link_rate_valid(bw_code) ||
!is_lane_count_valid(dp_panel->link_info.num_lanes) ||
(bw_code > dp_panel->max_bw_code)) {
-   DRM_ERROR("read dpcd failed %d\n", rc);
-   return rc;
+   DRM_ERROR("Illegal link rate=%d lane=%d\n", 
dp_panel->link_info.rate,
+   dp_panel->link_info.num_lanes);
+   return -EINVAL;
}
 
if (dp_panel->dfp_present) {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 2/2] drm/msm/dp: unplug interrupt missed after irq_hpd handler

2021-01-07 Thread Kuogee Hsieh
There is HPD unplug interrupts missed at scenario of an irq_hpd
followed by unplug interrupts with around 10 ms in between.
Since both AUX_SW_RESET and DP_SW_RESET clear pending HPD interrupts,
irq_hpd handler should not issues either aux or sw reset to avoid
following unplug interrupt be cleared accidentally.

Signed-off-by: Kuogee Hsieh 
---
 drivers/gpu/drm/msm/dp/dp_aux.c |  7 ---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 24 
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 15 ++-
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
index 19b35ae..1c6e1d2 100644
--- a/drivers/gpu/drm/msm/dp/dp_aux.c
+++ b/drivers/gpu/drm/msm/dp/dp_aux.c
@@ -336,7 +336,6 @@ static ssize_t dp_aux_transfer(struct drm_dp_aux *dp_aux,
ssize_t ret;
int const aux_cmd_native_max = 16;
int const aux_cmd_i2c_max = 128;
-   int const retry_count = 5;
struct dp_aux_private *aux = container_of(dp_aux,
struct dp_aux_private, dp_aux);
 
@@ -378,12 +377,6 @@ static ssize_t dp_aux_transfer(struct drm_dp_aux *dp_aux,
ret = dp_aux_cmd_fifo_tx(aux, msg);
 
if (ret < 0) {
-   if (aux->native) {
-   aux->retry_cnt++;
-   if (!(aux->retry_cnt % retry_count))
-   dp_catalog_aux_update_cfg(aux->catalog);
-   dp_catalog_aux_reset(aux->catalog);
-   }
usleep_range(400, 500); /* at least 400us to next try */
goto unlock_exit;
}
diff --git a/drivers/gpu/drm/msm/dp/dp_catalog.c 
b/drivers/gpu/drm/msm/dp/dp_catalog.c
index 44f0c57..9c0ce98 100644
--- a/drivers/gpu/drm/msm/dp/dp_catalog.c
+++ b/drivers/gpu/drm/msm/dp/dp_catalog.c
@@ -190,6 +190,18 @@ int dp_catalog_aux_clear_hw_interrupts(struct dp_catalog 
*dp_catalog)
return 0;
 }
 
+/**
+ * dp_catalog_aux_reset() - reset AUX controller
+ *
+ * @aux: DP catalog structure
+ *
+ * return: void
+ *
+ * This function reset AUX controller
+ *
+ * NOTE: reset AUX controller will also clear any pending HPD related 
interrupts
+ * 
+ */
 void dp_catalog_aux_reset(struct dp_catalog *dp_catalog)
 {
u32 aux_ctrl;
@@ -483,6 +495,18 @@ int dp_catalog_ctrl_set_pattern(struct dp_catalog 
*dp_catalog,
return 0;
 }
 
+/**
+ * dp_catalog_ctrl_reset() - reset DP controller
+ *
+ * @aux: DP catalog structure
+ *
+ * return: void
+ *
+ * This function reset DP controller
+ *
+ * NOTE: reset DP controller will also clear any pending HPD related interrupts
+ * 
+ */
 void dp_catalog_ctrl_reset(struct dp_catalog *dp_catalog)
 {
u32 sw_reset;
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index e3462f5..f96c415 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -1296,7 +1296,8 @@ static int dp_ctrl_setup_main_link(struct dp_ctrl_private 
*ctrl,
 * transitioned to PUSH_IDLE. In order to start transmitting
 * a link training pattern, we have to first do soft reset.
 */
-   dp_catalog_ctrl_reset(ctrl->catalog);
+   if (*training_step != DP_TRAINING_NONE)
+   dp_catalog_ctrl_reset(ctrl->catalog);
 
ret = dp_ctrl_link_train(ctrl, cr, training_step);
 
@@ -1491,15 +1492,18 @@ static int dp_ctrl_deinitialize_mainlink(struct 
dp_ctrl_private *ctrl)
return 0;
 }
 
+static void dp_ctrl_link_idle_reset(struct dp_ctrl_private *ctrl)
+{
+   dp_ctrl_push_idle(>dp_ctrl);
+   dp_catalog_ctrl_reset(ctrl->catalog);
+}
+
 static int dp_ctrl_link_maintenance(struct dp_ctrl_private *ctrl)
 {
int ret = 0;
struct dp_cr_status cr;
int training_step = DP_TRAINING_NONE;
 
-   dp_ctrl_push_idle(>dp_ctrl);
-   dp_catalog_ctrl_reset(ctrl->catalog);
-
ctrl->dp_ctrl.pixel_rate = ctrl->panel->dp_mode.drm_mode.clock;
 
ret = dp_ctrl_setup_main_link(ctrl, , _step);
@@ -1626,6 +1630,7 @@ void dp_ctrl_handle_sink_request(struct dp_ctrl *dp_ctrl)
 
if (sink_request & DP_TEST_LINK_TRAINING) {
dp_link_send_test_response(ctrl->link);
+   dp_ctrl_link_idle_reset(ctrl);
if (dp_ctrl_link_maintenance(ctrl)) {
DRM_ERROR("LM failed: TEST_LINK_TRAINING\n");
return;
@@ -1679,7 +1684,7 @@ int dp_ctrl_on_link(struct dp_ctrl *dp_ctrl)
break;
}
 
-   training_step = DP_TRAINING_NONE;
+   training_step = DP_TRAINING_1;
rc = dp_ctrl_setup_main_link(ctrl, , _step);
if (rc == 0) {
/* training completed successfully */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 12:25 PM Jason Gunthorpe  wrote:
>
> Lots of places are relying on pin_user_pages long term pins of memory,
> and cannot be converted to notifiers.
>
> I don't think it is reasonable to just declare that insecure and
> requires privileges, it is a huge ABI break.

Also, I think GUP (and pin_user_pages() as a special case) is a lot
more important and more commonly used than UFFD.

Which is really why I think this needs to be fixed by just fixing UFFD
to take the write lock.

I think Andrea is blinded by his own love for UFFDIO: when I do a
debian codesearch for UFFDIO_WRITEPROTECT, all it finds is the kernel
and strace (and the qemu copies of the kernel headers).

Does the debian code search cover everything? Obviously not. But if
you cannot find A SINGLE USE of that thing in the Debian code search,
then that is sure a sign of _something_.

Linus


[PATCH 0/2] *** fix missing unplug interrupt problem ***

2021-01-07 Thread Kuogee Hsieh


Both AUX_SW_RESET and DP_SW_RESET clear pending HPD interrupts.
Therefore irq_hpd handler should not issues either aux or sw reset
to avoid following unplug interrupt be cleared accidentally.

Kuogee Hsieh (2):
  drm/msm/dp: postpone irq_hpd event during connection pending state
  drm/msm/dp: unplug interrupt missed after irq_hpd handler

 drivers/gpu/drm/msm/dp/dp_aux.c |  7 ---
 drivers/gpu/drm/msm/dp/dp_catalog.c | 24 
 drivers/gpu/drm/msm/dp/dp_ctrl.c| 15 ++-
 drivers/gpu/drm/msm/dp/dp_display.c |  7 +++
 drivers/gpu/drm/msm/dp/dp_panel.c   | 12 +---
 5 files changed, 50 insertions(+), 15 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



RE: [EXTERNAL] Re: [PATCH] Hyper-V: pci: x64: Generalize irq/msi set-up and handling

2021-01-07 Thread Sunil Muthuswamy
> Do you see "Co-developed-by" in the title of that section? This is how
> you express co-authorship.
Yes, I do now.

> 
> As it is now:
> 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Boqun Feng (Microsoft) 
> 
> it says that you authored the patch and Boqun handled it further, i.e.,
> he's in the patch's delivery path. But he isn't - you're sending it.

thanks, I will send out a new patch with this updated.


[PATCH] dma-buf: cma_heap: Fix memory leak in CMA heap

2021-01-07 Thread John Stultz
Bing Song noticed the CMA heap was leaking memory due to a flub
I made in commit a5d2d29e24be ("dma-buf: heaps: Move heap-helper
logic into the cma_heap implementation"), and provided this fix
which ensures the pagelist is also freed on release.

Cc: Bing Song 
Cc: Sumit Semwal 
Cc: Liam Mark 
Cc: Laura Abbott 
Cc: Brian Starkey 
Cc: Hridya Valsaraju 
Cc: Suren Baghdasaryan 
Cc: Sandeep Patil 
Cc: Daniel Mentz 
Cc: Chris Goldsworthy 
Cc: Ørjan Eide 
Cc: Robin Murphy 
Cc: Ezequiel Garcia 
Cc: Simon Ser 
Cc: James Jones 
Cc: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Reported-by: Bing Song 
Fixes: a5d2d29e24be ("dma-buf: heaps: Move heap-helper logic into the cma_heap 
implementation")
Signed-off-by: John Stultz 
---
 drivers/dma-buf/heaps/cma_heap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 3c4e34301172..364fc2f3e499 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -251,6 +251,9 @@ static void cma_heap_dma_buf_release(struct dma_buf *dmabuf)
buffer->vaddr = NULL;
}
 
+   /* free page list */
+   kfree(buffer->pages);
+   /* release memory */
cma_release(cma_heap->cma, buffer->cma_pages, buffer->pagecount);
kfree(buffer);
 }
-- 
2.17.1



Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 12:17 PM Linus Torvalds
 wrote:
>
> I still think the real fix is "Don't do that then", and just take the
> write lock.

The alternative, of course, is to just make sure the page table flush
is done inside the page table lock (and then we make COW do the copy
inside of it).

But this whole "we know UFFD breaks all rules, we'll add even more
crap to it" approach is horrendous.

   Linus


Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Jason Gunthorpe
On Thu, Jan 07, 2021 at 03:04:00PM -0500, Andrea Arcangeli wrote:

> vmsplice syscall API is insecure allowing long term GUP PINs without
> privilege.

Lots of places are relying on pin_user_pages long term pins of memory,
and cannot be converted to notifiers.

I don't think it is reasonable to just declare that insecure and
requires privileges, it is a huge ABI break.

FWIW, vhost tries to use notifiers as a replacement for GUP, and I
think it ended up quite strange and complicated. It is hard to
maintain performance when every access to the pages needs to hold some
protection against parallel invalidation.

Jason


Re: [GIT PULL] regmap fixes for v5.11-rc2

2021-01-07 Thread pr-tracker-bot
The pull request you sent on Thu, 07 Jan 2021 17:09:30 +:

> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git 
> tags/regmap-fix-v5.11-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fc37784dc71bc9dd3a00a2f01906b3966e4034f2

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [GIT PULL] regulator fixes for v5.11-rc2

2021-01-07 Thread pr-tracker-bot
The pull request you sent on Thu, 07 Jan 2021 17:09:55 +:

> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
> tags/regulator-fix-v5.11-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a1a7b4f32433e91f0fff32cde534eadc67242298

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [GIT PULL] SPI fixes for v5.11-rc2

2021-01-07 Thread pr-tracker-bot
The pull request you sent on Thu, 07 Jan 2021 17:10:22 +:

> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git 
> tags/spi-fix-v5.11-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f5e6c330254ae691f6d7befe61c786eb5056007e

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [EXTERNAL] Re: [PATCH] Hyper-V: pci: x64: Generalize irq/msi set-up and handling

2021-01-07 Thread Borislav Petkov
On Thu, Jan 07, 2021 at 08:16:28PM +, Sunil Muthuswamy wrote:
> > What is this SoB chain supposed to say?
> 
> Quoting from the link you shared:
> "The Signed-off-by: tag indicates that the signer was involved in the 
> development of
> the patch, or that he/she was in the patch's delivery path."
> 
> My intent to include Boqun in the Signed-off by tag was to indicate that he 
> was involved
> in the development of the patch here.

Do you see "Co-developed-by" in the title of that section? This is how
you express co-authorship.

As it is now:

Signed-off-by: Sunil Muthuswamy 
Signed-off-by: Boqun Feng (Microsoft) 

it says that you authored the patch and Boqun handled it further, i.e.,
he's in the patch's delivery path. But he isn't - you're sending it.

HTH.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Marek Behún
On Thu, 7 Jan 2021 19:45:49 +
Russell King - ARM Linux admin  wrote:

> I think you're not reading the code very well. It checks for bytes at
> offset 1..blocksize-1, blocksize+1..2*blocksize-1, etc are zero. It
> does _not_ check that byte 0 or the byte at N*blocksize is zero - these
> bytes are skipped. In other words, the first byte of each transfer can
> be any value. The other bytes of the _entire_ ID must be zero.

Wouldn't it be better, instead of checking if 1..blocksize-1 are zero,
to check whether reading byte by byte returns the same as reading 16
bytes whole?

Marek


Re: [PATCH v2 2/4] KVM: nSVM: correctly restore nested_run_pending on migration

2021-01-07 Thread Sean Christopherson
On Thu, Jan 07, 2021, Paolo Bonzini wrote:
> On 07/01/21 10:38, Maxim Levitsky wrote:
> > The code to store it on the migration exists, but no code was restoring it.
> > 
> > One of the side effects of fixing this is that L1->L2 injected events
> > are no longer lost when migration happens with nested run pending.
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >   arch/x86/kvm/svm/nested.c | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index ee4f2082ad1bd..cc3130ab612e5 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -1200,6 +1200,10 @@ static int svm_set_nested_state(struct kvm_vcpu 
> > *vcpu,
> >  * in the registers, the save area of the nested state instead
> >  * contains saved L1 state.
> >  */
> > +
> > +   svm->nested.nested_run_pending =
> > +   !!(kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING);
> > +
> > copy_vmcb_control_area(>control, >vmcb->control);
> > hsave->save = *save;
> > 
> 
> Nice fix and we need to do it anyway.
> 
> That said, the v1 change had some appeal to it.

Which v1 change are you referring to?

> In the VMX case (if properly implemented) it would allow removing the weird
> nested_run_pending case from prepare_vmcs02_early.  I think it's a valuable
> invariant that there are no events in the VMCS after each KVM_RUN iteration,
> and this special case is breaking the invariant.

Hmm, as weird as that code is, I think it's actually the most architecturally
correct behavior.  Technically, the clearing of VM_ENTRY_INTR_INFO.VALID
shouldn't be visible in vmcs12 until a nested VM-Exit occurs, e.g. copying the
vmcs02 value to vmcs12 in vmx_get_nested_state() would work, but it's wrong at
the same time.  Ditto for L1 (or L2) writing vmcs12.VM_ENTRY_INTR_INFO while L2
is running (ignoring the SDM's very clear warning that doing so is bad); from
L1/L2's perspective, there is no VM-Entry so writing vmcs12.VM_ENTRY_INTR_INFO
should never generate an event in L2, even on CPUs without VMCS caching.


RE: [EXTERNAL] Re: [PATCH] Hyper-V: pci: x64: Generalize irq/msi set-up and handling

2021-01-07 Thread Sunil Muthuswamy
> There seems to be a long tradition of dreaming up random formats for
> the subject lines of Hyper-V-related patches.  Look at all the
> different ways these are spelled, hyphenated, and capitalized:
> 
I am reading this as a suggestion to unify the format of the subject of
the Hyper-V patches. Wei and other maintainers of the Hyper-V branch; do
you have any suggestions? If we have already defined a format and it is me
who is not following it, please forward the document my way.

> On Thu, Jan 07, 2021 at 05:05:36AM +, Sunil Muthuswamy wrote:
> > Currently, operations related to irq/msi in Hyper-V vPCI are
> 
> In comments in the patch, you use "IRQ" and "MSI".  I don't know
> whether "vPCI" means something or is a typo.  I suppose it probably
> means "virtual PCI" as below.
> 

Yes, vPCI here means virtual PCI.


RE: [EXTERNAL] Re: [PATCH] Hyper-V: pci: x64: Generalize irq/msi set-up and handling

2021-01-07 Thread Sunil Muthuswamy
> What is this SoB chain supposed to say?

Quoting from the link you shared:
"The Signed-off-by: tag indicates that the signer was involved in the 
development of
the patch, or that he/she was in the patch's delivery path."

My intent to include Boqun in the Signed-off by tag was to indicate that he was 
involved
in the development of the patch here.

- Sunil


Re: [PATCH 01/22] Add Vision Processing Unit (VPU) documentation.

2021-01-07 Thread mark gross
On Fri, Dec 18, 2020 at 03:30:18PM -0800, Randy Dunlap wrote:
> Hi--
> 
> On 12/1/20 2:34 PM, mgr...@linux.intel.com wrote:
> > From: mark gross 
> > 
> > 
> > Reviewed-by: Mark Gross 
> > Signed-off-by: Mark Gross 
> 
> My reading of submitting-patches.rst seems to indicate that
> the Reviewer and Submitter are probably not the same person.
> 
> Are you sure that you reviewed it?
> 
> 
> > ---
> >  Documentation/index.rst  |   3 +-
> >  Documentation/vpu/index.rst  |  16 ++
> >  Documentation/vpu/vpu-stack-overview.rst | 267 +++
> >  3 files changed, 285 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/vpu/index.rst
> >  create mode 100644 Documentation/vpu/vpu-stack-overview.rst
> > 
> > diff --git a/Documentation/index.rst b/Documentation/index.rst
> > index 57719744774c..0a2cc0204e8f 100644
> > --- a/Documentation/index.rst
> > +++ b/Documentation/index.rst
> > @@ -1,4 +1,4 @@
> > -.. SPDX-License-Identifier: GPL-2.0
> > +.. SPDX-License-Identifier: GPL-2.0-only
> 
> That looks both inappropriate for this patch and incorrect AFAICT.
> 
> >  
> >  
> >  .. The Linux Kernel documentation master file, created by
> > @@ -137,6 +137,7 @@ needed).
> > misc-devices/index
> > scheduler/index
> > mhi/index
> > +   vpu/index
> >  
> >  Architecture-agnostic documentation
> >  ---
> > diff --git a/Documentation/vpu/index.rst b/Documentation/vpu/index.rst
> > new file mode 100644
> > index ..7e290e048910
> > --- /dev/null
> > +++ b/Documentation/vpu/index.rst
> > @@ -0,0 +1,16 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> 
> license-rules.rst says:
> 
>   For 'GNU General Public License (GPL) version 2 only' use:
> SPDX-License-Identifier: GPL-2.0
> 
> > +
> > +
> > +Vision Processor Unit Documentation
> > +
> > +
> > +This documentation contains information for the Intel VPU stack.
> > +
> > +.. class:: toc-title
> > +
> > +  Table of contents
> > +
> > +.. toctree::
> > +   :maxdepth: 2
> > +
> > +   vpu-stack-overview
> > diff --git a/Documentation/vpu/vpu-stack-overview.rst 
> > b/Documentation/vpu/vpu-stack-overview.rst
> > new file mode 100644
> > index ..53c06a7d9a52
> > --- /dev/null
> > +++ b/Documentation/vpu/vpu-stack-overview.rst
> > @@ -0,0 +1,267 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> 
> Nope.
> 
> > +
> > +==
> > +Intel VPU architecture
> > +==
> > +
> > +Overview
> > +
> > +
> > +The Intel Movidius acquisition has developed a Vision Processing Unit (VPU)
> > +roadmap of products starting with Keem Bay (KMB).  The HW configurations 
> > the
> 
> s/HW/hardware/
> 
> > +VPU can support include:
> > +
> > +1. Standalone smart camera that does local CV processing in camera
> 
> Tell us what CV is before using it.
> 
> > +2. Standalone appliance or SBC device connected to a network and tethered
> 
> Tell us what SBC is before using it. (yeah, I know)
> 
> > +   cameras doing local CV processing
> > +3. Embedded in a USB dongle or M.2 as an CV accelerator.
> > +4. Multiple VPU enabled SOC's on a PCIE card as a CV accelerator in a 
> > larger IA
> 
>   PCIe (?)
> 
> > +   box or server.
> > +
> > +Keem Bay is the first instance of this family of products. This document
> > +provides an architectural overview of the SW stack supporting the VPU 
> > enabled
> 
> s/SW/software/
> 
> > +products.
> > +
> > +Keem Bay (KMB) is a Computer Vision AI processing SoC based on ARM A53 CPU 
> > that
> > +provides Edge neural network acceleration (inference) and includes a Vision
> > +Processing Unit (VPU) hardware.  The ARM CPU SubSystem (CPUSS) interfaces
> > +locally to the VPU and enables integration/interfacing with a remote host 
> > over
> > +PCIe or USB or Ethernet interfaces. The interface between the CPUSS and 
> > the VPU
> > +is implemented with HW FIFOs (Control) and coherent memory mapping (Data) 
> > such
> > +that zero copy processing can happen within the VPU.
> > +
> > +The KMB can be used in all 4 of the above classes of designs.
> > +
> > +We refer to the 'local host' as being the ARM part of the SoC, while the
> > +'remote host' as the IA system hosting the KMB device(s).  The KMB SoC 
> > boots
> > +from an eMMC via uBoot and ARM Linux compatible device tree interface with 
> > an
> > +expectation to fully boot within hundreds of milliseconds.  There is also
> > +support for downloading the kernel and root file system image from a remote
> > +host.
> > +
> > +The eMMC can be updated with standard mender update process.
> 
>  Mender
> 
> > +See https://github.com/mendersoftware/mender
> > +
> > +The VPU is started and controlled from the A53 local host.  Its firmware 
> > image
> > +is loaded using the drive FW helper KAPI's.

Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 12:04 PM Andrea Arcangeli  wrote:
>
> However there are two cases that could wrprotecting exclusive anon
> pages with only the mmap_read_lock:

I still think the real fix is "Don't do that then", and just take the
write lock.

The UFFDIO_WRITEPROTECT case simply isn't that critical. It's not a
normal operation. Same goes for softdirty.

Why have those become _so_ magical that they can break the VM for
everybody else?

 Linus


Re: [PATCH] usb: musb: add printf attribute to log function

2021-01-07 Thread Bin Liu
Hi,

On Tue, Dec 22, 2020 at 01:52:48AM -0800, Joe Perches wrote:
> On Tue, 2020-12-22 at 09:52 +0100, Greg KH wrote:
> > On Mon, Dec 21, 2020 at 08:25:47AM -0800, t...@redhat.com wrote:
> > > From: Tom Rix 
> > > 
> > > Attributing the function allows the compiler to more thoroughly
> > > check the use of the function with -Wformat and similar flags.
> > > 
> > > Signed-off-by: Tom Rix 
> > > ---
> > >  drivers/usb/musb/musb_debug.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/usb/musb/musb_debug.h b/drivers/usb/musb/musb_debug.h
> > > index e5b3506c7b3f..dfc0d02695fa 100644
> > > --- a/drivers/usb/musb/musb_debug.h
> > > +++ b/drivers/usb/musb/musb_debug.h
> > > @@ -17,6 +17,7 @@
> > >  #define INFO(fmt, args...) yprintk(KERN_INFO, fmt, ## args)
> > >  #define ERR(fmt, args...) yprintk(KERN_ERR, fmt, ## args)

These should be converted to dev_info or dev_err. I believe the work was only
done on those actively used platform drivers.

Further cleanup patches are welcomed.

> > >  
> > > 
> > > +__printf(2, 3)
> > >  void musb_dbg(struct musb *musb, const char *fmt, ...);
> > 
> > While I understand the need for this, did this find any problems?
> > If not, then it's not worth adding,
> 
> I have to disagree with that Greg.  While the driver isn't in active
> development, a trivial mod to make it less likely a defect is introduced
> by any additional code is still a useful addition.
> 
> > the driver-specific debugging macros
> > should be removed entirely and just use dev_dbg() and friends instead.
> 
> Read the suggested change I posted in reply.
> 
> btw: the musb_dbg function is actually a trace function and not a
> dmesg/logging mechanism.

Yes, musb_dbg() generates ftrace logs instead.

> 
> drivers/usb/musb/musb_trace.c:void musb_dbg(struct musb *musb, const char 
> *fmt, ...)
> drivers/usb/musb/musb_trace.c-{
> drivers/usb/musb/musb_trace.c-  struct va_format vaf;
> drivers/usb/musb/musb_trace.c-  va_list args;
> drivers/usb/musb/musb_trace.c-
> drivers/usb/musb/musb_trace.c-  va_start(args, fmt);
> drivers/usb/musb/musb_trace.c-  vaf.fmt = fmt;
> drivers/usb/musb/musb_trace.c-  vaf.va = 
> drivers/usb/musb/musb_trace.c-
> drivers/usb/musb/musb_trace.c-  trace_musb_log(musb, );
> drivers/usb/musb/musb_trace.c-
> drivers/usb/musb/musb_trace.c-  va_end(args);
> drivers/usb/musb/musb_trace.c-}
> 
> drivers/usb/musb/musb_trace.h:TRACE_EVENT(musb_log,
> drivers/usb/musb/musb_trace.h-  TP_PROTO(struct musb *musb, struct va_format 
> *vaf),
> drivers/usb/musb/musb_trace.h-  TP_ARGS(musb, vaf),
> drivers/usb/musb/musb_trace.h-  TP_STRUCT__entry(
> drivers/usb/musb/musb_trace.h-  __string(name, 
> dev_name(musb->controller))
> drivers/usb/musb/musb_trace.h-  __dynamic_array(char, msg, 
> MUSB_MSG_MAX)
> drivers/usb/musb/musb_trace.h-  ),
> drivers/usb/musb/musb_trace.h-  TP_fast_assign(
> drivers/usb/musb/musb_trace.h-  __assign_str(name, 
> dev_name(musb->controller));
> drivers/usb/musb/musb_trace.h-  vsnprintf(__get_str(msg), 
> MUSB_MSG_MAX, vaf->fmt, *vaf->va);
> drivers/usb/musb/musb_trace.h-  ),
> drivers/usb/musb/musb_trace.h-  TP_printk("%s: %s", __get_str(name), 
> __get_str(msg))
> drivers/usb/musb/musb_trace.h-);
> 
> Is that trace mechanism useful though?  I think it's somewhat odd.

The intention was to convert runtime debug log to ftrace for efficiency.
Some of the original printk() are converted to trace points. Other
unclassified prints are converted to musb_dbg() for further clean up.

-Bin.


[PATCH 0/2] page_count can't be used to decide when wp_page_copy

2021-01-07 Thread Andrea Arcangeli
Hello,

I prepared in 2/2 a fix to make UFFDIO_WRITEPROTECT and
clear_refs_write cope with page_count in do_wp_page. It'd stack
perfectly on top of 1/2 from will which fixes an orthogonal regression
and it'd need to be applied first since its Fixes tag comes first.

I hope this patchset shows and my initial my answer in
https://lkml.kernel.org/r/x+poxcizo392p...@redhat.com shows I tried to
keep an open mind and to try to fix what
09854ba94c6aad7886996bfbee2530b3d8a7f4f4 broke. Even in the commit of
2/2 I wrote "is completely correct" despite I had to change my mind
about that.

It turns out the memory corruption caused by the breakage in the TLB
flushing is a walk in the park to fix for clear_refs and
UFFDIO_WRITEPROTECT, that is only the tip of the icerberg.

To simplify, let's forget the mmap_read_lock and let's assume we
hypothetically throw away the mmap_read_lock from the kernel and
UFFDIO_WRITEPROTECT and clear_refs and everything else takes the
mmap_write_lock only.

Even then, clear_refs and UFFDIO_WRITEPROTECT will remain broken if
the memory they're wrprotecting is GUP pinned by a secondary MMU or
RDMA or something, that is reading the memory through a read GUP
pin.

You can only make fork safe from the page_count by COWing any page
that has a GUP pin, because fork is actually allowed to COW (and that
will also fix the longstanding fork vs threads vs GUP race as result,
which I tried already once in
https://lkml.kernel.org/r/20090311165833.GI27823@random.random
). However it's fundamentally flawed and forbidden to COW after
clear_refs and UFFDIO_WRITEPROTECT, if fork or clone have never been
called and there's any GUP pin on the pages that were wrprotected.

In other words, the page_count check in do_wp_page, extends the old
fork vs gup vs threads race, so that it also happens within a single
thread without fork() involvement.

The above scenario is even more explicit and unfixable when it's not
just a single page but something bigger mapped by multiple processes
that was GUP pinned.

Either COWs would have to be forbidden and features clear_refs dropped
from the kernel and mprotect also would be strictly forbidden to ever
leave any pte_write bit clear for any reason, or do_wp_page requires
full accuracy on exclusive pages in MAP_PRIVATE private COWs or
MAP_ANON do_wp_page.

In simple terms: the page_count check in do_wp_page makes it
impossible to wrprotect memory, if such memory is under a !FOLL_WRITE
GUP pin.

It's a feature (not a bug) that the GUP pin must not trigger a COW,
and this is also already explicitly documented in comments in the
current source in places that are still using mapcount, or we'd be
probably dealing with more breakage than what's reproducible right
now.

Can we take a step back and start looking at what started all this VM
breakage? I mean specifically Jann's discovery that parent can attack
the child after the child does drop privs by using vmsplice long term
unprivileged GUP pins?

vmsplice syscall API is insecure allowing long term GUP PINs without
privilege.

Before touching any of the COW code, something had to be done on
vmsplice because even after 09854ba94c6aad7886996bfbee2530b3d8a7f4f4
there's still no way to way to tame the other VM breakage and working
DoS caused by vmsplice. (I already sent the vmsplice DoS PoC exploit
to who of may concern on Aug 27 2020)

zygote before worrying about COWs, needs to block vmsplice in the
child (and io_uring until it's fixed) with seccomp by default until
the syscall become privileged.

I also recommended such change to podman default allowlist, but
apparently it wasn't committed checking below, vmsplice still there in
the allowlist unfortunately. I'll try to suggest it again in a follow
up.

https://github.com/containers/common/blob/master/pkg/seccomp/seccomp.json

io_uring unlike vmsplice can remain unprivileged, but it needs to use
the mmu notifier to make those GUP pins become VM neutral.

After io_uring is fixed with mmu notifier, and vmsplice becomes a
privileged syscall, the concern that remains for the zygote model is
on par with the fact that there is RAM carelessly shared with L1 and
L2 cache also shared between parent and child. It'll take DMA and
burning the flash in order to keep poking for the parent to write at
the wrong time. So the phone would get so hot or the battery would run
out of juice before the attack can expose data from the parent. So for
an attacker it may be easier to look for a side channel with flush and
reload on the shared L2 that doesn't rely on more costly GUP transient
pins.

So the concern that started all this, once vmsplice and io_uring are
both fixes, in my view becomes theoretical.

Especially on enterprise (non-embedded), this issue is not even
theoretical but it's fully irrelevant, since execve has to be used
after drop privs (or the app needs to use MADV_DONTFORK or unshare all
memory if using a jailer that doesn't execve) to avoid the
aforementioned side channel to remain. Only 

[PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

2021-01-07 Thread Andrea Arcangeli
NOTE: the "Fixes" tag used here is to optimize the backporting, but
09854ba94c6aad7886996bfbee2530b3d8a7f4f4 is completely
correct. Despite being correct it happened to uncover some implicit
assumption some other code made on a very specific behavior in
do_wp_page that had to be altered by such commit.

The page_mapcount is always guaranteed == 1 for an exclusive anon
page, so when it was used to decide if an exclusive page under
wrprotection could be reused (as in wp_page_reuse), the outcome would
always come true.

The page_mapcount had to be replaced with the page_count because it
couldn't account for GUP pins, so after that change, for the first
time, wp_page_copy can now be called also for exclusive anon pages
that are underway wrprotection.

Even then everything is still fine for all cases that wrprotect with
the mmap_write_lock since the COW faults cannot run concurrently in
such case.

However there are two cases that could wrprotecting exclusive anon
pages with only the mmap_read_lock: soft dirty clear_refs_write and
UFFDIO_WRITEPROTECT. Both of them would benefit from keeping their
wrprotection runtime scalable and to keep using the mmap_read_lock
without having to switch to the mmap_write_lock.

To stick to the mmap_read_lock, for both UFFDIO_WRITEPROTECT and
clear_refs_write we need to handle the new reality that there can be
COWs (as in wp_page_copy) happening on exclusive anon pages that are
under wrprotection, but with the respective TLB flush still deferred.

An example of the problematic UFFDIO_USERFAULTFD runtime is shown
below.

 CPU0   CPU 1   CPU 2
 -- ---
 userfaultfd_wrprotect(mode_wp = true)
 PT lock
 atomic set _PAGE_UFFD_WP and clear _PAGE_WRITE
 PT unlock

do_page_fault FAULT_FLAG_WRITE
userfaultfd_wrprotect(mode_wp = false)
PT lock
ATOMIC clear _PAGE_UFFD_WP <- problem
/* _PAGE_WRITE not set */
PT unlock
XX BUG RACE window open here

PT lock
FAULT_FLAG_WRITE is set by CPU
_PAGE_WRITE is still clear in pte
PT unlock

wp_page_copy
cow_user_page runs with stale TLB

 deferred tlb flush <- too late
 XX BUG RACE window close here

userfaultfd_wrprotect(mode_wp = true) is never a problem because as
long as the uffd-wp flag is set in the pte/hugepmd the page fault is
guaranteed to reach a dead end in handle_userfault(). The window for
uffd-wp not to be set while the pte has been wrprotected but the TLB
flush is still pending, is opened when userfaultfd_wrprotect(mode_wp =
false) releases the PT-lock as shown above and it closes when the
first deferred TLB flush runs. If do_wp_page->wp_copy_page runs within
such window, some userland writes can get lost in the copy and they
can end up in the original page that gets discarded.

The softy dirty runtime is similar and it would be like below:

 CPU0   CPU 1   CPU 2
 -- ---
instantiate writable TLB
 clear_refs_write
 PT lock
 pte_wrprotect
 PT unlock
do_page_fault FAULT_FLAG_WRITE
PT lock
FAULT_FLAG_WRITE is set by CPU
_PAGE_WRITE is still clear in pte
PT unlock

wp_page_copy
cow_user_page...
writes through the TLB
...cow_user_page

So to handle this race a wrprotect_tlb_flush_pending atomic counter is
added to the vma.

This counter needs to be elevated while holding the mmap_read_lock
before taking the PT lock to wrprotect the pagetable and it can only
be decreased after the deferred TLB flush is complete.

This way the page fault can trivially serialize against pending TLB
flushes using a new helper: sync_wrprotect_tlb_flush_pending().

Testing with the userfaultfd selftest is showing 100% reproducible mm
corruption with writes getting lost, before this commit.

$ ./userfaultfd anon 100 100
nr_pages: 25600, nr_pages_per_cpu: 3200
bounces: 99, mode: rnd racing read, userfaults: 773 missing 
(215+205+58+114+72+85+18+6), 9011 wp (1714+1714+886+1227+1009+1278+646+537)
[..]
bounces: 72, mode: poll, userfaults: 720 missing (187+148+102+49+92+103+24+15), 
9885 wp (1452+1175+1104+1667+1101+1365+913+1108)
bounces: 71, mode: rnd racing ver read, page_nr 25241 memory corruption 6 7

After the commit the userland memory corruption is gone as expected.

Cc: sta...@kernel.org
Reported-by: Nadav Amit 

[PATCH 1/2] mm: proc: Invalidate TLB after clearing soft-dirty page state

2021-01-07 Thread Andrea Arcangeli
From: Will Deacon 

Since commit 0758cd830494 ("asm-generic/tlb: avoid potential double
flush"), TLB invalidation is elided in tlb_finish_mmu() if no entries
were batched via the tlb_remove_*() functions. Consequently, the
page-table modifications performed by clear_refs_write() in response to
a write to /proc//clear_refs do not perform TLB invalidation.
Although this is fine when simply aging the ptes, in the case of
clearing the "soft-dirty" state we can end up with entries where
pte_write() is false, yet a writable mapping remains in the TLB.

Fix this by avoiding the mmu_gather API altogether: managing both the
'tlb_flush_pending' flag on the 'mm_struct' and explicit TLB
invalidation for the sort-dirty path, much like mprotect() does already.

Fixes: 0758cd830494 ("asm-generic/tlb: avoid potential double flush")
Signed-off-by: Will Deacon 
Signed-off-by: Andrea Arcangeli 
---
 fs/proc/task_mmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee5a235b3056..a127262ba517 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1189,7 +1189,6 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
struct mm_struct *mm;
struct vm_area_struct *vma;
enum clear_refs_types type;
-   struct mmu_gather tlb;
int itype;
int rv;
 
@@ -1234,7 +1233,6 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
count = -EINTR;
goto out_mm;
}
-   tlb_gather_mmu(, mm, 0, -1);
if (type == CLEAR_REFS_SOFT_DIRTY) {
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
@@ -1252,15 +1250,18 @@ static ssize_t clear_refs_write(struct file *file, 
const char __user *buf,
break;
}
 
+   inc_tlb_flush_pending(mm);
mmu_notifier_range_init(, MMU_NOTIFY_SOFT_DIRTY,
0, NULL, mm, 0, -1UL);
mmu_notifier_invalidate_range_start();
}
walk_page_range(mm, 0, mm->highest_vm_end, _refs_walk_ops,
);
-   if (type == CLEAR_REFS_SOFT_DIRTY)
+   if (type == CLEAR_REFS_SOFT_DIRTY) {
mmu_notifier_invalidate_range_end();
-   tlb_finish_mmu(, 0, -1);
+   flush_tlb_mm(mm);
+   dec_tlb_flush_pending(mm);
+   }
mmap_read_unlock(mm);
 out_mm:
mmput(mm);



Re: [PATCH v1 0/5] Enable fw_devlink=on by default

2021-01-07 Thread Greg Kroah-Hartman
On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
> 
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
> 
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
>   worry about functional dependencies between modules (depmod is still
>   needed for symbol dependencies).
> 
> Greg/Rafael,
> 
> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> see some issues due to device drivers that aren't following best
> practices (they don't expose the device to driver core). Want to
> identify those early on and try to have them fixed before 5.11 release.
> See [1] for an example of such a case.

Now queued up in my tree, will show up in linux-next in a few days,
let's see what breaks!  :)

And it is scheduled for 5.12-rc1, not 5.11, sorry.

thanks,

greg k-h


Re: [PATCH] RDMA/hns: remove h from printk format specifier

2021-01-07 Thread Jason Gunthorpe
On Wed, Dec 23, 2020 at 11:30:41AM -0800, t...@redhat.com wrote:
> From: Tom Rix 
> 
> This change fixes the checkpatch warning described in this commit
> commit cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of 
> unnecessary %h[xudi] and %hh[xudi]")
> 
> Standard integer promotion is already done and %hx and %hhx is useless
> so do not encourage the use of %hh[xudi] or %h[xudi].
> 
> Signed-off-by: Tom Rix 
> Reviewed-by: Leon Romanovsky 
> ---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to for-next, thanks

Jason


Re: [PATCH 0/3] regulator: mcp16502: make lpm pin optional

2021-01-07 Thread Mark Brown
On Thu, 7 Jan 2021 16:15:24 +0200, Claudiu Beznea wrote:
> This patch makes the LPM pin as optional as this may be controlled
> in the last phase of suspend procedure to decrease the power consumption
> while suspended. Along w/ this update the MAINTAINERS entry for this
> driver.
> 
> Thank you,
> Claudiu Beznea
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
for-next

Thanks!

[1/3] dt-bindings: regulator: mcp16502: document lpm as optional
  commit: eea0b4e213232b28a25de5b88af9e25667e8d2f2
[2/3] regulator: mcp16502: lpm pin can be optional on some platforms
  commit: 3c42728c18d093e8951ad6caf7daa89fa2f54702
[3/3] MAINTAINERS: add myself as maintainer for mcp16502
  commit: 8aad7fabce6ad9491cc7d23f85d9798a4a0ce399

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH] regulator: bd718x7: Stop using parent data

2021-01-07 Thread Mark Brown
On Thu, 7 Jan 2021 14:23:55 +0200, Matti Vaittinen wrote:
> The ROHM PMIC regulator drivers only need the regmap pointer from
> the parent device. Regmap can be obtained via dev_get_regmap()
> so do not require parent to populate driver data for that.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 
for-next

Thanks!

[1/1] regulator: bd718x7: Stop using parent data
  commit: 907dfdc945aa3d183cdc6a81b963ee3b42ece306

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH] mm/hugetlb: Fix potential double free in hugetlb_register_node() error path

2021-01-07 Thread Mike Kravetz
On 1/7/21 4:32 AM, Miaohe Lin wrote:
> In hugetlb_sysfs_add_hstate(), we would do kobject_put() on hstate_kobjs
> when failed to create sysfs group but forget to set hstate_kobjs to NULL.
> Then in hugetlb_register_node() error path, we may free it again via
> hugetlb_unregister_node().
> 
> Fixes: a3437870160c ("hugetlb: new sysfs interface")
> Signed-off-by: Miaohe Lin 
> Cc: 
> ---
>  mm/hugetlb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thanks, this is a potential issue that should be fixed.

Reviewed-by: Mike Kravetz 

This has been around for a long time (more than 12 years).  I suspect
nobody actually experienced this issue.  You just discovered via code
inspection.  Correct?
At one time cc stable would not be accepted for this type of issue,
not sure about today.
-- 
Mike Kravetz


[PATCH] net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE

2021-01-07 Thread Aleksander Jan Bajkowski
Exclude RMII from modes that report 1 GbE support. Reduced MII supports
up to 100 MbE.

Fixes: 14fceff ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski 
---
 drivers/net/dsa/lantiq_gswip.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 4b36d89bec06..662e68a0e7e6 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -1436,11 +1436,12 @@ static void gswip_phylink_validate(struct dsa_switch 
*ds, int port,
phylink_set(mask, Pause);
phylink_set(mask, Asym_Pause);
 
-   /* With the exclusion of MII and Reverse MII, we support Gigabit,
-* including Half duplex
+   /* With the exclusion of MII, Reverse MII and Reduced MII, we
+* support Gigabit, including Half duplex
 */
if (state->interface != PHY_INTERFACE_MODE_MII &&
-   state->interface != PHY_INTERFACE_MODE_REVMII) {
+   state->interface != PHY_INTERFACE_MODE_REVMII &&
+   state->interface != PHY_INTERFACE_MODE_RMII) {
phylink_set(mask, 1000baseT_Full);
phylink_set(mask, 1000baseT_Half);
}
-- 
2.20.1



Re: [PATCH] net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE

2021-01-07 Thread Florian Fainelli
On 1/7/21 11:58 AM, Aleksander Jan Bajkowski wrote:
> Exclude RMII from modes that report 1 GbE support. Reduced MII supports
> up to 100 MbE.
> 
> Fixes: 14fceff ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
> Signed-off-by: Aleksander Jan Bajkowski 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 11:33:36AM -0800, Linus Torvalds wrote:

> In fact, even some threaded app that does what I suspect it could do
> would likely be ok with it 99% of the time. Because the situation
> where you change the fd in the poll array is likely not the common
> case, and even if some -1 file descriptor gets overwritten by a valid
> one by the poll() system call again, it probably ends up being very
> hard to see a failure.
> 
> Which just makes me even more nervous.

Hmm...  But anything like that will have another problem - we do
copyin only once.  And we repeat fdget() on each iteration of
do_poll() loop.  Sure, we don't actually put anything on the
queues after the first time around, and __pollwait() keeps the
ones we are actually waiting for pinned, but...  If another
thread stores -1 to ->fd, then closes what used to be there
and moves on, what will it see?  ->poll() calls will be done
for whatever file we'd reused the descriptor for.  Sure,
the kernel won't break, but the caller of poll() would need
to be very careful about what it sees...

Frankly, I'd consider seeing that kind of games in the userland
as a big red flag; I'm not saying it's OK to break the suckers
even worse than they are now, but I'm curious whether anything
in the userland does it *and* how many bugs does it have around
those uses of poll()...


[watchdog] watchdog: mei_wdt: request stop on unregister

2021-01-07 Thread Tomas Winkler
From: Alexander Usyskin 

Send the stop command to the firmware on watchdog unregister
to eleminate false event on suspend.

Cc: 
Signed-off-by: Alexander Usyskin 
Signed-off-by: Tomas Winkler 
---
 drivers/watchdog/mei_wdt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/watchdog/mei_wdt.c b/drivers/watchdog/mei_wdt.c
index 5391bf3e6b11..c5967d8b4256 100644
--- a/drivers/watchdog/mei_wdt.c
+++ b/drivers/watchdog/mei_wdt.c
@@ -382,6 +382,7 @@ static int mei_wdt_register(struct mei_wdt *wdt)
 
watchdog_set_drvdata(>wdd, wdt);
watchdog_stop_on_reboot(>wdd);
+   watchdog_stop_on_unregister(>wdd);
 
ret = watchdog_register_device(>wdd);
if (ret)
-- 
2.26.2



RE: [PATCH v7 0/2] PCI: cadence: Retrain Link to work around Gen2

2021-01-07 Thread Athani Nadeem Ladkhan
Hello,

Requesting to provide review comments.

Thanks & Regards,
Nadeem Athani

> -Original Message-
> From: Nadeem Athani 
> Sent: Wednesday, December 30, 2020 5:35 PM
> To: Tom Joseph ; lorenzo.pieral...@arm.com;
> r...@kernel.org; bhelg...@google.com; kis...@ti.com; linux-
> o...@vger.kernel.org; linux-...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Athani Nadeem Ladkhan ; Milind Parab
> ; Swapnil Kashinath Jakhade
> ; Parshuram Raju Thombare
> 
> Subject: [PATCH v7 0/2] PCI: cadence: Retrain Link to work around Gen2
> 
> Cadence controller will not initiate autonomous speed change if strapped as
> Gen2. The Retrain Link bit is set as quirk to enable this speed change.
> Adding a quirk flag for defective IP. In future IP revisions this will not be
> applicable.
> 
> Version history:
> Changes in v7:
> - Changing the commit title of patch 1 in this series.
> - Added a return value for function cdns_pcie_retrain().
> Changes in v6:
> - Move the position of function cdns_pcie_host_wait_for_link to remove
>   compilation error. No changes in code. Separate patch for this.
> Changes in v5:
> - Remove the compatible string based setting of quirk flag.
> - Removed additional Link Up Check
> - Removed quirk from pcie-cadence-plat.c and added in pci-j721e.c Changes
> in v4:
> - Added a quirk flag based on a new compatible string.
> - Change of api for link up: cdns_pcie_host_wait_for_link().
> Changes in v3:
> - To set retrain link bit,checking device capability & link status.
> - 32bit read in place of 8bit.
> - Minor correction in patch comment.
> - Change in variable & macro name.
> Changes in v2:
> - 16bit read in place of 8bit.
> 
> Nadeem Athani (2):
>   PCI: cadence: Shifting of a function to support new code.
>   PCI: cadence: Retrain Link to work around Gen2 training defect.
> 
>  drivers/pci/controller/cadence/pci-j721e.c |  3 +
>  drivers/pci/controller/cadence/pcie-cadence-host.c | 70 -
> -
>  drivers/pci/controller/cadence/pcie-cadence.h  | 11 +++-
>  3 files changed, 65 insertions(+), 19 deletions(-)
> 
> --
> 2.15.0



Re: [PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

2021-01-07 Thread Tom Lendacky

On 1/7/21 12:13 PM, Paolo Bonzini wrote:

On 04/01/21 21:20, Tom Lendacky wrote:

From: Tom Lendacky 

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
   Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
   support. It is up to the guest to transfer control of the AP to the
   proper location.

Subsequent AP boot:
   KVM will expect to receive an AP Reset Hold exit event indicating that
   the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
   awaken it. When the AP Reset Hold exit event is received, KVM will place
   the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
   sequence, KVM will make the vCPU runnable. It is again up to the guest
   to then transfer control of the AP to the proper location.

   To differentiate between an actual HLT and an AP Reset Hold, a new MP
   state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
   placed in upon receiving the AP Reset Hold exit event. Additionally, to
   communicate the AP Reset Hold exit event up to userspace (if needed), a
   new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky 


Queued, thanks.


Thanks, Paolo!

Tom



Paolo



Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls

2021-01-07 Thread Greg KH
On Mon, Jan 04, 2021 at 08:58:23PM -0600, Mike Ximing Chen wrote:
> Introduce the dlb device ioctl layer and the first three ioctls: query
> device version, query available resources, and create a scheduling domain.
> Also introduce the user-space interface file dlb_user.h.
> 
> The device version query is designed to allow each DLB device version/type
> to have its own unique ioctl API through the /dev/dlb%d node. Each such API
> would share in common the device version command as its first command, and
> all subsequent commands can be unique to the particular device.
> 
> The hardware operation for scheduling domain creation will be added in a
> subsequent commit.
> 
> Signed-off-by: Gage Eads 
> Signed-off-by: Mike Ximing Chen 
> Reviewed-by: Magnus Karlsson 
> Reviewed-by: Dan Williams 
> ---
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  drivers/misc/dlb/Makefile |   2 +-
>  drivers/misc/dlb/dlb_bitmap.h |  32 
>  drivers/misc/dlb/dlb_ioctl.c  | 119 +
>  drivers/misc/dlb/dlb_ioctl.h  |  11 ++
>  drivers/misc/dlb/dlb_main.c   |   3 +
>  drivers/misc/dlb/dlb_main.h   |   7 +
>  drivers/misc/dlb/dlb_pf_ops.c |  21 +++
>  drivers/misc/dlb/dlb_resource.c   |  63 +++
>  drivers/misc/dlb/dlb_resource.h   |   5 +
>  include/uapi/linux/dlb.h  | 166 ++
>  11 files changed, 429 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.c
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.h
>  create mode 100644 include/uapi/linux/dlb.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 55a2d9b2ce33..afca043d59f8 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -241,6 +241,7 @@ Code  Seq#Include File
>Comments
>  'h'   00-7F  
> conflict! Charon filesystem
>   
> 
>  'h'   00-1F  linux/hpet.h
> conflict!
> +'h'   00-1F  uapi/linux/dlb.h
> conflict!
>  'h'   80-8F  fs/hfsplus/ioctl.c
>  'i'   00-3F  linux/i2o-dev.h 
> conflict!
>  'i'   0B-1F  linux/ipmi.h
> conflict!
> diff --git a/drivers/misc/dlb/Makefile b/drivers/misc/dlb/Makefile
> index 8a49ea5fd752..aaafb3086d8d 100644
> --- a/drivers/misc/dlb/Makefile
> +++ b/drivers/misc/dlb/Makefile
> @@ -7,4 +7,4 @@
>  obj-$(CONFIG_INTEL_DLB) := dlb.o
>  
>  dlb-objs := dlb_main.o
> -dlb-objs += dlb_pf_ops.o dlb_resource.o
> +dlb-objs += dlb_pf_ops.o dlb_resource.o dlb_ioctl.o
> diff --git a/drivers/misc/dlb/dlb_bitmap.h b/drivers/misc/dlb/dlb_bitmap.h
> index fb3ef52a306d..3ea78b42c79f 100644
> --- a/drivers/misc/dlb/dlb_bitmap.h
> +++ b/drivers/misc/dlb/dlb_bitmap.h
> @@ -73,4 +73,36 @@ static inline void dlb_bitmap_free(struct dlb_bitmap 
> *bitmap)
>   kfree(bitmap);
>  }
>  
> +/**
> + * dlb_bitmap_longest_set_range() - returns longest contiguous range of set
> + *bits
> + * @bitmap: pointer to dlb_bitmap structure.
> + *
> + * Return:
> + * Returns the bitmap's longest contiguous range of set bits upon success,
> + * <0 otherwise.
> + *
> + * Errors:
> + * EINVAL - bitmap is NULL or is uninitialized.
> + */
> +static inline int dlb_bitmap_longest_set_range(struct dlb_bitmap *bitmap)
> +{
> + int max_len, len;
> + int start, end;
> +
> + if (!bitmap || !bitmap->map)
> + return -EINVAL;
> +
> + if (bitmap_weight(bitmap->map, bitmap->len) == 0)
> + return 0;
> +
> + max_len = 0;
> + bitmap_for_each_set_region(bitmap->map, start, end, 0, bitmap->len) {
> + len = end - start;
> + if (max_len < len)
> + max_len = len;
> + }
> + return max_len;
> +}
> +
>  #endif /*  __DLB_OSDEP_BITMAP_H */
> diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c
> new file mode 100644
> index ..c072ed9b921c
> --- /dev/null
> +++ b/drivers/misc/dlb/dlb_ioctl.c
> @@ -0,0 +1,119 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(C) 2016-2020 Intel Corporation. All rights reserved. */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "dlb_ioctl.h"
> +#include "dlb_main.h"
> +
> +/* [7:0]: device revision, [15:8]: device version */
> +#define DLB_SET_DEVICE_VERSION(ver, rev) (((ver) << 8) | (rev))
> +
> +static int
> +dlb_ioctl_get_device_version(struct dlb *dlb __attribute__((unused)),
> +  void *karg)
> +{
> + struct 

Re: [PATCH] Input: cros_ec_keyb: Add support for a front proximity switch

2021-01-07 Thread Dmitry Torokhov
On Thu, Jan 07, 2021 at 06:57:10AM -0800, Doug Anderson wrote:
> Hi,
> 
> On Wed, Jan 6, 2021 at 6:22 PM Dmitry Torokhov
>  wrote:
> >
> > Hi Doug, Stephen,
> >
> > On Wed, Jan 06, 2021 at 05:16:10PM -0800, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Dec 4, 2020 at 4:48 PM Stephen Boyd  wrote:
> > > >
> > > > Some cros ECs support a front proximity MKBP event via
> > > > 'EC_MKBP_FRONT_PROXIMITY'. Map this to the 'SW_FRONT_PROXIMITY' input
> > > > event code so it can be reported up to userspace.
> > > >
> > > > Cc: Dmitry Torokhov 
> > > > Cc: Benson Leung 
> > > > Cc: Guenter Roeck 
> > > > Signed-off-by: Stephen Boyd 
> > > > ---
> > > >  drivers/input/keyboard/cros_ec_keyb.c  | 5 +
> > > >  include/linux/platform_data/cros_ec_commands.h | 1 +
> > > >  2 files changed, 6 insertions(+)
> > >
> > > This seems really straightforward.
> > >
> > > Reviewed-by: Douglas Anderson 
> > >
> > > Given that it touches a header file owned by the Chrome OS maintainers
> > > and a driver owned by input, how should it land?  One maintainer Acks
> > > and the other lands?
> >
> > Sorry about missing this one, however the "front proximity" switch has
> > been introduced for the benefit of phone devices, to be emitted when a
> > device is raised to user's ear, and I do not think we should be using
> > this here.
> >
> > We have just recently had similar discussion with regard to palm- and
> > lap-mode sensors and whether they should be reported over input or IIO
> > as true proximity sensors:
> >
> > https://lore.kernel.org/linux-iio/9f9b0ff6-3bf1-63c4-eb36-901cecd7c...@redhat.com/
> >
> > Based on what we are doing for other Chrome OS devices that expose
> > proximity sensors (for example trogdor) we have decided that we all
> > should be using IIO as it will allow not only on/off, but true proximity
> > reporting with potential of implementing smarter policies by userspace.
> >
> > Because of that we should do the same here and export this as IIO
> > proximity sensor as well.
> 
> For devices with a true proximity sensor that's exactly what we're
> doing.  I've only been involved in the periphery of the discussion,
> but as I understand it there are some models of laptop for which we
> don't have a true proximity sensor.  On these devices, the EC is in
> charge of deciding about proximity based on a number of factors.

Yes, I understand that on some devices the proximity sensors are not
true sensors but rather on/off signals, potentially derived from a
multitude of sources. However there is still a benefit in exposing them
as IIO proximity devices with limited reporting representing
[near, infinity] range/values. This will mean that userspace needs to
monitor only one set of devices (IIO) instead of both IIO and input, and
will not require constantly expanding EV_SW set to account for
ever-growing number of proximity sensors (lap, palm, general presence,
etc).

Thanks.

-- 
Dmitry


Re: [BUG] from x86: Support kmap_local() forced debugging

2021-01-07 Thread Linus Torvalds
On Wed, Jan 6, 2021 at 8:45 PM Willem de Bruijn  wrote:
>
> But there are three other kmap_atomic callers under net/ that do not
> loop at all, so assume non-compound pages. In esp_output_head,
> esp6_output_head and skb_seq_read. The first two directly use
> skb_page_frag_refill, which can allocate compound (but not
> __GFP_HIGHMEM) pages, and the third can be inserted with
> netfilter xt_string in the path of tcp transmit skbs, which can also
> have compound pages. I think that these could similarly access
> data beyond the end of the kmap_atomic mapped page. I'll take
> a closer look.

Thanks.

Note that I have flushed my random one-liner patch from my system, and
expect to get a proper fix through the normal networking pulls.

And _if_ the networking people feel that my one-liner was the proper
fix, you can use it and add my sign-off if you want to, but it really
was more of a "this is the quick ugly fix for testing" rather than
anything else.

  Linus


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 06:19:23PM +0100, Andrew Lunn wrote:
> > -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base)
> > +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len)
> >  {
> > -   if (!memcmp(base->vendor_name, "VSOL", 16))
> > -   return 1;
> > -   if (!memcmp(base->vendor_name, "OEM ", 16) &&
> > -   !memcmp(base->vendor_pn,   "V2801F  ", 16))
> > -   return 1;
> > +   size_t i, block_size = sfp->i2c_block_size;
> >  
> > -   /* Some modules can't cope with long reads */
> > -   return 16;
> > -}
> > +   /* Already using byte IO */
> > +   if (block_size == 1)
> > +   return false;
> 
> This seems counter intuitive. We don't need byte IO because we are
> doing btye IO? Can we return True here?

It is counter-intuitive, but as this is indicating whether we need to
switch to byte IO, if we're already doing byte IO, then we don't need
to switch.

> > -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base 
> > *base)
> > -{
> > -   sfp->i2c_block_size = sfp_quirk_i2c_block_size(base);
> > +   for (i = 1; i < len; i += block_size) {
> > +   if (memchr_inv(buf + i, '\0', block_size - 1))
> > +   return false;
> > +   }
> 
> Is the loop needed?

I think you're not reading the code very well. It checks for bytes at
offset 1..blocksize-1, blocksize+1..2*blocksize-1, etc are zero. It
does _not_ check that byte 0 or the byte at N*blocksize is zero - these
bytes are skipped. In other words, the first byte of each transfer can
be any value. The other bytes of the _entire_ ID must be zero.

> I also wonder if on the last iteration of the loop you go passed the
> end of buf? Don't you need a min(block_size -1, len - i) or
> similar?

The ID is 64 bytes long, and is fixed. block_size could be a non-power
of two, but that is highly unlikely. block_size will never be larger
than 16 either.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH] arm64: PCI: Enable SMC conduit

2021-01-07 Thread Jeremy Linton

Hi,

On 1/7/21 11:36 AM, Rob Herring wrote:

On Thu, Jan 7, 2021 at 9:24 AM Jeremy Linton  wrote:


Hi,


On 1/7/21 9:28 AM, Rob Herring wrote:

On Mon, Jan 4, 2021 at 9:57 PM Jeremy Linton  wrote:


Given that most arm64 platform's PCI implementations needs quirks
to deal with problematic config accesses, this is a good place to
apply a firmware abstraction. The ARM PCI SMMCCC spec details a
standard SMC conduit designed to provide a simple PCI config
accessor. This specification enhances the existing ACPI/PCI
abstraction and expects power, config, etc functionality is handled


(trimming)



+static int smccc_pcie_check_conduit(u16 seg)


check what? Based on how you use this, perhaps _has_conduit() instead.


Sure.




+{
+   struct arm_smccc_res res;
+
+   if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
+   return -EOPNOTSUPP;
+
+   arm_smccc_smc(SMCCC_PCI_VERSION, 0, 0, 0, 0, 0, 0, 0, );
+   if ((int)res.a0 < 0)
+   return -EOPNOTSUPP;
+
+   arm_smccc_smc(SMCCC_PCI_SEG_INFO, seg, 0, 0, 0, 0, 0, 0, );
+   if ((int)res.a0 < 0)
+   return -EOPNOTSUPP;


Don't you need to check that read and write functions are supported?


In theory no, the first version of the specification makes them
mandatory for all implementations. There isn't a partial access method,
so nothing works if only read or write were implemented.


Then the spec should change:

2.3.3 Caller responsibilities
The caller has the following responsibilities:
• The caller must ensure that this function is implemented before
issuing a call. This function is discoverable
by calling PCI_FEATURES with pci_func_id set to 0x8400_0132.


I guess knowing the version is ensuring, but the 2nd sentence makes it
seem that is how one should ensure.


Ok, yes i understand, I will add the check.



Related, are there any sort of tests for the interface? I generally
don't think the kernel's job is validating firmware (a frequent topic
for DT), but we should have something. Maybe an SMC unittest module?
If nothing else, seems like we should have at least one PCI_FEATURES
call to make sure it works. We don't want to add it later only to find
that it breaks on some firmware implementations. Though we can just
add firmware quirks. ;)


I'm not aware of any unit tests at the moment. My testing so far has 
been against these patches: 
https://review.trustedfirmware.org/q/topic:"Arm_PCI_Config_Space_Interface;


But given the next version does the PCI_FEATURES calls, that will 
satisfy your concern, right?


Re: [RFC 0/2] kbuild: Add support to build overlays (%.dtbo)

2021-01-07 Thread Bill Mills

Hello,

On 1/7/21 2:02 PM, Rob Herring wrote:

On Wed, Jan 6, 2021 at 10:35 PM Masahiro Yamada  wrote:


On Wed, Jan 6, 2021 at 12:21 AM Rob Herring  wrote:


On Tue, Jan 5, 2021 at 4:24 AM Viresh Kumar  wrote:


Hello,

Here is an attempt to make some changes in the kernel to allow building
of device tree overlays.

While at it, I would also like to discuss about how we should mention
the base DT blobs in the Makefiles for the overlays, so they can be
build tested to make sure the overlays apply properly.

A simple way is to mention that with -base extension, like this:

$(overlay-file)-base := platform-base.dtb

Any other preference ?




Viresh's patch is not enough.

We will need to change .gitignore
and scripts/Makefile.dtbinst as well.


In my understanding, the build rule is completely the same
between .dtb and .dtbo
As Rob mentioned, I am not sure if we really need/want
a separate extension.


A counter approach is to use an extension like '.ovl.dtb'
It clarifies it is an overlay fragment without changing
anything in our build system or the upstream DTC project.


*.dtbo is already a well established defato standard.  I see little 
value in changing it and doing so will likely just bifurcate common usage.




We use chained extension in some places, for example,
.dt.yaml for schema yaml files.



dtb-$(CONFIG_ARCH_FOO) += \
 foo-board.dtb \
 foo-overlay1.ovl.dtb \
 foo-overlay2.ovl.dtb


Overlay DT source file names must end with '.ovl.dts'


I like that suggestion as then it's also clear looking at the source
files which ones are overlays. Or we'd need .dtso to be consistent.



I don't think there is much of a problem renaming the source side.
Don't know if that helps if the output is still dtbo.

"Be consistent on dtso" sounds good to me.
Can it be enforced via build time checks?




I think we'll want something similar to how '-objs' works for modules:

foo-board-1-dtbs := foo-board.dtb foo-overlay1.dtbo
foo-board-2-dtbs := foo-board.dtb foo-overlay2.dtbo
foo-board-1-2-dtbs := foo-board.dtb foo-overlay1.dtbo foo-overlay2.dtbo
dtbs-y += foo-board-1.dtb foo-board-2.dtb foo-board-1-2.dtb

(One difference here is we will want all the intermediate targets
unlike .o files.)

You wouldn't necessarily have all the above combinations, but you have
to allow for them. I'm not sure how we'd handle applying any common
overlays where the base and overlay are in different directories.



I guess the motivation for supporting -dtbs is to
add per-board -@ option only when it contains *.dtbo pattern.


I hadn't thought that far, but yeah, that would be good. Really, I
just want it to be controlled per SoC family at least.


But, as you notice, if the overlay files are located
under drivers/, it is difficult to add -@ per board.


Generally, they shouldn't be. The exceptions are what we already have
there which are old dt fixups and unittests.

We want the stripped DT repo (devicetree-rebasing) to have all this
and drivers/ is stripped out. (Which reminds me, the DT repo will need
some work to support all this. It's a different build sys.)


Another scenario is, some people may want to compile
downstream overlay files (i.e. similar concept as external modules),
then we have no idea which base board should be given with the -@ flag.


I'd rather be tempted to add it globally


ifdef CONFIG_OF_OVERLAY
DTC_FLAGS += -@
endif


We've already rejected doing that. Turning on '-@' can grow the dtb
size by a significant amount which could be problematic for some
boards >

Another thing here is adding all the above is not really going to
scale on arm32 where we have a single dts directory. We need to move
things to per vendor/soc family directories. I have the script to do
this. We just need to agree on the vendor names and get Arnd/Olof to
run it. I also want that so we can enable schema checks by default
once a vendor is warning free (the whole tree is going to take
forever).



If this is a big churn, perhaps we could make it extreme
to decouple DT and Linux-arch.


I would be fine with that, but I don't think we'll get agreement
there. With that amount of change, we'll be discussing git submodule
again.

Rereading the thread on vendor directories[1], we may just move boards
one vendor at a time. We could just make that a prerequisite for
vendor supporting overlays.


arch/*/boot/dts/*.dts
  ->  dts//*.dts

Documentation/devicetree/bindings
  -> dts/Bindings/

include/dt-bindings/
  -> dts/include/dt-bindings/



Then, other project can take dts/
to reuse for them.


This is already possible with devicetree-rebasing.git. Though it is
still by arch.

Rob

[1] https://lore.kernel.org/linux-devicetree/20181204183649.GA5716@bogus/



Thanks for digging up this script!

Bill


Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls

2021-01-07 Thread Greg KH
On Mon, Jan 04, 2021 at 08:58:23PM -0600, Mike Ximing Chen wrote:
> Introduce the dlb device ioctl layer and the first three ioctls: query
> device version, query available resources, and create a scheduling domain.
> Also introduce the user-space interface file dlb_user.h.
> 
> The device version query is designed to allow each DLB device version/type
> to have its own unique ioctl API through the /dev/dlb%d node. Each such API
> would share in common the device version command as its first command, and
> all subsequent commands can be unique to the particular device.
> 
> The hardware operation for scheduling domain creation will be added in a
> subsequent commit.
> 
> Signed-off-by: Gage Eads 
> Signed-off-by: Mike Ximing Chen 
> Reviewed-by: Magnus Karlsson 
> Reviewed-by: Dan Williams 
> ---
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  drivers/misc/dlb/Makefile |   2 +-
>  drivers/misc/dlb/dlb_bitmap.h |  32 
>  drivers/misc/dlb/dlb_ioctl.c  | 119 +
>  drivers/misc/dlb/dlb_ioctl.h  |  11 ++
>  drivers/misc/dlb/dlb_main.c   |   3 +
>  drivers/misc/dlb/dlb_main.h   |   7 +
>  drivers/misc/dlb/dlb_pf_ops.c |  21 +++
>  drivers/misc/dlb/dlb_resource.c   |  63 +++
>  drivers/misc/dlb/dlb_resource.h   |   5 +
>  include/uapi/linux/dlb.h  | 166 ++
>  11 files changed, 429 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.c
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.h
>  create mode 100644 include/uapi/linux/dlb.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 55a2d9b2ce33..afca043d59f8 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -241,6 +241,7 @@ Code  Seq#Include File
>Comments
>  'h'   00-7F  
> conflict! Charon filesystem
>   
> 
>  'h'   00-1F  linux/hpet.h
> conflict!
> +'h'   00-1F  uapi/linux/dlb.h
> conflict!
>  'h'   80-8F  fs/hfsplus/ioctl.c
>  'i'   00-3F  linux/i2o-dev.h 
> conflict!
>  'i'   0B-1F  linux/ipmi.h
> conflict!
> diff --git a/drivers/misc/dlb/Makefile b/drivers/misc/dlb/Makefile
> index 8a49ea5fd752..aaafb3086d8d 100644
> --- a/drivers/misc/dlb/Makefile
> +++ b/drivers/misc/dlb/Makefile
> @@ -7,4 +7,4 @@
>  obj-$(CONFIG_INTEL_DLB) := dlb.o
>  
>  dlb-objs := dlb_main.o
> -dlb-objs += dlb_pf_ops.o dlb_resource.o
> +dlb-objs += dlb_pf_ops.o dlb_resource.o dlb_ioctl.o
> diff --git a/drivers/misc/dlb/dlb_bitmap.h b/drivers/misc/dlb/dlb_bitmap.h
> index fb3ef52a306d..3ea78b42c79f 100644
> --- a/drivers/misc/dlb/dlb_bitmap.h
> +++ b/drivers/misc/dlb/dlb_bitmap.h
> @@ -73,4 +73,36 @@ static inline void dlb_bitmap_free(struct dlb_bitmap 
> *bitmap)
>   kfree(bitmap);
>  }
>  
> +/**
> + * dlb_bitmap_longest_set_range() - returns longest contiguous range of set
> + *bits
> + * @bitmap: pointer to dlb_bitmap structure.
> + *
> + * Return:
> + * Returns the bitmap's longest contiguous range of set bits upon success,
> + * <0 otherwise.
> + *
> + * Errors:
> + * EINVAL - bitmap is NULL or is uninitialized.
> + */
> +static inline int dlb_bitmap_longest_set_range(struct dlb_bitmap *bitmap)
> +{
> + int max_len, len;
> + int start, end;
> +
> + if (!bitmap || !bitmap->map)
> + return -EINVAL;
> +
> + if (bitmap_weight(bitmap->map, bitmap->len) == 0)
> + return 0;
> +
> + max_len = 0;
> + bitmap_for_each_set_region(bitmap->map, start, end, 0, bitmap->len) {
> + len = end - start;
> + if (max_len < len)
> + max_len = len;
> + }
> + return max_len;
> +}
> +
>  #endif /*  __DLB_OSDEP_BITMAP_H */
> diff --git a/drivers/misc/dlb/dlb_ioctl.c b/drivers/misc/dlb/dlb_ioctl.c
> new file mode 100644
> index ..c072ed9b921c
> --- /dev/null
> +++ b/drivers/misc/dlb/dlb_ioctl.c
> @@ -0,0 +1,119 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(C) 2016-2020 Intel Corporation. All rights reserved. */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "dlb_ioctl.h"
> +#include "dlb_main.h"
> +
> +/* [7:0]: device revision, [15:8]: device version */
> +#define DLB_SET_DEVICE_VERSION(ver, rev) (((ver) << 8) | (rev))
> +
> +static int
> +dlb_ioctl_get_device_version(struct dlb *dlb __attribute__((unused)),

We don't use __attribute__((unused)) for function variables in Linux.

Re: [PATCH v8 04/20] dlb: add device ioctl layer and first three ioctls

2021-01-07 Thread Greg KH
On Mon, Jan 04, 2021 at 08:58:23PM -0600, Mike Ximing Chen wrote:
> Introduce the dlb device ioctl layer and the first three ioctls: query
> device version, query available resources, and create a scheduling domain.
> Also introduce the user-space interface file dlb_user.h.
> 
> The device version query is designed to allow each DLB device version/type
> to have its own unique ioctl API through the /dev/dlb%d node. Each such API
> would share in common the device version command as its first command, and
> all subsequent commands can be unique to the particular device.
> 
> The hardware operation for scheduling domain creation will be added in a
> subsequent commit.
> 
> Signed-off-by: Gage Eads 
> Signed-off-by: Mike Ximing Chen 
> Reviewed-by: Magnus Karlsson 
> Reviewed-by: Dan Williams 
> ---
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  drivers/misc/dlb/Makefile |   2 +-
>  drivers/misc/dlb/dlb_bitmap.h |  32 
>  drivers/misc/dlb/dlb_ioctl.c  | 119 +
>  drivers/misc/dlb/dlb_ioctl.h  |  11 ++
>  drivers/misc/dlb/dlb_main.c   |   3 +
>  drivers/misc/dlb/dlb_main.h   |   7 +
>  drivers/misc/dlb/dlb_pf_ops.c |  21 +++
>  drivers/misc/dlb/dlb_resource.c   |  63 +++
>  drivers/misc/dlb/dlb_resource.h   |   5 +
>  include/uapi/linux/dlb.h  | 166 ++
>  11 files changed, 429 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.c
>  create mode 100644 drivers/misc/dlb/dlb_ioctl.h
>  create mode 100644 include/uapi/linux/dlb.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 55a2d9b2ce33..afca043d59f8 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -241,6 +241,7 @@ Code  Seq#Include File
>Comments
>  'h'   00-7F  
> conflict! Charon filesystem
>   
> 
>  'h'   00-1F  linux/hpet.h
> conflict!
> +'h'   00-1F  uapi/linux/dlb.h
> conflict!

Why are you taking a range that you know there is a conflict for?



Re: [PATCH v8 03/20] dlb: add resource and device initialization

2021-01-07 Thread Greg KH
On Mon, Jan 04, 2021 at 08:58:22PM -0600, Mike Ximing Chen wrote:
> Introduce dlb_bitmap_* functions, a thin convenience layer wrapping the
> Linux bitmap interfaces, used by the bitmaps in the dlb hardware types.

No, you created custom #defines:

> --- a/drivers/misc/dlb/dlb_hw_types.h
> +++ b/drivers/misc/dlb/dlb_hw_types.h
> @@ -5,6 +5,15 @@
>  #define __DLB_HW_TYPES_H
>  
>  #include 
> +#include 
> +
> +#include "dlb_bitmap.h"
> +
> +#define BITS_SET(x, val, mask)   (x = ((x) & ~(mask)) \
> +  | (((val) << (mask##_LOC)) & (mask)))
> +#define BITS_CLR(x, mask)((x) &= ~(mask))
> +#define BIT_SET(x, mask)((x) |= (mask))
> +#define BITS_GET(x, mask)   (((x) & (mask)) >> (mask##_LOC))

We have functions for this, use them, don't create custom macros for
them.  Use the Linux functions please.

thanks,

greg k-h


Re: [PATCH v2 00/48] Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs

2021-01-07 Thread Dmitry Osipenko
05.01.2021 20:11, Krzysztof Kozlowski пишет:
> On Thu, Dec 17, 2020 at 09:05:50PM +0300, Dmitry Osipenko wrote:
>> Introduce core voltage scaling for NVIDIA Tegra20/30 SoCs, which reduces
>> power consumption and heating of the Tegra chips. Tegra SoC has multiple
>> hardware units which belong to a core power domain of the SoC and share
>> the core voltage. The voltage must be selected in accordance to a minimum
>> requirement of every core hardware unit.
>>
>> The minimum core voltage requirement depends on:
>>
>>   1. Clock enable state of a hardware unit.
>>   2. Clock frequency.
>>   3. Unit's internal idling/active state.
>>
>> This series is tested on Acer A500 (T20), AC100 (T20), Nexus 7 (T30),
>> Ouya (T30), TK1 (T124) and some others. I also added voltage scaling to
>> the Ventana (T20) and Cardhu (T30) boards which are tested by NVIDIA's CI
>> farm. Tegra30 is now couple degrees cooler on Nexus 7 and stays cool on
>> Ouya (instead of becoming burning hot) while system is idling. It should
>> be possible to improve this further by implementing a more advanced power
>> management features for the kernel drivers.
>>
>> The DVFS support is opt-in for all boards, meaning that older DTBs will
>> continue to work like they did it before this series. It should be possible
>> to easily add the core voltage scaling support for Tegra114+ SoCs based on
>> this grounding work later on, if anyone will want to implement it.
> 
> The same comment as for your interconnect work: for sets touching
> multiple systems please mention the dependencies between patches in the
> cover letter. Not as a reply to such remark like I make here, but as a
> separate entry in the cover letter.

I'll describe all the dependencies in the next revision, thanks.


Re: [PATCH 1/1] Revert "init/console: Use ttynull as a fallback when there is no console"

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 11:15 AM Greg Kroah-Hartman
 wrote:
>
> Linus, can you take this directly, or is this going through some other
> tree?

I was _assuming_ that I'd get it through the normal printk pull
request, it doesn't seem to be that timing-critical.

But if there is nothing else pending, I can certainly take it directly
as that patch too.

Petr?

Linus


Re: [PATCH v8 01/20] dlb: add skeleton for DLB driver

2021-01-07 Thread Greg KH
On Mon, Jan 04, 2021 at 08:58:20PM -0600, Mike Ximing Chen wrote:
> +static int dlb_probe(struct pci_dev *pdev,
> +  const struct pci_device_id *pdev_id)
> +{
> + struct dlb *dlb;
> + int ret;
> +
> + dlb = devm_kzalloc(>dev, sizeof(*dlb), GFP_KERNEL);
> + if (!dlb)
> + return -ENOMEM;
> +
> + pci_set_drvdata(pdev, dlb);
> +
> + dlb->pdev = pdev;
> +
> + spin_lock(_ids_lock);
> + dlb->id = idr_alloc(_ids,
> + (void *)dlb,
> + 0,
> + DLB_MAX_NUM_DEVICES - 1,
> + GFP_KERNEL);
> + spin_unlock(_ids_lock);
> +
> + if (dlb->id < 0) {
> + dev_err(>dev, "probe: device ID allocation failed\n");
> +
> + ret = dlb->id;
> + goto alloc_id_fail;
> + }
> +
> + ret = pcim_enable_device(pdev);
> + if (ret != 0) {
> + dev_err(>dev, "pcim_enable_device() returned %d\n", ret);
> +
> + goto pci_enable_device_fail;
> + }
> +
> + ret = pcim_iomap_regions(pdev,
> +  (1U << DLB_CSR_BAR) | (1U << DLB_FUNC_BAR),
> +  "dlb");
> + if (ret != 0) {
> + dev_err(>dev,
> + "pcim_iomap_regions(): returned %d\n", ret);
> +
> + goto pci_enable_device_fail;
> + }
> +
> + pci_set_master(pdev);
> +
> + if (pci_enable_pcie_error_reporting(pdev))
> + dev_info(>dev, "[%s()] Failed to enable AER\n", __func__);

Shouldn't that be dev_err() and you fail here?

And no need for __func__ please, the driver name and device is listed,
that's all that is necessary.

thanks,

greg k-h


Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 11:04 AM Al Viro  wrote:
>
> BTW, changing 'event' field in place from another thread is going to
> be interesting - you have two 16bit values next to each other and
> two CPUs modifying those with no exclusion.  Sounds like a recipe
> for massive trouble...

It's perfectly fine on just about anything else than on an original
pre-ev5 alpha.

The C standard even - finally - made it a requirement that accesses to
different members can't introduce data races.

So I agree with you that it's a bit annoying, and it's likely not even
very common, but I could easily imagine myself writing code that
changes either 'fd' or 'events' in a threaded server.

That's pretty much the whole point of 'poll()' after all - threaded
servers that have that convenient array of pollable file descriptors.

Maybe the pollfd array in many cases ends up being thread-local,
possibly generated from some other data structure each time. But if it
is some performance-critical thing (and I can't imagine a lot of more
performance-critical things than the core poll() loop), I can very
easily imagine it being re-used in between poll() calls, and people
modifying it from signal handlers and other threads as the set of
pollable file descriptors change due to new connections etc.

But I'll be honest - I didn't try to actually find such code, and I
suspect 99% of all cases would be happy with your "copy everything".

In fact, even some threaded app that does what I suspect it could do
would likely be ok with it 99% of the time. Because the situation
where you change the fd in the poll array is likely not the common
case, and even if some -1 file descriptor gets overwritten by a valid
one by the poll() system call again, it probably ends up being very
hard to see a failure.

Which just makes me even more nervous.

But I'm sure that yes, on platforms like s390, that "only write 16
bits out of every 64 bits" loop is probably pretty painful.

On most normal architectures it's probably a wash. I doubt it is
measurable on x86, for example.

   Linus


Re: [PATCH] of: property: Add device link support for interrupts

2021-01-07 Thread Rob Herring
On Thu, Jan 7, 2021 at 12:09 PM Saravana Kannan  wrote:
>
> On Thu, Jan 7, 2021 at 10:39 AM Rob Herring  wrote:
> >
> > On Wed, Jan 6, 2021 at 11:53 AM Saravana Kannan  
> > wrote:
> > >
> > > On Sat, Jan 2, 2021 at 3:37 AM Marc Zyngier  wrote:
> > > >
> > > > On Thu, 31 Dec 2020 21:12:40 +,
> > > > Rob Herring  wrote:
> > > > >
> > > > > On Mon, Dec 21, 2020 at 09:30:45AM +, Marc Zyngier wrote:
> > > > > > On 2020-12-18 21:07, Saravana Kannan wrote:
> > > > > > > Add support for creating device links out of interrupts property.
> > > > > > >
> > > > > > > Cc: Marc Zyngier 
> > > > > > > Cc: Kevin Hilman 
> > > > > > > Signed-off-by: Saravana Kannan 
> > > > > > > ---
> > > > > > > Rob/Greg,
> > > > > > >
> > > > > > > This might need to go into driver-core to avoid conflict
> > > > > > > due to fw_devlink refactor series that merged there.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Saravana
> > > > > > >
> > > > > > >
> > > > > > >  drivers/of/property.c | 17 +
> > > > > > >  1 file changed, 17 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > > index 5f9eed79a8aa..e56a5eae0a0b 100644
> > > > > > > --- a/drivers/of/property.c
> > > > > > > +++ b/drivers/of/property.c
> > > > > > > @@ -1271,6 +1271,22 @@ static struct device_node
> > > > > > > *parse_iommu_maps(struct device_node *np,
> > > > > > >   return of_parse_phandle(np, prop_name, (index * 4) + 1);
> > > > > > >  }
> > > > > > >
> > > > > > > +static struct device_node *parse_interrupts(struct device_node 
> > > > > > > *np,
> > > > > > > + const char *prop_name, int 
> > > > > > > index)
> > > > > > > +{
> > > > > > > + struct device_node *sup;
> > > > > > > +
> > > > > > > + if (strcmp(prop_name, "interrupts") || index)
> > > > > > > + return NULL;
> > > > > > > +
> > > > > > > + of_node_get(np);
> > > > > > > + while (np && !(sup = of_parse_phandle(np, "interrupt-parent", 
> > > > > > > 0)))
> > > > > > > + np = of_get_next_parent(np);
> > > > > > > + of_node_put(np);
> > > > > > > +
> > > > > > > + return sup;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static const struct supplier_bindings of_supplier_bindings[] = {
> > > > > > >   { .parse_prop = parse_clocks, },
> > > > > > >   { .parse_prop = parse_interconnects, },
> > > > > > > @@ -1296,6 +1312,7 @@ static const struct supplier_bindings
> > > > > > > of_supplier_bindings[] = {
> > > > > > >   { .parse_prop = parse_pinctrl6, },
> > > > > > >   { .parse_prop = parse_pinctrl7, },
> > > > > > >   { .parse_prop = parse_pinctrl8, },
> > > > > > > + { .parse_prop = parse_interrupts, },
> > > > > > >   { .parse_prop = parse_regulators, },
> > > > > > >   { .parse_prop = parse_gpio, },
> > > > > > >   { .parse_prop = parse_gpios, },
> > > > > >
> > > > > > You don't really describe what this is for so I'm only guessing
> > > > > > from the context. If you want to follow the interrupt hierarchy,
> > > > > > "interrupt-parent" isn't enough. You also need to track
> > > > > > things like interrupt-map, or anything that carries a phandle
> > > > > > to an interrupt controller.
> > > > >
> > > > > We don't need to follow the hierarchy, we just need the immediate
> > > > > dependencies.
> > > >
> > > > Indeed. I also wonder why this isn't just a irq_find_parent() call, TBH.
> > >
> > > Thanks Rob for explaining it.
> > >
> > > Marc, I wasn't sure if Rob would be okay with including of_irq.h here.
> > > Also, I'm trying to keep of/property.c independent of the framework
> > > code for now. The long term goal is to see if I can move out most of
> > > this into the frameworks. But I want to do that after I sort of some
> > > of the larger problems (like getting fw_devlink=on to work on all
> > > devices  first). Let me know if you have a strong preference for right
> > > now, if not, I'd rather keep property.c independent for now.
> > >
> > > I wasn't aware of interrupt-map until a few weeks ago and didn't know
> > > it carried phandles. I can add support for that too. There's no reason
> > > for all of them to go in one patch though.
> > >
> > > >
> > > > > But you are right that 'interrupt-map' also needs to be tracked.
> > > >
> > > > And 'interrupts-extended', while we're at it.
> > >
> > > This is already handled.
> > >
> > > > >
> > > > > I also noticed that we define 'interrupt-parent' as a dependency to
> > > > > parse, but that's wrong. The dependency is where 'interrupts' appears
> > > > > and where 'interrupt-parent' appears is irrelevant.
> > >
> > > No, the interrupt-parent parsing is correct and it's needed for
> > > interrupt controllers to probe in the right order. But
> > > interrupt-parent is also needs to be looked at for parsing
> > > "interrupts".
> >
> > If you parse 'interrupts' for interrupt controllers (which in turn
> > will use 'interrupt-parent'), then you aren't going to need to track
> > 'interrupt-parent' by itself.
>
> Do all 

Re: [PATCH] usb: dwc3: core: Replace devm_reset_control_array_get()

2021-01-07 Thread Dmitry Osipenko
03.11.2020 06:57, Yejune Deng пишет:
> devm_reset_control_array_get_optional_shared() looks more readable
> 
> Signed-off-by: Yejune Deng 
> ---
>  drivers/usb/dwc3/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 841daec..b87acf0 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1490,7 +1490,7 @@ static int dwc3_probe(struct platform_device *pdev)
>  
>   dwc3_get_properties(dwc);
>  
> - dwc->reset = devm_reset_control_array_get(dev, true, true);
> + dwc->reset = devm_reset_control_array_get_optional_shared(dev);
>   if (IS_ERR(dwc->reset))
>   return PTR_ERR(dwc->reset);
>  
> 

Reviewed-by: Dmitry Osipenko 


Re: [PATCH] usb: dwc3: Replace of_reset_control_array_get()

2021-01-07 Thread Dmitry Osipenko
03.11.2020 12:25, Yejune Deng пишет:
> of_reset_control_array_get_optional_exclusive() looks more readable
> 
> Signed-off-by: Yejune Deng 
> ---
>  drivers/usb/dwc3/dwc3-of-simple.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/dwc3-of-simple.c 
> b/drivers/usb/dwc3/dwc3-of-simple.c
> index e62ecd2..e358ff0 100644
> --- a/drivers/usb/dwc3/dwc3-of-simple.c
> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
> @@ -52,8 +52,7 @@ static int dwc3_of_simple_probe(struct platform_device 
> *pdev)
>   if (of_device_is_compatible(np, "rockchip,rk3399-dwc3"))
>   simple->need_reset = true;
>  
> - simple->resets = of_reset_control_array_get(np, false, true,
> - true);
> + simple->resets = of_reset_control_array_get_optional_exclusive(np);
>   if (IS_ERR(simple->resets)) {
>   ret = PTR_ERR(simple->resets);
>   dev_err(dev, "failed to get device resets, err=%d\n", ret);
> 

Reviewed-by: Dmitry Osipenko 


[PATCH v2] module: harden ELF info handling

2021-01-07 Thread Frank van der Linden
5fdc7db644 ("module: setup load info before module_sig_check()")
moved the ELF setup, so that it was done before the signature
check. This made the module name available to signature error
messages.

However, the checks for ELF correctness in setup_load_info
are not sufficient to prevent bad memory references due to
corrupted offset fields, indices, etc.

So, there's a regression in behavior here: a corrupt and unsigned
(or badly signed) module, which might previously have been rejected
immediately, can now cause an oops/crash.

Harden ELF handling for module loading by doing the following:

- Move the signature check back up so that it comes before ELF
  initialization. It's best to do the signature check to see
  if we can trust the module, before using the ELF structures
  inside it. This also makes checks against info->len
  more accurate again, as this field will be reduced by the
  length of the signature in mod_check_sig().

  The module name is now once again not available for error
  messages during the signature check, but that seems like
  a fair tradeoff.

- Check if sections have offset / size fields that at least don't
  exceed the length of the module.

- Check if sections have section name offsets that don't fall
  outside the section name table.

- Add a few other sanity checks against invalid section indices,
  etc.

This is not an exhaustive consistency check, but the idea is to
at least get through the signature and blacklist checks without
crashing because of corrupted ELF info, and to error out gracefully
for most issues that would have caused problems later on.

Fixes: 5fdc7db644 ("module: setup load info before module_sig_check()")
Signed-off-by: Frank van der Linden 
---
 kernel/module.c   | 143 --
 kernel/module_signature.c |   2 +-
 kernel/module_signing.c   |   2 +-
 3 files changed, 126 insertions(+), 21 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4bf30e4b3eaa..34fc6c85eb65 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2964,7 +2964,7 @@ static int module_sig_check(struct load_info *info, int 
flags)
}
 
if (is_module_sig_enforced()) {
-   pr_notice("%s: loading of %s is rejected\n", info->name, 
reason);
+   pr_notice("Loading of %s is rejected\n", reason);
return -EKEYREJECTED;
}
 
@@ -2977,9 +2977,33 @@ static int module_sig_check(struct load_info *info, int 
flags)
 }
 #endif /* !CONFIG_MODULE_SIG */
 
-/* Sanity checks against invalid binaries, wrong arch, weird elf version. */
-static int elf_header_check(struct load_info *info)
+static int validate_section_offset(struct load_info *info, Elf_Shdr *shdr)
 {
+   unsigned long secend;
+
+   /*
+* Check for both overflow and offset/size being
+* too large.
+*/
+   secend = shdr->sh_offset + shdr->sh_size;
+   if (secend < shdr->sh_offset || secend > info->len)
+   return -ENOEXEC;
+
+   return 0;
+}
+
+/*
+ * Sanity checks against invalid binaries, wrong arch, weird elf version.
+ *
+ * Also do basic validity checks against section offsets and sizes, the
+ * section name string table, and the indices used for it (sh_name).
+ */
+static int elf_validity_check(struct load_info *info)
+{
+   unsigned int i;
+   Elf_Shdr *shdr, *strhdr;
+   int err;
+
if (info->len < sizeof(*(info->hdr)))
return -ENOEXEC;
 
@@ -2989,11 +3013,78 @@ static int elf_header_check(struct load_info *info)
|| info->hdr->e_shentsize != sizeof(Elf_Shdr))
return -ENOEXEC;
 
+   /*
+* e_shnum is 16 bits, and sizeof(Elf_Shdr) is
+* known and small. So e_shnum * sizeof(Elf_Shdr)
+* will not overflow unsigned long on any platform.
+*/
if (info->hdr->e_shoff >= info->len
|| (info->hdr->e_shnum * sizeof(Elf_Shdr) >
info->len - info->hdr->e_shoff))
return -ENOEXEC;
 
+   info->sechdrs = (void *)info->hdr + info->hdr->e_shoff;
+
+   /*
+* Verify if the section name table index is valid.
+*/
+   if (info->hdr->e_shstrndx == SHN_UNDEF
+   || info->hdr->e_shstrndx >= info->hdr->e_shnum)
+   return -ENOEXEC;
+
+   strhdr = >sechdrs[info->hdr->e_shstrndx];
+   err = validate_section_offset(info, strhdr);
+   if (err < 0)
+   return err;
+
+   /*
+* The section name table must be NUL-terminated, as required
+* by the spec. This makes strcmp and pr_* calls that access
+* strings in the section safe.
+*/
+   info->secstrings = (void *)info->hdr + strhdr->sh_offset;
+   if (info->secstrings[strhdr->sh_size - 1] != '\0')
+   return -ENOEXEC;
+
+   /*
+* The code assumes that section 0 has a length of zero and
+* an addr of zero, so check for it.
+*/
+   if 

Re: [PATCH v3 3/5] RISC-V: Initial DTS for Microchip ICICLE board

2021-01-07 Thread Atish Patra
On Thu, Jan 7, 2021 at 3:44 AM  wrote:
>
> Hi Atish,
>
> On 12/4/20 8:58 AM, Atish Patra wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > Add initial DTS for Microchip ICICLE board having only
> > essential devices (clocks, sdhci, ethernet, serial, etc).
> > The device tree is based on the U-Boot patch.
> >
> > https://patchwork.ozlabs.org/project/uboot/patch/20201110103414.10142-6-padmarao.beg...@microchip.com/
> >
> > Signed-off-by: Atish Patra 
> > ---
> >   arch/riscv/boot/dts/Makefile  |   1 +
> >   arch/riscv/boot/dts/microchip/Makefile|   2 +
> >   .../microchip/microchip-mpfs-icicle-kit.dts   |  72 
> >   .../boot/dts/microchip/microchip-mpfs.dtsi| 331 ++
> >   4 files changed, 406 insertions(+)
> >   create mode 100644 arch/riscv/boot/dts/microchip/Makefile
> >   create mode 100644 
> > arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
> >   create mode 100644 arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi
> >
> > diff --git a/arch/riscv/boot/dts/Makefile b/arch/riscv/boot/dts/Makefile
> > index ca1f8cbd78c0..3ea94ea0a18a 100644
> > --- a/arch/riscv/boot/dts/Makefile
> > +++ b/arch/riscv/boot/dts/Makefile
> > @@ -1,5 +1,6 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   subdir-y += sifive
> >   subdir-y += kendryte
> > +subdir-y += microchip
> >
> >   obj-$(CONFIG_BUILTIN_DTB) := $(addsuffix /, $(subdir-y))
> > diff --git a/arch/riscv/boot/dts/microchip/Makefile 
> > b/arch/riscv/boot/dts/microchip/Makefile
> > new file mode 100644
> > index ..622b12771fd3
> > --- /dev/null
> > +++ b/arch/riscv/boot/dts/microchip/Makefile
> > @@ -0,0 +1,2 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +dtb-$(CONFIG_SOC_MICROCHIP_POLARFIRE) += microchip-mpfs-icicle-kit.dtb
> > diff --git a/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts 
> > b/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
> > new file mode 100644
> > index ..5b51dad13c72
> > --- /dev/null
> > +++ b/arch/riscv/boot/dts/microchip/microchip-mpfs-icicle-kit.dts
> > @@ -0,0 +1,72 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> > +/* Copyright (c) 2020 Microchip Technology Inc */
> > +
> > +/dts-v1/;
> > +
> > +#include "microchip-mpfs.dtsi"
> > +
> > +/* Clock frequency (in Hz) of the rtcclk */
> > +#define RTCCLK_FREQ100
> > +
> > +/ {
> > +   #address-cells = <2>;
> > +   #size-cells = <2>;
> > +   model = "Microchip PolarFire-SoC Icicle Kit";
> > +   compatible = "microchip,mpfs-icicle-kit", "microchip,polarfire-soc";
> > +
> > +   chosen {
> > +   stdout-path = 
> > +   };
> > +
> > +   cpus {
> > +   timebase-frequency = ;
> > +   };
> > +
> > +   memory@8000 {
> > +   device_type = "memory";
> > +   reg = <0x0 0x8000 0x0 0x4000>;
> > +   clocks = < 26>;
> > +   };
> > +
> > +   soc {
> > +   };
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   status = "okay";
> > +};
> > +
> > + {
> > +   phy-mode = "sgmii";
> > +   phy-handle = <>;
> > +   phy0: ethernet-phy@8 {
> > +   reg = <8>;
> > +   ti,fifo-depth = <0x01>;
> > +   };
> > +};
> > +
> > + {
> > +   status = "okay";
> > +   phy-mode = "sgmii";
> > +   phy-handle = <>;
> > +   phy1: ethernet-phy@9 {
> > +   reg = <9>;
> > +   ti,fifo-depth = <0x01>;
> > +   };
> > +};
> > diff --git a/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi 
> > b/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi
> > new file mode 100644
> > index ..7a076aa4c78d
> > --- /dev/null
> > +++ b/arch/riscv/boot/dts/microchip/microchip-mpfs.dtsi
> > @@ -0,0 +1,331 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> > +/* Copyright (c) 2020 Microchip Technology Inc */
> > +
> > +/dts-v1/;
> > +
> > +/ {
> > +   #address-cells = <2>;
> > +   #size-cells = <2>;
> > +   model = "Microchip PolarFire-SoC";
> > +   compatible = "microchip,polarfire-soc";
> > +
> > +   chosen {
> > +   };
> > +
> > +   cpus {
> > +   #address-cells = <1>;
> > +   #size-cells = <0>;
> > +
> > +   cpu@0 {
> > +   clock-frequency = <0>;
> > +   compatible = "sifive,rocket0", "riscv";
> > +   device_type = "cpu";
> > +   i-cache-block-size = <64>;
> > +   i-cache-sets = <128>;
> > +   i-cache-size = <16384>;
> > +   reg = <0>;
> > +   riscv,isa = "rv64imac";
> > +   status = "disabled";
> > +
> > +   

Re: [PATCH v3 1/5] RISC-V: Add Microchip PolarFire SoC kconfig option

2021-01-07 Thread Atish Patra
On Thu, Jan 7, 2021 at 3:40 AM  wrote:
>
> Hi Atish,
>
> On 12/4/20 8:58 AM, Atish Patra wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> >
> > Add Microchip PolarFire kconfig option which selects SoC specific
> > and common drivers that is required for this SoC.
> >
> > Signed-off-by: Atish Patra 
> > Reviewed-by: Bin Meng 
> > Reviewed-by: Anup Patel 
> > ---
> >   arch/riscv/Kconfig.socs | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/riscv/Kconfig.socs b/arch/riscv/Kconfig.socs
> > index 8a55f6156661..148ab095966b 100644
> > --- a/arch/riscv/Kconfig.socs
> > +++ b/arch/riscv/Kconfig.socs
> > @@ -1,5 +1,12 @@
> >   menu "SoC selection"
> >
> > +config SOC_MICROCHIP_POLARFIRE
> > +   bool "Microchip PolarFire SoCs"
> > +   select MCHP_CLK_PFSOC
> Can you change MCHP_CLK_PFSOC to MCHP_CLK_MPFS to align with the v2
> clock driver?

Sure. Will do that.

> > +   select SIFIVE_PLIC
> > +   help
> > + This enables support for Microchip PolarFire SoC platforms.
> > +
> >   config SOC_SIFIVE
> >  bool "SiFive SoCs"
> >  select SERIAL_SIFIVE if TTY
> > --
> > 2.25.1
> >
> Regards,
>
> Cyril.
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



-- 
Regards,
Atish


Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2021-01-07 Thread Sean Christopherson
On Thu, Jan 07, 2021, Ashish Kalra wrote:
> On Thu, Jan 07, 2021 at 09:26:25AM -0800, Sean Christopherson wrote:
> > On Thu, Jan 07, 2021, Ashish Kalra wrote:
> > > Hello Steve,
> > > 
> > > On Wed, Jan 06, 2021 at 05:01:33PM -0800, Steve Rutherford wrote:
> > > > Avoiding an rbtree for such a small (but unstable) list seems correct.
> > > > 
> > > > For the unencrypted region list strategy, the only questions that I
> > > > have are fairly secondary.
> > > > - How should the kernel upper bound the size of the list in the face
> > > > of malicious guests, but still support large guests? (Something
> > > > similar to the size provided in the bitmap API would work).
> > > 
> > > I am thinking of another scenario, where a malicious guest can make
> > > infinite/repetetive hypercalls and DOS attack the host. 
> > > 
> > > But probably this is a more generic issue, this can be done by any guest
> > > and under any hypervisor, i don't know what kind of mitigations exist
> > > for such a scenario ?
> > > 
> > > Potentially, the guest memory donation model can handle such an attack,
> > > because in this model, the hypervisor will expect only one hypercall,
> > > any repetetive hypercalls can make the hypervisor disable the guest ?
> > 
> > KVM doesn't need to explicitly bound its tracking structures, it just needs 
> > to
> > use GFP_KERNEL_ACCOUNT when allocating kernel memory for the structures so 
> > that
> > the memory will be accounted to the task/process/VM.  Shadow MMU pages are 
> > the
> > only exception that comes to mind; they're still accounted properly, but KVM
> > also explicitly limits them for a variety of reasons.
> > 
> > The size of the list will naturally be bounded by the size of the guest; and
> > assuming KVM proactively merges adjancent regions, that upper bound is 
> > probably
> > reasonably low compared to other allocations, e.g. the aforementioned MMU 
> > pages.
> > 
> > And, using a list means a malicious guest will get automatically throttled 
> > as
> > the latency of walking the list (to merge/delete existing entries) will 
> > increase
> > with the size of the list.
> 
> Just to add here, potentially there won't be any proactive
> merging/deletion of existing entries, as the only static entries will be
> initial guest MMIO regions, which are contigious guest PA ranges but not
> necessarily adjacent. 

My point was that, if the guest is malicious, eventually there will be adjacent
entries, e.g. the worst case scenario is that the encrypted status changes on
every 4k page.  Anyways, not really all that important, I mostly thinking out
loud :-)


Re: [PATCH] usb: dwc3: core: Replace devm_reset_control_array_get()

2021-01-07 Thread Greg KH
On Thu, Jan 07, 2021 at 10:16:50PM +0300, Dmitry Osipenko wrote:
> 03.11.2020 06:57, Yejune Deng пишет:
> > devm_reset_control_array_get_optional_shared() looks more readable
> > 
> > Signed-off-by: Yejune Deng 
> > ---
> >  drivers/usb/dwc3/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> > index 841daec..b87acf0 100644
> > --- a/drivers/usb/dwc3/core.c
> > +++ b/drivers/usb/dwc3/core.c
> > @@ -1490,7 +1490,7 @@ static int dwc3_probe(struct platform_device *pdev)
> >  
> > dwc3_get_properties(dwc);
> >  
> > -   dwc->reset = devm_reset_control_array_get(dev, true, true);
> > +   dwc->reset = devm_reset_control_array_get_optional_shared(dev);
> > if (IS_ERR(dwc->reset))
> > return PTR_ERR(dwc->reset);
> >  
> > 
> 
> Greg / Felipe, could you please pick up this patch?
> 
> I want to add devm_reset_control_array_get_exclusive_released() for
> NVIDIA Tegra drivers and we need to get rid of all the open-coded
> devm_reset_control_array_get() users in order to extend the reset API
> sanely.

Care to ack it or send a reviewed-by for it?

thanks,

greg k-h


Re: [PATCH] arm64: PCI: Enable SMC conduit

2021-01-07 Thread Jeremy Linton

Hi,

On 1/7/21 12:14 PM, Will Deacon wrote:

On Mon, Jan 04, 2021 at 10:57:35PM -0600, Jeremy Linton wrote:

Given that most arm64 platform's PCI implementations needs quirks
to deal with problematic config accesses, this is a good place to
apply a firmware abstraction. The ARM PCI SMMCCC spec details a
standard SMC conduit designed to provide a simple PCI config
accessor. This specification enhances the existing ACPI/PCI
abstraction and expects power, config, etc functionality is handled
by the platform. It also is very explicit that the resulting config
space registers must behave as is specified by the pci specification.

Lets hook the normal ACPI/PCI config path, and when we detect
missing MADT data, attempt to probe the SMC conduit. If the conduit
exists and responds for the requested segment number (provided by the
ACPI namespace) attach a custom pci_ecam_ops which redirects
all config read/write requests to the firmware.

This patch is based on the Arm PCI Config space access document @
https://developer.arm.com/documentation/den0115/latest


Why does firmware need to be involved with this at all? Can't we just
quirk Linux when these broken designs show up in production? We'll need
to modify Linux _anyway_ when the firmware interface isn't implemented
correctly...



IMHO, The short answer is that having the quirk in the firmware keeps it 
centralized over multiple OSs and linux distro versions and avoids a lot 
of costly kernel->distro churning to backport/maintain quirks over a 
dozen distro versions.


There is also a long-term maintenance advantage since its hard for the 
kernel community as a  whole to have a good view of how long a 
particular model of machine is actually in use. For example, today we 
could ask are any of those xgene1's still in use and remove their 
quirks, but no one really has a clear view.


As far as working around the firmware, that is of course potentially 
problematic, but I think it is easier to say "fix the firmware if you 
want/need linux support" than it is to get people to fix their ECAM 
implementations. Hypothetically, if at some point there is a broken 
version of this interface in firmware, the kernel could choose to bypass 
it entirely and talk to whatever broken config space method the hardware 
implements. At which point we aren't any worse off than the situation 
today.


The flip side of this is that a fair number of these platforms have open 
source firmware as well, so it may be trivial to fix the firmware.


Thanks for looking a this!



Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 17:40:06 Russell King - ARM Linux admin wrote:
> On Thu, Jan 07, 2021 at 06:19:23PM +0100, Andrew Lunn wrote:
> > Did we loose the comment:
> > 
> > /* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a
> >  * single read. Switch back to reading 16 byte blocks ...
> > 
> > That explains why 16 is used. Given how broken stuff is and the number
> > of workaround we need, we should try to document as much as we cam, so
> > we don't break stuff when adding more workarounds.
> 
> It is _not_ why 16 is used at all.
> 
> We used to read the whole lot in one go. However, some modules could
> not cope with a full read - also some Linux I2C drivers struggled with
> it.
> 
> So, we reduced it down to 16 bytes. See commit 28e74a7cfd64 ("net: sfp:
> read eeprom in maximum 16 byte increments"). That had nothing to do
> with the 3FE46541AA, which came along later. It has been discovered
> that 3FE46541AA reacts badly to a single byte read to address 0x51 -
> it locks the I2C bus. Hence why we can't just go to single byte reads
> for every module.
> 
> So, the comment needs to be kept to explain why we are unable to go
> to single byte reads for all modules.  The choice of 16 remains
> relatively arbitary.

Do you have an idea where to put a comment?


Re: [PATCH 2/4] arm64: mte: Add asynchronous mode support

2021-01-07 Thread Andrey Konovalov
On Thu, Jan 7, 2021 at 6:25 PM Vincenzo Frascino
 wrote:
>
> Hi Andrey,
>
> On 1/7/21 4:29 PM, Andrey Konovalov wrote:
> > On Wed, Jan 6, 2021 at 12:56 PM Vincenzo Frascino
> >  wrote:
> >>
> >> MTE provides an asynchronous mode for detecting tag exceptions. In
> >> particular instead of triggering a fault the arm64 core updates a
> >> register which is checked by the kernel at the first entry after the tag
> >> exception has occurred.
> >>
> >> Add support for MTE asynchronous mode.
> >>
> >> The exception handling mechanism will be added with a future patch.
> >>
> >> Note: KASAN HW activates async mode via kasan.mode kernel parameter.
> >> The default mode is set to synchronous.
> >>
> >> Cc: Catalin Marinas 
> >> Cc: Will Deacon 
> >> Signed-off-by: Vincenzo Frascino 
> >> ---
> >>  arch/arm64/kernel/mte.c | 31 +--
> >>  1 file changed, 29 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> >> index 24a273d47df1..5d992e16b420 100644
> >> --- a/arch/arm64/kernel/mte.c
> >> +++ b/arch/arm64/kernel/mte.c
> >> @@ -153,8 +153,35 @@ void mte_init_tags(u64 max_tag)
> >>
> >>  void mte_enable_kernel(enum kasan_arg_mode mode)
> >>  {
> >> -   /* Enable MTE Sync Mode for EL1. */
> >> -   sysreg_clear_set(sctlr_el1, SCTLR_ELx_TCF_MASK, 
> >> SCTLR_ELx_TCF_SYNC);
> >> +   const char *m;
> >> +
> >> +   /* Preset parameter values based on the mode. */
> >> +   switch (mode) {
> >> +   case KASAN_ARG_MODE_OFF:
> >> +   return;
> >> +   case KASAN_ARG_MODE_LIGHT:
> >> +   /* Enable MTE Async Mode for EL1. */
> >> +   sysreg_clear_set(sctlr_el1, SCTLR_ELx_TCF_MASK, 
> >> SCTLR_ELx_TCF_ASYNC);
> >> +   m = "asynchronous";
> >> +   break;
> >> +   case KASAN_ARG_MODE_DEFAULT:
> >> +   case KASAN_ARG_MODE_PROD:
> >> +   case KASAN_ARG_MODE_FULL:
> >> +   /* Enable MTE Sync Mode for EL1. */
> >> +   sysreg_clear_set(sctlr_el1, SCTLR_ELx_TCF_MASK, 
> >> SCTLR_ELx_TCF_SYNC);
> >> +   m = "synchronous";
> >> +   break;
> >> +   default:
> >> +   /*
> >> +* kasan mode should be always set hence we should
> >> +* not reach this condition.
> >> +*/
> >> +   WARN_ON_ONCE(1);
> >> +   return;
> >> +   }
> >> +
> >> +   pr_info_once("MTE: enabled in %s mode at EL1\n", m);
> >> +
> >> isb();
> >>  }
> >>
> >> --
> >> 2.29.2
> >>
> >
> > Hi Vincenzo,
> >
> > It would be cleaner to pass a bool to mte_enable_kernel() and have it
> > indicate sync/async mode. This way you don't have to pull all these
> > KASAN constants into the arm64 code.
> >
>
> Boolean arguments are generally bad for legibility, hence I tend to avoid 
> them.
> In this case exposing the constants does not seem a big issue especially 
> because
> the only user of this code is "KASAN_HW_TAGS" and definitely improves its
> legibility hence I would prefer to keep it as is.

I don't like that this spills KASAN internals to the arm64 code. Let's
add another enum with two values and pass it as an argument then.
Something like:

enum mte_mode {
  ARM_MTE_SYNC,
  ARM_MTE_ASYNC
}


Re: [PATCH v2 1/3] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips

2021-01-07 Thread Pali Rohár
On Thursday 07 January 2021 18:19:23 Andrew Lunn wrote:
> > +   if (sfp->i2c_block_size < 2) {
> > +   dev_info(sfp->dev, "skipping hwmon device registration "
> > +  "due to broken EEPROM\n");
> > +   dev_info(sfp->dev, "diagnostic EEPROM area cannot be read "
> > +  "atomically to guarantee data coherency\n");
> 
> Strings like this are the exception to the 80 character rule. People
> grep for them, and when they are split, they are harder to find.

Ok. I will fix it.

> > -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base)
> > +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len)
> >  {
> > -   if (!memcmp(base->vendor_name, "VSOL", 16))
> > -   return 1;
> > -   if (!memcmp(base->vendor_name, "OEM ", 16) &&
> > -   !memcmp(base->vendor_pn,   "V2801F  ", 16))
> > -   return 1;
> > +   size_t i, block_size = sfp->i2c_block_size;
> >  
> > -   /* Some modules can't cope with long reads */
> > -   return 16;
> > -}
> > +   /* Already using byte IO */
> > +   if (block_size == 1)
> > +   return false;
> 
> This seems counter intuitive. We don't need byte IO because we are
> doing btye IO? Can we return True here?

I do not know this part was written by Russel.

Currently function is used in a way if sfp subsystem should switch to
byte IO. So if we are already using byte IO we are not going to do
switch and therefore false is returning.

At least this is how I understood why 'return false' is there.

> >  
> > -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base 
> > *base)
> > -{
> > -   sfp->i2c_block_size = sfp_quirk_i2c_block_size(base);
> > +   for (i = 1; i < len; i += block_size) {
> > +   if (memchr_inv(buf + i, '\0', block_size - 1))
> > +   return false;
> > +   }
> 
> Is the loop needed?

Originally I wanted to use just four memcmp() calls but Russel told me
that code should be generic (in case in future initial block size would
be changed, which is a good argument) and come up with this code with
for-loop.

So I think loop is needed.

> I also wonder if on the last iteration of the loop you go passed the
> end of buf? Don't you need a min(block_size -1, len - i) or
> similar?

You are right, if code is generic this needs to be fixed to prevent
reading reading undefined memory. I will replace it by proposed min(...)
call.


Re: [PATCH] usb: dwc3: core: Replace devm_reset_control_array_get()

2021-01-07 Thread Dmitry Osipenko
03.11.2020 06:57, Yejune Deng пишет:
> devm_reset_control_array_get_optional_shared() looks more readable
> 
> Signed-off-by: Yejune Deng 
> ---
>  drivers/usb/dwc3/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 841daec..b87acf0 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1490,7 +1490,7 @@ static int dwc3_probe(struct platform_device *pdev)
>  
>   dwc3_get_properties(dwc);
>  
> - dwc->reset = devm_reset_control_array_get(dev, true, true);
> + dwc->reset = devm_reset_control_array_get_optional_shared(dev);
>   if (IS_ERR(dwc->reset))
>   return PTR_ERR(dwc->reset);
>  
> 

Greg / Felipe, could you please pick up this patch?

I want to add devm_reset_control_array_get_exclusive_released() for
NVIDIA Tegra drivers and we need to get rid of all the open-coded
devm_reset_control_array_get() users in order to extend the reset API
sanely.

Thanks in advance.


Re: [PATCH 1/1] Revert "init/console: Use ttynull as a fallback when there is no console"

2021-01-07 Thread Greg Kroah-Hartman
On Thu, Jan 07, 2021 at 05:44:00PM +0100, Petr Mladek wrote:
> This reverts commit 757055ae8dedf5333af17b3b5b4b70ba9bc9da4e.
> 
> The commit caused that ttynull was used as the default console
> on many systems. It happened when there was no console configured
> on the command line and ttynull_init() was the first initcall
> calling register_console().
> 
> The commit fixed a historical problem that have been there for ages.
> The primary motivation was the commit 3cffa06aeef7ece30f6
> ("printk/console: Allow to disable console output by using console=""
> or console=null"). It provided a clean solution
> for a workaround that was widely used and worked only by chance.
> 
> This revert causes that the console="" or console=null command line
> options will again work only by chance. These options will cause that
> a particular console will be preferred and the default (tty) ones
> will not get enabled. There will be no console registered at
> all. As a result there won't be stdin, stdout, and stderr for
> the init process. But it worked exactly this way even before.
> 
> The proper solution has to fulfill many conditions:
> 
>   + Register ttynull only when explicitly required or as
> the ultimate fallback.
> 
>   + ttynull must get associated with /dev/console but it must
> not become preferred console when used as a fallback.
> Especially, it must still be possible to replace it
> by a better console later.
> 
> Such a change requires clean up of the register_console() code.
> Otherwise, it would be even harder to follow. Especially, the use
> of has_preferred_console and CON_CONSDEV flag is tricky. The clean
> up is risky. The ordering of consoles is not well defined. And
> any changes tend to break existing user settings.
> 
> Do the revert at the least risky solution for now.
> 
> Signed-off-by: Petr Mladek 

Acked-by: Greg Kroah-Hartman 

Linus, can you take this directly, or is this going through some other
tree?

thanks,

greg k-h


[ANNOUNCE] kmod 28

2021-01-07 Thread Lucas De Marchi

kmod 28 is out:

https://www.kernel.org/pub/linux/utils/kernel/kmod/kmod-28.tar.xz
https://www.kernel.org/pub/linux/utils/kernel/kmod/kmod-28.tar.sign

- Improvements
- Add Zstandard to the supported compression formats using libzstd
  (pass --with-zstd to configure)

- Bug fixes
- Ignore ill-formed kernel command line, e.g. with 
"ivrs_acpihid[00:14.5]=AMD0020:0"
  option in it
- Fix some memory leaks
- Fix 0-length builtin.alias.bin: it needs at least the index header

Shortlog is below:

Lucas De Marchi (17):
  gitignore: ignore release files
  gitignore: ignore .cache.mk when building modules
  libkmod: ignore kcmdline option if we fail to parse modname
  testsuite: check for ill-formed kcmdline
  depmod: do not output .bin to stdout
  libkmod: simplify lookup when builtin.modinfo.bin file is missing
  libkmod: fix return error when opening index
  libkmod: allow modules.alias.builtin to be optional
  testsuite: add check for kmod_load_resources
  ci: update travis distro
  ci: remove semaphoreci
  depmod: unconditionally write builtin.alias.bin
  shared: fix UNIQ definition
  testsuite: add test for empty modules.builtin.aliases.bin
  build: fix distcheck due to missing zstd
  build: add comment with rules for libtool version update
  kmod 28

Samanta Navarro (1):
  man: fix typo

Shuo Wang (1):
  NEWS: fix typo

Torge Matthies (2):
  add Zstandard compression support
  testsuite: add test for zstd-compressed module

Yauheni Kaliuta (3):
  libkmod: kmod_builtin_get_modinfo: free modinfo on error
  depmod: output_builtin_alias_bin: free idx on error path
  libkmod: kmod_log_null: qualify ctx argument as const


Thank you all for the contributions.


Lucas De Marchi


Re: [PATCH v2 1/4] KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit

2021-01-07 Thread Sean Christopherson
On Thu, Jan 07, 2021, Paolo Bonzini wrote:
> On 07/01/21 18:00, Sean Christopherson wrote:
> > Ugh, I assume this is due to one of the "premature" 
> > nested_ops->check_events()
> > calls that are necessitated by the event mess?  I'm guessing 
> > kvm_vcpu_running()
> > is the culprit?
> > 
> > If my assumption is correct, this bug affects nVMX as well.
> 
> Yes, though it may be latent.  For SVM it was until we started allocating
> svm->nested on demand.
> 
> > Rather than clear the request blindly on any nested VM-Exit, what
> > about something like the following?
> 
> I think your patch is overkill, KVM_REQ_GET_NESTED_STATE_PAGES is only set
> from KVM_SET_NESTED_STATE so it cannot happen while the VM runs.

Yeah, which is why I was hoping we could avoid clearing the request on every
nested exit.

> Something like this is small enough and works well.

I've no argument against it working, rather that I dislike clearing the request
on every exit.  Except for the ->check_events() case, hitting the scenario where
there's a pending request at the time of nested VM-Exit would ideally be treated
as a KVM bug.

On the other hand, clearing nested-specific request on nested VM-Exit is
logically sound, so I guess I'm ok with the minimal patch.


Re: [PATCH] mm/hugetlb: Fix potential missing huge page size info

2021-01-07 Thread Mike Kravetz
On 1/7/21 4:34 AM, Miaohe Lin wrote:
> The huge page size is encoded for VM_FAULT_HWPOISON errors only. So if we
> return VM_FAULT_HWPOISON, huge page size would just be ignored.
> 
> Fixes: aa50d3a7aa81 ("Encode huge page size for VM_FAULT_HWPOISON errors")
> Signed-off-by: Miaohe Lin 
> Cc: 
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks!

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz


Re: [PATCH] of: property: Add device link support for interrupts

2021-01-07 Thread Saravana Kannan
On Thu, Jan 7, 2021 at 10:39 AM Rob Herring  wrote:
>
> On Wed, Jan 6, 2021 at 11:53 AM Saravana Kannan  wrote:
> >
> > On Sat, Jan 2, 2021 at 3:37 AM Marc Zyngier  wrote:
> > >
> > > On Thu, 31 Dec 2020 21:12:40 +,
> > > Rob Herring  wrote:
> > > >
> > > > On Mon, Dec 21, 2020 at 09:30:45AM +, Marc Zyngier wrote:
> > > > > On 2020-12-18 21:07, Saravana Kannan wrote:
> > > > > > Add support for creating device links out of interrupts property.
> > > > > >
> > > > > > Cc: Marc Zyngier 
> > > > > > Cc: Kevin Hilman 
> > > > > > Signed-off-by: Saravana Kannan 
> > > > > > ---
> > > > > > Rob/Greg,
> > > > > >
> > > > > > This might need to go into driver-core to avoid conflict
> > > > > > due to fw_devlink refactor series that merged there.
> > > > > >
> > > > > > Thanks,
> > > > > > Saravana
> > > > > >
> > > > > >
> > > > > >  drivers/of/property.c | 17 +
> > > > > >  1 file changed, 17 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > index 5f9eed79a8aa..e56a5eae0a0b 100644
> > > > > > --- a/drivers/of/property.c
> > > > > > +++ b/drivers/of/property.c
> > > > > > @@ -1271,6 +1271,22 @@ static struct device_node
> > > > > > *parse_iommu_maps(struct device_node *np,
> > > > > >   return of_parse_phandle(np, prop_name, (index * 4) + 1);
> > > > > >  }
> > > > > >
> > > > > > +static struct device_node *parse_interrupts(struct device_node *np,
> > > > > > + const char *prop_name, int 
> > > > > > index)
> > > > > > +{
> > > > > > + struct device_node *sup;
> > > > > > +
> > > > > > + if (strcmp(prop_name, "interrupts") || index)
> > > > > > + return NULL;
> > > > > > +
> > > > > > + of_node_get(np);
> > > > > > + while (np && !(sup = of_parse_phandle(np, "interrupt-parent", 0)))
> > > > > > + np = of_get_next_parent(np);
> > > > > > + of_node_put(np);
> > > > > > +
> > > > > > + return sup;
> > > > > > +}
> > > > > > +
> > > > > >  static const struct supplier_bindings of_supplier_bindings[] = {
> > > > > >   { .parse_prop = parse_clocks, },
> > > > > >   { .parse_prop = parse_interconnects, },
> > > > > > @@ -1296,6 +1312,7 @@ static const struct supplier_bindings
> > > > > > of_supplier_bindings[] = {
> > > > > >   { .parse_prop = parse_pinctrl6, },
> > > > > >   { .parse_prop = parse_pinctrl7, },
> > > > > >   { .parse_prop = parse_pinctrl8, },
> > > > > > + { .parse_prop = parse_interrupts, },
> > > > > >   { .parse_prop = parse_regulators, },
> > > > > >   { .parse_prop = parse_gpio, },
> > > > > >   { .parse_prop = parse_gpios, },
> > > > >
> > > > > You don't really describe what this is for so I'm only guessing
> > > > > from the context. If you want to follow the interrupt hierarchy,
> > > > > "interrupt-parent" isn't enough. You also need to track
> > > > > things like interrupt-map, or anything that carries a phandle
> > > > > to an interrupt controller.
> > > >
> > > > We don't need to follow the hierarchy, we just need the immediate
> > > > dependencies.
> > >
> > > Indeed. I also wonder why this isn't just a irq_find_parent() call, TBH.
> >
> > Thanks Rob for explaining it.
> >
> > Marc, I wasn't sure if Rob would be okay with including of_irq.h here.
> > Also, I'm trying to keep of/property.c independent of the framework
> > code for now. The long term goal is to see if I can move out most of
> > this into the frameworks. But I want to do that after I sort of some
> > of the larger problems (like getting fw_devlink=on to work on all
> > devices  first). Let me know if you have a strong preference for right
> > now, if not, I'd rather keep property.c independent for now.
> >
> > I wasn't aware of interrupt-map until a few weeks ago and didn't know
> > it carried phandles. I can add support for that too. There's no reason
> > for all of them to go in one patch though.
> >
> > >
> > > > But you are right that 'interrupt-map' also needs to be tracked.
> > >
> > > And 'interrupts-extended', while we're at it.
> >
> > This is already handled.
> >
> > > >
> > > > I also noticed that we define 'interrupt-parent' as a dependency to
> > > > parse, but that's wrong. The dependency is where 'interrupts' appears
> > > > and where 'interrupt-parent' appears is irrelevant.
> >
> > No, the interrupt-parent parsing is correct and it's needed for
> > interrupt controllers to probe in the right order. But
> > interrupt-parent is also needs to be looked at for parsing
> > "interrupts".
>
> If you parse 'interrupts' for interrupt controllers (which in turn
> will use 'interrupt-parent'), then you aren't going to need to track
> 'interrupt-parent' by itself.

Do all interrupt controllers (that are not the root interrupt
controller) need to have "interrupts" property? If yes, then yeah,
that makes sense. But I vaguely remember that this wasn't the case for
some DT I saw.

Ah, here's one I found.

Re: [PATCH v3 07/15] x86/alternative: support "not feature" and ALTERNATIVE_TERNARY

2021-01-07 Thread Borislav Petkov
On Thu, Dec 17, 2020 at 10:31:25AM +0100, Juergen Gross wrote:
> Instead of only supporting to modify instructions when a specific
> feature is set, support doing so for the case a feature is not set.
> 
> As today a feature is specified using a 16 bit quantity and the highest
> feature number in use is around 600, using a negated feature number for
> specifying the inverted case seems to be appropriate.
> 
>   ALTERNATIVE "default_instr", "patched_instr", ~FEATURE_NR
> 
> will start with "default_instr" and patch that with "patched_instr" in
> case FEATURE_NR is not set.
> 
> Using that add ALTERNATIVE_TERNARY:
> 
>   ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
>   "feature_on_instr", "feature_off_instr"
> 
> which will start with "default_instr" and at patch time will, depending
> on FEATURE_NR being set or not, patch that with either
> "feature_on_instr" or "feature_off_instr".

How about an even simpler one (only build-tested):

---
diff --git a/arch/x86/include/asm/alternative-asm.h 
b/arch/x86/include/asm/alternative-asm.h
index 464034db299f..d52b423d3cab 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -109,6 +109,9 @@
.popsection
 .endm
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2 oldinstr, newinstr1, feature, newinstr2, 
X86_FEATURE_TERNARY
+
 #endif  /*  __ASSEMBLY__  */
 
 #endif /* _ASM_X86_ALTERNATIVE_ASM_H */
diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 13adca37c99a..f170cbe89539 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -175,6 +175,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)\
".popsection\n"
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2(oldinstr, newinstr1, feature, newinstr2, 
X86_FEATURE_TERNARY)
+
 #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, 
feat3) \
OLDINSTR_3(oldinsn, 1, 2, 3)
\
".pushsection .altinstructions,\"a\"\n" 
\
@@ -206,6 +209,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, 
newinstr2, feature2) ::: "memory")
 
+#define alternative_ternary(oldinstr, feature, newinstr1, newinstr2)   \
+   asm_inline volatile(ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, 
newinstr2) ::: "memory")
+
 /*
  * Alternative inline assembly with input.
  *
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..cc634db0b91f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -108,7 +108,7 @@
 #define X86_FEATURE_EXTD_APICID( 3*32+26) /* Extended APICID 
(8 bits) */
 #define X86_FEATURE_AMD_DCM( 3*32+27) /* AMD multi-node processor 
*/
 #define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-State hardware 
coordination feedback capability (APERF/MPERF MSRs) */
-/* free( 3*32+29) */
+#define X86_FEATURE_TERNARY( 3*32+29) /* "" Synthetic bit for 
ALTERNATIVE_TERNARY() */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 
state */
 #define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8d778e46725d..2cb29d4d8dd9 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,7 +393,7 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
replacement = (u8 *)>repl_offset + a->repl_offset;
BUG_ON(a->instrlen > sizeof(insn_buff));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
-   if (!boot_cpu_has(a->cpuid)) {
+   if (!boot_cpu_has(a->cpuid) && (a->cpuid != 
X86_FEATURE_TERNARY)) {
if (a->padlen > 1)
optimize_nops(a, instr);
 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Re: [PATCH 5/5] media: hantro: Add support for the Rockchip PX30

2021-01-07 Thread Ezequiel Garcia
Hi Paul,

Happy to see this patch. It was on my TODO list,
but I hadn't had time to bringup my rk3326 device.

A few comments.

On Thu, 2021-01-07 at 14:41 +0100, Paul Kocialkowski wrote:
> The PX30 SoC includes both the VDPU2 and VEPU2 blocks which are similar
> to the RK3399 (Hantro G1/H1 with shuffled registers).
> 
> Besides taking an extra clock, it also shares an interrupt with the IOMMU
> so it's necessary to request the interrupt shared.
> 

Could you clarify on the commit description which iommu device interrupt
line is being shared?

> Signed-off-by: Paul Kocialkowski 
> ---
>  drivers/staging/media/hantro/hantro_drv.c    |  5 +++--
>  drivers/staging/media/hantro/hantro_hw.h |  1 +
>  drivers/staging/media/hantro/rk3399_vpu_hw.c | 21 
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/media/hantro/hantro_drv.c 
> b/drivers/staging/media/hantro/hantro_drv.c
> index e5f200e64993..076a7782b476 100644
> --- a/drivers/staging/media/hantro/hantro_drv.c
> +++ b/drivers/staging/media/hantro/hantro_drv.c
> @@ -472,6 +472,7 @@ static const struct v4l2_file_operations hantro_fops = {
>  
>  static const struct of_device_id of_hantro_match[] = {
>  #ifdef CONFIG_VIDEO_HANTRO_ROCKCHIP
> +   { .compatible = "rockchip,px30-vpu", .data = _vpu_variant, },
> { .compatible = "rockchip,rk3399-vpu", .data = _vpu_variant, },
> { .compatible = "rockchip,rk3328-vpu", .data = _vpu_variant, },
> { .compatible = "rockchip,rk3288-vpu", .data = _vpu_variant, },
> @@ -796,8 +797,8 @@ static int hantro_probe(struct platform_device *pdev)
> return -ENXIO;
>  
> ret = devm_request_irq(vpu->dev, irq,
> -  vpu->variant->irqs[i].handler, 0,
> -  dev_name(vpu->dev), vpu);
> +  vpu->variant->irqs[i].handler,
> +  IRQF_SHARED, dev_name(vpu->dev), vpu);

Maybe this irq flag should be part of vpu->variant? It sounds like an IP block
integration specific thing.

Also, you will need a px30-specific interrupt handler now,
since the rk3399 one is not shared-friendly.

> if (ret) {
> dev_err(vpu->dev, "Could not request %s IRQ.\n",
> irq_name);
> diff --git a/drivers/staging/media/hantro/hantro_hw.h 
> b/drivers/staging/media/hantro/hantro_hw.h
> index 34c9e4649a25..07f516fd7a2e 100644
> --- a/drivers/staging/media/hantro/hantro_hw.h
> +++ b/drivers/staging/media/hantro/hantro_hw.h
> @@ -148,6 +148,7 @@ enum hantro_enc_fmt {
> RK3288_VPU_ENC_FMT_UYVY422 = 3,
>  };
>  
> +extern const struct hantro_variant px30_vpu_variant;
>  extern const struct hantro_variant rk3399_vpu_variant;
>  extern const struct hantro_variant rk3328_vpu_variant;
>  extern const struct hantro_variant rk3288_vpu_variant;
> diff --git a/drivers/staging/media/hantro/rk3399_vpu_hw.c 
> b/drivers/staging/media/hantro/rk3399_vpu_hw.c
> index 7a7962cf771e..4112f98baa60 100644
> --- a/drivers/staging/media/hantro/rk3399_vpu_hw.c
> +++ b/drivers/staging/media/hantro/rk3399_vpu_hw.c

Perhaps it's time to rename this to rockchip_vpu_hw.c,
and merge rk3288 and rk3399? It's a nitpick, though.

> @@ -220,3 +220,24 @@ const struct hantro_variant rk3328_vpu_variant = {
> .clk_names = rk3399_clk_names,
> .num_clocks = ARRAY_SIZE(rk3399_clk_names),
>  };
> +
> +static const char * const px30_clk_names[] = {
> +   "aclk", "hclk", "sclk"
> +};
> +
> +const struct hantro_variant px30_vpu_variant = {
> +   .enc_offset = 0x0,
> +   .enc_fmts = rk3399_vpu_enc_fmts,
> +   .num_enc_fmts = ARRAY_SIZE(rk3399_vpu_enc_fmts),
> +   .dec_offset = 0x400,
> +   .dec_fmts = rk3399_vpu_dec_fmts,
> +   .num_dec_fmts = ARRAY_SIZE(rk3399_vpu_dec_fmts),
> +   .codec = HANTRO_JPEG_ENCODER | HANTRO_MPEG2_DECODER |
> +    HANTRO_VP8_DECODER,
> +   .codec_ops = rk3399_vpu_codec_ops,
> +   .irqs = rk3399_irqs,
> +   .num_irqs = ARRAY_SIZE(rk3399_irqs),
> +   .init = rk3399_vpu_hw_init,
> +   .clk_names = px30_clk_names,
> +   .num_clocks = ARRAY_SIZE(px30_clk_names)
> +};

Thanks,
Ezequiel



Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 10:47:07AM -0800, Linus Torvalds wrote:

> Now, the "whole entry" is just 8 bytes, so it's possible that it would
> actually be faster to do a copy of the whole thing rather than write
> just the 16 bits. But I got very nervous about it, because I could
> easily see some threaded app actually changing the 'fd' (or the
> 'event' field) in place (ie writing -1 to it as they close and re-use
> it)

BTW, changing 'event' field in place from another thread is going to
be interesting - you have two 16bit values next to each other and
two CPUs modifying those with no exclusion.  Sounds like a recipe
for massive trouble...

Or am I missing something here?


[RFC PATCH 1/2] userfaultfd: add minor fault registration mode

2021-01-07 Thread Axel Rasmussen
This feature allows userspace to intercept "minor" faults. By "minor"
fault, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

This commit adds the new registration mode, and sets the relevant flag
on the VMAs being registered. In the hugetlb fault path, if we find
that we have huge_pte_none(), but find_lock_page() does indeed find an
existing page, then we have a "minor" fault, and if the VMA has the
userfaultfd registration flag, we call into userfaultfd to handle it.

Why add a new registration mode, as opposed to adding a feature to
MISSING registration, like UFFD_FEATURE_SIGBUS?

- The semantics are significantly different. UFFDIO_COPY or
  UFFDIO_ZEROPAGE do not make sense for these minor faults; userspace
  would instead just memset() or memcpy() or whatever via the non-UFFD
  mapping. Unlike MISSING registration, MINOR registration only makes
  sense for shared memory (hugetlbfs or shmem [to be supported in future
  commits]).

- Doing so would make handle_userfault()'s "reason" argument confusing.
  We'd pass in "MISSING" even if the pages weren't really missing.

Signed-off-by: Axel Rasmussen 
---
 fs/proc/task_mmu.c   |  1 +
 fs/userfaultfd.c | 80 +++-
 include/linux/mm.h   |  1 +
 include/linux/userfaultfd_k.h| 12 -
 include/trace/events/mmflags.h   |  1 +
 include/uapi/linux/userfaultfd.h | 15 +-
 mm/hugetlb.c | 31 +
 7 files changed, 107 insertions(+), 34 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee5a235b3056..108faf719a83 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_MTE)] = "mt",
[ilog2(VM_MTE_ALLOWED)] = "",
 #endif
+   [ilog2(VM_UFFD_MINOR)]  = "ui",
 #ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 894cc28142e7..0a661422eb19 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
+
 int sysctl_unprivileged_userfaultfd __read_mostly;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
@@ -196,24 +198,21 @@ static inline struct uffd_msg userfault_msg(unsigned long 
address,
msg_init();
msg.event = UFFD_EVENT_PAGEFAULT;
msg.arg.pagefault.address = address;
+   /*
+* These flags indicate why the userfault occurred:
+* - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault.
+* - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault.
+* - Neither of these flags being set indicates a MISSING fault.
+*
+* Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write
+* fault. Otherwise, it was a read fault.
+*/
if (flags & FAULT_FLAG_WRITE)
-   /*
-* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE
-* was not set in a UFFD_EVENT_PAGEFAULT, it means it
-* was a read fault, otherwise if set it means it's
-* a write fault.
-*/
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE;
if (reason & VM_UFFD_WP)
-   /*
-* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was
-* not set in a UFFD_EVENT_PAGEFAULT, it means it was
-* a missing fault, otherwise if set it means it's a
-* write protect fault.
-*/
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
+   if (reason & VM_UFFD_MINOR)
+   msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR;
if (features & UFFD_FEATURE_THREAD_ID)
msg.arg.pagefault.feat.ptid = task_pid_vnr(current);
return msg;
@@ -400,8 +399,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
BUG_ON(ctx->mm != mm);
 
-   VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
-   VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
+   /* Any unrecognized flag 

[RFC PATCH 2/2] userfaultfd: add UFFDIO_CONTINUE ioctl

2021-01-07 Thread Axel Rasmussen
This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page. This in turn causes
hugetlb_mcopy_atomic_pte() to not (double-)add the page to the page
cache, and it also avoid set_page_huge_active() (as this would have been
done when the page was allocated).

Signed-off-by: Axel Rasmussen 
---
 fs/userfaultfd.c | 63 +++
 include/linux/userfaultfd_k.h|  2 +
 include/uapi/linux/userfaultfd.h | 21 +++-
 mm/hugetlb.c | 11 ++--
 mm/userfaultfd.c | 86 
 5 files changed, 154 insertions(+), 29 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0a661422eb19..08ff561426c1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1840,6 +1840,66 @@ static int userfaultfd_writeprotect(struct 
userfaultfd_ctx *ctx,
return ret;
 }
 
+static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
+{
+   __s64 ret;
+   struct uffdio_continue uffdio_continue;
+   struct uffdio_continue __user *user_uffdio_continue;
+   struct userfaultfd_wake_range range;
+
+   user_uffdio_continue = (struct uffdio_continue __user *)arg;
+
+   ret = -EAGAIN;
+   if (READ_ONCE(ctx->mmap_changing))
+   goto out;
+
+   ret = -EFAULT;
+   if (copy_from_user(_continue, user_uffdio_continue,
+  /* don't copy the output fields */
+  sizeof(uffdio_continue) - (sizeof(__s64
+   goto out;
+
+   ret = validate_range(ctx->mm, _continue.range.start,
+uffdio_continue.range.len);
+   if (ret)
+   goto out;
+
+   ret = -EINVAL;
+   /* double check for wraparound just in case. */
+   if (uffdio_continue.range.start + uffdio_continue.range.len <=
+   uffdio_continue.range.start) {
+   goto out;
+   }
+   if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
+   goto out;
+
+   if (mmget_not_zero(ctx->mm)) {
+   ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
+uffdio_continue.range.len,
+>mmap_changing);
+   mmput(ctx->mm);
+   } else {
+   return -ESRCH;
+   }
+
+   if (unlikely(put_user(ret, _uffdio_continue->mapped)))
+   return -EFAULT;
+   if (ret < 0)
+   goto out;
+
+   /* len == 0 would wake all */
+   BUG_ON(!ret);
+   range.len = ret;
+   if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) {
+   range.start = uffdio_continue.range.start;
+   wake_userfault(ctx, );
+   }
+   ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN;
+
+out:
+   return ret;
+}
+
 static inline unsigned int uffd_ctx_features(__u64 user_features)
 {
/*
@@ -1924,6 +1984,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned 
cmd,
case UFFDIO_WRITEPROTECT:
ret = userfaultfd_writeprotect(ctx, arg);
break;
+   case UFFDIO_CONTINUE:
+   ret = userfaultfd_continue(ctx, arg);
+   break;
}
return ret;
 }
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 0c8c5fa5efc8..b1f5aaec8dad 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -41,6 +41,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
  unsigned long dst_start,
  unsigned long len,
  bool *mmap_changing);
+extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long 
dst_start,
+ unsigned long len, bool *mmap_changing);
 extern int mwriteprotect_range(struct mm_struct *dst_mm,
   unsigned long start, unsigned long len,
  

Re: [RFC PATCH v2 07/14] cxl/mem: Implement polled mode mailbox

2021-01-07 Thread Ben Widawsky
On 20-12-08 16:24:11, Ben Widawsky wrote:

[snip]

> +static int cxl_mem_mbox_send_cmd(struct cxl_mem *cxlm,
> +  struct mbox_cmd *mbox_cmd)
> +{
> + u64 cmd, status;
> + size_t out_len;
> + int rc;
> +
> + lockdep_assert_held(>mbox_lock);
> +
> + /*
> +  * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
> +  *   1. Caller reads MB Control Register to verify doorbell is clear
> +  *   2. Caller writes Command Register
> +  *   3. Caller writes Command Payload Registers if input payload is 
> non-empty
> +  *   4. Caller writes MB Control Register to set doorbell
> +  *   5. Caller either polls for doorbell to be clear or waits for 
> interrupt if configured
> +  *   6. Caller reads MB Status Register to fetch Return code
> +  *   7. If command successful, Caller reads Command Register to get 
> Payload Length
> +  *   8. If output payload is non-empty, host reads Command Payload 
> Registers
> +  */
> +
> + /* #1 */
> + WARN_ON(cxl_doorbell_busy(cxlm));
> +
> + /* #2 */
> + cxl_write_mbox_reg64(cxlm, CXLDEV_MB_CMD_OFFSET, mbox_cmd->opcode);
> +
> + if (mbox_cmd->size_in) {
> + /* #3 */
> + CXL_SET_FIELD(mbox_cmd->size_in, CXLDEV_MB_CMD_PAYLOAD_LENGTH);
> + cxl_mbox_payload_fill(cxlm, mbox_cmd->payload,
> +   mbox_cmd->size_in);
> + }

There is a bug here where the payload length isn't written on input. It was
working in v1. It will be fixed in v3.

[snip]



[RFC PATCH 0/2] userfaultfd: handle minor faults, add UFFDIO_CONTINUE

2021-01-07 Thread Axel Rasmussen
Overview


This series adds a new userfaultfd registration mode,
UFFDIO_REGISTER_MODE_MINOR. This allows userspace to intercept "minor" faults.
By "minor" fault, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory).
One of the mappings is registered with userfaultfd (in minor mode), and the
other is not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what I'm
calling a "minor" fault. As a concrete example, when working with hugetlbfs, we
have huge_pte_none(), but find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is,
userspace resolves the fault by either a) doing nothing if the contents are
already correct, or b) updating the underlying contents using the second,
non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA,
or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel
"I have ensured the page contents are correct, carry on setting up the mapping".

Use Case


Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
   target machine. The pages are populated on the target by writing to the
   non-UFFD mapping, using the setup described above. The VM is still running
   (and therefore its memory is likely changing), so this may be repeated
   several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
   During this gap, the VM's user(s) will *see* a pause, so it is desirable to
   minimize this window.

3. Between the last time any page was copied from the source to the target, and
   when the VM was paused, the contents of that page may have changed - and
   therefore the copy we have on the target machine is out of date. Although we
   can keep track of which pages are out of date, for VMs with large amounts of
   memory, it is "slow" to transfer this information to the target machine. We
   want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
   touches its memory (via the UFFD-registered mapping), userspace wants to
   intercept this fault. Userspace checks whether or not the page is up to date,
   and if not, copies the updated page from the source machine, via the non-UFFD
   mapping. Finally, whether a copy was performed or not, userspace issues a
   UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
   are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==

Because it's possible to combine registration modes (e.g. a single VMA can be
userfaultfd-registered MINOR | MISSING), and because it's up to userspace how to
resolve faults once they are received, I spent some time thinking through how
the existing API interacts with the new feature.

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without
modifications, the existing codepath assumes a new page needs to be allocated.
This is okay, since userspace must have a second non-UFFD-registered mapping
anyway, thus there isn't much reason to want to use these in any case (just
memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
  in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
  -ENOENT in that case (regardless of the kind of fault).

Remaining Work
==

This patchset doesn't include updates to userfaultfd's documentation or
selftests. This will be added before I send a non-RFC version of this series
(I want to find out if there are strong objections to the API surface before
spending the time to document it.)

Currently the patchset only supports hugetlbfs. There is no reason it can't work
with shmem, but I expect hugetlbfs to be much more commonly used since we're
talking about backing guest memory for VMs. I plan to implement shmem support in
a follow-up patch series.

Axel Rasmussen (2):
  userfaultfd: add minor fault registration mode
  userfaultfd: add UFFDIO_CONTINUE ioctl

 fs/proc/task_mmu.c 

Re: [RFC 0/2] kbuild: Add support to build overlays (%.dtbo)

2021-01-07 Thread Rob Herring
On Wed, Jan 6, 2021 at 10:35 PM Masahiro Yamada  wrote:
>
> On Wed, Jan 6, 2021 at 12:21 AM Rob Herring  wrote:
> >
> > On Tue, Jan 5, 2021 at 4:24 AM Viresh Kumar  wrote:
> > >
> > > Hello,
> > >
> > > Here is an attempt to make some changes in the kernel to allow building
> > > of device tree overlays.
> > >
> > > While at it, I would also like to discuss about how we should mention
> > > the base DT blobs in the Makefiles for the overlays, so they can be
> > > build tested to make sure the overlays apply properly.
> > >
> > > A simple way is to mention that with -base extension, like this:
> > >
> > > $(overlay-file)-base := platform-base.dtb
> > >
> > > Any other preference ?
>
>
>
> Viresh's patch is not enough.
>
> We will need to change .gitignore
> and scripts/Makefile.dtbinst as well.
>
>
> In my understanding, the build rule is completely the same
> between .dtb and .dtbo
> As Rob mentioned, I am not sure if we really need/want
> a separate extension.
>
>
> A counter approach is to use an extension like '.ovl.dtb'
> It clarifies it is an overlay fragment without changing
> anything in our build system or the upstream DTC project.
>
> We use chained extension in some places, for example,
> .dt.yaml for schema yaml files.
>
>
>
> dtb-$(CONFIG_ARCH_FOO) += \
> foo-board.dtb \
> foo-overlay1.ovl.dtb \
> foo-overlay2.ovl.dtb
>
>
> Overlay DT source file names must end with '.ovl.dts'

I like that suggestion as then it's also clear looking at the source
files which ones are overlays. Or we'd need .dtso to be consistent.


> > I think we'll want something similar to how '-objs' works for modules:
> >
> > foo-board-1-dtbs := foo-board.dtb foo-overlay1.dtbo
> > foo-board-2-dtbs := foo-board.dtb foo-overlay2.dtbo
> > foo-board-1-2-dtbs := foo-board.dtb foo-overlay1.dtbo foo-overlay2.dtbo
> > dtbs-y += foo-board-1.dtb foo-board-2.dtb foo-board-1-2.dtb
> >
> > (One difference here is we will want all the intermediate targets
> > unlike .o files.)
> >
> > You wouldn't necessarily have all the above combinations, but you have
> > to allow for them. I'm not sure how we'd handle applying any common
> > overlays where the base and overlay are in different directories.
>
>
> I guess the motivation for supporting -dtbs is to
> add per-board -@ option only when it contains *.dtbo pattern.

I hadn't thought that far, but yeah, that would be good. Really, I
just want it to be controlled per SoC family at least.

> But, as you notice, if the overlay files are located
> under drivers/, it is difficult to add -@ per board.

Generally, they shouldn't be. The exceptions are what we already have
there which are old dt fixups and unittests.

We want the stripped DT repo (devicetree-rebasing) to have all this
and drivers/ is stripped out. (Which reminds me, the DT repo will need
some work to support all this. It's a different build sys.)

> Another scenario is, some people may want to compile
> downstream overlay files (i.e. similar concept as external modules),
> then we have no idea which base board should be given with the -@ flag.
>
>
> I'd rather be tempted to add it globally
>
>
> ifdef CONFIG_OF_OVERLAY
> DTC_FLAGS += -@
> endif

We've already rejected doing that. Turning on '-@' can grow the dtb
size by a significant amount which could be problematic for some
boards.


> > Another thing here is adding all the above is not really going to
> > scale on arm32 where we have a single dts directory. We need to move
> > things to per vendor/soc family directories. I have the script to do
> > this. We just need to agree on the vendor names and get Arnd/Olof to
> > run it. I also want that so we can enable schema checks by default
> > once a vendor is warning free (the whole tree is going to take
> > forever).
>
>
> If this is a big churn, perhaps we could make it extreme
> to decouple DT and Linux-arch.

I would be fine with that, but I don't think we'll get agreement
there. With that amount of change, we'll be discussing git submodule
again.

Rereading the thread on vendor directories[1], we may just move boards
one vendor at a time. We could just make that a prerequisite for
vendor supporting overlays.

> arch/*/boot/dts/*.dts
>  ->  dts//*.dts
>
> Documentation/devicetree/bindings
>  -> dts/Bindings/
>
> include/dt-bindings/
>  -> dts/include/dt-bindings/
>
>
>
> Then, other project can take dts/
> to reuse for them.

This is already possible with devicetree-rebasing.git. Though it is
still by arch.

Rob

[1] https://lore.kernel.org/linux-devicetree/20181204183649.GA5716@bogus/


Re: Expense of read_iter

2021-01-07 Thread Mikulas Patocka



On Thu, 7 Jan 2021, Matthew Wilcox wrote:

> On Thu, Jan 07, 2021 at 08:15:41AM -0500, Mikulas Patocka wrote:
> > I'd like to ask about this piece of code in __kernel_read:
> > if (unlikely(!file->f_op->read_iter || file->f_op->read))
> > return warn_unsupported...
> > and __kernel_write:
> > if (unlikely(!file->f_op->write_iter || file->f_op->write))
> > return warn_unsupported...
> > 
> > - It exits with an error if both read_iter and read or write_iter and 
> > write are present.
> > 
> > I found out that on NVFS, reading a file with the read method has 10% 
> > better performance than the read_iter method. The benchmark just reads the 
> > same 4k page over and over again - and the cost of creating and parsing 
> > the kiocb and iov_iter structures is just that high.
> 
> Which part of it is so expensive?

The read_iter path is much bigger:
vfs_read- 0x160 bytes
new_sync_read   - 0x160 bytes
nvfs_rw_iter- 0x100 bytes
nvfs_rw_iter_locked - 0x4a0 bytes
iov_iter_advance- 0x300 bytes

If we go with the "read" method, there's just:
vfs_read- 0x160 bytes
nvfs_read   - 0x200 bytes

> Is it worth, eg adding an iov_iter
> type that points to a single buffer instead of a single-member iov?
> 
> +++ b/include/linux/uio.h
> @@ -19,6 +19,7 @@ struct kvec {
>  
>  enum iter_type {
> /* iter types */
> +   ITER_UBUF = 2,
> ITER_IOVEC = 4,
> ITER_KVEC = 8,
> ITER_BVEC = 16,
> @@ -36,6 +36,7 @@ struct iov_iter {
> size_t iov_offset;
> size_t count;
> union {
> +   void __user *buf;
> const struct iovec *iov;
> const struct kvec *kvec;
> const struct bio_vec *bvec;
> 
> and then doing all the appropriate changes to make that work.


I tried this benchmark on nvfs:

#include 
#include 
#include 

int main(void)
{
unsigned long i;
unsigned long l = 1UL << 38;
unsigned s = 4096;
void *a = valloc(s);
if (!a) perror("malloc"), exit(1);
for (i = 0; i < l; i += s) {
if (pread(0, a, s, 0) != s) perror("read"), exit(1);
}
return 0;
}


Result, using the read_iter method:

# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 3K of event 'cycles'
# Event count (approx.): 1049885560
#
# Overhead  Command  Shared Object Symbol   
#   ...    .
#
47.32%  pread[kernel.vmlinux]  [k] copy_user_generic_string
 7.83%  pread[kernel.vmlinux]  [k] current_time
 6.57%  pread[nvfs][k] nvfs_rw_iter_locked
 5.59%  pread[kernel.vmlinux]  [k] entry_SYSCALL_64
 4.23%  preadlibc-2.31.so  [.] __libc_pread
 3.51%  pread[kernel.vmlinux]  [k] syscall_return_via_sysret
 2.34%  pread[kernel.vmlinux]  [k] entry_SYSCALL_64_after_hwframe
 2.34%  pread[kernel.vmlinux]  [k] vfs_read
 2.34%  pread[kernel.vmlinux]  [k] __fsnotify_parent
 2.31%  pread[kernel.vmlinux]  [k] new_sync_read
 2.21%  pread[nvfs][k] nvfs_bmap
 1.89%  pread[kernel.vmlinux]  [k] iov_iter_advance
 1.71%  pread[kernel.vmlinux]  [k] __x64_sys_pread64
 1.59%  pread[kernel.vmlinux]  [k] atime_needs_update
 1.24%  pread[nvfs][k] nvfs_rw_iter
 0.94%  pread[kernel.vmlinux]  [k] touch_atime
 0.75%  pread[kernel.vmlinux]  [k] syscall_enter_from_user_mode
 0.72%  pread[kernel.vmlinux]  [k] ktime_get_coarse_real_ts64
 0.68%  pread[kernel.vmlinux]  [k] down_read
 0.62%  pread[kernel.vmlinux]  [k] exit_to_user_mode_prepare
 0.52%  pread[kernel.vmlinux]  [k] syscall_exit_to_user_mode
 0.49%  pread[kernel.vmlinux]  [k] syscall_exit_to_user_mode_prepare
 0.47%  pread[kernel.vmlinux]  [k] __fget_light
 0.46%  pread[kernel.vmlinux]  [k] do_syscall_64
 0.42%  preadpread [.] main
 0.33%  pread[kernel.vmlinux]  [k] up_read
 0.29%  pread[kernel.vmlinux]  [k] iov_iter_init
 0.16%  pread[kernel.vmlinux]  [k] __fdget
 0.10%  pread[kernel.vmlinux]  [k] entry_SYSCALL_64_safe_stack
 0.03%  preadpread [.] pread@plt
 0.00%  perf [kernel.vmlinux]  [k] x86_pmu_enable_all


#
# (Tip: Use --symfs  if your symbol files are in non-standard locations)
#



Result, using the read method:

# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 3K of event 'cycles'
# Event count (approx.): 1312158116
#
# Overhead  Command  Shared Object Symbol   
#   ...    .
#
60.77%  pread

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 10:47:07AM -0800, Linus Torvalds wrote:
> On Thu, Jan 7, 2021 at 10:34 AM Al Viro  wrote:
> >
> > I'm not sure it's the best approach, TBH.  How about simply
> > for (walk = head; walk; ufds += walk->len, walk = walk->next) {
> > if (copy_to_user(ufds, walk->entries,
> >  walk->len * sizeof(struct pollfd))
> > goto out_fds;
> > }
> > in there?  It's both simpler (obviously matches the copyin side) and
> > might very well be faster...
> 
> I started doing that, but ..  Nope.
> 
> It's not copying the whole entry. It's literally just modifying one
> 16-bit word in each entry.
> 
> Now, the "whole entry" is just 8 bytes, so it's possible that it would
> actually be faster to do a copy of the whole thing rather than write
> just the 16 bits. But I got very nervous about it, because I could
> easily see some threaded app actually changing the 'fd' (or the
> 'event' field) in place (ie writing -1 to it as they close and re-use
> it)

Point...  Pity, that.


Re: [RFC PATCH] x86/mce: Add ppin and microcode to mce trace

2021-01-07 Thread Steven Rostedt
On Thu,  7 Jan 2021 09:12:25 -0800
Tony Luck  wrote:

> Steven,

Hi Tony,

> 
> I've been remiss about updating the mce_record trace as new fields
> have been added to "struct mce". What are the ABI implications of a
> patch like the one below (sample only ... there are a couple more fields
> that may need to be added)?
> 
> Are there any size limitations that I might hit adding more items to
> this record?

Nope, this should be all fine. I know that rasdaemon uses libtraceevent
(actually a copy, but should now start checking if its installed on the
system and use the shared one), and that is used to parse the binary data.

> 
> Thanks
> 
> -Tony
> 
> ---
>  include/trace/events/mce.h | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
> index 1391ada0da3b..cf0e8b9920b7 100644
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -33,6 +33,8 @@ TRACE_EVENT(mce_record,
>   __field(u8, cs  )
>   __field(u8, bank)
>   __field(u8, cpuvendor   )
> + __field(u64,ppin)
> + __field(u32,microcode   )
>   ),

 # trace-cmd list -e mce_record -F
system: mce
name: mce_record
ID: 105
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   
size:1;signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:u64 mcgcap;   offset:8;   size:8; signed:0;
field:u64 mcgstatus;offset:16;  size:8; signed:0;
field:u64 status;   offset:24;  size:8; signed:0;
field:u64 addr; offset:32;  size:8; signed:0;
field:u64 misc; offset:40;  size:8; signed:0;
field:u64 synd; offset:48;  size:8; signed:0;
field:u64 ipid; offset:56;  size:8; signed:0;
field:u64 ip;   offset:64;  size:8; signed:0;
field:u64 tsc;  offset:72;  size:8; signed:0;
field:u64 walltime; offset:80;  size:8; signed:0;
field:u32 cpu;  offset:88;  size:4; signed:0;
field:u32 cpuid;offset:92;  size:4; signed:0;
field:u32 apicid;   offset:96;  size:4; signed:0;
field:u32 socketid; offset:100; size:4; signed:0;
field:u8 cs;offset:104; size:1; signed:0;
field:u8 bank;  offset:105; size:1; signed:0;
field:u8 cpuvendor; offset:106; size:1; signed:0;

The event looks to be currently 108 bytes (rounds up to the nearest 4
alignment). The sub buffer size (max event size) is 4080 bytes.

Does this help?

-- Steve



>  
>   TP_fast_assign(
> @@ -53,9 +55,11 @@ TRACE_EVENT(mce_record,
>   __entry->cs = m->cs;
>   __entry->bank   = m->bank;
>   __entry->cpuvendor  = m->cpuvendor;
> + __entry->ppin   = m->ppin;
> + __entry->microcode  = m->microcode;
>   ),
>  
> - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x",
> + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, 
> ADDR/MISC/SYND: %016Lx/%016Lx/%016Lx, RIP: %02x:<%016Lx>, TSC: %llx, 
> PROCESSOR: %u:%x, TIME: %llu, SOCKET: %u, APIC: %x, PPIN: %llx, MICROCODE: 
> %x",
>   __entry->cpu,
>   __entry->mcgcap, __entry->mcgstatus,
>   __entry->bank, __entry->status,
> @@ -66,7 +70,7 @@ TRACE_EVENT(mce_record,
>   __entry->cpuvendor, __entry->cpuid,
>   __entry->walltime,
>   __entry->socketid,
> - __entry->apicid)
> + __entry->apicid, __entry->ppin, __entry->microcode)
>  );
>  
>  #endif /* _TRACE_MCE_H */



Re: [PATCH v2] genirq: add IRQF_NO_AUTOEN for request_irq

2021-01-07 Thread Greg KH
On Tue, Jan 05, 2021 at 03:14:11PM +1300, Barry Song wrote:
> Many drivers don't want interrupts enabled automatically due to
> request_irq(). So they are handling this issue by either way of
> the below two:
> (1)
> irq_set_status_flags(irq, IRQ_NOAUTOEN);
> request_irq(dev, irq...);
> (2)
> request_irq(dev, irq...);
> disable_irq(irq);
> 
> The code in the second way is silly and unsafe. In the small time
> gap between request_irq() and disable_irq(), interrupts can still
> come.
> The code in the first way is safe though we might be able to do it
> in the generic irq code.
> 
> With this patch, drivers can request_irq with IRQF_NO_AUTOEN flag.
> They will need neither irq_set_status_flags() nor disable_irq().
> Hundreds of drivers with this problem will be handled afterwards.
> 
> Cc: Dmitry Torokhov 
> Signed-off-by: Barry Song 

Can you also convert some in-kernel drivers to this new api so that we
can see how this works?

thanks,

greg k-h


Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 06:40:58PM +, Al Viro wrote:
> do_sys_poll(): do the wholesale copyout
> 
> Don't bother with patching up just one field - 16 bits out of each 64.
> The amount of memory traffic is not going to be greater (might be
> smaller, actually) and the loop in copy_to_user() is optimized for
> bulk copy.

BTW, considering the access pattern, I would expect it to be
considerably cheaper in a lot of cases; basically, we have a copy
of userland array of 64bit values, then we do a non-trivial amount
of work and modify 16 bits out of each 64.  Then we want that
propagated back to the original array.  I suspect that copying just
those 16bit fields out is going to cost more that a bulk copy of
the entire thing, and not just on s390 and similar oddball cases.

Comments?


[PATCH v2] x86: Treat R_386_PLT32 as R_386_PC32

2021-01-07 Thread Fangrui Song
This is similar to commit b21ebf2fb4cd ("x86: Treat R_X86_64_PLT32 as
R_X86_64_PC32"), but for i386.  As far as Linux kernel is concerned,
R_386_PLT32 can be treated the same as R_386_PC32.

R_386_PC32/R_X86_64_PC32 are PC-relative relocation types with the
requirement that the symbol address is significant.
R_386_PLT32/R_X86_64_PLT32 are PC-relative relocation types without the
address significance requirement.

On x86-64, there is no PIC vs non-PIC PLT distinction and an
R_X86_64_PLT32 relocation is produced for both `call/jmp foo` and
`call/jmp foo@PLT` with newer (2018) GNU as/LLVM integrated assembler.

On i386, there are 2 types of PLTs, PIC and non-PIC. Currently the
convention is to use R_386_PC32 for non-PIC PLT and R_386_PLT32 for PIC
PLT, but R_386_PLT32 is arguably preferable for -fno-pic code as well:
this can avoid a "canonical PLT entry" (st_shndx=0, st_value!=0) if the
symbol turns out to be defined externally.

clang-12 -fno-pic (since
https://github.com/llvm/llvm-project/commit/961f31d8ad14c66829991522d73e14b5a96ff6d4)
can emit R_386_PLT32 for compiler produced symbols (if we drop
-ffreestanding for CONFIG_X86_32, library call optimization can produce
newer declarations) and future GCC -fno-pic may emit R_386_PLT32 as well
if an option like -fno-direct-access-external-data is adopted to avoid
canonical PLT entry/copy relocations.

Link: https://github.com/ClangBuiltLinux/linux/issues/1210
Link: 
https://github.com/llvm/llvm-project/commit/961f31d8ad14c66829991522d73e14b5a96ff6d4
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reported-by: Arnd Bergmann 
Signed-off-by: Fangrui Song 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Nathan Chancellor 
Tested-by: Nick Desaulniers 
Tested-by: Nathan Chancellor 

---
Change in v2:
* Improve commit message
---
 arch/x86/kernel/module.c | 1 +
 arch/x86/tools/relocs.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..5e9a34b5bd74 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -114,6 +114,7 @@ int apply_relocate(Elf32_Shdr *sechdrs,
*location += sym->st_value;
break;
case R_386_PC32:
+   case R_386_PLT32:
/* Add the value, subtract its position */
*location += sym->st_value - (uint32_t)location;
break;
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..717e48ca28b6 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -867,6 +867,7 @@ static int do_reloc32(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
@@ -910,6 +911,7 @@ static int do_reloc_real(struct section *sec, Elf_Rel *rel, 
Elf_Sym *sym,
case R_386_PC32:
case R_386_PC16:
case R_386_PC8:
+   case R_386_PLT32:
/*
 * NONE can be ignored and PC relative relocations don't
 * need to be adjusted.
-- 
2.29.2.729.g45daf8777d-goog



Re: [PATCH v4 2/2] scsi: ufs: handle LINERESET with correct tm_cmd

2021-01-07 Thread Jaegeuk Kim
On 01/07, Can Guo wrote:
> On 2021-01-07 16:46, Jaegeuk Kim wrote:
> > On 01/07, Can Guo wrote:
> > > On 2021-01-07 16:07, Jaegeuk Kim wrote:
> > > > On 01/07, Can Guo wrote:
> > > > > On 2021-01-07 15:47, Jaegeuk Kim wrote:
> > > > > > From: Jaegeuk Kim 
> > > > > >
> > > > > > This fixes a warning caused by wrong reserve tag usage in
> > > > > > __ufshcd_issue_tm_cmd.
> > > > > >
> > > > > > WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
> > > > > > blk_get_request+0x68/0x70
> > > > > > WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
> > > > > > blk_mq_get_tag+0x438/0x46c
> > > > > >
> > > > > > And, in ufshcd_err_handler(), we can avoid to send tm_cmd before
> > > > > > aborting
> > > > > > outstanding commands by waiting a bit for IO completion like this.
> > > > > >
> > > > > > __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> > > > > >
> > > > > > Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> > > > > > allocate and free TMFs")
> > > > > > Fixes: 2355b66ed20c ("scsi: ufs: Handle LINERESET indication in err
> > > > > > handler")
> > > > >
> > > > > Hi Jaegeuk,
> > > > >
> > > > > Sorry, what is wrong with commit 2355b66ed20c? Clearing pending I/O
> > > > > reqs is a general procedure for handling all non-fatal errors.
> > > >
> > > > Without waiting IOs, I hit the below timeout all the time from
> > > > LINERESET, which
> > > > causes UFS stuck permanently, as mentioned in the description.
> > > >
> > > > "__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out"
> > > 
> > > In that case, ufshcd_try_to_abort_task(), the caller of
> > > __ufshcd_issue_tm_cmd(),
> > > should return -ETIMEOUT, then err_handler would jump to do a full
> > > reset,
> > > then bail.
> > > I am not sure what gets UFS stuck permanently. Could you please
> > > share the
> > > callstack
> > > if possible? I really want to know what is happening. Thanks.
> > 
> > I can't share all the log tho, it entered full reset. While printing out
> > whole registers, the device was hard reset. Thanks,
> 
> Hi Jaegeuk,
> 
> Entering full reset is expected in this case, which is why I am saying
> line-reset handling logic should not be penalized. I think we need to
> find out what caused the hard reset but not just adding a delay before
> clearing pending reqs, because let's say 3 sec expires and you hit the
> same tm req timeout (maybe with a lower possibility), you may still end
> up same at the hard reset. You don't need to share all the log, just the
> last call stacks before hard reset. Is it a QCOM's platform used in your
> case? Can you check the log/dump if NoC error happened?

Hi Can,

I figured out it is caused by verbose kernel logs printed in terminal.
I posted v5, so could you please review it?

Thanks,

> 
> Thanks.
> Can Guo.
> 
> > 
> > > 
> > > Regards,
> > > Can Guo.
> > > 
> > > >
> > > > >
> > > > > Thanks,
> > > > > Can Guo.
> > > > >
> > > > > > Signed-off-by: Jaegeuk Kim 
> > > > > > ---
> > > > > >  drivers/scsi/ufs/ufshcd.c | 35 +++
> > > > > >  1 file changed, 31 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > > > index e6e7bdf99cd7..340dd5e515dd 100644
> > > > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > > > @@ -44,6 +44,9 @@
> > > > > >  /* Query request timeout */
> > > > > >  #define QUERY_REQ_TIMEOUT 1500 /* 1.5 seconds */
> > > > > >
> > > > > > +/* LINERESET TIME OUT */
> > > > > > +#define LINERESET_IO_TIMEOUT_MS(3) /* 30 
> > > > > > sec */
> > > > > > +
> > > > > >  /* Task management command timeout */
> > > > > >  #define TM_CMD_TIMEOUT 100 /* msecs */
> > > > > >
> > > > > > @@ -5826,6 +5829,7 @@ static void ufshcd_err_handler(struct 
> > > > > > work_struct
> > > > > > *work)
> > > > > > int err = 0, pmc_err;
> > > > > > int tag;
> > > > > > bool needs_reset = false, needs_restore = false;
> > > > > > +   ktime_t start;
> > > > > >
> > > > > > hba = container_of(work, struct ufs_hba, eh_work);
> > > > > >
> > > > > > @@ -5911,6 +5915,22 @@ static void ufshcd_err_handler(struct 
> > > > > > work_struct
> > > > > > *work)
> > > > > > }
> > > > > >
> > > > > > hba->silence_err_logs = true;
> > > > > > +
> > > > > > +   /* Wait for IO completion for non-fatal errors to avoid 
> > > > > > aborting IOs
> > > > > > */
> > > > > > +   start = ktime_get();
> > > > > > +   while (hba->outstanding_reqs) {
> > > > > > +   ufshcd_complete_requests(hba);
> > > > > > +   spin_unlock_irqrestore(hba->host->host_lock, flags);
> > > > > > +   schedule();
> > > > > > +   spin_lock_irqsave(hba->host->host_lock, flags);
> > > > > > +   if (ktime_to_ms(ktime_sub(ktime_get(), start)) >
> > > > > > +   
> > > > > > LINERESET_IO_TIMEOUT_MS) {
> > > > > > +   dev_err(hba->dev, "%s: timeout, 
> > > 

Re: [PATCH 3/5] crypto: add RFC5869 HKDF

2021-01-07 Thread Eric Biggers
On Thu, Jan 07, 2021 at 08:53:15AM +0100, Stephan Mueller wrote:
> > 
> > > RFC5869
> > > allows two optional parameters to be provided to the extract operation:
> > > the salt and additional information. Both are to be provided with the
> > > seed parameter where the salt is the first entry of the seed parameter
> > > and all subsequent entries are handled as additional information. If
> > > the caller intends to invoke the HKDF without salt, it has to provide a
> > > NULL/0 entry as first entry in seed.
> > 
> > Where does "additional information" for extract come from?  RFC 5869 has:
> > 
> > HKDF-Extract(salt, IKM) -> PRK
> > 
> > Inputs:
> >   salt optional salt value (a non-secret random value);
> >    if not provided, it is set to a string of HashLen
> > zeros.
> >   IKM  input keying material
> > 
> > There's no "additional information".
> 
> I used the terminology from SP800-108. I will update the description
> accordingly. 

For HKDF, it would be better to stick to the terminology used in RFC 5869
(https://tools.ietf.org/html/rfc5869), as generally that's what people are most
familiar with for HKDF.  It also matches the HKDF paper
(https://eprint.iacr.org/2010/264.pdf) more closely.

- Eric


[PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens

2021-01-07 Thread Jaegeuk Kim
From: Jaegeuk Kim 

When non-fatal error like line-reset happens, ufshcd_err_handler() starts to
abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm request,
we've hit two warnings.

WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 blk_mq_get_tag+0x438/0x46c

After fixing the above warnings, I've hit another tm_cmd timeout, which may be
caused by unstable controller state.

__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out

Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. It turned
out ufshcd_print_trs() printed too many messages in console which requires CPU
locks. Likewise hba->silence_err_logs, we need to avoid too verbose messages.
Actually it came from ufshcd_transfer_rsp_status() when requeuing commands back.
Indeed, this is actually not an error case, so let's fix it.

Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to allocate and 
free TMFs")
Signed-off-by: Jaegeuk Kim 
---
 drivers/scsi/ufs/ufshcd.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..2a715f13fe1d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct 
ufshcd_lrb *lrbp)
break;
} /* end of switch */
 
-   if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
+   if ((host_byte(result) != DID_OK) &&
+   (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
return result;
 }
@@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
 
-   if (enabled_intr_status && retval == IRQ_NONE) {
-   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
-   __func__, intr_status);
+   if (enabled_intr_status && retval == IRQ_NONE &&
+   !ufshcd_eh_in_progress(hba)) {
+   dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
0x%08x)\n",
+   __func__,
+   intr_status,
+   hba->ufs_stats.last_intr_status,
+   enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
 
@@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba *hba,
 * Even though we use wait_event() which sleeps indefinitely,
 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
 */
-   req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+   req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+   if (IS_ERR(req))
+   return PTR_ERR(req);
+
req->end_io_data = 
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);
-- 
2.29.2.729.g45daf8777d-goog



[PATCH v5 0/2] Two UFS fixes

2021-01-07 Thread Jaegeuk Kim
Change log from v4:
 - remove RESERVE tag for tm command
 - remove waiting IOs and let full reset handle it
 - avoid verbose error log which causes cpu lock up




[PATCH v5 1/2] scsi: ufs: fix livelock of ufshcd_clear_ua_wluns

2021-01-07 Thread Jaegeuk Kim
When gate_work/ungate_work gets an error during hibern8_enter or exit,
 ufshcd_err_handler()
   ufshcd_scsi_block_requests()
   ufshcd_reset_and_restore()
 ufshcd_clear_ua_wluns() -> stuck
   ufshcd_scsi_unblock_requests()

In order to avoid it, ufshcd_clear_ua_wluns() can be called per recovery flows
such as suspend/resume, link_recovery, and error_handler.

Fixes: 1918651f2d7e ("scsi: ufs: Clear UAC for RPMB after ufshcd resets")
Signed-off-by: Jaegeuk Kim 
Reviewed-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bedb822a40a3..e6e7bdf99cd7 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3996,6 +3996,8 @@ int ufshcd_link_recovery(struct ufs_hba *hba)
if (ret)
dev_err(hba->dev, "%s: link recovery failed, err %d",
__func__, ret);
+   else
+   ufshcd_clear_ua_wluns(hba);
 
return ret;
 }
@@ -6003,6 +6005,9 @@ static void ufshcd_err_handler(struct work_struct *work)
ufshcd_scsi_unblock_requests(hba);
ufshcd_err_handling_unprepare(hba);
up(>eh_sem);
+
+   if (!err && needs_reset)
+   ufshcd_clear_ua_wluns(hba);
 }
 
 /**
@@ -6940,14 +6945,11 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba 
*hba)
ufshcd_set_clk_freq(hba, true);
 
err = ufshcd_hba_enable(hba);
-   if (err)
-   goto out;
 
/* Establish the link again and restore the device */
-   err = ufshcd_probe_hba(hba, false);
if (!err)
-   ufshcd_clear_ua_wluns(hba);
-out:
+   err = ufshcd_probe_hba(hba, false);
+
if (err)
dev_err(hba->dev, "%s: Host init failed %d\n", __func__, err);
ufshcd_update_evt_hist(hba, UFS_EVT_HOST_RESET, (u32)err);
@@ -7718,6 +7720,8 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
if (ret)
goto out;
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Initialize devfreq after UFS device is detected */
if (ufshcd_is_clkscaling_supported(hba)) {
memcpy(>clk_scaling.saved_pwr_info.info,
@@ -7919,8 +7923,6 @@ static void ufshcd_async_scan(void *data, async_cookie_t 
cookie)
pm_runtime_put_sync(hba->dev);
ufshcd_exit_clk_scaling(hba);
ufshcd_hba_exit(hba);
-   } else {
-   ufshcd_clear_ua_wluns(hba);
}
 }
 
@@ -8777,6 +8779,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
ufshcd_resume_clkscaling(hba);
hba->clk_gating.is_suspended = false;
hba->dev_info.b_rpm_dev_flush_capable = false;
+   ufshcd_clear_ua_wluns(hba);
ufshcd_release(hba);
 out:
if (hba->dev_info.b_rpm_dev_flush_capable) {
@@ -8887,6 +8890,8 @@ static int ufshcd_resume(struct ufs_hba *hba, enum 
ufs_pm_op pm_op)
cancel_delayed_work(>rpm_dev_flush_recheck_work);
}
 
+   ufshcd_clear_ua_wluns(hba);
+
/* Schedule clock gating in case of no access to UFS device yet */
ufshcd_release(hba);
 
-- 
2.29.2.729.g45daf8777d-goog



Re: [PATCH 5/5] fs: use HKDF implementation from kernel crypto API

2021-01-07 Thread Eric Biggers
On Thu, Jan 07, 2021 at 08:49:52AM +0100, Stephan Mueller wrote:
> > > -int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
> > > +int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, u8 *master_key,
> > >   unsigned int master_key_size);
> > 
> > It shouldn't be necessary to remove const here.
> 
> Unfortunately it is when adding the pointer to struct kvec
> > 
> > >  
> > >  /*
> > > @@ -323,7 +323,7 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const
> > > u8 *master_key,
> > >  #define HKDF_CONTEXT_INODE_HASH_KEY7 /* info=   */
> > >  
> > >  int fscrypt_hkdf_expand(const struct fscrypt_hkdf *hkdf, u8 context,
> > > -   const u8 *info, unsigned int infolen,
> > > +   u8 *info, unsigned int infolen,
> > > u8 *okm, unsigned int okmlen);
> > 
> > Likewise.  In fact some callers rely on 'info' not being modified.
> 
> Same here.

If the HKDF API will have a quirk like this, it's better not to "leak" it into
the prototypes of these fscrypt functions.  Just add the needed casts in
fscrypt_init_hkdf() and fscrypt_hkdf_expand().

> > > -   err = crypto_shash_setkey(hmac_tfm, prk, sizeof(prk));
> > > +   err = crypto_hkdf_setkey(hmac_tfm, seed, ARRAY_SIZE(seed));
> > > if (err)
> > > goto err_free_tfm;
> > 
> > It's weird that the salt and key have to be passed in a kvec.
> > Why not just have normal function parameters like:
> > 
> > int crypto_hkdf_setkey(struct crypto_shash *hmac_tfm,
> >    const u8 *key, size_t keysize,
> >    const u8 *salt, size_t saltsize);
> 
> I wanted to have an identical interface for all types of KDFs to allow turning
> them into a template eventually. For example, SP800-108 KDFs only have one
> parameter. Hence the use of a kvec.

But the API being provided is a library function specifically for HKDF.
So there's no need to make it conform to some other API.

- Eric


Re: [PATCH v7 4/5] PCI: tegra: Remove platform driver support for ZRX-DC compliant PHY

2021-01-07 Thread kernel test robot
Hi Shradha,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on linus/master v5.11-rc2 next-20210104]
[cannot apply to robh/for-next linux/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Shradha-Todi/Add-support-to-handle-ZRX-DC-Compliant-PHYs/20210108-001626
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-a005-20210107 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
5c951623bc8965fa1e89660f2f5f4a2944e4981a)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/a626b81436de9ab5c0c0bc9785fbc64549ea0f3a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Shradha-Todi/Add-support-to-handle-ZRX-DC-Compliant-PHYs/20210108-001626
git checkout a626b81436de9ab5c0c0bc9785fbc64549ea0f3a
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/pci/controller/dwc/pcie-tegra194.c:2063:4: warning: variable 
>> 'phy_zrxdc_count' is uninitialized when used here [-Wuninitialized]
   phy_zrxdc_count++;
   ^~~
   drivers/pci/controller/dwc/pcie-tegra194.c:1953:30: note: initialize the 
variable 'phy_zrxdc_count' to silence this warning
   unsigned int phy_zrxdc_count;
   ^
= 0
   1 warning generated.


vim +/phy_zrxdc_count +2063 drivers/pci/controller/dwc/pcie-tegra194.c

  1948  
  1949  static int tegra_pcie_dw_probe(struct platform_device *pdev)
  1950  {
  1951  const struct tegra_pcie_dw_of_data *data;
  1952  struct device *dev = >dev;
  1953  unsigned int phy_zrxdc_count;
  1954  struct resource *atu_dma_res;
  1955  struct tegra_pcie_dw *pcie;
  1956  struct pcie_port *pp;
  1957  struct dw_pcie *pci;
  1958  struct phy **phys;
  1959  char *name;
  1960  int ret;
  1961  u32 i;
  1962  
  1963  data = of_device_get_match_data(dev);
  1964  
  1965  pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
  1966  if (!pcie)
  1967  return -ENOMEM;
  1968  
  1969  pci = >pci;
  1970  pci->dev = >dev;
  1971  pci->ops = _dw_pcie_ops;
  1972  pci->n_fts[0] = N_FTS_VAL;
  1973  pci->n_fts[1] = FTS_VAL;
  1974  pci->version = 0x490A;
  1975  
  1976  pp = >pp;
  1977  pp->num_vectors = MAX_MSI_IRQS;
  1978  pcie->dev = >dev;
  1979  pcie->mode = (enum dw_pcie_device_mode)data->mode;
  1980  
  1981  ret = tegra_pcie_dw_parse_dt(pcie);
  1982  if (ret < 0) {
  1983  const char *level = KERN_ERR;
  1984  
  1985  if (ret == -EPROBE_DEFER)
  1986  level = KERN_DEBUG;
  1987  
  1988  dev_printk(level, dev,
  1989 dev_fmt("Failed to parse device tree: %d\n"),
  1990 ret);
  1991  return ret;
  1992  }
  1993  
  1994  ret = tegra_pcie_get_slot_regulators(pcie);
  1995  if (ret < 0) {
  1996  const char *level = KERN_ERR;
  1997  
  1998  if (ret == -EPROBE_DEFER)
  1999  level = KERN_DEBUG;
  2000  
  2001  dev_printk(level, dev,
  2002 dev_fmt("Failed to get slot regulators: 
%d\n"),
  2003 ret);
  2004  return ret;
  2005  }
  2006  
  2007  if (pcie->pex_refclk_sel_gpiod)
  2008  gpiod_set_value(pcie->pex_refclk_sel_gpiod, 1);
  2009  
  2010  pcie->pex_ctl_supply = devm_regulator_get(dev, "vddio-pex-ctl");
  2011  if (IS_ERR(pcie->pex_ctl_supply)) {
  2012  ret = PTR_ERR(pcie->pex_ctl_supply);
  2013  if (ret != -EPROBE_DEFER)
  2014  dev_err(dev, "Failed to get regulator: %ld\n",
  2015  

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 10:34 AM Al Viro  wrote:
>
> I'm not sure it's the best approach, TBH.  How about simply
> for (walk = head; walk; ufds += walk->len, walk = walk->next) {
> if (copy_to_user(ufds, walk->entries,
>  walk->len * sizeof(struct pollfd))
> goto out_fds;
> }
> in there?  It's both simpler (obviously matches the copyin side) and
> might very well be faster...

I started doing that, but ..  Nope.

It's not copying the whole entry. It's literally just modifying one
16-bit word in each entry.

Now, the "whole entry" is just 8 bytes, so it's possible that it would
actually be faster to do a copy of the whole thing rather than write
just the 16 bits. But I got very nervous about it, because I could
easily see some threaded app actually changing the 'fd' (or the
'event' field) in place (ie writing -1 to it as they close and re-use
it)

The man-pages even document that only the 'revent' field is an output parameter.

So I think my patch is a _lot_ safer than your arguably simpler one,
because mine keeps the original semantics.

 Linus


[PATCH v1 0/3] cpufreq: intel_pstate: Assorted cleanups

2021-01-07 Thread Rafael J. Wysocki
Hi,

These three patches clean up intel_pstate a bit:

[1/3] makes it always use READ_ONCE() for reading hwp_cap_cached
[2/3] changes the first argument of intel_pstate_get_hwp_max()
[3/3] renames to functions (to avoid possible confusion).

Please see patch changelogs for details.

Thanks!





[PATCH v1 3/3] cpufreq: intel_pstate: Rename two functions

2021-01-07 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Rename intel_cpufreq_adjust_hwp() and intel_cpufreq_adjust_perf_ctl()
to intel_cpufreq_hwp_update() and intel_cpufreq_perf_ctl_update(),
respectively, to avoid possible confusion with the ->adjist_perf()
callback function, intel_cpufreq_adjust_perf().

Signed-off-by: Rafael J. Wysocki 
---
 drivers/cpufreq/intel_pstate.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2527,7 +2527,7 @@ static void intel_cpufreq_trace(struct c
fp_toint(cpu->iowait_boost * 100));
 }
 
-static void intel_cpufreq_adjust_hwp(struct cpudata *cpu, u32 min, u32 max,
+static void intel_cpufreq_hwp_update(struct cpudata *cpu, u32 min, u32 max,
 u32 desired, bool fast_switch)
 {
u64 prev = READ_ONCE(cpu->hwp_req_cached), value = prev;
@@ -2551,7 +2551,7 @@ static void intel_cpufreq_adjust_hwp(str
wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
 }
 
-static void intel_cpufreq_adjust_perf_ctl(struct cpudata *cpu,
+static void intel_cpufreq_perf_ctl_update(struct cpudata *cpu,
  u32 target_pstate, bool fast_switch)
 {
if (fast_switch)
@@ -2573,10 +2573,10 @@ static int intel_cpufreq_update_pstate(s
int max_pstate = policy->strict_target ?
target_pstate : cpu->max_perf_ratio;
 
-   intel_cpufreq_adjust_hwp(cpu, target_pstate, max_pstate, 0,
+   intel_cpufreq_hwp_update(cpu, target_pstate, max_pstate, 0,
 fast_switch);
} else if (target_pstate != old_pstate) {
-   intel_cpufreq_adjust_perf_ctl(cpu, target_pstate, fast_switch);
+   intel_cpufreq_perf_ctl_update(cpu, target_pstate, fast_switch);
}
 
cpu->pstate.current_pstate = target_pstate;
@@ -2674,7 +2674,7 @@ static void intel_cpufreq_adjust_perf(un
 
target_pstate = clamp_t(int, target_pstate, min_pstate, max_pstate);
 
-   intel_cpufreq_adjust_hwp(cpu, min_pstate, max_pstate, target_pstate, 
true);
+   intel_cpufreq_hwp_update(cpu, min_pstate, max_pstate, target_pstate, 
true);
 
cpu->pstate.current_pstate = target_pstate;
intel_cpufreq_trace(cpu, INTEL_PSTATE_TRACE_FAST_SWITCH, old_pstate);





[PATCH v1 2/3] cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument

2021-01-07 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

All of the callers of intel_pstate_get_hwp_max() access the struct
cpudata object that corresponds to the given CPU already and the
function itself needs to access that object (in order to update
hwp_cap_cached), so modify the code to pass a struct cpudata pointer
to it instead of the CPU number.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/cpufreq/intel_pstate.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -819,13 +819,13 @@ static struct freq_attr *hwp_cpufreq_att
NULL,
 };
 
-static void intel_pstate_get_hwp_max(unsigned int cpu, int *phy_max,
+static void intel_pstate_get_hwp_max(struct cpudata *cpu, int *phy_max,
 int *current_max)
 {
u64 cap;
 
-   rdmsrl_on_cpu(cpu, MSR_HWP_CAPABILITIES, );
-   WRITE_ONCE(all_cpu_data[cpu]->hwp_cap_cached, cap);
+   rdmsrl_on_cpu(cpu->cpu, MSR_HWP_CAPABILITIES, );
+   WRITE_ONCE(cpu->hwp_cap_cached, cap);
if (global.no_turbo || global.turbo_disabled)
*current_max = HWP_GUARANTEED_PERF(cap);
else
@@ -1213,7 +1213,7 @@ static void update_qos_request(enum freq
continue;
 
if (hwp_active)
-   intel_pstate_get_hwp_max(i, _max, _state);
+   intel_pstate_get_hwp_max(cpu, _max, _state);
else
turbo_max = cpu->pstate.turbo_pstate;
 
@@ -1723,7 +1723,7 @@ static void intel_pstate_get_cpu_pstates
if (hwp_active && !hwp_mode_bdw) {
unsigned int phy_max, current_max;
 
-   intel_pstate_get_hwp_max(cpu->cpu, _max, _max);
+   intel_pstate_get_hwp_max(cpu, _max, _max);
cpu->pstate.turbo_freq = phy_max * cpu->pstate.scaling;
cpu->pstate.turbo_pstate = phy_max;
} else {
@@ -2208,7 +2208,7 @@ static void intel_pstate_update_perf_lim
 * rather than pure ratios.
 */
if (hwp_active) {
-   intel_pstate_get_hwp_max(cpu->cpu, _max, _state);
+   intel_pstate_get_hwp_max(cpu, _max, _state);
} else {
max_state = global.no_turbo || global.turbo_disabled ?
cpu->pstate.max_pstate : cpu->pstate.turbo_pstate;
@@ -2323,7 +2323,7 @@ static void intel_pstate_verify_cpu_poli
if (hwp_active) {
int max_state, turbo_max;
 
-   intel_pstate_get_hwp_max(cpu->cpu, _max, _state);
+   intel_pstate_get_hwp_max(cpu, _max, _state);
max_freq = max_state * cpu->pstate.scaling;
} else {
max_freq = intel_pstate_get_max_freq(cpu);
@@ -2710,7 +2710,7 @@ static int intel_cpufreq_cpu_init(struct
if (hwp_active) {
u64 value;
 
-   intel_pstate_get_hwp_max(policy->cpu, _max, _state);
+   intel_pstate_get_hwp_max(cpu, _max, _state);
policy->transition_delay_us = 
INTEL_CPUFREQ_TRANSITION_DELAY_HWP;
rdmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, );
WRITE_ONCE(cpu->hwp_req_cached, value);





[PATCH v1 1/3] cpufreq: intel_pstate: Always read hwp_cap_cached with READ_ONCE()

2021-01-07 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Because intel_pstate_get_hwp_max() which updates hwp_cap_cached
may run in parallel with the readers of it, annotate all of the
read accesses to it with READ_ONCE().

Signed-off-by: Rafael J. Wysocki 
---
 drivers/cpufreq/intel_pstate.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -914,7 +914,7 @@ static void intel_pstate_hwp_offline(str
}
 
value &= ~GENMASK_ULL(31, 0);
-   min_perf = HWP_LOWEST_PERF(cpu->hwp_cap_cached);
+   min_perf = HWP_LOWEST_PERF(READ_ONCE(cpu->hwp_cap_cached));
 
/* Set hwp_max = hwp_min */
value |= HWP_MAX_PERF(min_perf);
@@ -1750,6 +1750,7 @@ static int hwp_boost_hold_time_ns = 3 *
 static inline void intel_pstate_hwp_boost_up(struct cpudata *cpu)
 {
u64 hwp_req = READ_ONCE(cpu->hwp_req_cached);
+   u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached);
u32 max_limit = (hwp_req & 0xff00) >> 8;
u32 min_limit = (hwp_req & 0xff);
u32 boost_level1;
@@ -1776,14 +1777,14 @@ static inline void intel_pstate_hwp_boos
cpu->hwp_boost_min = min_limit;
 
/* level at half way mark between min and guranteed */
-   boost_level1 = (HWP_GUARANTEED_PERF(cpu->hwp_cap_cached) + min_limit) 
>> 1;
+   boost_level1 = (HWP_GUARANTEED_PERF(hwp_cap) + min_limit) >> 1;
 
if (cpu->hwp_boost_min < boost_level1)
cpu->hwp_boost_min = boost_level1;
-   else if (cpu->hwp_boost_min < HWP_GUARANTEED_PERF(cpu->hwp_cap_cached))
-   cpu->hwp_boost_min = HWP_GUARANTEED_PERF(cpu->hwp_cap_cached);
-   else if (cpu->hwp_boost_min == HWP_GUARANTEED_PERF(cpu->hwp_cap_cached) 
&&
-max_limit != HWP_GUARANTEED_PERF(cpu->hwp_cap_cached))
+   else if (cpu->hwp_boost_min < HWP_GUARANTEED_PERF(hwp_cap))
+   cpu->hwp_boost_min = HWP_GUARANTEED_PERF(hwp_cap);
+   else if (cpu->hwp_boost_min == HWP_GUARANTEED_PERF(hwp_cap) &&
+max_limit != HWP_GUARANTEED_PERF(hwp_cap))
cpu->hwp_boost_min = max_limit;
else
return;





Re: [PATCH v7 2/5] PCI: dwc: add support to handle ZRX-DC Compliant PHYs

2021-01-07 Thread Bjorn Helgaas
Capitalize subject to match the rest of the series.

"Add support to handle .." is redundant; "Add support for ..." would
be equivalent and shorter.

But this patch actually doesn't add anything at all by itself, since
it checks pci->phy_zrxdc_compliant but never sets it.

On Thu, Jan 07, 2021 at 08:58:40PM +0530, Shradha Todi wrote:
> From: Pankaj Dubey 
> 
> Many platforms use DesignWare controller but the PHY can be different in
> different platforms. If the PHY is compliant is to ZRX-DC specification it
> helps in low power consumption during power states.

s/is to/to/

Even with that, this sentence doesn't quite parse correctly.  Do you
mean something like this?

  If the PHY is compliant to the ZRX-DC specification, it reduces
  power consumption in low power Link states.

(I assume this is related to Link power states (L0, L1, etc), not
device power states (D0, D3hot, etc)).

> If current data rate is 8.0 GT/s or higher and PHY is not compliant to
> ZRX-DC specification, then after every 100ms link should transition to
> recovery state during the low power states.

Not sure this makes sense.  If the Link is in a low power state for 10
seconds, it must transition to the Recovery state every 100ms during
that 10 seconds, i.e., 100 times?

> DesignWare controller provides GEN3_ZRXDC_NONCOMPL field in
> GEN3_RELATED_OFF to specify about ZRX-DC compliant PHY.
> 
> Platforms with ZRX-DC compliant PHY can set phy_zrxdc_compliant variable to
> specify this property to the controller.

If this is a DesignWare-generic register and the "phy-zrxdc-compliant"
property can be used by any DesignWare-based driver, why isn't the
code to look for it in the DesignWare-generic part?

Is there a link to the ZRX-DC specification you can mention somewhere
in this series?

> Signed-off-by: Anvesh Salveru 
> Signed-off-by: Pankaj Dubey 
> Signed-off-by: Shradha Todi 
> Cc: Jingoo Han 
> Cc: Gustavo Pimentel 
> Cc: Lorenzo Pieralisi 
> Cc: Rob Herring 
> Cc: Bjorn Helgaas 
> ---
>  drivers/pci/controller/dwc/pcie-designware.c | 6 ++
>  drivers/pci/controller/dwc/pcie-designware.h | 4 
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c 
> b/drivers/pci/controller/dwc/pcie-designware.c
> index 645fa18..74590c7 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -722,4 +722,10 @@ void dw_pcie_setup(struct dw_pcie *pci)
>  PCIE_PL_CHK_REG_CHK_REG_START;
>   dw_pcie_writel_dbi(pci, PCIE_PL_CHK_REG_CONTROL_STATUS, val);
>   }
> +
> + if (pci->phy_zrxdc_compliant) {
> + val = dw_pcie_readl_dbi(pci, PCIE_PORT_GEN3_RELATED);
> + val &= ~PORT_LOGIC_GEN3_ZRXDC_NONCOMPL;
> + dw_pcie_writel_dbi(pci, PCIE_PORT_GEN3_RELATED, val);
> + }
>  }
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
> b/drivers/pci/controller/dwc/pcie-designware.h
> index 0207840..8b905a2 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -74,6 +74,9 @@
>  #define PCIE_MSI_INTR0_MASK  0x82C
>  #define PCIE_MSI_INTR0_STATUS0x830
>  
> +#define PCIE_PORT_GEN3_RELATED   0x890
> +#define PORT_LOGIC_GEN3_ZRXDC_NONCOMPL   BIT(0)
> +
>  #define PCIE_PORT_MULTI_LANE_CTRL0x8C0
>  #define PORT_MLTI_UPCFG_SUPPORT  BIT(7)
>  
> @@ -273,6 +276,7 @@ struct dw_pcie {
>   u8  n_fts[2];
>   booliatu_unroll_enabled: 1;
>   boolio_cfg_atu_shared: 1;
> + boolphy_zrxdc_compliant;

I raise my eyebrows a little at "bool xx : 1".  I think it's probably
*correct*, but "unsigned int xx : 1" is the overwhelming favorite and
I doubt bool gives any advantage.

  $ git grep -E "int\s+\S+\s*:\s*1" | egrep "^\S*\.[ch]" | wc -l
  3129
  $ git grep -E "bool\s+\S+\s*:\s*1" | egrep "^\S*\.[ch]" | wc -l
  637

pcie-designware.h is the only user in drivers/pci.  But you're
following the existing style in the file, which is good.

>  };
>  
>  #define to_dw_pcie_from_pp(port) container_of((port), struct dw_pcie, pp)
> -- 
> 2.7.4
> 


Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2021-01-07 Thread Ashish Kalra
On Thu, Jan 07, 2021 at 09:26:25AM -0800, Sean Christopherson wrote:
> On Thu, Jan 07, 2021, Ashish Kalra wrote:
> > Hello Steve,
> > 
> > On Wed, Jan 06, 2021 at 05:01:33PM -0800, Steve Rutherford wrote:
> > > Avoiding an rbtree for such a small (but unstable) list seems correct.
> > > 
> > > For the unencrypted region list strategy, the only questions that I
> > > have are fairly secondary.
> > > - How should the kernel upper bound the size of the list in the face
> > > of malicious guests, but still support large guests? (Something
> > > similar to the size provided in the bitmap API would work).
> > 
> > I am thinking of another scenario, where a malicious guest can make
> > infinite/repetetive hypercalls and DOS attack the host. 
> > 
> > But probably this is a more generic issue, this can be done by any guest
> > and under any hypervisor, i don't know what kind of mitigations exist
> > for such a scenario ?
> > 
> > Potentially, the guest memory donation model can handle such an attack,
> > because in this model, the hypervisor will expect only one hypercall,
> > any repetetive hypercalls can make the hypervisor disable the guest ?
> 
> KVM doesn't need to explicitly bound its tracking structures, it just needs to
> use GFP_KERNEL_ACCOUNT when allocating kernel memory for the structures so 
> that
> the memory will be accounted to the task/process/VM.  Shadow MMU pages are the
> only exception that comes to mind; they're still accounted properly, but KVM
> also explicitly limits them for a variety of reasons.
> 
> The size of the list will naturally be bounded by the size of the guest; and
> assuming KVM proactively merges adjancent regions, that upper bound is 
> probably
> reasonably low compared to other allocations, e.g. the aforementioned MMU 
> pages.
> 
> And, using a list means a malicious guest will get automatically throttled as
> the latency of walking the list (to merge/delete existing entries) will 
> increase
> with the size of the list.

Just to add here, potentially there won't be any proactive
merging/deletion of existing entries, as the only static entries will be
initial guest MMIO regions, which are contigious guest PA ranges but not
necessarily adjacent. 

After that, as discussed before, almost all entries will be due to 
DMA I/O with respect to dma_alloc_coherent/dma_free_coherent, and all
these entries will be temporary as these DMA buffers are marked
un-encrypted and immediately marked encrypted as soon as DMA I/O is
completed, so it makes no sense to do merging of temporary entries
that will be deleted from the list immediately after being added to it.

Thanks,
Ashish


Re: [PATCH v3 10/21] x86/fpu/xstate: Update xstate save function to support dynamic xstate

2021-01-07 Thread Bae, Chang Seok


> On Jan 7, 2021, at 17:41, Liu, Jing2  wrote:
> 
> static void kvm_save_current_fpu(struct fpu *fpu)  {
> + struct fpu *src_fpu = >thread.fpu;
> +
>   /*
>* If the target FPU state is not resident in the CPU registers, just
>* memcpy() from current, else save CPU state directly to the target.
>*/
> - if (test_thread_flag(TIF_NEED_FPU_LOAD))
> - memcpy(>state, >thread.fpu.state,
> + if (test_thread_flag(TIF_NEED_FPU_LOAD)) {
> + memcpy(>state, _fpu->state,
>  fpu_kernel_xstate_min_size);
> For kvm, if we assume that it does not support dynamic features until this 
> series,
> memcpy for only fpu->state is correct. 
> I think this kind of assumption is reasonable and we only make original 
> xstate work.
> 
> - else
> + } else {
> + if (fpu->state_mask != src_fpu->state_mask)
> + fpu->state_mask = src_fpu->state_mask;
> 
> Though dynamic feature is not supported in kvm now, this function still need
> consider more things for fpu->state_mask.

Can you elaborate this? Which path might be affected by fpu->state_mask
without dynamic state supported in KVM?

> I suggest that we can set it before if...else (for both cases) and not change 
> other. 

I tried a minimum change here.  The fpu->state_mask value does not impact the
memcpy(). So, why do we need to change it for both?

Thanks,
Chang

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 06:33:58PM +, Al Viro wrote:
> On Thu, Jan 07, 2021 at 09:43:54AM -0800, Linus Torvalds wrote:
> 
> > Before, it would do the whole CLAC/STAC dance inside that loop for
> > every entry (and with that commit d55564cfc22 it would be a function
> > call, of course).
> > 
> > Can you verify that this fixes the regression (and in fact I'd expect
> > it to improve that test-case)?
> 
> I'm not sure it's the best approach, TBH.  How about simply
> for (walk = head; walk; ufds += walk->len, walk = walk->next) {
>   if (copy_to_user(ufds, walk->entries,
>walk->len * sizeof(struct pollfd))
>   goto out_fds;
> }
> in there?  It's both simpler (obviously matches the copyin side) and
> might very well be faster...

Something like

do_sys_poll(): do the wholesale copyout

Don't bother with patching up just one field - 16 bits out of each 64.
The amount of memory traffic is not going to be greater (might be
smaller, actually) and the loop in copy_to_user() is optimized for
bulk copy.

Signed-off-by: Al Viro 
---
diff --git a/fs/select.c b/fs/select.c
index ebfebdfe5c69..288633053c7f 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1011,12 +1011,9 @@ static int do_sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
fdcount = do_poll(head, , end_time);
poll_freewait();
 
-   for (walk = head; walk; walk = walk->next) {
-   struct pollfd *fds = walk->entries;
-   int j;
-
-   for (j = 0; j < walk->len; j++, ufds++)
-   if (__put_user(fds[j].revents, >revents))
+   for (walk = head; walk; ufds += walk->len, walk = walk->next) {
+   if (copy_to_user(ufds, walk->entries,
+walk->len * sizeof(struct pollfd)))
goto out_fds;
}
 


Re: [PATCH] of: property: Add device link support for interrupts

2021-01-07 Thread Rob Herring
On Wed, Jan 6, 2021 at 11:53 AM Saravana Kannan  wrote:
>
> On Sat, Jan 2, 2021 at 3:37 AM Marc Zyngier  wrote:
> >
> > On Thu, 31 Dec 2020 21:12:40 +,
> > Rob Herring  wrote:
> > >
> > > On Mon, Dec 21, 2020 at 09:30:45AM +, Marc Zyngier wrote:
> > > > On 2020-12-18 21:07, Saravana Kannan wrote:
> > > > > Add support for creating device links out of interrupts property.
> > > > >
> > > > > Cc: Marc Zyngier 
> > > > > Cc: Kevin Hilman 
> > > > > Signed-off-by: Saravana Kannan 
> > > > > ---
> > > > > Rob/Greg,
> > > > >
> > > > > This might need to go into driver-core to avoid conflict
> > > > > due to fw_devlink refactor series that merged there.
> > > > >
> > > > > Thanks,
> > > > > Saravana
> > > > >
> > > > >
> > > > >  drivers/of/property.c | 17 +
> > > > >  1 file changed, 17 insertions(+)
> > > > >
> > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > index 5f9eed79a8aa..e56a5eae0a0b 100644
> > > > > --- a/drivers/of/property.c
> > > > > +++ b/drivers/of/property.c
> > > > > @@ -1271,6 +1271,22 @@ static struct device_node
> > > > > *parse_iommu_maps(struct device_node *np,
> > > > >   return of_parse_phandle(np, prop_name, (index * 4) + 1);
> > > > >  }
> > > > >
> > > > > +static struct device_node *parse_interrupts(struct device_node *np,
> > > > > + const char *prop_name, int 
> > > > > index)
> > > > > +{
> > > > > + struct device_node *sup;
> > > > > +
> > > > > + if (strcmp(prop_name, "interrupts") || index)
> > > > > + return NULL;
> > > > > +
> > > > > + of_node_get(np);
> > > > > + while (np && !(sup = of_parse_phandle(np, "interrupt-parent", 0)))
> > > > > + np = of_get_next_parent(np);
> > > > > + of_node_put(np);
> > > > > +
> > > > > + return sup;
> > > > > +}
> > > > > +
> > > > >  static const struct supplier_bindings of_supplier_bindings[] = {
> > > > >   { .parse_prop = parse_clocks, },
> > > > >   { .parse_prop = parse_interconnects, },
> > > > > @@ -1296,6 +1312,7 @@ static const struct supplier_bindings
> > > > > of_supplier_bindings[] = {
> > > > >   { .parse_prop = parse_pinctrl6, },
> > > > >   { .parse_prop = parse_pinctrl7, },
> > > > >   { .parse_prop = parse_pinctrl8, },
> > > > > + { .parse_prop = parse_interrupts, },
> > > > >   { .parse_prop = parse_regulators, },
> > > > >   { .parse_prop = parse_gpio, },
> > > > >   { .parse_prop = parse_gpios, },
> > > >
> > > > You don't really describe what this is for so I'm only guessing
> > > > from the context. If you want to follow the interrupt hierarchy,
> > > > "interrupt-parent" isn't enough. You also need to track
> > > > things like interrupt-map, or anything that carries a phandle
> > > > to an interrupt controller.
> > >
> > > We don't need to follow the hierarchy, we just need the immediate
> > > dependencies.
> >
> > Indeed. I also wonder why this isn't just a irq_find_parent() call, TBH.
>
> Thanks Rob for explaining it.
>
> Marc, I wasn't sure if Rob would be okay with including of_irq.h here.
> Also, I'm trying to keep of/property.c independent of the framework
> code for now. The long term goal is to see if I can move out most of
> this into the frameworks. But I want to do that after I sort of some
> of the larger problems (like getting fw_devlink=on to work on all
> devices  first). Let me know if you have a strong preference for right
> now, if not, I'd rather keep property.c independent for now.
>
> I wasn't aware of interrupt-map until a few weeks ago and didn't know
> it carried phandles. I can add support for that too. There's no reason
> for all of them to go in one patch though.
>
> >
> > > But you are right that 'interrupt-map' also needs to be tracked.
> >
> > And 'interrupts-extended', while we're at it.
>
> This is already handled.
>
> > >
> > > I also noticed that we define 'interrupt-parent' as a dependency to
> > > parse, but that's wrong. The dependency is where 'interrupts' appears
> > > and where 'interrupt-parent' appears is irrelevant.
>
> No, the interrupt-parent parsing is correct and it's needed for
> interrupt controllers to probe in the right order. But
> interrupt-parent is also needs to be looked at for parsing
> "interrupts".

If you parse 'interrupts' for interrupt controllers (which in turn
will use 'interrupt-parent'), then you aren't going to need to track
'interrupt-parent' by itself.

To look at it another way, 'interrupt-parent' can appear in any
ancestor node. Which node the dts author arbitrarily decided to put it
in does not matter at all. It could be at the DT root or duplicated in
every single node with 'interrupts'. Those are logically the same. The
node(s) with the dependency are the ones with 'interrupts'.

Rob


<    1   2   3   4   5   6   7   8   9   10   >