date:20190417

Applied "ASoC: tlv320aic32x4: Remove set but not used variable 'mclk_rate'" to the asoc tree

2019-04-17 Thread Mark Brown

The patch

   ASoC: tlv320aic32x4: Remove set but not used variable 'mclk_rate'

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-5.2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 9f54872adbc2707b44eaf898defce60285aeca69 Mon Sep 17 00:00:00 2001
From: YueHaibing 
Date: Wed, 17 Apr 2019 23:01:57 +0800
Subject: [PATCH] ASoC: tlv320aic32x4: Remove set but not used variable
 'mclk_rate'

Fixes gcc '-Wunused-but-set-variable' warning:

sound/soc/codecs/tlv320aic32x4.c: In function 'aic32x4_setup_clocks':
sound/soc/codecs/tlv320aic32x4.c:669:16: warning: variable 'mclk_rate' set but 
not used [-Wunused-but-set-variable]

It is not used since introduction in
commit 96c3bb00239d ("ASoC: tlv320aic32x4: Dynamically Determine Clocking")

Signed-off-by: YueHaibing 
Signed-off-by: Mark Brown 
---
 sound/soc/codecs/tlv320aic32x4.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/sound/soc/codecs/tlv320aic32x4.c b/sound/soc/codecs/tlv320aic32x4.c
index 6edee05ff9f0..83608f386aef 100644
--- a/sound/soc/codecs/tlv320aic32x4.c
+++ b/sound/soc/codecs/tlv320aic32x4.c
@@ -684,9 +684,8 @@ static int aic32x4_setup_clocks(struct snd_soc_component 
*component,
u8 madc, nadc, mdac, ndac, max_nadc, min_mdac, max_ndac;
u8 dosr_increment;
u16 max_dosr, min_dosr;
-   unsigned long mclk_rate, adc_clock_rate, dac_clock_rate;
+   unsigned long adc_clock_rate, dac_clock_rate;
int ret;
-   struct clk *mclk;
 
struct clk_bulk_data clocks[] = {
{ .id = "pll" },
@@ -700,9 +699,6 @@ static int aic32x4_setup_clocks(struct snd_soc_component 
*component,
if (ret)
return ret;
 
-   mclk = clk_get_parent(clocks[1].clk);
-   mclk_rate = clk_get_rate(mclk);
-
if (sample_rate <= 48000) {
aosr = 128;
adc_resource_class = 6;
-- 
2.20.1

Re: [PATCH v4 08/16] locking/rwsem: Make rwsem_spin_on_owner() return owner state

2019-04-17 Thread Waiman Long

On 04/17/2019 05:00 AM, Peter Zijlstra wrote:
> On Sat, Apr 13, 2019 at 01:22:51PM -0400, Waiman Long wrote:
>> This patch modifies rwsem_spin_on_owner() to return four possible
>> values to better reflect the state of lock holder which enables us to
>> make a better decision of what to do next.
>>
>> In the special case that there is no active lock and the handoff bit
>> is set, optimistic spinning has to be stopped.
>>
>> Signed-off-by: Waiman Long 
>> ---
>>  kernel/locking/rwsem.c | 45 +++---
>>  1 file changed, 38 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index aaab546a890d..2d6850c3e77b 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -156,6 +156,11 @@ static inline bool is_rwsem_owner_spinnable(struct 
>> task_struct *owner)
>>  return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED);
>>  }
>>  
>> +static inline bool is_rwsem_owner_reader(struct task_struct *owner)
>> +{
>> +return (unsigned long)owner & RWSEM_READER_OWNED;
>> +}
> Move this and the surrounding helpers into the RWSEM_SPIN_ON_OWNER
> block, it is only used there and that way all the code is together.

OK, will do that.

Cheers,
Longman

Applied "regulator: ready_mask_table[] can be static" to the regulator tree

2019-04-17 Thread Mark Brown

The patch

   regulator: ready_mask_table[] can be static

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-5.2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 82f26185a91298a21aa33a985893dd5f8ed4c75a Mon Sep 17 00:00:00 2001
From: kbuild test robot 
Date: Tue, 16 Apr 2019 00:52:38 +0800
Subject: [PATCH] regulator: ready_mask_table[] can be static

Fixes: 6cdae8173f67 ("regulator: Add support for stm32 power regulators")
Signed-off-by: kbuild test robot 
Signed-off-by: Mark Brown 
---
 drivers/regulator/stm32-pwr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/regulator/stm32-pwr.c b/drivers/regulator/stm32-pwr.c
index e434b26d4c8b..222d593d76a2 100644
--- a/drivers/regulator/stm32-pwr.c
+++ b/drivers/regulator/stm32-pwr.c
@@ -32,7 +32,7 @@ enum {
STM32PWR_REG_NUM_REGS
 };
 
-u32 ready_mask_table[STM32PWR_REG_NUM_REGS] = {
+static u32 ready_mask_table[STM32PWR_REG_NUM_REGS] = {
[PWR_REG11] = REG_1_1_RDY,
[PWR_REG18] = REG_1_8_RDY,
[PWR_USB33] = USB_3_3_RDY,
@@ -44,7 +44,7 @@ struct stm32_pwr_reg {
u32 ready_mask;
 };
 
-int stm32_pwr_reg_is_ready(struct regulator_dev *rdev)
+static int stm32_pwr_reg_is_ready(struct regulator_dev *rdev)
 {
struct stm32_pwr_reg *priv = rdev_get_drvdata(rdev);
u32 val;
@@ -54,7 +54,7 @@ int stm32_pwr_reg_is_ready(struct regulator_dev *rdev)
return (val & priv->ready_mask);
 }
 
-int stm32_pwr_reg_is_enabled(struct regulator_dev *rdev)
+static int stm32_pwr_reg_is_enabled(struct regulator_dev *rdev)
 {
struct stm32_pwr_reg *priv = rdev_get_drvdata(rdev);
u32 val;
-- 
2.20.1

Applied "regulator: wm8350: Switch to SPDX identifier" to the regulator tree

2019-04-17 Thread Mark Brown

The patch

   regulator: wm8350: Switch to SPDX identifier

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-5.2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 87dbc5eb3cff166f8107f5be6a53db858211f23e Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Wed, 17 Apr 2019 22:16:31 +0800
Subject: [PATCH] regulator: wm8350: Switch to SPDX identifier

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/wm8350-regulator.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/regulator/wm8350-regulator.c 
b/drivers/regulator/wm8350-regulator.c
index 0eb3f3a33caa..56d6168a888d 100644
--- a/drivers/regulator/wm8350-regulator.c
+++ b/drivers/regulator/wm8350-regulator.c
@@ -1,16 +1,11 @@
-/*
- * wm8350.c  --  Voltage and current regulation for the Wolfson WM8350 PMIC
- *
- * Copyright 2007, 2008 Wolfson Microelectronics PLC.
- *
- * Author: Liam Girdwood
- * li...@wolfsonmicro.com
- *
- *  This program is free software; you can redistribute  it and/or modify it
- *  under  the terms of  the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the  License, or (at your
- *  option) any later version.
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// wm8350.c  --  Voltage and current regulation for the Wolfson WM8350 PMIC
+//
+// Copyright 2007, 2008 Wolfson Microelectronics PLC.
+//
+// Author: Liam Girdwood
+// li...@wolfsonmicro.com
 
 #include 
 #include 
-- 
2.20.1

Applied "regulator: wm8400: Switch to SPDX identifier" to the regulator tree

2019-04-17 Thread Mark Brown

The patch

   regulator: wm8400: Switch to SPDX identifier

has been applied to the regulator tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-5.2

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 362af736508134dddcc4cccd575fbb7c22c29712 Mon Sep 17 00:00:00 2001
From: Axel Lin 
Date: Wed, 17 Apr 2019 22:16:32 +0800
Subject: [PATCH] regulator: wm8400: Switch to SPDX identifier

Signed-off-by: Axel Lin 
Signed-off-by: Mark Brown 
---
 drivers/regulator/wm8400-regulator.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/regulator/wm8400-regulator.c 
b/drivers/regulator/wm8400-regulator.c
index 5ce86b92851b..6f331b51e479 100644
--- a/drivers/regulator/wm8400-regulator.c
+++ b/drivers/regulator/wm8400-regulator.c
@@ -1,16 +1,10 @@
-/*
- * Regulator support for WM8400
- *
- * Copyright 2008 Wolfson Microelectronics PLC.
- *
- * Author: Mark Brown 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2 of the
- * License, or (at your option) any later version.
- *
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Regulator support for WM8400
+//
+// Copyright 2008 Wolfson Microelectronics PLC.
+//
+// Author: Mark Brown 
 
 #include 
 #include 
-- 
2.20.1

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Casey Schaufler


On 4/17/2019 9:27 AM, Oleg Nesterov wrote:

On 04/17, Paul Moore wrote:

On Wed, Apr 17, 2019 at 10:57 AM Oleg Nesterov  wrote:

On 04/17, Paul Moore wrote:

I'm tempted to simply return an error in selinux_setprocattr() if
the task's credentials are not the same as its real_cred;

What about other modules? I have no idea what smack_setprocattr() is,
but it too does prepare_creds/commit creds.

it seems that the simplest workaround should simply add the additional
cred == real_cred into proc_pid_attr_write().

Yes, that is simple, but I worry about what other LSMs might want to
do.  While I believe failing if the effective creds are not the same
as the real_creds is okay for SELinux (possibly Smack too), I worry
about what other LSMs may want to do.  After all,
proc_pid_attr_write() doesn't change the the creds itself, that is
something the specific LSMs do.

Yes, but if proc_pid_attr_write() is called with cred != real_cred then
something is already wrong?

In fact, I think that something is already wrong if it is not called by
user-space directly. Too late to ask, but why is this /proc/self/attr/
magic not implemented via syscall(s) ?


Shell scripts, for one thing. It's a straightforward and appropriate
use of the /proc interface. System calls would require additional change
to existing programs, whereas using the /proc interface allows a good
deal to be done in the containing scripts.

Re: [PATCH 4.19 000/101] 4.19.35-stable review

2019-04-17 Thread Bharath Vedartham

On Wed, Apr 17, 2019 at 08:16:15AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Apr 17, 2019 at 08:15:56AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 17, 2019 at 03:46:09AM +0530, Bharath Vedartham wrote:
> > > Compiled and Booted(defconfig) on my x86 machine. No dmesg regressions.
> > 
> > Thanks for testing 2 of these and letting me know.
> 
> Oops, you tested 3, thanks for that :)
eager to do more!

Re: [PATCH v4 07/16] locking/rwsem: Implement lock handoff to prevent lock starvation

2019-04-17 Thread Waiman Long

On 04/17/2019 04:05 AM, Peter Zijlstra wrote:
> On Tue, Apr 16, 2019 at 02:16:11PM -0400, Waiman Long wrote:
>
 @@ -608,56 +687,63 @@ __rwsem_down_write_failed_common(struct rw_semaphore 
 *sem, int state)
 */
waiter.task = current;
waiter.type = RWSEM_WAITING_FOR_WRITE;
 +  waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
  
raw_spin_lock_irq(>wait_lock);
  
/* account for this before adding a new element to the list */
 +  wstate = list_empty(>wait_list) ? WRITER_FIRST : WRITER_NOT_FIRST;
  
list_add_tail(, >wait_list);
  
/* we're now waiting on the lock */
 +  if (wstate == WRITER_NOT_FIRST) {
count = atomic_long_read(>count);
  
/*
 +   * If there were already threads queued before us and:
 +   *  1) there are no no active locks, wake the front
 +   * queued process(es) as the handoff bit might be set.
 +   *  2) there are no active writers and some readers, the lock
 +   * must be read owned; so we try to wake any read lock
 +   * waiters that were queued ahead of us.
 */
 +  if (!RWSEM_COUNT_LOCKED(count))
 +  __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, _q);
 +  else if (!(count & RWSEM_WRITER_MASK) &&
 +(count & RWSEM_READER_MASK))
__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, _q);
>>> Does the above want to be something like:
>>>
>>> if (!(count & RWSEM_WRITER_LOCKED)) {
>>> __rwsem_mark_wake(sem, (count & RWSEM_READER_MASK) ?
>>>RWSEM_WAKE_READERS :
>>>RWSEM_WAKE_ANY, _q);
>>> }
>> Yes.
>>
 +  else
 +  goto wait;
  
 +  /*
 +   * The wakeup is normally called _after_ the wait_lock
 +   * is released, but given that we are proactively waking
 +   * readers we can deal with the wake_q overhead as it is
 +   * similar to releasing and taking the wait_lock again
 +   * for attempting rwsem_try_write_lock().
 +   */
 +  wake_up_q(_q);
>>> Hurmph.. the reason we do wake_up_q() outside of wait_lock is such that
>>> those tasks don't bounce on wait_lock. Also, it removes a great deal of
>>> hold-time from wait_lock.
>>>
>>> So I'm not sure I buy your argument here.
>>>
>> Actually, we don't want to release the wait_lock, do wake_up_q() and
>> acquire the wait_lock again as the state would have been changed. I
>> didn't change the comment on this patch, but will reword it to discuss that.
> I don't understand, we've queued ourselves, we're on the list, we're not
> first. How would dropping the lock to try and kick waiters before us be
> a problem?
>
> Sure, once we re-acquire the lock we have to re-avaluate @wstate to see
> if we're first now or not, but we need to do that anyway.
>
> So what is wrong with the below?
>
> --- a/include/linux/sched/wake_q.h
> +++ b/include/linux/sched/wake_q.h
> @@ -51,6 +51,11 @@ static inline void wake_q_init(struct wa
>   head->lastp = >first;
>  }
>  
> +static inline bool wake_q_empty(struct wake_q_head *head)
> +{
> + return head->first == WAKE_Q_TAIL;
> +}
> +
>  extern void wake_q_add(struct wake_q_head *head, struct task_struct *task);
>  extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct 
> *task);
>  extern void wake_up_q(struct wake_q_head *head);
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -700,25 +700,22 @@ __rwsem_down_write_failed_common(struct
>* must be read owned; so we try to wake any read lock
>* waiters that were queued ahead of us.
>*/
> - if (!(count & RWSEM_LOCKED_MASK))
> - __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, _q);
> - else if (!(count & RWSEM_WRITER_MASK) &&
> - (count & RWSEM_READER_MASK))
> - __rwsem_mark_wake(sem, RWSEM_WAKE_READERS, _q);
> - else
> + if (count & RWSEM_WRITER_LOCKED)
>   goto wait;
> - /*
> -  * The wakeup is normally called _after_ the wait_lock
> -  * is released, but given that we are proactively waking
> -  * readers we can deal with the wake_q overhead as it is
> -  * similar to releasing and taking the wait_lock again
> -  * for attempting rwsem_try_write_lock().
> -  */
> - wake_up_q(_q);
> - /*
> -  * Reinitialize wake_q after use.
> -  */
> - wake_q_init(_q);
> +
> + __rwsem_mark_wake(sem, (count & RWSEM_READER_MASK) ?
> +

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Michal Hocko

On Wed 17-04-19 09:37:39, Keith Busch wrote:
> On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> > On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > > On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > > > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > > > Keith Busch had a set of patches to let you specify the demotion order
> > > > > via sysfs for fun.  The rules we came up with were:
> > > > 
> > > > I am not a fan of any sysfs "fun"
> > > 
> > > I'm hung up on the user facing interface, but there should be some way a
> > > user decides if a memory node is or is not a migrate target, right?
> > 
> > Why? Or to put it differently, why do we have to start with a user
> > interface at this stage when we actually barely have any real usecases
> > out there?
> 
> The use case is an alternative to swap, right? The user has to decide
> which storage is the swap target, so operating in the same spirit.

I do not follow. If you use rebalancing you can still deplete the memory
and end up in a swap storage. If you want to reclaim/swap rather than
rebalance then you do not enable rebalancing (by node_reclaim or similar
mechanism).

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] regulator: stm32-pwr: Staticize local symbols

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 03:59:42PM +0800, Axel Lin wrote:
> These symbols are only used by this driver, make them static.

Someone already sent a patch for this.


signature.asc
Description: PGP signature

Re: [PATCH -next] regulator: stm32-pwr: Make some symbols static

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 02:31:12AM +, Wei Yongjun wrote:
> Fixes the following sparse warnings:
> 
> drivers/regulator/stm32-pwr.c:35:5: warning:
>  symbol 'ready_mask_table' was not declared. Should it be static?
> drivers/regulator/stm32-pwr.c:47:5: warning:
>  symbol 'stm32_pwr_reg_is_ready' was not declared. Should it be static?
> drivers/regulator/stm32-pwr.c:57:5: warning:
>  symbol 'stm32_pwr_reg_is_enabled' was not declared. Should it be static?

Someone already sent a patch for this.


signature.asc
Description: PGP signature

Re: [PATCH] regulator: stm32-pwr: Fix error checking for of_iomap

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 04:43:35PM +0800, Axel Lin wrote:
> of_iomap returns NULL on error.

Someone already sent a patch for this.


signature.asc
Description: PGP signature

Re: [PATCH v4 07/16] locking/rwsem: Implement lock handoff to prevent lock starvation

2019-04-17 Thread Waiman Long

On 04/17/2019 03:35 AM, Peter Zijlstra wrote:
> On Tue, Apr 16, 2019 at 02:16:11PM -0400, Waiman Long wrote:
>
 @@ -324,6 +364,12 @@ static void __rwsem_mark_wake(struct rw_semaphore 
 *sem,
adjustment -= RWSEM_FLAG_WAITERS;
}
  
 +  /*
 +   * Clear the handoff flag
 +   */
>>> Right, but that is a trivial comment in the 'increment i' style, it
>>> clearly states what the code does, but completely fails to elucidate the
>>> code.
>>>
>>> Maybe:
>>>
>>> /*
>>>  * When we've woken a reader, we no longer need to force writers
>>>  * to give up the lock and we can clear HANDOFF.
>>>  */
>>>
>>> And I suppose this is required if we were the pickup of the handoff set
>>> above, but is there a guarantee that the HANDOFF was not set by a
>>> writer?
>> I can change the comment. The handoff bit is always cleared in
>> rwsem_try_write_lock() when the lock is successfully acquire. Will add a
>> comment to document that.
> That doesn't help much, because it drops ->wait_lock between setting it
> and acquiring it. So the read-acquire can interleave.
>
> I _think_ it works, but I'm having trouble explaining how exactly. I
> think because readers don't spin yet and thus wakeups abide by queue
> order.
>
> And the other way around should have (write) spinners terminate the
> moment they see HANDOFF set by a readers, but I'm not immediately seeing
> that either.
>
> I'll continue staring at that.
>
All writers acquire the lock by cmpxchg and they did check for the
handoff bit before attempting to acquire. So there is no way for write
spinners to acquire after they see the handoff bit.

Cheers,
Longman

Re: [PATCH v4] platform: chrome: Add ChromeOS EC ISHTP driver

2019-04-17 Thread Jett Rink

Reviewed-by: Jett Rink 
Tested-by: Jett Rink 

On Wed, Apr 17, 2019 at 10:27 AM Srinivas Pandruvada
 wrote:
>
> On Wed, 2019-04-17 at 15:48 +0530, Rushikesh S Kadam wrote:
> > This driver implements a slim layer to enable the ChromeOS
> > EC kernel stack (cros_ec) to communicate with ChromeOS EC
> > firmware running on the Intel Integrated Sensor Hub (ISH).
> >
> > The driver registers a ChromeOS EC MFD device to connect
> > with cros_ec kernel stack (upper layer), and it registers a
> > client with the ISH Transport Protocol bus (lower layer) to
> > talk with the ISH firwmare. See description of the ISHTP
> > protocol at Documentation/hid/intel-ish-hid.txt
> >
> > Signed-off-by: Rushikesh S Kadam 
> Acked-by: Srinivas Pandruvada 
>
> I think you have some reviewed and tested by.
>
> Also copy to Jiri, as this may have to go via HID pull as this has
> dependency.
>
> Thanks,
> Srinivas
>
> > ---
> > The patches are baselined to hid git tree, branch for-5.2/ish
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git/log/?h=for-5.2/ish
> >
> > v4
> >  - Coding style related changes. No functional changes. Addresses
> >review comments on v3.
> >
> > v3
> >  - Made several changes to improve code readability. Replaced
> >multiple cl_data_to_dev(client_data) with dev variable. Use
> >reverse Xmas tree for variable defintion where it made sense.
> >Dropped few debug prints. Add docstring for function
> >prepare_cros_ec_rx().
> >  - Fix code in function prepare_cros_ec_rx() under label
> >end_cros_ec_dev_init_error.
> >  - Recycle buffer in process_recv() on failing to obtain the
> >semaphore.
> >  - Increase ISHTP TX/RX ring buffer size to 8.
> >  - Alphabetically ordered CROS_EC_ISHTP entries in Makefile and
> >Kconfig.
> >  - Updated commit message.
> >
> > v2
> >  - Dropped unused "reset" parameter in function cros_ec_init()
> >  - Change driver name to cros_ec_ishtp to be consistent with other
> >references in the code.
> >  - Fixed a few typos.
> >
> > v1
> >  - Initial version
> >
> >  drivers/platform/chrome/Kconfig |  13 +
> >  drivers/platform/chrome/Makefile|   1 +
> >  drivers/platform/chrome/cros_ec_ishtp.c | 763
> > 
> >  3 files changed, 777 insertions(+)
> >  create mode 100644 drivers/platform/chrome/cros_ec_ishtp.c
> >
> > diff --git a/drivers/platform/chrome/Kconfig
> > b/drivers/platform/chrome/Kconfig
> > index 16b1615..5848179 100644
> > --- a/drivers/platform/chrome/Kconfig
> > +++ b/drivers/platform/chrome/Kconfig
> > @@ -62,6 +62,19 @@ config CROS_EC_I2C
> > a checksum. Failing accesses will be retried three times to
> > improve reliability.
> >
> > +config CROS_EC_ISHTP
> > + tristate "ChromeOS Embedded Controller (ISHTP)"
> > + depends on MFD_CROS_EC
> > + depends on INTEL_ISH_HID
> > + help
> > +   If you say Y here, you get support for talking to the
> > ChromeOS EC
> > +   firmware running on Intel Integrated Sensor Hub (ISH), using
> > the
> > +   ISH Transport protocol (ISH-TP). This uses a simple byte-
> > level
> > +   protocol with a checksum.
> > +
> > +   To compile this driver as a module, choose M here: the
> > +   module will be called cros_ec_ishtp.
> > +
> >  config CROS_EC_SPI
> >   tristate "ChromeOS Embedded Controller (SPI)"
> >   depends on MFD_CROS_EC && SPI
> > diff --git a/drivers/platform/chrome/Makefile
> > b/drivers/platform/chrome/Makefile
> > index cd591bf..4efe102 100644
> > --- a/drivers/platform/chrome/Makefile
> > +++ b/drivers/platform/chrome/Makefile
> > @@ -7,6 +7,7 @@ cros_ec_ctl-objs  :=
> > cros_ec_sysfs.o cros_ec_lightbar.o \
> >  cros_ec_vbc.o
> > cros_ec_debugfs.o
> >  obj-$(CONFIG_CROS_EC_CTL)+= cros_ec_ctl.o
> >  obj-$(CONFIG_CROS_EC_I2C)+= cros_ec_i2c.o
> > +obj-$(CONFIG_CROS_EC_ISHTP)  += cros_ec_ishtp.o
> >  obj-$(CONFIG_CROS_EC_SPI)+= cros_ec_spi.o
> >  cros_ec_lpcs-objs:= cros_ec_lpc.o
> > cros_ec_lpc_reg.o
> >  cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC)   += cros_ec_lpc_mec.o
> > diff --git a/drivers/platform/chrome/cros_ec_ishtp.c
> > b/drivers/platform/chrome/cros_ec_ishtp.c
> > new file mode 100644
> > index 000..997503d
> > --- /dev/null
> > +++ b/drivers/platform/chrome/cros_ec_ishtp.c
> > @@ -0,0 +1,763 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// ISHTP interface for ChromeOS Embedded Controller
> > +//
> > +// Copyright (c) 2019, Intel Corporation.
> > +//
> > +// ISHTP client driver for talking to the Chrome OS EC firmware
> > running
> > +// on Intel Integrated Sensor Hub (ISH) using the ISH Transport
> > protocol
> > +// (ISH-TP).
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/*
> > + * ISH TX/RX ring buffer pool size
> > + *
> > + * The AP->ISH messages and

Re: [PATCH v6 10/12] soc: ti: Add MSI domain bus support for Interrupt Aggregator

2019-04-17 Thread Marc Zyngier

On 10/04/2019 05:13, Lokesh Vutla wrote:
> With the system coprocessor managing the range allocation of the
> inputs to Interrupt Aggregator, it is difficult to represent
> the device IRQs from DT.
> 
> The suggestion is to use MSI in such cases where devices wants
> to allocate and group interrupts dynamically.
> 
> Create a MSI domain bus layer that allocates and frees MSIs for
> a device.
> 
> APIs that are implemented:
> - ti_sci_inta_msi_create_irq_domain() that creates a MSI domain
> - ti_sci_inta_msi_domain_alloc_irqs() that creates MSIs for the
>   specified device and resource.
> - ti_sci_inta_msi_domain_free_irqs() frees the irqs attached to the device.
> - ti_sci_inta_msi_get_virq() for getting the virq attached to a specific 
> event.
> 
> Signed-off-by: Lokesh Vutla 
> ---
> Changes since v5:
> - Updated the input parametes to all apis
> - Updated the default chip ops.
> - Prefixed all the apis with ti_sci_inta_
> 
> Marc,
>   Right now ti_sci_resource is being passed for irq allocatons.
> I couldn't get to use resources attached to platform_device. Because
> platform_device resources are allocated in of_device_alloc() and number
> of resources are fixed in it. In order to update the resources, driver
> has to do a krealloc(pdev->resources) and update the num of resources.
> Is it allowed to update the pdev->resources during probe time? If yes,
> Ill be happy to update the patch to use platform_dev resources.

My suggestion was for you to define your own bus, device type and co
(much like the fsl-mc stuff), and not reuse platform devices at all.

> 
> 
>  MAINTAINERS|   2 +
>  drivers/soc/ti/Kconfig |   6 +
>  drivers/soc/ti/Makefile|   1 +
>  drivers/soc/ti/ti_sci_inta_msi.c   | 167 +
>  include/linux/irqdomain.h  |   1 +
>  include/linux/msi.h|   6 +
>  include/linux/soc/ti/ti_sci_inta_msi.h |  23 
>  7 files changed, 206 insertions(+)
>  create mode 100644 drivers/soc/ti/ti_sci_inta_msi.c
>  create mode 100644 include/linux/soc/ti/ti_sci_inta_msi.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ba88b3033fe4..dd31d7cb2fc6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15353,6 +15353,8 @@ F:
> Documentation/devicetree/bindings/interrupt-controller/ti,sci-intr.txt
>  F:   Documentation/devicetree/bindings/interrupt-controller/ti,sci-inta.txt
>  F:   drivers/irqchip/irq-ti-sci-intr.c
>  F:   drivers/irqchip/irq-ti-sci-inta.c
> +F:   include/linux/soc/ti/ti_sci_inta_msi.h
> +F:   drivers/soc/ti/ti_sci_inta_msi.c
>  
>  Texas Instruments ASoC drivers
>  M:   Peter Ujfalusi 
> diff --git a/drivers/soc/ti/Kconfig b/drivers/soc/ti/Kconfig
> index be4570baad96..82f110fe4953 100644
> --- a/drivers/soc/ti/Kconfig
> +++ b/drivers/soc/ti/Kconfig
> @@ -73,4 +73,10 @@ config TI_SCI_PM_DOMAINS
> called ti_sci_pm_domains. Note this is needed early in boot before
> rootfs may be available.
>  
> +config TI_SCI_INTA_MSI_DOMAIN
> + bool
> + select GENERIC_MSI_IRQ_DOMAIN
> + help
> +   Driver to enable Interrupt Aggregator specific MSI Domain.
> +
>  endif # SOC_TI
> diff --git a/drivers/soc/ti/Makefile b/drivers/soc/ti/Makefile
> index a22edc0b258a..b3868d392d4f 100644
> --- a/drivers/soc/ti/Makefile
> +++ b/drivers/soc/ti/Makefile
> @@ -8,3 +8,4 @@ obj-$(CONFIG_KEYSTONE_NAVIGATOR_DMA)  += knav_dma.o
>  obj-$(CONFIG_AMX3_PM)+= pm33xx.o
>  obj-$(CONFIG_WKUP_M3_IPC)+= wkup_m3_ipc.o
>  obj-$(CONFIG_TI_SCI_PM_DOMAINS)  += ti_sci_pm_domains.o
> +obj-$(CONFIG_TI_SCI_INTA_MSI_DOMAIN) += ti_sci_inta_msi.o
> diff --git a/drivers/soc/ti/ti_sci_inta_msi.c 
> b/drivers/soc/ti/ti_sci_inta_msi.c
> new file mode 100644
> index ..247a5e5f216b
> --- /dev/null
> +++ b/drivers/soc/ti/ti_sci_inta_msi.c
> @@ -0,0 +1,167 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Texas Instruments' K3 Interrupt Aggregator MSI bus
> + *
> + * Copyright (C) 2018-2019 Texas Instruments Incorporated - 
> http://www.ti.com/
> + *   Lokesh Vutla 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Alphabetical ordering, please.

> +#include 
> +#include 
> +
> +static void ti_sci_inta_msi_write_msg(struct irq_data *data,
> +   struct msi_msg *msg)
> +{
> + /* Nothing to do */
> +}
> +
> +static void ti_sci_inta_msi_compose_msi_msg(struct irq_data *data,
> + struct msi_msg *msg)
> +{
> + /* Nothing to do */
> +}
> +
> +static int ti_sci_inta_msi_request_resources(struct irq_data *data)
> +{
> + data = data->parent_data;
> +
> + return data->chip->irq_request_resources(data);
> +}
> +
> +static void ti_sci_inta_msi_release_resources(struct irq_data *data)
> +{
> + data = data->parent_data;
> + data->chip->irq_release_resources(data);
> +}

The two functions above are an

Re: [PATCH v4] platform: chrome: Add ChromeOS EC ISHTP driver

2019-04-17 Thread Srinivas Pandruvada

On Wed, 2019-04-17 at 15:48 +0530, Rushikesh S Kadam wrote:
> This driver implements a slim layer to enable the ChromeOS
> EC kernel stack (cros_ec) to communicate with ChromeOS EC
> firmware running on the Intel Integrated Sensor Hub (ISH).
> 
> The driver registers a ChromeOS EC MFD device to connect
> with cros_ec kernel stack (upper layer), and it registers a
> client with the ISH Transport Protocol bus (lower layer) to
> talk with the ISH firwmare. See description of the ISHTP
> protocol at Documentation/hid/intel-ish-hid.txt
> 
> Signed-off-by: Rushikesh S Kadam 
Acked-by: Srinivas Pandruvada 

I think you have some reviewed and tested by.

Also copy to Jiri, as this may have to go via HID pull as this has
dependency.

Thanks,
Srinivas

> ---
> The patches are baselined to hid git tree, branch for-5.2/ish
> 
https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git/log/?h=for-5.2/ish
> 
> v4
>  - Coding style related changes. No functional changes. Addresses
>review comments on v3.
> 
> v3
>  - Made several changes to improve code readability. Replaced
>multiple cl_data_to_dev(client_data) with dev variable. Use
>reverse Xmas tree for variable defintion where it made sense.
>Dropped few debug prints. Add docstring for function
>prepare_cros_ec_rx().
>  - Fix code in function prepare_cros_ec_rx() under label
>end_cros_ec_dev_init_error.
>  - Recycle buffer in process_recv() on failing to obtain the
>semaphore.
>  - Increase ISHTP TX/RX ring buffer size to 8.
>  - Alphabetically ordered CROS_EC_ISHTP entries in Makefile and
>Kconfig.
>  - Updated commit message.
> 
> v2
>  - Dropped unused "reset" parameter in function cros_ec_init()
>  - Change driver name to cros_ec_ishtp to be consistent with other
>references in the code.
>  - Fixed a few typos. 
> 
> v1
>  - Initial version
> 
>  drivers/platform/chrome/Kconfig |  13 +
>  drivers/platform/chrome/Makefile|   1 +
>  drivers/platform/chrome/cros_ec_ishtp.c | 763
> 
>  3 files changed, 777 insertions(+)
>  create mode 100644 drivers/platform/chrome/cros_ec_ishtp.c
> 
> diff --git a/drivers/platform/chrome/Kconfig
> b/drivers/platform/chrome/Kconfig
> index 16b1615..5848179 100644
> --- a/drivers/platform/chrome/Kconfig
> +++ b/drivers/platform/chrome/Kconfig
> @@ -62,6 +62,19 @@ config CROS_EC_I2C
> a checksum. Failing accesses will be retried three times to
> improve reliability.
>  
> +config CROS_EC_ISHTP
> + tristate "ChromeOS Embedded Controller (ISHTP)"
> + depends on MFD_CROS_EC
> + depends on INTEL_ISH_HID
> + help
> +   If you say Y here, you get support for talking to the
> ChromeOS EC
> +   firmware running on Intel Integrated Sensor Hub (ISH), using
> the
> +   ISH Transport protocol (ISH-TP). This uses a simple byte-
> level
> +   protocol with a checksum.
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called cros_ec_ishtp.
> +
>  config CROS_EC_SPI
>   tristate "ChromeOS Embedded Controller (SPI)"
>   depends on MFD_CROS_EC && SPI
> diff --git a/drivers/platform/chrome/Makefile
> b/drivers/platform/chrome/Makefile
> index cd591bf..4efe102 100644
> --- a/drivers/platform/chrome/Makefile
> +++ b/drivers/platform/chrome/Makefile
> @@ -7,6 +7,7 @@ cros_ec_ctl-objs  :=
> cros_ec_sysfs.o cros_ec_lightbar.o \
>  cros_ec_vbc.o
> cros_ec_debugfs.o
>  obj-$(CONFIG_CROS_EC_CTL)+= cros_ec_ctl.o
>  obj-$(CONFIG_CROS_EC_I2C)+= cros_ec_i2c.o
> +obj-$(CONFIG_CROS_EC_ISHTP)  += cros_ec_ishtp.o
>  obj-$(CONFIG_CROS_EC_SPI)+= cros_ec_spi.o
>  cros_ec_lpcs-objs:= cros_ec_lpc.o
> cros_ec_lpc_reg.o
>  cros_ec_lpcs-$(CONFIG_CROS_EC_LPC_MEC)   += cros_ec_lpc_mec.o
> diff --git a/drivers/platform/chrome/cros_ec_ishtp.c
> b/drivers/platform/chrome/cros_ec_ishtp.c
> new file mode 100644
> index 000..997503d
> --- /dev/null
> +++ b/drivers/platform/chrome/cros_ec_ishtp.c
> @@ -0,0 +1,763 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// ISHTP interface for ChromeOS Embedded Controller
> +//
> +// Copyright (c) 2019, Intel Corporation.
> +//
> +// ISHTP client driver for talking to the Chrome OS EC firmware
> running
> +// on Intel Integrated Sensor Hub (ISH) using the ISH Transport
> protocol
> +// (ISH-TP).
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * ISH TX/RX ring buffer pool size
> + *
> + * The AP->ISH messages and corresponding ISH->AP responses are
> + * serialized. We need 1 TX and 1 RX buffer for these.
> + *
> + * The MKBP ISH->AP events are serialized. We need one additional RX
> + * buffer for them.
> + */
> +#define CROS_ISH_CL_TX_RING_SIZE 8
> +#define CROS_ISH_CL_RX_RING_SIZE 8
> +
> +/* ISH CrOS EC Host Commands */
> +enum cros_ec_ish_channel {
>

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Oleg Nesterov

On 04/17, Paul Moore wrote:
>
> On Wed, Apr 17, 2019 at 10:57 AM Oleg Nesterov  wrote:
> > On 04/17, Paul Moore wrote:
> > >
> > > I'm tempted to simply return an error in selinux_setprocattr() if
> > > the task's credentials are not the same as its real_cred;
> >
> > What about other modules? I have no idea what smack_setprocattr() is,
> > but it too does prepare_creds/commit creds.
> >
> > it seems that the simplest workaround should simply add the additional
> > cred == real_cred into proc_pid_attr_write().
>
> Yes, that is simple, but I worry about what other LSMs might want to
> do.  While I believe failing if the effective creds are not the same
> as the real_creds is okay for SELinux (possibly Smack too), I worry
> about what other LSMs may want to do.  After all,
> proc_pid_attr_write() doesn't change the the creds itself, that is
> something the specific LSMs do.

Yes, but if proc_pid_attr_write() is called with cred != real_cred then
something is already wrong?

In fact, I think that something is already wrong if it is not called by
user-space directly. Too late to ask, but why is this /proc/self/attr/
magic not implemented via syscall(s) ?

But, Paul, this is up to you. I don't understand this all even remotely.

Oleg.

Re: [PATCH] asm/io: Correct output operand specification of the MMIO write* routines

2019-04-17 Thread Linus Torvalds

On Wed, Apr 17, 2019 at 1:50 AM Borislav Petkov  wrote:
>
> I'm looking at
>
>   c1f64a58003f ("x86: MMIO and gcc re-ordering issue")
>
> and trying to figure out was there any particular reason the address to
> the MMIO write routines had to be an input operand?

It doesn't have to be an input operand, but as long as it's a "asm
volatile" it simply doesn't matter, and it won't be re-ordered or
optimized wrt other mmio accesses (that are also "asm volatile").

The memory clobber we have is to make sure that it's not re-ordered
with non-mmio accesses to other addresses (and thats' true for reads
_or_ writes, so both mmio read and mmio write have the memory
clobber).

So changing the input "m" to an output "+m" simply shouldn't matter.
There's no upside. You can't remove the memory clobber anyway, and you
can't remove the "asm volatile".

The "__" versions lack the memory clobber and aren't ordered wrt
normal memory (but are ordered wrt other mmio due to the "asm
volaile").

So I see no upside to changing it.

  Linus

[PATCH] ASoC: hdmi-codec: fix spelling mistake "plalform" -> "platform"

2019-04-17 Thread Colin King

From: Colin Ian King 

There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King 
---
 sound/soc/codecs/hdmi-codec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/hdmi-codec.c b/sound/soc/codecs/hdmi-codec.c
index 35df73e42cbc..b9d9dde9fbaf 100644
--- a/sound/soc/codecs/hdmi-codec.c
+++ b/sound/soc/codecs/hdmi-codec.c
@@ -757,7 +757,7 @@ static int hdmi_codec_probe(struct platform_device *pdev)
dev_dbg(dev, "%s()\n", __func__);
 
if (!hcd) {
-   dev_err(dev, "%s: No plalform data\n", __func__);
+   dev_err(dev, "%s: No platform data\n", __func__);
return -EINVAL;
}
 
-- 
2.20.1

Re: [PATCH v4 07/16] locking/rwsem: Implement lock handoff to prevent lock starvation

2019-04-17 Thread Waiman Long

On 04/17/2019 03:13 AM, Peter Zijlstra wrote:
> On Tue, Apr 16, 2019 at 05:07:26PM -0400, Waiman Long wrote:
>
>> Thinking about it again. I think I will just change its definition to
>> "((HZ + 249)/250)" for now to make sure that it is at least 1. The
> DIV_ROUND_UP()

Sure.

Thanks,
Longman

Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

2019-04-17 Thread Nicolas Dufresne

Le mercredi 17 avril 2019 à 17:40 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> On Wed, 2019-04-17 at 11:30 -0400, Nicolas Dufresne wrote:
> > Le dimanche 14 avril 2019 à 18:41 +0200, Paul Kocialkowski a écrit :
> > > Hi,
> > > 
> > > Le vendredi 12 avril 2019 à 16:47 -0400, Nicolas Dufresne a écrit :
> > > > Le mercredi 06 mars 2019 à 17:00 +0900, Alexandre Courbot a écrit :
> > > > > Documents the protocol that user-space should follow when
> > > > > communicating with stateless video decoders.
> > > > > 
> > > > > The stateless video decoding API makes use of the new request and tags
> > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > should probably still be considered staging for a short while.
> > > 
> > > [...]
> > > 
> > > > From an IRC discussion with Paul and some more digging, I have found a
> > > > design problem in the decoding process.
> > > > 
> > > > In H264 and HEVC you can have multiple decoding unit per frames
> > > > (slices). This type of encoding is increasingly popular, specially for
> > > > low latency streaming use cases. The wording of this spec does allow
> > > > for the notion of decoding unit, and in practice it has been proven to
> > > > work through some RFC FFMPEG patches and the Cedrus driver. But
> > > > something important to know is that the FFMPEG RFC implements decoding
> > > > in lock steps. Which means:
> > > > 
> > > >   1. It queues a single free capture buffer
> > > >   2. It queues an output buffer, set controls, queue the request
> > > >   3. It waits for a capture buffer to reach state done
> > > >   4. It dequeues that capture buffer, and queue it back again
> > > >   5. And then it runs step 2,4,3 again with following slices, until we 
> > > >  have a complete frame. After what, it restart at step 1
> > > > 
> > > > So the implementation makes no use of the queues. There is no batch
> > > > processing, so we might not be able to reach the maximum hardware
> > > > throughput.
> > > > 
> > > > So the optimal method would look like the following, but there comes
> > > > the design issue.
> > > > 
> > > >   1. Queue a single free capture buffer
> > > >   2. Queue output buffer for slice 1, set controls, queue the request
> > > >   3. Queue output buffer for slice 2, set controls, queue the request
> > > >   4. Wait for completion
> > > > 
> > > > The problem is in step 4. Completion means that the capture buffer done
> > > > decoding a single unit. So assuming the driver supports matching the
> > > > timestamp against the queued buffer, instead of waiting for a new
> > > > buffer, the driver would have to mark twice the same buffer to done
> > > > state, which is just not working to inform userspace that all slices
> > > > are decoded into the one capture buffer they share.
> > > 
> > > Interestingly, I'm experiencing the exact same problem dealing with a
> > > 2D graphics blitter that has limited ouput scaling abilities which
> > > imply handlnig a large scaling operation as multiple clipped smaller
> > > scaling operations. The issue is basically that multiple jobs have to
> > > be submitted to complete a single frame and relying on an indication
> > > from the destination buffer (such as a fence) doesn't work to indicate
> > > that all the operations were completed, since we get the indication at
> > > each step instead of at the end of the batch.
> > 
> > That looks similar to the IMX.6 IPU m2m driver. It splits the image in
> > tiles of 1024x1024 and process each tile separately. This driver has
> > been around for a long time, so I guess they have a solution to that.
> > They don't need requests, because there is nothing to be bundled with
> > the input image. I know that Renesas folks have started working on a
> > de-interlacer. Again, this kind of driver may process and reuse input
> > buffers for motion compensation, but I don't think they need special
> > userspace API for that.
> 
> Thanks for the reference! I hope it's not a blitter that was
> contributed as a V4L2 driver instead of DRM, as it probably would be
> more useful in DRM (but that's way beside the point).

DRM does not offer a generic and discoverable interface for these
accelerators. Note that these drivers have most of the time started as
DRM driver and their DRM side where dropped. That was the case for
Exynos drivers at least.

The thing is that DRM is great if you do immediate display stuff, while
V4L2 is nice if you do streaming, where you expect filling queued, and
popping buffers from queues.

In the end, this is just an interface, nothing prevents you from making
an internal driver (like the Meson Canvas) and simply letting multiple
sub-system expose it. Specially that some of these IP will often
support both signal and memory processing, so they equally fit into a
media controller ISP, a v4l2 m2m or a DRM driver.

Another driver you might want to look is Rockchip RGA driver (which is
a multi function IP, including blitting).

> 
> > > One

Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

2019-04-17 Thread Nicolas Dufresne

Le mardi 16 avril 2019 à 09:37 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> Le lundi 15 avril 2019 à 11:30 -0400, Nicolas Dufresne a écrit :
> > Le lundi 15 avril 2019 à 15:26 +0200, Paul Kocialkowski a écrit :
> > > Hi,
> > > 
> > > On Mon, 2019-04-15 at 08:24 -0400, Nicolas Dufresne wrote:
> > > > Le lundi 15 avril 2019 à 09:58 +0200, Paul Kocialkowski a écrit :
> > > > > Hi,
> > > > > 
> > > > > On Sun, 2019-04-14 at 18:38 -0400, Nicolas Dufresne wrote:
> > > > > > Le dimanche 14 avril 2019 à 18:41 +0200, Paul Kocialkowski a écrit :
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Le vendredi 12 avril 2019 à 16:47 -0400, Nicolas Dufresne a écrit 
> > > > > > > :
> > > > > > > > Le mercredi 06 mars 2019 à 17:00 +0900, Alexandre Courbot a 
> > > > > > > > écrit :
> > > > > > > > > Documents the protocol that user-space should follow when
> > > > > > > > > communicating with stateless video decoders.
> > > > > > > > > 
> > > > > > > > > The stateless video decoding API makes use of the new request 
> > > > > > > > > and tags
> > > > > > > > > APIs. While it has been implemented with the Cedrus driver so 
> > > > > > > > > far, it
> > > > > > > > > should probably still be considered staging for a short while.
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > From an IRC discussion with Paul and some more digging, I have 
> > > > > > > > found a
> > > > > > > > design problem in the decoding process.
> > > > > > > > 
> > > > > > > > In H264 and HEVC you can have multiple decoding unit per frames
> > > > > > > > (slices). This type of encoding is increasingly popular, 
> > > > > > > > specially for
> > > > > > > > low latency streaming use cases. The wording of this spec does 
> > > > > > > > allow
> > > > > > > > for the notion of decoding unit, and in practice it has been 
> > > > > > > > proven to
> > > > > > > > work through some RFC FFMPEG patches and the Cedrus driver. But
> > > > > > > > something important to know is that the FFMPEG RFC implements 
> > > > > > > > decoding
> > > > > > > > in lock steps. Which means:
> > > > > > > > 
> > > > > > > >   1. It queues a single free capture buffer
> > > > > > > >   2. It queues an output buffer, set controls, queue the request
> > > > > > > >   3. It waits for a capture buffer to reach state done
> > > > > > > >   4. It dequeues that capture buffer, and queue it back again
> > > > > > > >   5. And then it runs step 2,4,3 again with following slices, 
> > > > > > > > until we 
> > > > > > > >  have a complete frame. After what, it restart at step 1
> > > > > > > > 
> > > > > > > > So the implementation makes no use of the queues. There is no 
> > > > > > > > batch
> > > > > > > > processing, so we might not be able to reach the maximum 
> > > > > > > > hardware
> > > > > > > > throughput.
> > > > > > > > 
> > > > > > > > So the optimal method would look like the following, but there 
> > > > > > > > comes
> > > > > > > > the design issue.
> > > > > > > > 
> > > > > > > >   1. Queue a single free capture buffer
> > > > > > > >   2. Queue output buffer for slice 1, set controls, queue the 
> > > > > > > > request
> > > > > > > >   3. Queue output buffer for slice 2, set controls, queue the 
> > > > > > > > request
> > > > > > > >   4. Wait for completion
> > > > > > > > 
> > > > > > > > The problem is in step 4. Completion means that the capture 
> > > > > > > > buffer done
> > > > > > > > decoding a single unit. So assuming the driver supports 
> > > > > > > > matching the
> > > > > > > > timestamp against the queued buffer, instead of waiting for a 
> > > > > > > > new
> > > > > > > > buffer, the driver would have to mark twice the same buffer to 
> > > > > > > > done
> > > > > > > > state, which is just not working to inform userspace that all 
> > > > > > > > slices
> > > > > > > > are decoded into the one capture buffer they share.
> > > > > > > 
> > > > > > > Interestingly, I'm experiencing the exact same problem dealing 
> > > > > > > with a
> > > > > > > 2D graphics blitter that has limited ouput scaling abilities which
> > > > > > > imply handlnig a large scaling operation as multiple clipped 
> > > > > > > smaller
> > > > > > > scaling operations. The issue is basically that multiple jobs 
> > > > > > > have to
> > > > > > > be submitted to complete a single frame and relying on an 
> > > > > > > indication
> > > > > > > from the destination buffer (such as a fence) doesn't work to 
> > > > > > > indicate
> > > > > > > that all the operations were completed, since we get the 
> > > > > > > indication at
> > > > > > > each step instead of at the end of the batch.
> > > > > > > 
> > > > > > > One idea I see to solve this is to have a notion of batch in the 
> > > > > > > driver
> > > > > > > (for our situation, that would be in v4l2) and provide means to 
> > > > > > > get a
> > > > > > > done indication for that entity.
> > > > > > > 
> > > > > > > I think we could extend the request API to allow this. We already
> > > > > > >

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

2019-04-17 Thread Joseph Myers

On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:

> > +/* RSEQ_SIG is a signature required before each abort handler code.
> > +
> > +   It is a 32-bit value that maps to actual architecture code compiled
> > +   into applications and libraries. It needs to be defined for each
> > +   architecture. When choosing this value, it needs to be taken into
> > +   account that generating invalid instructions may have ill effects on
> > +   tools like objdump, and may also have impact on the CPU speculative
> > +   execution efficiency in some cases.  */
> > +
> > +#define RSEQ_SIG 0xd428bc00/* BRK #0x45E0.  */
> 
> After further investigation, we should probably do the following
> to handle compiling with -mbig-endian on aarch64, which generates
> binaries with mixed code vs data endianness (little endian code,
> big endian data):

First, the comment on RSEQ_SIG should specify whether it is to be 
interpreted in the code or the data endianness.

> For ARM32, the situation is a bit more complex. Only armv6+
> generates mixed-endianness code vs data with -mbig-endian.
> Prior to armv6, the code and data endianness matches. Therefore,
> I plan to #ifdef the reversed endianness handling with:
> 
> #if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN
> 
> on arm32.

That doesn't work well because BE code (.o files) can be built for v5te 
(for example) and used on a range of different architecture variants with 
both BE32 and BE8 - the choice between BE32 and BE8 is a link-time choice, 
not a compile-time choice.  So if the value for Arm is a compile-time 
constant, it should also work for both BE32 and BE8.

In turn, that suggests to me that RSEQ_SIG should be defined to be a value 
that is always in the code endianness (and whatever corresponding kernel 
code handles RSEQ_SIG values should act accordingly on architectures where 
the two endiannesses can differ).  If the kernel ABI is already fixed in a 
way that prevents such a definition of RSEQ_SIG semantics as using code 
endianness, a value should be chosen for Arm that works for both 
endiannesses.

(Also, installed glibc headers are supposed to work with older compilers, 
and support for __ARM_ARCH was only added in GCC 4.8.  Before that you 
need to test lots of separate macros for different architecture variants 
to determine a version number.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH RFC 1/1] kernfs: keep kernfs node alive for __kernfs_remove()

2019-04-17 Thread Konstantin Khorenko

On 04/16/2019 10:17 PM, Tejun Heo wrote:
> On Tue, Apr 16, 2019 at 06:53:35PM +0300, Konstantin Khorenko wrote:
>> __kernfs_remove() which is called under kernfs_mutex,
>> assumes nobody kills kernfs node whie it's working on it
>> and "get"s current kernfs node for that.
>>
>> But we hit a warning in kernfs_get(): kn->counter == 0 already:
>>   [ cut here ]
>>   WARNING: CPU: 2 PID: 63923 at fs/kernfs/dir.c:377 kernfs_get+0x2f/0x40
>>   ...
>>   Call Trace:
>>[] dump_stack+0x19/0x1b
>>[] __warn+0xd8/0x100
>>[] warn_slowpath_null+0x1d/0x20
>>[] kernfs_get+0x2f/0x40
>>[] __kernfs_remove+0x113/0x260
>>[] kernfs_remove+0x21/0x30
>>[] sysfs_remove_dir+0x50/0x80
>>[] kobject_del+0x18/0x50
>>[] sysfs_slab_remove+0x3d/0x50
>>[] do_kmem_cache_release+0x3b/0x70
>>[] memcg_destroy_kmem_caches+0xb1/0xf0
>>[] mem_cgroup_css_free+0x4c/0x280
>>[] cgroup_free_fn+0x4c/0x120
>>[] process_one_work+0x182/0x440
>>[] worker_thread+0x126/0x3c0
>>[] kthread+0xd1/0xe0
>>
>> This could be for example because of kernfs_notify_workfn() which
>> does kernfs_put(kn) out of kernfs_mutex held section,
>> so move kernfs_put(kn) under the mutex.
>
> This patch doesn't really make sense to me.  Can you give a more
> concrete scenario where this would help?

i don't know the full scenario unfortunately, but the idea is the following:

__kernfs_remove() is called under kernfs_mutex and if
   !(!kn || (kn->parent && RB_EMPTY_NODE(>rb)))

it assumes that nothing can change while we hold the mutex and
for each kernfs descendant should have kn->count > 0.

=
 /* deactivate and unlink the subtree node-by-node */
 do {
 pos = kernfs_leftmost_descendant(kn);

 /*
  * kernfs_drain() drops kernfs_mutex temporarily and @pos's
  * base ref could have been put by someone else by the time
  * the function returns.  Make sure it doesn't go away
  * underneath us.
  */
 kernfs_get(pos);
=

At the same time kernfs_notify_workfn() can do a kernfs_put() out of 
kernfs_mutex
which probably can be the last put and dec kn->count to 0 any moment.


Thank you.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

Re: [PATCH v4 07/10] drivers: pinctrl: msm: setup GPIO irqchip hierarchy

2019-04-17 Thread Lina Iyer


On Fri, Mar 15 2019 at 10:28 -0600, Stephen Boyd wrote:

Quoting Lina Iyer (2019-03-13 14:18:41)

---
Changes in v4:
- Remove irq_set_wake() on summary IRQ interrupt
Changes in v3:
- Use of_irq_domain_map() and pass PDC pin to parent irqdomain
Changes in v2:
- Call parent mask when masking GPIO interrupt
Changes in v1:
- Fix bug when unmasking PDC interrupt

[...]

+}
+
+/*
+ * TODO: Get rid of this and push it into gpiochip_to_irq()
+ */


Any chance this TODO can be resolved?



I am thinking of something like this. Would there be any issue in
setting the type to IRQ_TYPE_SENSE_MASK instead of any one particular
type?

---8<-
static int gpiochip_to_irq(struct gpio_chip *chip, unsigned offset)
{
#ifdef CONFIG_OF_GPIO
   struct irq_fwspec fwspec;

   if (chip->of_node) {
  fwspec.fwnode = of_node_to_fwnode(chip->of_node);
  fwspec.param[0] = offset;
  fwspec.param[1] = IRQ_TYPE_SENSE_MASK;
  fwspec.param_count = 2;
  return irq_create_fwspec_mapping();
   }
#endif

   if (!gpiochip_irqchip_irq_valid(chip, offset))
   return -ENXIO;

   return irq_create_mapping(chip->irq.domain, offset);
}
---8<


Thanks,
Lina

Re: [PATCH v4 2/2] x86/boot/KASLR: skip the specified crashkernel region

2019-04-17 Thread Borislav Petkov

On Wed, Apr 17, 2019 at 01:53:37PM +0800, Pingfan Liu wrote:
> Take __parse_crashkernel()->parse_crashkernel_simple() for example. If
> no offset given, then it still return 0, but crash_base is dangling.

Well, that is bad design. parse_crashkernel_simple() should return a
*separate* distinct value which denotes that @offset hasn't been passed.

Please fix that by having it return 1 or something else positive to
denote that there wasn't an [@offset] given.

And then correct that crap here:

static void __init reserve_crashkernel(void)
{
...

ret = parse_crashkernel(boot_command_line, total_mem, _size, 
_base);
if (ret != 0 || crash_size <= 0) {

where *two*! variables are used as return values from a single function.
That's just sloppy.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: + mm-memcontrol-make-cgroup-stats-and-events-query-api-explicitly-local.patch added to -mm tree

2019-04-17 Thread Johannes Weiner

On Mon, Apr 15, 2019 at 06:51:32PM -0700, a...@linux-foundation.org wrote:
> 
> The patch titled
>  Subject: mm: memcontrol: make cgroup stats and events query API 
> explicitly local
> has been added to the -mm tree.  Its filename is
>  
> mm-memcontrol-make-cgroup-stats-and-events-query-api-explicitly-local.patch

>From 65f026fe5481f8dc32b3dc3b97994f8cdc82dd17 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Wed, 17 Apr 2019 11:08:47 -0400
Subject: [PATCH] mm: memcontrol: make cgroup stats and events query API
 explicitly local fix

The lruvec_page_state() -> lruvec_page_state_local() rename should
have been part of this patch, not the previous one.

Signed-off-by: Johannes Weiner 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 871c661ca8be..6e99a8b9b2ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2979,7 +2979,7 @@ static void snapshot_refaults(struct mem_cgroup 
*root_memcg, pg_data_t *pgdat)
struct lruvec *lruvec;
 
lruvec = mem_cgroup_lruvec(pgdat, memcg);
-   refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE);
+   refaults = lruvec_page_state_local(lruvec, WORKINGSET_ACTIVATE);
lruvec->refaults = refaults;
} while ((memcg = mem_cgroup_iter(root_memcg, memcg, NULL)));
 }
-- 
2.21.0

Re: [PATCH v3 13/13] platform/x86: intel_cht_int33fe: Replacing the old connections with references

2019-04-17 Thread Hans de Goede


Hi,

On 17-04-19 12:44, Heikki Krogerus wrote:

On Wed, Apr 17, 2019 at 12:15:18PM +0200, Hans de Goede wrote:

Hi,

On 17-04-19 11:32, Heikki Krogerus wrote:

On Wed, Apr 17, 2019 at 11:19:28AM +0200, Hans de Goede wrote:





That is not going to work since the (virtual) mux / orientation-switch
devices are only registered once the driver binds to the piusb30532 i2c
device, so when creating the nodes we only have the piusb30532 i2c device.


It's not a problem, that's why we have the software nodes. The nodes
can be created before the device entires. The node for pi3usb30532
will just be the parent node for the new nodes we add for the mux and
switch.


I've been thinking some more about this and an easy fix is to have separate
fwnode_match functions for typec_switch_match and typec_mux_match and have
them check that the dev_name ends in "-mux" resp. "-switch" that requires
only a very minimal change to "usb: typec: Registering real device entries for the 
muxes"
and then everything should be fine.


I don't want to do anymore device name matching unless we have to, and
here we don't have to. We can name the nodes for those virtual mux and
switch, and then just do fwnode_find_named_child_node() in
pi3usb30532.c for both of them.


Thinking more about this, I have a feeling that this makes things needlessly
complicated, checking the dev_name *ends* in "-mux" resp. "-switch" should be
100% reliable since we call:

 dev_set_name(>dev, "%s-switch", dev_name(parent));
 dev_set_name(>dev, "%s-mux", dev_name(parent));

When registering the switch / mux, so I believe doing name (suffix) comparison
here is fine and much simpler. Anyways this is just my 2 cents on this, I'm
happy with either solution, your choice.


You do have a point. I'll take a look how the two options look like,
but maybe your way is better after all.


I whipped up a quick fix using my approach so that I can start working
on debugging the usb_role_switch_get call in tcpm.c returning NULL.

I've attached it, feel free to use this for v4 of the series if you
decide to go with this approach.

Regards,

Hans




thanks,

>From 47154195c05dc7c8b3373de9603b06c2f435588a Mon Sep 17 00:00:00 2001
From: Hans de Goede 
Date: Wed, 17 Apr 2019 17:58:17 +0200
Subject: [PATCH v2] FIXUP: "usb: typec: Registering real device entries for
 the muxes"

Check the dev_name suffix so that we do not return the first registered
device when a mux and switch share the same parent and fwnode.

Signed-off-by: Hans de Goede 
---
 drivers/usb/typec/mux.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/typec/mux.c b/drivers/usb/typec/mux.c
index c7d4a2dd454e..a28803544301 100644
--- a/drivers/usb/typec/mux.c
+++ b/drivers/usb/typec/mux.c
@@ -22,9 +22,21 @@ static int name_match(struct device *dev, const void *name)
 	return !strcmp((const char *)name, dev_name(dev));
 }
 
-static int fwnode_match(struct device *dev, const void *fwnode)
+static bool dev_name_ends_with(struct device *dev, const char *suffix)
 {
-	return dev_fwnode(dev) == fwnode;
+	const char *name = dev_name(dev);
+	const int name_len = strlen(name);
+	const int suffix_len = strlen(suffix);
+
+	if (suffix_len > name_len)
+		return false;
+
+	return strcmp(name + (name_len - suffix_len), suffix) == 0;
+}
+
+static int switch_fwnode_match(struct device *dev, const void *fwnode)
+{
+	return dev_fwnode(dev) == fwnode && dev_name_ends_with(dev, "-switch");
 }
 
 static void *typec_switch_match(struct device_connection *con, int ep,
@@ -37,7 +49,7 @@ static void *typec_switch_match(struct device_connection *con, int ep,
 			return NULL;
 
 		dev = class_find_device(_mux_class, NULL, con->fwnode,
-	fwnode_match);
+	switch_fwnode_match);
 	} else {
 		dev = class_find_device(_mux_class, NULL,
 	con->endpoint[ep], name_match);
@@ -167,6 +179,11 @@ EXPORT_SYMBOL_GPL(typec_switch_get_drvdata);
 
 /* - */
 
+static int mux_fwnode_match(struct device *dev, const void *fwnode)
+{
+	return dev_fwnode(dev) == fwnode && dev_name_ends_with(dev, "-mux");
+}
+
 static void *typec_mux_match(struct device_connection *con, int ep, void *data)
 {
 	const struct typec_altmode_desc *desc = data;
@@ -226,7 +243,7 @@ static void *typec_mux_match(struct device_connection *con, int ep, void *data)
 
 find_mux:
 	dev = class_find_device(_mux_class, NULL, con->fwnode,
-fwnode_match);
+mux_fwnode_match);
 
 	return dev ? to_typec_switch(dev) : ERR_PTR(-EPROBE_DEFER);
 }
-- 
2.21.0

Re: [PATCH v3 1/3] dt-bindings: input: add GPIO controllable vibrator

2019-04-17 Thread Luca Weiss

On Freitag, 12. April 2019 17:06:23 CEST Luca Weiss wrote:
> Provide a simple driver for GPIO controllable vibrators.
> It will be used by the Fairphone 2.
> 
> Signed-off-by: Luca Weiss 
> ---
>  .../bindings/input/gpio-vibrator.txt  | 20 +++
>  1 file changed, 20 insertions(+)
>  create mode 100644
> Documentation/devicetree/bindings/input/gpio-vibrator.txt
> 
> diff --git a/Documentation/devicetree/bindings/input/gpio-vibrator.txt
> b/Documentation/devicetree/bindings/input/gpio-vibrator.txt new file mode
> 100644
> index ..93e5a8e7622d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/input/gpio-vibrator.txt
> @@ -0,0 +1,20 @@
> +* GPIO vibrator device tree bindings
> +
> +Registers a GPIO device as vibrator, where the vibration motor just has the
> +capability to turn on or off. If the device is connected to a pwm, you
> should +use the pwm-vibrator driver instead.
> +
> +Required properties:
> +- compatible: should contain "gpio-vibrator"
> +- enable-gpios: Should contain a GPIO handle
> +
> +Optional properties:
> +- vcc-supply: Phandle for the regulator supplying power
> +
> +Example from Fairphone 2:
> +
> +vibrator {
> + compatible = "gpio-vibrator";
> + enable-gpios = < 86 GPIO_ACTIVE_HIGH>;
> + vcc-supply = <_l18>;
> +};

I see that the yaml based device tree binding docs seem to be the new hotness? 
Is there any "policy" / preference about new drivers?

Luca

[GIT PULL] tee subsys for v5.2

2019-04-17 Thread Jens Wiklander

Hello arm-soc maintainers,

Please pull this OP-TEE driver patch. It allows the OP-TEE driver to work
without a static carved out shared memory area.

Thanks,
Jens

The following changes since commit 1c163f4c7b3f621efff9b28a47abb36f7378d783:

  Linux 5.0 (2019-03-03 15:21:29 -0800)

are available in the Git repository at:

  http://git.linaro.org:/people/jens.wiklander/linux-tee.git 
tags/tee-optee-for-5.2

for you to fetch changes up to 9733b072a12a422e2bf17bc7ba8b39769853d4a2:

  optee: allow to work without static shared memory (2019-04-17 17:26:33 +0200)


Allow OP-TEE driver to work without static shared memory


Volodymyr Babchuk (1):
  optee: allow to work without static shared memory

 drivers/tee/optee/core.c | 80 +---
 1 file changed, 49 insertions(+), 31 deletions(-)

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

2019-04-17 Thread Mathieu Desnoyers

- On Apr 16, 2019, at 1:32 PM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

[...]
> diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> new file mode 100644
> index 00..b02471a89a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> @@ -0,0 +1,32 @@
> +/* Restartable Sequences Linux aarch64 architecture header.
> +
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   .  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use  directly; include  instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries. It needs to be defined for each
> +   architecture. When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.  */
> +
> +#define RSEQ_SIG 0xd428bc00  /* BRK #0x45E0.  */

After further investigation, we should probably do the following
to handle compiling with -mbig-endian on aarch64, which generates
binaries with mixed code vs data endianness (little endian code,
big endian data):

#ifdef __ARM_BIG_ENDIAN
#define RSEQ_SIG 0x00bc28d4 /* BRK #0x45E0.  */
#else
#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
#endif

Else mismatch between code endianness for the generated
signatures and data endianness for the RSEQ_SIG parameter
passed to the rseq registration will trigger application
segmentation faults when the kernel try to abort rseq
critical sections.

For ARM32, the situation is a bit more complex. Only armv6+
generates mixed-endianness code vs data with -mbig-endian.
Prior to armv6, the code and data endianness matches. Therefore,
I plan to #ifdef the reversed endianness handling with:

#if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN

on arm32.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

2019-04-17 Thread David Laight

From: Theodore Ts'o
> Sent: 17 April 2019 16:16
> On Wed, Apr 17, 2019 at 09:28:35AM +, David Laight wrote:
> >
> > If you can guarantee back to back requests on the PRNG then it is probably
> > possible to recalculate its state from 'bits of state'/5 calls.
> > Depend on the PRNG this might be computationally expensive.
> > For some PRNG it will be absolutely trivial.
> > ...
> > Stirring in a little bit of entropy doesn't help much either.
> > The entropy bits are effectively initial state bits.
> > Add 4 in with each request and 128 outputs gives 640 linear
> > equations in the (128 + 4 * 128) unknowns - still solvable.
> 
> This is basically a scenario where the attacker has already taken
> control of Ring 3 execution and the question is how hard is it for
> them to perform privilege escalation attack to ring 0, right?

Or extract information that should only be known by ring 0.
I fairly sure many of the side-channel attacks not only require
ring 3 access, but also the ability to request ring 0 repeatedly
perform a specific action on an otherwise idle system.

> I'm sure the security folks will think I'm defeatist, but my personal rule
> of thumb is if the attacker has ring 3 control, you've already lost
> --- I figure there are so many zero days that getting ring 0 control
> is a foregone conclusion.  :-(
> 
> So that basically means if we want to protect against this, we're
> going to do something which involves Real Crypto (tm).  Whether that's
> RDRAND, or using Chacha20, etc., or something that has some attack
> resistance, such as "half MD5", etc., but emminently crackable by
> brute force, is essentially a overhead vs. security argument, and what
> it is we are willing to pay.

Some of these 'random' values have a short lifetime - and would need
to be cracked quickly to be of any use.

I suspect that combining the output three linear generators with
addition not xor would make it computationally much harder to
reverse.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [RFC][PATCHSET] sorting out RCU-delayed stuff in ->destroy_inode()

2019-04-17 Thread David Sterba

On Tue, Apr 16, 2019 at 06:49:00PM +0100, Al Viro wrote:
>   We have a lot of boilerplate in ->destroy_inode()
> instances, and several filesystems got the things wrong
> in that area.  The patchset below attempts to deal with that.
> 
>   New method (void ->free_inode(inode)) is introduced,
> and RCU-delayed parts of ->destroy_inode() are moved there.
> The change is backwards-compatible - unmodified filesystem
> will behave as it used to.  Rules:
>   ->destroy_inode ->free_inode
>   f   g   f(), rcu-delayed g()
>   f   NULLf()
>   NULLg   rcu-delayed g()
>   NULLNULLrcu-delayed free_inode_nonrcu()
> IOW, NULL/NULL acts as NULL/free_inode_nonrcu.
> 
>   For a lot of filesystems ->destroy_inode() used to consist
> only of call_rcu(foo_i_callback, >i_rcu).  Those simply get
> rid of ->destroy_inode() and have the callback (with saner prototype)
> become their ->free_inode().

The simplified API looks good to me. For btrfs and affs bits:

Acked-by: David Sterba

Re: + mm-fix-inactive-list-balancing-between-numa-nodes-and-cgroups.patch added to -mm tree

2019-04-17 Thread Johannes Weiner

On Mon, Apr 15, 2019 at 02:27:44PM -0700, a...@linux-foundation.org wrote:
> 
> The patch titled
>  Subject: mm: fix inactive list balancing between NUMA nodes and cgroups
> has been added to the -mm tree.  Its filename is
>  mm-fix-inactive-list-balancing-between-numa-nodes-and-cgroups.patch

---

>From b5a82062b99fd3d2d4f4f7dc220d4acb1aa9b749 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Wed, 17 Apr 2019 11:08:07 -0400
Subject: [PATCH] mm: fix inactive list balancing between NUMA nodes and
 cgroups fix

lruvec_page_state_local() is only defined later in the series. This is
fallout from reshuffling the patch series to pull a standalone fix
before the bigger stats rework.

Signed-off-by: Johannes Weiner 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c9f8afe61ae3..461720e2ae90 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2979,7 +2979,7 @@ static void snapshot_refaults(struct mem_cgroup 
*root_memcg, pg_data_t *pgdat)
struct lruvec *lruvec;
 
lruvec = mem_cgroup_lruvec(pgdat, memcg);
-   refaults = lruvec_page_state_local(lruvec, WORKINGSET_ACTIVATE);
+   refaults = lruvec_page_state(lruvec, WORKINGSET_ACTIVATE);
lruvec->refaults = refaults;
} while ((memcg = mem_cgroup_iter(root_memcg, memcg, NULL)));
 }
-- 
2.21.0

Re: [PATCH V1 3/3] regulator: slg51000: add slg51000 regulator driver

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 11:25:29AM +, Eric Hyeung Dong Jeong wrote:
> On Wednesday, April 17, 2019 12:25 AM +0900, Mark Brown wrote:

> > This looks like it should be a get_status() operation as it's reading
> > status bits rather than the command we sent to the device - for that
> > just use regulator_is_enabled_regmap().

> I thought that it needs to return current status of a regulator when the 
> function is called.
> I am wondering that the *is_enabled()* function is just to check 
> If a regulator has been turned on or not rather than getting current status 
> of the regulator.

It should say if we told the hardware to enable the regulator so the
former - basically, is the hardware in the same state it'd be in after
we called _enable() or _disable().  If there's a fault then that should
only affect is_enabled() if it resets the enable bit that _enable() set.

signature.asc
Description: PGP signature

Re: [PATCH v3] proc/sysctl: add shared variables for range check

2019-04-17 Thread Matthew Wilcox

On Wed, Apr 17, 2019 at 03:15:31PM +0200, Matteo Croce wrote:
> In the sysctl code the proc_dointvec_minmax() function is often used to
> validate the user supplied value between an allowed range. This function
> uses the extra1 and extra2 members from struct ctl_table as minimum and
> maximum allowed value.
> 
> On sysctl handler declaration, in every source file there are some readonly
> variables containing just an integer which address is assigned to the
> extra1 and extra2 members, so the sysctl range is enforced.
> 
> The special values 0, 1 and INT_MAX are very often used as range boundary,
> leading duplication of variables like zero=0, one=1, int_max=INT_MAX in
> different source files:
> 
> $ git grep -E '\.extra[12].*&(zero|one|int_max)\b' |wc -l
> 245
> 
> This patch adds three const variables for the most commonly used values,
> and use them instead of creating a local one for every object file.

Does this actually cause the kernel size to shrink?  EXPORT_SYMBOL isn't
free, you know.

Re: [PATCH v1] perf record: collect user registers set jointly with dwarf stacks

2019-04-17 Thread Arnaldo Carvalho de Melo

On April 17, 2019 11:40:02 AM GMT-03:00, Jiri Olsa  wrote:
>On Wed, Apr 17, 2019 at 11:35:42AM -0300, Arnaldo Carvalho de Melo
>wrote:
>> Em Wed, Apr 17, 2019 at 09:39:52AM +0200, Jiri Olsa escreveu:
>> > On Mon, Apr 15, 2019 at 06:36:13PM +0300, Alexey Budankov wrote:
>> > > 
>> > > When dwarf stacks are collected jointly with user specified
>register
>> > > set using --user-regs option like below the full register context
>is
>> > > still captured on a sample:
>> > > 
>> > >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP
>-- matrix.gcc.g.O3
>> > > 
>> > >   188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002):
>23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac
>> > >   ... FP chain: nr:0
>> > >   ... user regs: mask 0xff0fff ABI 64-bit
>> > >    AX0x53b
>> > >    BX0x7ffedbdd3cc0
>> > >    CX0x
>> > >    DX0x33d3a
>> > >    SI0x7f09b74c38d0
>> > >    DI0x0
>> > >    BP0x401260
>> > >    SP0x7ffedbdd3cc0
>> > >    IP0x401236
>> > >    FLAGS 0x20a
>> > >    CS0x33
>> > >    SS0x2b
>> > >    R80x7f09b74c3800
>> > >    R90x7f09b74c2da0
>> > >    R10   0xf3ce
>> > >    R11   0x246
>> > >    R12   0x401070
>> > >    R13   0x7ffedbdd5db0
>> > >    R14   0x0
>> > >    R15   0x0
>> > >   ... ustack: size 1024, offset 0xe0
>> > >. data_src: 0x5080021
>> > >... thread: stack_test2.g.O:23828
>> > >.. dso: /root/abudanko/stacks/stack_test2.g.O3
>> > > 
>> > > After applying the change suggested in the patch the sample data
>contain
>> > > only user specified register values:
>> > > 
>> > >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP
>-- matrix.gcc.g.03
>> > > 
>> > >   188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002):
>23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c
>> > >   ... FP chain: nr:0
>> > >   ... user regs: mask 0x1c0 ABI 64-bit
>> > >    BP0x401260
>> > >    SP0x7ffd3d85cc20
>> > >    IP0x401236
>> > >   ... ustack: size 1024, offset 0x58
>> > >. data_src: 0x5080021
>> > >... thread: stack_test2.g.O:23839
>> > >.. dso: /root/abudanko/stacks/stack_test2.g.O3
>> > > 
>> > > Signed-off-by: Alexey Budankov 
>> > 
>> > Acked-by: Jiri Olsa 
>> 
>> So, there are registers that are needed to do the DWARF unwinding,
>> right? But at the same time, if the user says only some are needed,
>he
>> better know what they're doing and ask for at least the registers
>needed
>> for the unwinding process to be successfull, right?
>
>yep, that's how understand that

So we need to document that, stating that specifying a set of registers 
together with requesting DWARF callchains may break things.

- Arnaldo

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Keith Busch

On Wed, Apr 17, 2019 at 05:39:23PM +0200, Michal Hocko wrote:
> On Wed 17-04-19 09:23:46, Keith Busch wrote:
> > On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > > Keith Busch had a set of patches to let you specify the demotion order
> > > > via sysfs for fun.  The rules we came up with were:
> > > 
> > > I am not a fan of any sysfs "fun"
> > 
> > I'm hung up on the user facing interface, but there should be some way a
> > user decides if a memory node is or is not a migrate target, right?
> 
> Why? Or to put it differently, why do we have to start with a user
> interface at this stage when we actually barely have any real usecases
> out there?

The use case is an alternative to swap, right? The user has to decide
which storage is the swap target, so operating in the same spirit.

Re: [PATCH] regulator: core: do not report EPROBE_DEFER as error.

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 10:54:11AM +0200, Jorge Ramirez-Ortiz wrote:
> Do not log a temporary failure to get a regulator (EPROBE_DEFER) while
> the driver is requesting retries.

> - dev_err(dev, "Failed to get supply '%s': %d\n",
> - consumers[i].supply, ret);
> + if (ret != -EPROBE_DEFER)
> + dev_err(dev, "Failed to get supply '%s': %d\n",
> + consumers[i].supply, ret);

Please leave at least a debug log in place, it's not good to just
silently fail - even if we will retry someone might still need some help
debugging (eg, figuring out that they need to enable whatever driver is
providing the supply in their config) so we should tell them why we're
deferring.

BTW does an e-mail address need updating somewhere here?

signature.asc
Description: PGP signature

[PATCH v3] x86: mm: Do not use set_{pud,pmd}_safe when splitting the large page

2019-04-17 Thread Singh, Brijesh

The following commit 0a9fe8ca844d ("x86/mm: Validate 
kernel_physical_mapping_init()
PTE population") triggers the below warning in the SEV guest.

WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 
phys_pmd_init+0x30d/0x386
Call Trace:
 kernel_physical_mapping_init+0xce/0x259
 early_set_memory_enc_dec+0x10f/0x160
 kvm_smp_prepare_boot_cpu+0x71/0x9d
 start_kernel+0x1c9/0x50b
 secondary_startup_64+0xa4/0xb0

The SEV guest calls kernel_physical_mapping_init() to clear the encryption
mask from an existing mapping. While clearing the encryption mask
kernel_physical_mapping_init() splits the large pages into the smaller.
To split the page, the kernel_physical_mapping_init() allocates a new page
and updates the existing entry. The set_{pud,pmd}_safe triggers warning
when updating the entry with page in the present state.

Add a new kernel_physical_mapping_change() which uses the non-safe variants
of set_{pmd,pud,p4d}() and {pmd,pud,p4d}_populate() routines when updating
the entry. Since the kernel_physical_mapping_change() may replace the
existing entry with a new entry so the caller is responsible to issue the
TLB flushes. Update the early_set_memory_enc_dec() to use
kernel_physical_mapping_change() when it wants to clear the memory
encryption mask from the page table entry.

Signed-off-by: Brijesh Singh 
Fixes: 0a9fe8ca844d (x86/mm: Validate kernel_physical_mapping_init() ...)
Cc: Peter Zijlstra 
Cc: Dave Hansen 
Cc: Dan Williams 
Cc: Kirill A. Shutemov 
Cc: Peter Zijlstra (Intel) 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
---

Changes since v2:
- rename variable safe->init
- rename __set_* -> set_*_init()
- remame __*_populate -> *_populate_init()


Changes since v1:
-  add kernel_physical_mapping_change() which uses non-safe variants of
   set_{pmd,pud,pte}.

 arch/x86/mm/init_64.c | 137 --
 arch/x86/mm/mem_encrypt.c |  10 ++-
 arch/x86/mm/mm_internal.h |   3 +
 3 files changed, 110 insertions(+), 40 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bccff68e3267..70065726c39a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -58,6 +58,37 @@
 
 #include "ident_map.c"
 
+#define DEFINE_POPULATE(fname, type1, type2, init) \
+static inline void fname##_init(struct mm_struct *mm,  \
+   type1##_t *arg1, type2##_t *arg2, bool init)\
+{  \
+   if (init)   \
+   fname##_safe(mm, arg1, arg2);   \
+   else\
+   fname(mm, arg1, arg2);  \
+}
+
+DEFINE_POPULATE(p4d_populate, p4d, pud, init)
+DEFINE_POPULATE(pgd_populate, pgd, p4d, init)
+DEFINE_POPULATE(pud_populate, pud, pmd, init)
+DEFINE_POPULATE(pmd_populate_kernel, pmd, pte, init)
+
+#define DEFINE_ENTRY(type1, type2, init)   \
+static inline void set_##type1##_init(type1##_t *arg1, \
+   type2##_t arg2, bool init)  \
+{  \
+   if (init)   \
+   set_##type1##_safe(arg1, arg2); \
+   else\
+   set_##type1(arg1, arg2);\
+}
+
+DEFINE_ENTRY(p4d, p4d, init)
+DEFINE_ENTRY(pud, pud, init)
+DEFINE_ENTRY(pmd, pmd, init)
+DEFINE_ENTRY(pte, pte, init)
+
+
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
  * physical space so we can cache the place of the first one and move
@@ -414,7 +445,7 @@ void __init cleanup_highmap(void)
  */
 static unsigned long __meminit
 phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
- pgprot_t prot)
+ pgprot_t prot, bool init)
 {
unsigned long pages = 0, paddr_next;
unsigned long paddr_last = paddr_end;
@@ -432,7 +463,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 E820_TYPE_RESERVED_KERN))
-   set_pte_safe(pte, __pte(0));
+   set_pte_init(pte, __pte(0), init);
continue;
}
 
@@ -452,7 +483,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
pr_info("   pte=%p addr=%lx pte=%016lx\n", pte, paddr,
pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte);
pages++;
-   set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
+

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Paul Moore

On Wed, Apr 17, 2019 at 10:57 AM Oleg Nesterov  wrote:
> On 04/17, Paul Moore wrote:
> >
> > I'm tempted to simply return an error in selinux_setprocattr() if
> > the task's credentials are not the same as its real_cred;
>
> What about other modules? I have no idea what smack_setprocattr() is,
> but it too does prepare_creds/commit creds.
>
> it seems that the simplest workaround should simply add the additional
> cred == real_cred into proc_pid_attr_write().

Yes, that is simple, but I worry about what other LSMs might want to
do.  While I believe failing if the effective creds are not the same
as the real_creds is okay for SELinux (possibly Smack too), I worry
about what other LSMs may want to do.  After all,
proc_pid_attr_write() doesn't change the the creds itself, that is
something the specific LSMs do.

-- 
paul moore
www.paul-moore.com

Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

2019-04-17 Thread Kees Cook

On Wed, Apr 17, 2019 at 10:17 AM Theodore Ts'o  wrote:
>
> On Wed, Apr 17, 2019 at 09:28:35AM +, David Laight wrote:
> >
> > If you can guarantee back to back requests on the PRNG then it is probably
> > possible to recalculate its state from 'bits of state'/5 calls.
> > Depend on the PRNG this might be computationally expensive.
> > For some PRNG it will be absolutely trivial.
> > ...
> > Stirring in a little bit of entropy doesn't help much either.
> > The entropy bits are effectively initial state bits.
> > Add 4 in with each request and 128 outputs gives 640 linear
> > equations in the (128 + 4 * 128) unknowns - still solvable.
>
> This is basically a scenario where the attacker has already taken
> control of Ring 3 execution and the question is how hard is it for
> them to perform privilege escalation attack to ring 0, right?  I'm
> sure the security folks will think I'm defeatist, but my personal rule
> of thumb is if the attacker has ring 3 control, you've already lost
> --- I figure there are so many zero days that getting ring 0 control
> is a foregone conclusion.  :-(

I think this attitude comes from Linux traditionally having had such a
weak line between ring 3 and ring 0. That's what we're trying to fix,
generally speaking. :)

> So that basically means if we want to protect against this, we're
> going to do something which involves Real Crypto (tm).  Whether that's
> RDRAND, or using Chacha20, etc., or something that has some attack
> resistance, such as "half MD5", etc., but emminently crackable by
> brute force, is essentially a overhead vs. security argument, and what
> it is we are willing to pay.

I wonder how a separate per-cpu state combined with frequent reseeding
would compare to chacha20 (or RDRAND)?

Another point to consider is that this weakness depends on a separate
bug existing, which is becoming less and less likely, given the
always-init options now available. I don't think we should try to
over-engineer this too much. Best-effort here seems fine. Using a
stack leak when the stack is randomized may also prove difficult, so
there's some chicken-and-egg problems with the proposed threat...

-- 
Kees Cook

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Casey Schaufler


On 4/17/2019 7:57 AM, Oleg Nesterov wrote:

On 04/17, Paul Moore wrote:

I'm tempted to simply return an error in selinux_setprocattr() if
the task's credentials are not the same as its real_cred;

What about other modules? I have no idea what smack_setprocattr() is,
but it too does prepare_creds/commit creds.


For what it's worth, my test for Smack does not reproduce
the problem.



it seems that the simplest workaround should simply add the additional
cred == real_cred into proc_pid_attr_write().

Oleg.

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Michal Hocko

On Wed 17-04-19 09:23:46, Keith Busch wrote:
> On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> > On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > > Keith Busch had a set of patches to let you specify the demotion order
> > > via sysfs for fun.  The rules we came up with were:
> > 
> > I am not a fan of any sysfs "fun"
> 
> I'm hung up on the user facing interface, but there should be some way a
> user decides if a memory node is or is not a migrate target, right?

Why? Or to put it differently, why do we have to start with a user
interface at this stage when we actually barely have any real usecases
out there?

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 1/7] s390: zcrypt: driver callback to indicate resource in use

2019-04-17 Thread Halil Pasic

On Mon, 15 Apr 2019 18:43:24 -0400
Tony Krowiak  wrote:

> On 4/15/19 2:59 PM, Halil Pasic wrote:
> > On Mon, 15 Apr 2019 12:51:23 -0400
> > Tony Krowiak  wrote:
> > 
> >> Having said that, I understand your concern about a driver hogging
> >> resources. I think I can provide a solution that serves both the
> >> purpose of preventing problems associated with accidental removal
> >> of AP resources as well as allowing root to remove them
> >> forcefully. I'll work on that for v2.
> > 
> > Can you tell us some more about this solution? Should we stop reviewing
> > v1 because v2 is going to be different anyway?
> 
> Patch 1 and 2 will be removed. There will not be a major design change
> between these patches and v2. In order to avoid a long explanation of
> my proposed changes, I'd prefer to state that the patch set will 
> establish and enforce the following rules:
> 
>  1. An APQN can be assigned to an mdev device iff it is NOT
> reserved for use by a zcrypt driver and is not assigned to
> another mdev device.
> 
>  2. Once an APQN is assigned to an mdev device, it will remain
> assigned until it is explicitly unassigned.
> 
>  3. A queue's APQN can be set in the guest's CRYCB iff the APQN is
> assigned to the mdev device used by the guest; however, if the
> queue is also in the host configuration (i.e., online), it MUST
> also be bound to the vfio_ap device driver.
> 
>  4. When a queue is bound to the vfio_ap driver and its APQN
> is assigned to an mdev device in use by a guest, the guest will
> be given access to the queue.
> 
>  5. When a queue is unbound from the vfio_ap driver and its APQN
> is assigned to an mdev device in use by the guest, access to
> the card containing the queue will be removed from the guest.
> Keep in mind that we can not deny access to a specific queue
> due to the architecture (i.e., clearing a bit in the AQM
> removes access to the queue for all adapters)
> 
>  6. When an adapter is assigned to an mdev device that is in use
> by a guest, the guest will be given access to the adapter.
> 
>  7. When an adapter is unassigned from an mdev device that is in use
> by a guest, access to the adapter will removed from the guest.
> 
>  8. When a domain is assigned to an mdev device that is in use
> by a guest, the guest will be given access to the domain.
> 
>  9. When a domain is unassigned from an mdev device that is in use
> by a guest, access to the domain will removed from the guest.
> 

Based on our off-the-list chat and this list I think I know
where are you heading :). I think it's actually the design that I
currently prefer the most. But in that case, it may be wise to touch
base with Reinhard -- AFAIR he was the strongest proponent of the 'do
not let a[pq]mask changes take away queues from guests' design.

Regards,
Halil

Re: [PATCH v3 3/3] module: Make __tracepoints_ptrs as read-only

2019-04-17 Thread Paul E. McKenney

On Wed, Apr 17, 2019 at 05:16:18PM +0200, Jessica Yu wrote:
> +++ Steven Rostedt [10/04/19 20:44 -0400]:
> >On Wed, 10 Apr 2019 16:29:02 -0400
> >Joel Fernandes  wrote:
> >
> >>The srcu structure pointer array is modified at module load time because the
> >>array is fixed up by the module loader at load-time with the final locations
> >>of the tracepoints right?  Basically relocation fixups. At compile time, I
> >>believe it is not know what the values in the ptr array are. I believe same
> >>is true for the tracepoint ptrs array.
> >>
> >>Also it needs to be in a separate __tracepoint_ptrs so that this code works:
> >>
> >>
> >>#ifdef CONFIG_TRACEPOINTS
> >>mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
> >> sizeof(*mod->tracepoints_ptrs),
> >> >num_tracepoints);
> >>#endif
> >>
> >>Did I  miss some point? Thanks,
> >
> >But there's a lot of others too. Hmm, does this mean that the RO data
> >sections that are in modules are not set to RO?
> >
> >There's a bunch of separate sections that are RO. Just look in
> >include/asm-generic/vmlinux.lds.h under the RO_DATA_SECTION() macro.
> >
> >A lot of the sections saved in module.c:find_module_sections() are in
> >that RO_DATA when compiled as a builtin. Are they not RO when loaded via
> >a module?
> 
> Unlike the kernel, the module loader does not rely on a linker script
> to determine which sections get what protections. On module load, all
> sections in a module are looped through and those sections without the
> SHF_WRITE flag will be set to RO. For example, when there is a section
> filled with structs declared as const or if the section was explicitly
> given only the SHF_ALLOC attribute, those will be read-only. As long
> as the sections were given the correct section attributes for
> read-only, it'll have read-only protection. I see this is already the
> case for __param and  __ksymtab*/__kcrctab* sections, but I agree that
> a full audit would be useful to be consistent with builtin RO
> protections.

Thank you very much for the explanation!

Thanx, Paul

Re: rseq/arm32: choosing rseq code signature

2019-04-17 Thread Mathieu Desnoyers

- On Apr 17, 2019, at 10:43 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

> - On Apr 17, 2019, at 6:37 AM, richard earnshaw richard.earns...@arm.com
> wrote:
> 
>> On 16/04/2019 14:39, Mathieu Desnoyers wrote:
>>> - On Apr 15, 2019, at 9:37 AM, Mathieu Desnoyers
>>> mathieu.desnoy...@efficios.com wrote:
>>> 
 - On Apr 15, 2019, at 9:30 AM, peter maydell peter.mayd...@linaro.org 
 wrote:

> On Mon, 15 Apr 2019 at 14:11, Mathieu Desnoyers
>  wrote:
>>
>> - On Apr 11, 2019, at 3:55 PM, peter maydell 
>> peter.mayd...@linaro.org wrote:
>>
>>> On Thu, 11 Apr 2019 at 18:51, Mathieu Desnoyers
>>>  wrote:
  * This translates to the following instruction pattern in the T16 
 instruction
  * set:
  *
  * little endian:
  * def3udf#243  ; 0xf3
  * e7f5b.n<7f5>
  *
  * big endian:
  * e7f5b.n<7f5>
  * def3udf#243  ; 0xf3
>>>
>>> Do we really care about big-endian instruction-ordering for Thumb?
>>> It requires (AIUI) either an ARMv7R CPU which implements and sets
>>> SCTLR.IE to 1, or a v6-or-earlier CPU using BE32, and it's going to
>>> be even rarer than normal BE8 big-endian...
>>
>> I don't think we care enough about it to look for a trick to
>> turn the branch into something else (which would not branch away from the
>> udf instruction), but considering this signature will be ABI, it's good 
>> to
>> be thorough documentation-wise and cover all existing cases.
>
> I think if you want to document it it would be helpful to
> readers to make it clear that this is the ultra-rare
> big-endian-instruction-order "big endian Thumb", not the only
> moderately-rare little-endian-instructions-big-endian-data
> "big endian Thumb".

 I'm actually very much concerned about environments with big endian
 data and little endian code. Which gcc compiler flags do I need to
 use to test it ?

 I'm concerned about a signature mismatch between what is passed to
 the rseq system call ("data-endian signature") and what is generated
 in the code ("instruction-endian signature").
>>> 
>>> Based on this page:
>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFBBCHB.html
>>> 
>>> My understanding is that the situation is as follows (please confirm):
>>> 
>>> - Prior to ARMv6, you could build and run code that is either big or little
>>> endian,
>>>   given you had a matching Linux kernel endianness. Code and data endianness
>>>   needed
>>>   to match,
>>> - Starting from ARMv6, only little endian code is supported. The endianness 
>>> for
>>> data
>>>   access can be changed through bit [9], the E bit, of the Program Status
>>>   Register,
>>>   (mixed endianness)
>>> 
>>> Looking at ARM build options for gcc, it seems you can select either big or
>>> little
>>> endian (-mbig-endian or -mlittle-endian (default)) which affects both
>>> instruction and
>>> data endianness. So I suspect the -mbig-endian option is really only useful 
>>> for
>>> pre-ARMv6.
>> 
>> -mbig-endian is still correct, even on later architectures.  The linker
>> gets involved, however, and (using the mapping symbol information) swaps
>> the code segments to little-endian form (this is why you have to use
>> .inst rather than .word when inserting instructions, so that the correct
>> mapping symbols are inserted).
> 
> So what you're saying is that if I have:
> 
> void main()
> {
>asm volatile (
>".arm\n\t"
>".inst 0xe7f5def3\n\t"
>".long 0xe7f5def3\n\t");
> }
> 
> and compile it with:
> 
> arm-linux-gnueabihf-gcc -mbig-endian -march=armv6 -c -o arm-big-endianv6.o
> arm-test-endian.c
> 
> It's expected that the generated .o will have big endian instructions, 
> matching
> the endianness of the data, e.g.:
> 
> hexdump arm-big-endianv6.o
> 
> [...]
> 030 0a00 0900 80b5 00af f5e7 f3de f5e7 f3de
> 
> But it's then at the linking stage that the linker will
> reverse the endianness of the ".inst" (but not .long).
> 
> Let's see:
> 
> arm-linux-gnueabihf-gcc -nodefaultlibs -nostdlib -mbig-endian -march=armv6 -o
> arm-big-endianv6 arm-big-endianv6.o
> /usr/lib/gcc-cross/arm-linux-gnueabihf/7/../../../../arm-linux-gnueabihf/bin/ld:
> warning: cannot find entry symbol _start; defaulting to 01b0
> 
> hexdump gives me:
> [...]
> 1b0 80b5 00af f5e7 f3de f5e7 f3de c046 bd46
> 
> So it has not reversed the instruction endianness.
> 
> What am I doing wrong ?

It seems to be specific to using armv6 and armv7* with gcc 7.
gcc 8 seems to indeed reverse the code vs data endianness.

So we need to figure out whether .inst is the right things to
do to declare a signature, or if it's better to use ".long"
which would probably generate an invalid

Re: [PATCH v4] media: docs-rst: Document m2m stateless video decoder interface

2019-04-17 Thread Nicolas Dufresne

Le dimanche 14 avril 2019 à 18:41 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> Le vendredi 12 avril 2019 à 16:47 -0400, Nicolas Dufresne a écrit :
> > Le mercredi 06 mars 2019 à 17:00 +0900, Alexandre Courbot a écrit :
> > > Documents the protocol that user-space should follow when
> > > communicating with stateless video decoders.
> > > 
> > > The stateless video decoding API makes use of the new request and tags
> > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > should probably still be considered staging for a short while.
> 
> [...]
> 
> > From an IRC discussion with Paul and some more digging, I have found a
> > design problem in the decoding process.
> > 
> > In H264 and HEVC you can have multiple decoding unit per frames
> > (slices). This type of encoding is increasingly popular, specially for
> > low latency streaming use cases. The wording of this spec does allow
> > for the notion of decoding unit, and in practice it has been proven to
> > work through some RFC FFMPEG patches and the Cedrus driver. But
> > something important to know is that the FFMPEG RFC implements decoding
> > in lock steps. Which means:
> > 
> >   1. It queues a single free capture buffer
> >   2. It queues an output buffer, set controls, queue the request
> >   3. It waits for a capture buffer to reach state done
> >   4. It dequeues that capture buffer, and queue it back again
> >   5. And then it runs step 2,4,3 again with following slices, until we 
> >  have a complete frame. After what, it restart at step 1
> > 
> > So the implementation makes no use of the queues. There is no batch
> > processing, so we might not be able to reach the maximum hardware
> > throughput.
> > 
> > So the optimal method would look like the following, but there comes
> > the design issue.
> > 
> >   1. Queue a single free capture buffer
> >   2. Queue output buffer for slice 1, set controls, queue the request
> >   3. Queue output buffer for slice 2, set controls, queue the request
> >   4. Wait for completion
> > 
> > The problem is in step 4. Completion means that the capture buffer done
> > decoding a single unit. So assuming the driver supports matching the
> > timestamp against the queued buffer, instead of waiting for a new
> > buffer, the driver would have to mark twice the same buffer to done
> > state, which is just not working to inform userspace that all slices
> > are decoded into the one capture buffer they share.
> 
> Interestingly, I'm experiencing the exact same problem dealing with a
> 2D graphics blitter that has limited ouput scaling abilities which
> imply handlnig a large scaling operation as multiple clipped smaller
> scaling operations. The issue is basically that multiple jobs have to
> be submitted to complete a single frame and relying on an indication
> from the destination buffer (such as a fence) doesn't work to indicate
> that all the operations were completed, since we get the indication at
> each step instead of at the end of the batch.

That looks similar to the IMX.6 IPU m2m driver. It splits the image in
tiles of 1024x1024 and process each tile separately. This driver has
been around for a long time, so I guess they have a solution to that.
They don't need requests, because there is nothing to be bundled with
the input image. I know that Renesas folks have started working on a
de-interlacer. Again, this kind of driver may process and reuse input
buffers for motion compensation, but I don't think they need special
userspace API for that.

> 
> One idea I see to solve this is to have a notion of batch in the driver
> (for our situation, that would be in v4l2) and provide means to get a
> done indication for that entity.

Can't you just make this part of your driver state machine ?

> 
> I think we could extend the request API to allow this. We already
> represent requests as individual file descriptors, we could totally
> group requests in batches and get a sync fd for the batch to poll on
> when we need to return the frames. It would be good if we could expose
> this in a way that makes it work with DRM as an in fence for display.
> Then we can pretty much schedule our flip + decoding together (which is
> quite nice to have when we're running late on the decoding side).
> 
> What do you think?

I'm not sure why this specific thing needs a userspace exposition.

> 
> It feels to me like the request API was designed to open up the way for
> these kinds of improvements, so I'm sure we can find an agreeable
> solution that extends the API.
> 
> > To me, multi slice encoded stream are just too common, and they will
> > also exist for AV1. So we really need a solution to this that does not
> > require operating in lock steps. Specially that some HW can decode
> > multiple slices in parallel (multi core), we would not want to prevent
> > that HW from being used efficiently. On top of this, we need a solution
> > so that we can also keep queuing slice of the following frames if

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Keith Busch

On Wed, Apr 17, 2019 at 11:23:18AM +0200, Michal Hocko wrote:
> On Tue 16-04-19 14:22:33, Dave Hansen wrote:
> > Keith Busch had a set of patches to let you specify the demotion order
> > via sysfs for fun.  The rules we came up with were:
> 
> I am not a fan of any sysfs "fun"

I'm hung up on the user facing interface, but there should be some way a
user decides if a memory node is or is not a migrate target, right?

[PATCH] ASoC: Mediatek: MT8183: Fix build err while CONFIG_I2C set to module

2019-04-17 Thread Yue Haibing

From: YueHaibing 

During randconfig builds, I occasionally run into an invalid configuration

WARNING: unmet direct dependencies detected for SND_SOC_TS3A227E
  Depends on [m]: SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && I2C [=m]
  Selected by [y]:
  - SND_SOC_MT8183_MT6358_TS3A227E_MAX98357A [=y] && SOUND [=y] && !UML && SND 
[=y] && SND_SOC [=y] && SND_SOC_MT8183 [=y]

sound/soc/codecs/ts3a227e.o: In function `ts3a227e_i2c_probe':
ts3a227e.c:(.text+0x684): undefined reference to `__devm_regmap_init_i2c'
sound/soc/codecs/ts3a227e.o: In function `ts3a227e_driver_init':
ts3a227e.c:(.init.text+0x18): undefined reference to `i2c_register_driver'
sound/soc/codecs/ts3a227e.o: In function `ts3a227e_driver_exit':
ts3a227e.c:(.exit.text+0x14): undefined reference to `i2c_del_driver'

This patch add I2C dependency to fix this.

Reported-by: Hulk Robot 
Fixes: ebbddc75bbe8 ("ASoC: Mediatek: MT8183: Add machine driver with DA7219")
Signed-off-by: YueHaibing 
---
 sound/soc/mediatek/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/mediatek/Kconfig b/sound/soc/mediatek/Kconfig
index 874404b..f70b710 100644
--- a/sound/soc/mediatek/Kconfig
+++ b/sound/soc/mediatek/Kconfig
@@ -118,6 +118,7 @@ config SND_SOC_MT8183
 
 config SND_SOC_MT8183_MT6358_TS3A227E_MAX98357A
tristate "ASoC Audio driver for MT8183 with MT6358 TS3A227E MAX98357A 
codec"
+   depends on I2C
depends on SND_SOC_MT8183
select SND_SOC_MT6358
select SND_SOC_MAX98357A
-- 
2.7.4

[PATCH -next] drm/panfrost: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST

2019-04-17 Thread Steven Price

Since panfrost has a 'select' on IOMMU_IO_PGTABLE_LPAE we must depend on
the same set of flags. Otherwise IOMMU_IO_PGTABLE_LPAE will be forced on
even though it cannot build (no support for cmpxchg64).

This fixes the following warning from kconfig:

WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE
  Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=y] && 
!GENERIC_ATOMIC64 [=y])
  Selected by [y]:
  - DRM_PANFROST [=y] && HAS_IOMEM [=y] && DRM [=y] && (ARM || ARM64 || 
COMPILE_TEST [=y]) && MMU [=y]

Reported-by: kbuild test robot 
Signed-off-by: Steven Price 
---
 drivers/gpu/drm/panfrost/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
index 7f5e572daa2d..591611dc4e34 100644
--- a/drivers/gpu/drm/panfrost/Kconfig
+++ b/drivers/gpu/drm/panfrost/Kconfig
@@ -3,7 +3,7 @@
 config DRM_PANFROST
tristate "Panfrost (DRM support for ARM Mali Midgard/Bifrost GPUs)"
depends on DRM
-   depends on ARM || ARM64 || COMPILE_TEST
+   depends on ARM || ARM64 || (COMPILE_TEST && !GENERIC_ATOMIC64)
depends on MMU
select DRM_SCHED
select IOMMU_SUPPORT
-- 
2.20.1

fsi: use put_device to release resource on error path

2019-04-17 Thread Pan Bian

In the function fsi_slave_init, kfree is used to release slave if error
occurs during set smode. Some fields of slave will not be released.
Resulting in memory leak. Instread, put_device should be used to
correctly release resources.

Fixes: d1dcd6782576("fsi: Add cfam char devices")
Signed-off-by: Pan Bian 
---
 drivers/fsi/fsi-core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c
index 2c31563..2bb6625 100644
--- a/drivers/fsi/fsi-core.c
+++ b/drivers/fsi/fsi-core.c
@@ -1056,8 +1056,7 @@ static int fsi_slave_init(struct fsi_master *master, int 
link, uint8_t id)
dev_warn(>dev,
"can't set smode on slave:%02x:%02x %d\n",
link, id, rc);
-   kfree(slave);
-   return -ENODEV;
+   goto err_free;
}
if (master->link_config)
master->link_config(master, link,
-- 
2.7.4

[PATCH v5 2/5] ARM: dts: da850-lego-ev3: enable cpufreq

2019-04-17 Thread Bartosz Golaszewski

From: David Lechner 

Add a fixed regulator for the LEGO EV3 board along with board-specific
CPU configuration.

Signed-off-by: David Lechner 
Signed-off-by: Bartosz Golaszewski 
---
 arch/arm/boot/dts/da850-lego-ev3.dts | 30 
 1 file changed, 30 insertions(+)

diff --git a/arch/arm/boot/dts/da850-lego-ev3.dts 
b/arch/arm/boot/dts/da850-lego-ev3.dts
index 66fcadf0ba91..553717f84483 100644
--- a/arch/arm/boot/dts/da850-lego-ev3.dts
+++ b/arch/arm/boot/dts/da850-lego-ev3.dts
@@ -125,6 +125,15 @@
amp-supply = <>;
};
 
+   cvdd: regulator0 {
+   compatible = "regulator-fixed";
+   regulator-name = "cvdd";
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-always-on;
+   regulator-boot-on;
+   };
+
/*
 * This is a 5V current limiting regulator that is shared by USB,
 * the sensor (input) ports, the motor (output) ports and the A/DC.
@@ -204,6 +213,27 @@
clock-frequency = <2400>;
 };
 
+ {
+   cpu-supply = <>;
+};
+
+/* since we have a fixed regulator, we can't run at these points */
+_100 {
+   status = "disabled";
+};
+
+_200 {
+   status = "disabled";
+};
+
+/*
+ * The SoC is actually the 456MHz version, but because of the fixed regulator
+ * This is the fastest we can go.
+ */
+_375 {
+   status = "okay";
+};
+
 _core {
status = "okay";
 
-- 
2.21.0

[PATCH v5 3/5] ARM: dts: da850-lcdk: enable cpufreq

2019-04-17 Thread Bartosz Golaszewski

From: David Lechner 

Add a fixed regulator for the da850-lcdk board along with board-specific
CPU configuration.

Signed-off-by: David Lechner 
Signed-off-by: Bartosz Golaszewski 
---
 arch/arm/boot/dts/da850-lcdk.dts | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/arm/boot/dts/da850-lcdk.dts b/arch/arm/boot/dts/da850-lcdk.dts
index 26f453dc8370..b36d5e36bcf1 100644
--- a/arch/arm/boot/dts/da850-lcdk.dts
+++ b/arch/arm/boot/dts/da850-lcdk.dts
@@ -155,12 +155,48 @@
};
};
};
+
+   cvdd: regulator0 {
+   compatible = "regulator-fixed";
+   regulator-name = "cvdd";
+   regulator-min-microvolt = <130>;
+   regulator-max-microvolt = <130>;
+   regulator-always-on;
+   regulator-boot-on;
+   };
 };
 
 _clk {
clock-frequency = <2400>;
 };
 
+ {
+   cpu-supply = <>;
+};
+
+/*
+ * LCDK has a fixed CVDD of 1.3V, so only operating points >= 300MHz are
+ * valid. Unfortunately due to a problem with the DA8XX OHCI controller, we
+ * can't enable more than one OPP by default, since the controller sometimes
+ * becomes unresponsive after a transition. Fix the frequency at 456 MHz.
+ */
+
+_100 {
+   status = "disabled";
+};
+
+_200 {
+   status = "disabled";
+};
+
+_300 {
+   status = "disabled";
+};
+
+_456 {
+   status = "okay";
+};
+
 _core {
status = "okay";
 
-- 
2.21.0

[PATCH v5 5/5] ARM: davinci_all_defconfig: Enable CPUFREQ_DT

2019-04-17 Thread Bartosz Golaszewski

From: David Lechner 

This sets CONFIG_CPUFREQ_DT=m in davinci_all_defconfig. This is used for
frequency scaling on device tree boards.

Signed-off-by: David Lechner 
Signed-off-by: Bartosz Golaszewski 
---
 arch/arm/configs/davinci_all_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/davinci_all_defconfig 
b/arch/arm/configs/davinci_all_defconfig
index 207962a656a2..c3502236132e 100644
--- a/arch/arm/configs/davinci_all_defconfig
+++ b/arch/arm/configs/davinci_all_defconfig
@@ -45,6 +45,7 @@ CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
 CONFIG_CPU_FREQ_GOV_PERFORMANCE=m
 CONFIG_CPU_FREQ_GOV_POWERSAVE=m
 CONFIG_CPU_FREQ_GOV_ONDEMAND=m
+CONFIG_CPUFREQ_DT=m
 CONFIG_CPU_IDLE=y
 CONFIG_NET=y
 CONFIG_PACKET=y
-- 
2.21.0

[PATCH v5 4/5] ARM: dts: da850-evm: enable cpufreq

2019-04-17 Thread Bartosz Golaszewski

From: Bartosz Golaszewski 

Enable cpufreq-dt support for da850-evm. The cvdd is supplied by the
tps65070 pmic with configurable output voltage. By default da850-evm
boards support frequencies up to 375MHz so enable this operating
point.

Signed-off-by: Bartosz Golaszewski 
Reviewed-by: Adam Ford 
---
 arch/arm/boot/dts/da850-evm.dts | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts
index f04bc3e15332..f94bb38fdad9 100644
--- a/arch/arm/boot/dts/da850-evm.dts
+++ b/arch/arm/boot/dts/da850-evm.dts
@@ -191,6 +191,19 @@
};
 };
 
+ {
+   cpu-supply = <_reg>;
+};
+
+/*
+ * The standard da850-evm kits and SOM's are 375MHz so enable this operating
+ * point by default. Higher frequencies must be enabled for custom boards with
+ * other variants of the SoC.
+ */
+_375 {
+   status = "okay";
+};
+
  {
status = "okay";
 };
-- 
2.21.0

[PATCH v5 0/5] ARM: da850: enable cpufreq in DT mode

2019-04-17 Thread Bartosz Golaszewski

From: Bartosz Golaszewski 

This series adds cpufreq-dt operating points for da850 boards supported
with device tree (da850-lcdk, da850-lego-ev3, da850-evm).

Last patch enables CPUFREQ_DT in davinci_all_defconfig.

v1 -> v2:
- use the VDCDC3_1.2V regulator as cpu-supply on da850-evm

v2 -> v3:
- drop patch 1, as the revision tag is in fact correctly passed to the kernel
  by u-boot
- only enable the 375 operating point for da850-evm as this is the standard
  frequency for this board

v3 -> v4:
- split the first patch into three separate changesets: one adding the
  operating points to the main dtsi and two enabling cpufreq on da850-lego-ev3
  and da850-lcdk
- remove the operating point not mentioned in the datasheet (415 MHz)
- fix commit message in patch 4/5

v4 -> v5:
- only enable a single OPP for da850-lcdk due to the problem with the OHCI
  controller becoming unresponsive after cpufreq transitions
- fix the name of the pmic on da850-evm

Bartosz Golaszewski (1):
  ARM: dts: da850-evm: enable cpufreq

David Lechner (4):
  ARM: dts: da850: add cpu node and operating points to DT
  ARM: dts: da850-lego-ev3: enable cpufreq
  ARM: dts: da850-lcdk: enable cpufreq
  ARM: davinci_all_defconfig: Enable CPUFREQ_DT

 arch/arm/boot/dts/da850-evm.dts| 13 +++
 arch/arm/boot/dts/da850-lcdk.dts   | 36 +++
 arch/arm/boot/dts/da850-lego-ev3.dts   | 30 
 arch/arm/boot/dts/da850.dtsi   | 50 ++
 arch/arm/configs/davinci_all_defconfig |  1 +
 5 files changed, 130 insertions(+)

-- 
2.21.0

[PATCH v5 1/5] ARM: dts: da850: add cpu node and operating points to DT

2019-04-17 Thread Bartosz Golaszewski

From: David Lechner 

This adds a cpu node and operating points to the common da850.dtsi file.

All operating points above 300MHz are disabled by default.

Regulators need to be hooked up on a per-board basis.

Signed-off-by: David Lechner 
Signed-off-by: Bartosz Golaszewski 
---
 arch/arm/boot/dts/da850.dtsi | 50 
 1 file changed, 50 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 559659b399d0..0c9a8e78f748 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -20,6 +20,56 @@
reg = <0xc000 0x0>;
};
 
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu: cpu@0 {
+   compatible = "arm,arm926ej-s";
+   device_type = "cpu";
+   reg = <0>;
+   clocks = < 14>;
+   operating-points-v2 = <_table>;
+   };
+   };
+
+   opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   opp_100: opp100-1 {
+   opp-hz = /bits/ 64 <1>;
+   opp-microvolt = <100 95 105>;
+   };
+
+   opp_200: opp110-2 {
+   opp-hz = /bits/ 64 <2>;
+   opp-microvolt = <110 105 116>;
+   };
+
+   opp_300: opp120-3 {
+   opp-hz = /bits/ 64 <3>;
+   opp-microvolt = <120 114 132>;
+   };
+
+   /*
+* Original silicon was 300MHz max, so higher frequencies
+* need to be enabled on a per-board basis if the chip is
+* capable.
+*/
+
+   opp_375: opp120-37500 {
+   status = "disabled";
+   opp-hz = /bits/ 64 <37500>;
+   opp-microvolt = <120 114 132>;
+   };
+
+   opp_456: opp130-45600 {
+   status = "disabled";
+   opp-hz = /bits/ 64 <45600>;
+   opp-microvolt = <130 125 135>;
+   };
+   };
+
arm {
#address-cells = <1>;
#size-cells = <1>;
-- 
2.21.0

Re: [PATCH v1 3/4] signal: support CLONE_PIDFD with pidfd_send_signal

2019-04-17 Thread Oleg Nesterov

On 04/17, Christian Brauner wrote:
>
> On Wed, Apr 17, 2019 at 04:01:06PM +0200, Oleg Nesterov wrote:
> > On 04/16, Christian Brauner wrote:
> > >
> > > @@ -3581,12 +3588,12 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, 
> > > int, sig,
> > >   if (flags)
> > >   return -EINVAL;
> > >
> > > - f = fdget_raw(pidfd);
> > > + f = fdget(pidfd);
> >
> > could you explain this change?
> >
> > I am just curious, I don't understand why should we disallow O_PATH and how
> > this connects to this patch.
>
> Sending a signal through a pidfd is considered to be on a par with a
> "write" to that pidfd.

OK, but how this connects to "support pidfds" ?

> Additionally, we use the fops associated with the fd to detect whether
> it is actually a pidfd or not. This is not possible with O_PATH since
> f_ops will be set to dummy fops.

indeed... I didn't know this, thanks!

But this means that pidfd_send_signal() will return -EBADF with or without
this change; pidfd_to_pid() will return -EBADF even if fdget_raw() suceeds,
right?

To clarify, I am not arguing. I am trying to understand why exactly do we
need this s/fdget_raw/fdget/ change and, why it doesn't come as a separate
patch. Can you add a note into the changelog?

Oleg.

Re: [v2 RFC PATCH 0/9] Another Approach to Use PMEM as NUMA Node

2019-04-17 Thread Keith Busch

On Tue, Apr 16, 2019 at 04:17:44PM -0700, Yang Shi wrote:
> On 4/16/19 4:04 PM, Dave Hansen wrote:
> > On 4/16/19 2:59 PM, Yang Shi wrote:
> > > On 4/16/19 2:22 PM, Dave Hansen wrote:
> > > > Keith Busch had a set of patches to let you specify the demotion order
> > > > via sysfs for fun.  The rules we came up with were:
> > > > 1. Pages keep no history of where they have been
> > > > 2. Each node can only demote to one other node
> > > Does this mean any remote node? Or just DRAM to PMEM, but remote PMEM
> > > might be ok?
> > In Keith's code, I don't think we differentiated.  We let any node
> > demote to any other node you want, as long as it follows the cycle rule.
> 
> I recall Keith's code let the userspace define the target node.

Right, you have to opt-in in my original proposal since it may be a
bit presumptuous of the kernel to decide how a node's memory is going
to be used. User applications have other intentions for it.

It wouldn't be too difficult to make HMAT to create a reasonable initial
migration graph too, and that can also make that an opt-in user choice.

> Anyway, we may need add one rule: not migrate-on-reclaim from PMEM
> node.  Demoting from  PMEM to DRAM sounds pointless.

I really don't think we should be making such hard rules on PMEM. It
makes more sense to consider performance and locality for migration
rules than on a persistence attribute.

Re: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

2019-04-17 Thread Theodore Ts'o

On Wed, Apr 17, 2019 at 09:28:35AM +, David Laight wrote:
> 
> If you can guarantee back to back requests on the PRNG then it is probably
> possible to recalculate its state from 'bits of state'/5 calls.
> Depend on the PRNG this might be computationally expensive.
> For some PRNG it will be absolutely trivial.
> ...
> Stirring in a little bit of entropy doesn't help much either.
> The entropy bits are effectively initial state bits.
> Add 4 in with each request and 128 outputs gives 640 linear
> equations in the (128 + 4 * 128) unknowns - still solvable.

This is basically a scenario where the attacker has already taken
control of Ring 3 execution and the question is how hard is it for
them to perform privilege escalation attack to ring 0, right?  I'm
sure the security folks will think I'm defeatist, but my personal rule
of thumb is if the attacker has ring 3 control, you've already lost
--- I figure there are so many zero days that getting ring 0 control
is a foregone conclusion.  :-(

So that basically means if we want to protect against this, we're
going to do something which involves Real Crypto (tm).  Whether that's
RDRAND, or using Chacha20, etc., or something that has some attack
resistance, such as "half MD5", etc., but emminently crackable by
brute force, is essentially a overhead vs. security argument, and what
it is we are willing to pay.

- Ted

Re: [PATCH v3 3/3] module: Make __tracepoints_ptrs as read-only

2019-04-17 Thread Jessica Yu


+++ Steven Rostedt [10/04/19 20:44 -0400]:

On Wed, 10 Apr 2019 16:29:02 -0400
Joel Fernandes  wrote:


The srcu structure pointer array is modified at module load time because the
array is fixed up by the module loader at load-time with the final locations
of the tracepoints right?  Basically relocation fixups. At compile time, I
believe it is not know what the values in the ptr array are. I believe same
is true for the tracepoint ptrs array.

Also it needs to be in a separate __tracepoint_ptrs so that this code works:


#ifdef CONFIG_TRACEPOINTS
mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
 sizeof(*mod->tracepoints_ptrs),
 >num_tracepoints);
#endif

Did I  miss some point? Thanks,


But there's a lot of others too. Hmm, does this mean that the RO data
sections that are in modules are not set to RO?

There's a bunch of separate sections that are RO. Just look in
include/asm-generic/vmlinux.lds.h under the RO_DATA_SECTION() macro.

A lot of the sections saved in module.c:find_module_sections() are in
that RO_DATA when compiled as a builtin. Are they not RO when loaded via
a module?


Unlike the kernel, the module loader does not rely on a linker script
to determine which sections get what protections. On module load, all
sections in a module are looped through and those sections without the
SHF_WRITE flag will be set to RO. For example, when there is a section
filled with structs declared as const or if the section was explicitly
given only the SHF_ALLOC attribute, those will be read-only. As long
as the sections were given the correct section attributes for
read-only, it'll have read-only protection. I see this is already the
case for __param and  __ksymtab*/__kcrctab* sections, but I agree that
a full audit would be useful to be consistent with builtin RO
protections.

Hope that helps,

Jessica

Re: [PATCH v2] panic: add an option to replay all the printk message in buffer

2019-04-17 Thread Feng Tang

On Wed, Apr 17, 2019 at 02:24:58PM +0200, Petr Mladek wrote:
> On Wed 2019-04-17 19:50:10, Sergey Senozhatsky wrote:
> > On (04/17/19 18:46), Sergey Senozhatsky wrote:
> > > 
> > > Does not look too complex/ugly.
> > 
> > Looks simpler than adding one more global state to the
> > console_unlock() printing loop.
> > 
> > // Not tested at all //
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index cd73af35ec66..50eacfc9bc7e 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -51,6 +51,7 @@ EXPORT_SYMBOL_GPL(panic_timeout);
> >  #define PANIC_PRINT_TIMER_INFO 0x0004
> >  #define PANIC_PRINT_LOCK_INFO  0x0008
> >  #define PANIC_PRINT_FTRACE_INFO0x0010
> > +#define PANIC_PRINT_ALL_PRINTK_MSG 0x0020
> >  unsigned long panic_print;
> >  
> >  ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
> > @@ -148,6 +149,9 @@ static void panic_print_sys_info(void)
> >  
> > if (panic_print & PANIC_PRINT_FTRACE_INFO)
> > ftrace_dump(DUMP_ALL);
> > +
> > +   if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > +   console_flush_on_panic(CONSOLE_FLUSH_ALL);
> 
> The console must be replayed as the first thing in
> panic_print_sys_info(). Otherwise, the original messages
> are replaced by the other dumps. Especially ftrace_dump()
> might be pretty long.
> 
> Also the names of the function and the parameter are misleading.
> All messages are already flushed when this is called.
> 
> >  }
> >  
> >  /**
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 17102fd4c136..da60a185dbbb 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2549,6 +2549,14 @@ void console_flush_on_panic(void)
> >  */
> > console_trylock();
> > console_may_schedule = 0;
> > +   if (flush_mode == CONSOLE_FLUSH_ALL) {
> > +   /*
> > +* Can be done under logbuf lock, but it's unlikely that
> > +* we will have any race conditions here.
> > +*/
> > +   console_seq = log_first_seq;
> > +   console_idx = log_first_idx;

This is very similar to my V1 patch :), excepted I used a bool
as the parameter.

> 
> I agree that it is easier. The cost is that the same messages are
> printed again without any explanation.
> 
> I still think that it would be convenient to write a header line.
> It would help to understand the log for any, even 3rd-party, reader.
> Also it would help to find the beginning in a very long log.

My thought is, the replay is only a debug option and disabled by default,
so when user specifically enable the bit of PANIC_PRINT_ALL_PRINTK_MSG,
the whole replay of printk msg should be expected.

Thanks,
Feng

> 
> If the complexity of console_unlock() is the concern, we could
> refactor the code, e.g. put this "reset log" code into a separate
> function.
> 
> Best Regards,
> Petr

[PATCH v3] ARM: debug-ll: add default address for digicolor

2019-04-17 Thread Arnd Bergmann

The digicolor platform has three UARTs, but the Kconfig.debug
file explicitly lists port zero as the one to be used for the
console, while not providing any default values.

This can get an automated randconfig build stuck in a loop
waiting for the user to input the number. As we already know
the physical address, this patch provides that number as
default, along with a reasonable default value for the virtual
address.

Cc: Baruch Siach 
Signed-off-by: Arnd Bergmann 
---
 arch/arm/Kconfig.debug | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index a8a1e14f20ab..3e33753bd60c 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1676,6 +1676,7 @@ config DEBUG_UART_PHYS
default 0xe6e68000 if DEBUG_RCAR_GEN2_SCIF1
default 0xe6ee if DEBUG_RCAR_GEN2_SCIF4
default 0xe8008000 if DEBUG_R7S72100_SCIF2
+   default 0xf000 if DEBUG_DIGICOLOR_UA0
default 0xfbe0 if ARCH_EBSA110
default 0xf1012000 if DEBUG_MVEBU_UART0_ALTERNATE
default 0xf1012100 if DEBUG_MVEBU_UART1_ALTERNATE
@@ -1728,6 +1729,7 @@ config DEBUG_UART_VIRT
default 0xe0010fe0 if ARCH_RPC
default 0xfbe0 if ARCH_EBSA110
default 0xf001 if DEBUG_ASM9260_UART
+   default 0xf010 if DEBUG_DIGICOLOR_UA0
default 0xf01fb000 if DEBUG_NOMADIK_UART
default 0xf0201000 if DEBUG_BCM2835 || DEBUG_BCM2836
default 0xf1000300 if DEBUG_BCM_5301X
-- 
2.20.0

Re: [PATCH v2] ARM: debug-ll: add default address for digicolor

2019-04-17 Thread Arnd Bergmann

On Wed, Apr 17, 2019 at 4:18 PM Baruch Siach  wrote:
> On Wed, Apr 17 2019, Arnd Bergmann wrote:

> > diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
> > index 6d6e0330930b..12c0d29b75e3 100644
> > --- a/arch/arm/Kconfig.debug
> > +++ b/arch/arm/Kconfig.debug
> > @@ -1677,6 +1677,7 @@ config DEBUG_UART_PHYS
> >   default 0xe6ee if DEBUG_RCAR_GEN2_SCIF4
> >   default 0xe8008000 if DEBUG_R7S72100_SCIF2
> >   default 0xfbe0 if ARCH_EBSA110
> > + default 0xf010 if DEBUG_DIGICOLOR_UA0
>
> Should be 0xf000 for the physical address.
>
> >   default 0xf1012000 if DEBUG_MVEBU_UART0_ALTERNATE
> >   default 0xf1012100 if DEBUG_MVEBU_UART1_ALTERNATE
> >   default 0xf7fc9000 if DEBUG_BERLIN_UART
> > @@ -1784,6 +1785,7 @@ config DEBUG_UART_VIRT
> >   default 0xfd012000 if DEBUG_MVEBU_UART0_ALTERNATE && ARCH_MV78XX0
> >   default 0xfd883000 if DEBUG_ALPINE_UART0
> >   default 0xfde12000 if DEBUG_MVEBU_UART0_ALTERNATE && ARCH_DOVE
> > + default 0xfe00 if DEBUG_DIGICOLOR_UA0
>
> Maybe you intended to change the virtual address to 0xf010?

Ouch, sorry for messing it up again. Sending a v3 now.

   Arnd

[PATCH -next] ASoC: Intel: Haswell: Remove set but not used variable 'stage_type'

2019-04-17 Thread Yue Haibing

From: YueHaibing 

Fixes gcc '-Wunused-but-set-variable' warning:

sound/soc/intel/haswell/sst-haswell-ipc.c: In function 'hsw_stream_message':
sound/soc/intel/haswell/sst-haswell-ipc.c:669:29: warning: variable 
'stage_type' set but not used [-Wunused-but-set-variable]

It is never used since introduction in
commit ba57f68235cf ("ASoC: Intel: create haswell folder and move haswell 
platform files in")

Signed-off-by: YueHaibing 
---
 sound/soc/intel/haswell/sst-haswell-ipc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/intel/haswell/sst-haswell-ipc.c 
b/sound/soc/intel/haswell/sst-haswell-ipc.c
index 31fcdf12..4d3de99 100644
--- a/sound/soc/intel/haswell/sst-haswell-ipc.c
+++ b/sound/soc/intel/haswell/sst-haswell-ipc.c
@@ -672,7 +672,6 @@ static int hsw_stream_message(struct sst_hsw *hsw, u32 
header)
 
stream_msg = msg_get_stream_type(header);
stream_id = msg_get_stream_id(header);
-   stage_type = msg_get_stage_type(header);
 
stream = get_stream_by_id(hsw, stream_id);
if (stream == NULL)
-- 
2.7.4

Re: [PATCH] iio: accel: add missing sensor for some 2-in-1 based ultrabooks

2019-04-17 Thread Hans de Goede


Hi,

On 15-04-19 17:40, l...@aurorafoss.org wrote:

April 6, 2019 10:36 AM, "Hans de Goede"  wrote:

Hi,


Yes that seems the best way forward with this.

Note I think "base" is better then "keyboard" for the sensor which
is in the base/keyboard. But neither is perfect, so go which whatever
you prefer.


Reference to:
- https://github.com/hadess/iio-sensor-proxy/pull/262 > - 
https://github.com/systemd/systemd/pull/12322


Thank you for your work on this, I see that Bastien has
already reviewed the iio-sensor-proxy changes.

I've just added one small remark to the systemd changes,
except for that small remark the systemd changes look good to me.

Regards,

Hams





On 06-04-19 01:01, Luís Ferreira wrote:


Hi,
Basically we need to come up with a convention to (optionally) indicate


the sensors location with a udev attribute set by:
/lib/udev/hwdb.d/60-sensor.hwdb

So should we start adding `ACCEL_LOCATION=display` and
`ACCEL_LOCATION=keyboard` attributes to that file and patch
iio-sensor-proxy to ignore the keyboard ones as a first step ?


Yes that seems the best way forward with this.

Note I think "base" is better then "keyboard" for the sensor which
is in the base/keyboard. But neither is perfect, so go which whatever
you prefer.

Thanks & Regards,

Hans


On Wed, 3 Apr 2019 at 10:10, Hans de Goede  wrote:


Hi,

On 02-04-19 18:04, Luís Ferreira wrote:

Some ultrabooks, like Teclast F6 Pro, use KIOX010A sensor on display
and KIOX020A sensor on keyboard base, to detect tablet mode or screen
orientation.


I deliberately left out the KIOX020A id for now, because currently
userspace cannot really deal with having 2 sensors.

See:
https://github.com/systemd/systemd/issues/6557
https://github.com/hadess/iio-sensor-proxy/issues/166

Basically we need to come up with a convention to (optionally) indicate
the sensors location with a udev attribute set by:
/lib/udev/hwdb.d/60-sensor.hwdb

And then patch iio-sensor-proxy to consume that attribute and ignore
the one which has e.g. ACCEL_LOCATION=keyboard in its udev properties

Ignoring would be a first step, maybe later it can do something useful
with it, see e.g. : https://github.com/alesguzik/linux_detect_tablet_mode

IMHO we really should minimally get code in place for iio-sensor-proxy
to ignore the keyboard accelerometer before merging this patch.

I realize that having the code in place will not magically get it on
all users machines, but I believe this is the minimum which needs to
happen before we push this out and potentially breaks people screen
rotation.

I've had working on this on my TODO list for a long long time now,
but -ENOTIME. If you have some time to work on this then that would
be great.

Regards,

Hans

Signed-off-by: Luís Ferreira 
---
drivers/iio/accel/kxcjk-1013.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/iio/accel/kxcjk-1013.c b/drivers/iio/accel/kxcjk-1013.c
index 7096e577b23f..9a5e445facc1 100644
--- a/drivers/iio/accel/kxcjk-1013.c
+++ b/drivers/iio/accel/kxcjk-1013.c
@@ -1492,6 +1492,7 @@ static const struct acpi_device_id kx_acpi_match[] = {
{"KIOX0009", KXTJ21009},
{"KIOX000A", KXCJ91008},
{"KIOX010A", KXCJ91008}, /* KXCJ91008 inside the display of a 2-in-1 */
+ {"KIOX020A", KXCJ91008},
{"KXTJ1009", KXTJ21009},
{"KXJ2109", KXTJ21009},
{"SMO8500", KXCJ91008},

Re: [PATCH] mm: fix false-positive OVERCOMMIT_GUESS failures

2019-04-17 Thread Johannes Weiner

On Wed, Apr 17, 2019 at 02:04:17PM +0200, Vlastimil Babka wrote:
> On 4/12/19 10:06 PM, Roman Gushchin wrote:
> > On Fri, Apr 12, 2019 at 03:14:18PM -0400, Johannes Weiner wrote:
> >> With the default overcommit==guess we occasionally run into mmap
> >> rejections despite plenty of memory that would get dropped under
> >> pressure but just isn't accounted reclaimable. One example of this is
> >> dying cgroups pinned by some page cache. A previous case was auxiliary
> >> path name memory associated with dentries; we have since annotated
> >> those allocations to avoid overcommit failures (see d79f7aa496fc ("mm:
> >> treat indirectly reclaimable memory as free in overcommit logic")).
> >>
> >> But trying to classify all allocated memory reliably as reclaimable
> >> and unreclaimable is a bit of a fool's errand. There could be a myriad
> >> of dependencies that constantly change with kernel versions.
> 
> Just wondering, did you find at least one another reclaimable case like
> those path names?

I'm only aware of the cgroup structures which can be pinned by a
dentry, inode, or page cache page. But they're an entire tree of
memory allocations, per-cpu memory regions etc. that would be
impossible to annotate correctly; it's also unreclaimable while the
cgroup is user-visible and only becomes reclaimable once rmdir'd.

Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()

2019-04-17 Thread Mickaël Salaün

On 17/04/2019 12:01, Florian Weimer wrote:
> * Steve Grubb:
>
>> On Tuesday, April 16, 2019 7:49:39 AM EDT Florian Weimer wrote:
>>> * Steve Grubb:
 This flag that is being proposed means that you would have to patch all
 interpreters to use it. If you are sure that upstreams will accept that,
 why not just change the policy to interpreters shouldn't execute
 anything unless the execute bit is set? That is simpler and doesn't need
 a kernel change. And setting the execute bit is an auditable event.
>>>
>>> I think we need something like O_MAYEXEC so that security policies can
>>> be enforced and noexec mounts can be detected.
>>
>> Application whitelisting can already today stop unknown software without
>> needing O_MAYEXEC.

Whitelisting may be a lot of thing (path/TPE, signed binaries…), but
being able to handle this with a global system configuration (instead of
app-specific hardcoded configuration) is a good idea. ;)

>
> I'm somewhat interested in using this to add a proper check for
> executability to explicit dynamic loader invocations.  In other words,
> this
>
>   /lib64/ld-linux-x86-64.so.2 /path/to/noexec/fs/program
>
> should refuse to run the program if the program is located on a file
> system mounted with the noexec attribute.

What if a sysadmin need to do this on an executable mount point? Being
able to enforce a security policy according to a configuration may fit
to much more use cases.

>
>> The problem is that passing O_MAYEXEC is opt-in. You can use ptrace/seccomp/
>> bpf/LD_PRELOAD/LD_AUDIT to remove that bit from an otherwise normal program.
>> This does not require privs to do so.
>
> That doesn't really help with the above.

Right, ptrace/LD_PRELOAD and so on must be addressed by something else
than only O_MAYEXEC.

>
>> But let's consider that this comes to pass and every interpreter is
>> updated and IMA can see the O_MAYEXEC flag. Attackers now simply pivot
>> to running programs via stdin. It never touches disk and therefore
>> nothing enforces security policy. This already is among the most
>> common ways that malware runs today to evade detection.

As my previous reply, use cases like stdin may be restricted as well.

>
> Are you referring to Windows malware using Powershell?
>
> I'm not sure this is applicable to Linux.  We do not have much
> behavioral monitoring anyway.
>
> Thanks,
> Florian
>

--
Mickaël Salaün
ANSSI/SDE/ST/LAM

Les données à caractère personnel recueillies et traitées dans le cadre de cet 
échange, le sont à seule fin d’exécution d’une relation professionnelle et 
s’opèrent dans cette seule finalité et pour la durée nécessaire à cette 
relation. Si vous souhaitez faire usage de vos droits de consultation, de 
rectification et de suppression de vos données, veuillez contacter 
contact.r...@sgdsn.gouv.fr. Si vous avez reçu ce message par erreur, nous vous 
remercions d’en informer l’expéditeur et de détruire le message. The personal 
data collected and processed during this exchange aims solely at completing a 
business relationship and is limited to the necessary duration of that 
relationship. If you wish to use your rights of consultation, rectification and 
deletion of your data, please contact: contact.r...@sgdsn.gouv.fr. If you have 
received this message in error, we thank you for informing the sender and 
destroying the message.

Re: [RFC PATCH v1 1/5] fs: Add support for an O_MAYEXEC flag on sys_open()

2019-04-17 Thread Mickaël Salaün



On 15/04/2019 20:47, Steve Grubb wrote:
> Hello,
>
> On Wednesday, December 12, 2018 9:43:06 AM EDT Jan Kara wrote:
>> On Wed 12-12-18 09:17:08, Mickaël Salaün wrote:
>>> When the O_MAYEXEC flag is passed, sys_open() may be subject to
>>> additional restrictions depending on a security policy implemented by an
>>> LSM through the inode_permission hook.
>>>
>>> The underlying idea is to be able to restrict scripts interpretation
>>> according to a policy defined by the system administrator.  For this to
>>> be possible, script interpreters must use the O_MAYEXEC flag
>>> appropriately.  To be fully effective, these interpreters also need to
>>> handle the other ways to execute code (for which the kernel can't help):
>>> command line parameters (e.g., option -e for Perl), module loading
>>> (e.g., option -m for Python), stdin, file sourcing, environment
>>> variables, configuration files...  According to the threat model, it may
>>> be acceptable to allow some script interpreters (e.g. Bash) to interpret
>>> commands from stdin, may it be a TTY or a pipe, because it may not be
>>> enough to (directly) perform syscalls.
>>>
>>> A simple security policy implementation is available in a following
>>> patch for Yama.
>>>
>>> This is an updated subset of the patch initially written by Vincent
>>> Strubel for CLIP OS:
>>> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d
>>> 6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has
>>> been used for more than 10 years with customized script interpreters.
>>> Some examples can be found here:
>>> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYE
>>> XEC
>>>
>>> Signed-off-by: Mickaël Salaün 
>>> Signed-off-by: Thibaut Sautereau 
>>> Signed-off-by: Vincent Strubel 
>>> Reviewed-by: Philippe Trébuchet 
>>> Cc: Al Viro 
>>> Cc: Kees Cook 
>>> Cc: Mickaël Salaün 
>>
>> ...
>>
>>> diff --git a/fs/open.c b/fs/open.c
>>> index 0285ce7dbd51..75479b79a58f 100644
>>> --- a/fs/open.c
>>> +++ b/fs/open.c
>>> @@ -974,6 +974,10 @@ static inline int build_open_flags(int flags,
>>> umode_t mode, struct open_flags *o>
>>> if (flags & O_APPEND)
>>>
>>> acc_mode |= MAY_APPEND;
>>>
>>> +   /* Check execution permissions on open. */
>>> +   if (flags & O_MAYEXEC)
>>> +   acc_mode |= MAY_OPENEXEC;
>>> +
>>>
>>> op->acc_mode = acc_mode;
>>>
>>> op->intent = flags & O_PATH ? 0 : LOOKUP_OPEN;
>>
>> I don't feel experienced enough in security to tell whether we want this
>> functionality or not. But if we do this, shouldn't we also set FMODE_EXEC
>> on the resulting struct file? That way also security_file_open() can be
>> used to arbitrate such executable opens and in particular
>> fanotify permission event FAN_OPEN_EXEC will get properly generated which I
>> guess is desirable (support for it is sitting in my tree waiting for the
>> merge window) - adding some audit people involved in FAN_OPEN_EXEC to CC.
>> Just an idea...
>
> Late in replying. But I think it's important to have a deep look into the
> issue.
>
> TL;DR - This is a gentle man's handshake. It won't _really_ solve the
> problem.

Thanks for your comments. You should find most answers in this thread:
https://lore.kernel.org/lkml/20181212081712.32347-4-...@digikod.net/

The threat model targets persistent attacks. This O_MAYEXEC flag is not
a silver bullet but it's a needed block to enforce a security policy on
a trusted system. This means that every component executable on the
system must be controlled, which means they may need some bit of
customization. Today no userspace application use this flag (except in
CLIP OS), but we need to first create a feature before it can be used.

It is very important to have in mind that a system security policy need
to have a (central) security manager, in this case the kernel thanks to
Yama's policy (but it could be SELinux, IMA or any other LSM). The goal
is not to give to the developer the job of defining a security policy
for the *system*; this job is for the system administrator (or the distro).

>
> This flag that is being proposed means that you would have to patch all
> interpreters to use it. If you are sure that upstreams will accept that, why
> not just change the policy to interpreters shouldn't execute anything unless
> the execute bit is set? That is simpler and doesn't need a kernel change. And
> setting the execute bit is an auditable event.

As said above, the definition of a the security policy is the job of the
system administrator. Moreover, the security policy may be defined by
the mount point restrictions (i.e. noexec) but it should be definable
with something else (e.g. a SELinux or IMA policy which may be agnostic
to the mount points).

>
> The bottom line is that any interpreter has to become a security policy
> enforcement point whether by indicating it wants to execute by setting a flag
> or by refusing to use a file without execute bit set. But this just moves the
>

[PATCH -next] ASoC: tlv320aic32x4: Remove set but not used variable 'mclk_rate'

2019-04-17 Thread Yue Haibing

From: YueHaibing 

Fixes gcc '-Wunused-but-set-variable' warning:

sound/soc/codecs/tlv320aic32x4.c: In function 'aic32x4_setup_clocks':
sound/soc/codecs/tlv320aic32x4.c:669:16: warning: variable 'mclk_rate' set but 
not used [-Wunused-but-set-variable]

It is not used since introduction in
commit 96c3bb00239d ("ASoC: tlv320aic32x4: Dynamically Determine Clocking")

Signed-off-by: YueHaibing 
---
 sound/soc/codecs/tlv320aic32x4.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/sound/soc/codecs/tlv320aic32x4.c b/sound/soc/codecs/tlv320aic32x4.c
index 6edee05..83608f3 100644
--- a/sound/soc/codecs/tlv320aic32x4.c
+++ b/sound/soc/codecs/tlv320aic32x4.c
@@ -684,9 +684,8 @@ static int aic32x4_setup_clocks(struct snd_soc_component 
*component,
u8 madc, nadc, mdac, ndac, max_nadc, min_mdac, max_ndac;
u8 dosr_increment;
u16 max_dosr, min_dosr;
-   unsigned long mclk_rate, adc_clock_rate, dac_clock_rate;
+   unsigned long adc_clock_rate, dac_clock_rate;
int ret;
-   struct clk *mclk;
 
struct clk_bulk_data clocks[] = {
{ .id = "pll" },
@@ -700,9 +699,6 @@ static int aic32x4_setup_clocks(struct snd_soc_component 
*component,
if (ret)
return ret;
 
-   mclk = clk_get_parent(clocks[1].clk);
-   mclk_rate = clk_get_rate(mclk);
-
if (sample_rate <= 48000) {
aosr = 128;
adc_resource_class = 6;
-- 
2.7.4

Re: [PATCH v2 RESEND 1/2] x86/mm/KASLR: Fix the size of the direct mapping section

2019-04-17 Thread Borislav Petkov

On Wed, Apr 17, 2019 at 04:35:36PM +0800, Baoquan He wrote:
> I made a new one to add this fact, I can repost if it's OK to you.

No, it looks ok and I can take it from here.

Also, resending too often is annoying, as I'm sure you know. Try to
stick to resending once a week.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

[PATCH -next] staging: most: configfs: Make mdev_link_list static

2019-04-17 Thread Yue Haibing

From: YueHaibing 

Fix sparse warning:

drivers/staging/most/configfs.c:34:18: warning:
 symbol 'mdev_link_list' was not declared. Should it be static?

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/staging/most/configfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/most/configfs.c b/drivers/staging/most/configfs.c
index 934fb6d..1d8bf29 100644
--- a/drivers/staging/most/configfs.c
+++ b/drivers/staging/most/configfs.c
@@ -31,7 +31,7 @@ struct mdev_link {
char comp_params[PAGE_SIZE];
 };
 
-struct list_head mdev_link_list;
+static struct list_head mdev_link_list;
 
 static int set_cfg_buffer_size(struct mdev_link *link)
 {
-- 
2.7.4

Re: [PATCH 0/7] arm64: meson: g12a: add audio devices

2019-04-17 Thread Kevin Hilman

Jerome Brunet  writes:

> On Tue, 2019-04-16 at 12:22 -0700, Kevin Hilman wrote:
>> On Tue, Apr 16, 2019 at 11:52 AM Kevin Hilman  wrote:
>> > Jerome Brunet  writes:
>> > 
>> > > This patchset adds most the audio devices of the g12a SoCs.
>> > > 
>> > > Kevin, couple of things worth noting:
>> > >  * This patch depends of the new audio clocks binding recently applied
>> > >by Neil [0].
>> > 
>> > Was this supposed to be part of the clk-headers dependency PR I
>> > received?  /me looks... It looks like it's applied right after the tag I
>> > pulled, so I doesn't look like it.
>> > 
>> > If I should queue $SUBJECT series up for 5.2, I'll need another stable
>> > tag.
>> 
>> Also, this doesn't apply cleanly anymore to my v5.2/dt64 branch.  When
>> the stable tag is ready, do you mind doing a rebase? (or telling me
>> what it currently applies to, and I will do the rebase.)
>
> Hi Kevin,
>
> Until we have a solution for the drive strength in pinctrl, I won't be able to
> submit the audio pins. I doubt it will happen in this cycle.
>
> Without the pins, audio won't work anyway ... so no rush
>
> If it is ok with you, I'll resend at the beginning of the next cycle. Clk and
> ASoC deps will be merged by then, so things should be more straight forward.

Works for me, thanks.

Kevin

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Oleg Nesterov

On 04/17, Paul Moore wrote:
>
> I'm tempted to simply return an error in selinux_setprocattr() if
> the task's credentials are not the same as its real_cred;

What about other modules? I have no idea what smack_setprocattr() is,
but it too does prepare_creds/commit creds.

it seems that the simplest workaround should simply add the additional
cred == real_cred into proc_pid_attr_write().

Oleg.

Re: [PATCH] prctl: Don't compile some of prctl functions when CRUI

2019-04-17 Thread Cyrill Gorcunov

On Wed, Apr 17, 2019 at 04:44:54PM +0200, Michal Koutný wrote:
> On Wed, Apr 17, 2019 at 03:38:41PM +0300, Cyrill Gorcunov 
>  wrote:
> > I've a bit vague memory what we've ended up with, but iirc there was
> > a problem with brk() syscall or similar. Then I think we left everything
> > as is.
>
> Was this related to the removal of non PR_SET_MM_MAP operations too?
> Do you have any pointers to the topic?

Gimme some time, will reply later.

Re: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

2019-04-17 Thread Michal Hocko

On Wed 17-04-19 16:41:42, Michal Koutny wrote:
> On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko  
> wrote:
> > Don't we need to use the lock in prctl_set_mm as well then?
> 
> Correct. The patch alone just moves the race from
> get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.
> 
> arg_lock could be used in prctl_set_mm but the better idea (IMO) is
> complete removal of that code in favor of prctl_set_mm_map [1].

Ohh, I have missed that patch. As long as both are merged together then
no objections from me and you can add

Acked-by: Michal Hocko 

> Michal
> 
> [1] https://lore.kernel.org/lkml/20180405182651.gm15...@uranus.lan/

-- 
Michal Hocko
SUSE Labs

[PATCH] perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()

2019-04-17 Thread Jiri Olsa

We currently don't return NULL in case we don't find the
bpf_prog_info_node, fixing that.

Signed-off-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Song Liu 
Fixes: 3792cb2ff43b ("perf bpf: Save BTF in a rbtree in perf_env")
Link: http://lkml.kernel.org/n/tip-99g9rg4p20a1o99vr0nkj...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/env.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 34a363f2e71b..9494f9dc61ec 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -111,10 +111,12 @@ struct btf_node *perf_env__find_btf(struct perf_env *env, 
__u32 btf_id)
else if (btf_id > node->id)
n = n->rb_right;
else
-   break;
+   goto out;
}
+   node = NULL;
 
up_read(>bpf_progs.lock);
+out:
return node;
 }
 
-- 
2.17.2

[RFC 2/2] clocksource: timer-davinci: add support for clocksource

2019-04-17 Thread Bartosz Golaszewski

From: Bartosz Golaszewski 

Extend the davinci-timer driver to also register a clock source.

Signed-off-by: Bartosz Golaszewski 
---
 drivers/clocksource/timer-davinci.c | 70 +
 1 file changed, 70 insertions(+)

diff --git a/drivers/clocksource/timer-davinci.c 
b/drivers/clocksource/timer-davinci.c
index d30f81a4088e..d630fca98123 100644
--- a/drivers/clocksource/timer-davinci.c
+++ b/drivers/clocksource/timer-davinci.c
@@ -42,6 +42,8 @@
 #define DAVINCI_TIMER_MIN_DELTA0x01
 #define DAVINCI_TIMER_MAX_DELTA0xfffe
 
+#define DAVINCI_TIMER_CLKSRC_BITS  32
+
 #define DAVINCI_TIMER_TGCR_DEFAULT \
(DAVINCI_TIMER_TIMMODE_32BIT_UNCHAINED | DAVINCI_TIMER_UNRESET)
 
@@ -59,6 +61,16 @@ struct davinci_clockevent {
unsigned int enamode_mask;
 };
 
+/*
+ * This must be globally accessible by davinci_timer_read_sched_clock(), so
+ * let's keep it here.
+ */
+static struct {
+   struct clocksource dev;
+   void __iomem *base;
+   unsigned int tim_off;
+} davinci_clocksource;
+
 static struct davinci_clockevent *
 to_davinci_clockevent(struct clock_event_device *clockevent)
 {
@@ -148,6 +160,32 @@ static irqreturn_t davinci_timer_irq_timer(int irq, void 
*data)
return IRQ_HANDLED;
 }
 
+static u64 notrace davinci_timer_read_sched_clock(void)
+{
+   return readl_relaxed(davinci_clocksource.base +
+davinci_clocksource.tim_off);
+}
+
+static u64 davinci_clocksource_read(struct clocksource *dev)
+{
+   return davinci_timer_read_sched_clock();
+}
+
+static void davinci_clocksource_init(void __iomem *base, unsigned int tim_off,
+unsigned int prd_off, unsigned int shift)
+{
+   davinci_reg_update(base, DAVINCI_TIMER_REG_TCR,
+  DAVINCI_TIMER_ENAMODE_MASK << shift,
+  DAVINCI_TIMER_ENAMODE_DISABLED << shift);
+
+   writel_relaxed(0x0, base + tim_off);
+   writel_relaxed(UINT_MAX, base + prd_off);
+
+   davinci_reg_update(base, DAVINCI_TIMER_REG_TCR,
+  DAVINCI_TIMER_ENAMODE_MASK << shift,
+  DAVINCI_TIMER_ENAMODE_PERIODIC << shift);
+}
+
 static void davinci_timer_init(void __iomem *base)
 {
/* Set clock to internal mode and disable it. */
@@ -235,6 +273,38 @@ int __init davinci_timer_register(struct clk *clk,
DAVINCI_TIMER_MIN_DELTA,
DAVINCI_TIMER_MAX_DELTA);
 
+   davinci_clocksource.dev.rating = 300;
+   davinci_clocksource.dev.read = davinci_clocksource_read;
+   davinci_clocksource.dev.mask =
+   CLOCKSOURCE_MASK(DAVINCI_TIMER_CLKSRC_BITS);
+   davinci_clocksource.dev.flags = CLOCK_SOURCE_IS_CONTINUOUS;
+   davinci_clocksource.base = base;
+
+   if (timer_cfg->cmp_off) {
+   davinci_clocksource.dev.name = "tim12";
+   davinci_clocksource.tim_off = DAVINCI_TIMER_REG_TIM12;
+   davinci_clocksource_init(base,
+DAVINCI_TIMER_REG_TIM12,
+DAVINCI_TIMER_REG_PRD12,
+DAVINCI_TIMER_ENAMODE_SHIFT_TIM12);
+   } else {
+   davinci_clocksource.dev.name = "tim34";
+   davinci_clocksource.tim_off = DAVINCI_TIMER_REG_TIM34;
+   davinci_clocksource_init(base,
+DAVINCI_TIMER_REG_TIM34,
+DAVINCI_TIMER_REG_PRD34,
+DAVINCI_TIMER_ENAMODE_SHIFT_TIM34);
+   }
+
+   rv = clocksource_register_hz(_clocksource.dev, tick_rate);
+   if (rv) {
+   pr_err("Unable to register clocksource");
+   return rv;
+   }
+
+   sched_clock_register(davinci_timer_read_sched_clock,
+DAVINCI_TIMER_CLKSRC_BITS, tick_rate);
+
return 0;
 }
 
-- 
2.21.0

[RFC 1/2] clocksource: davinci-timer: add support for clockevents

2019-04-17 Thread Bartosz Golaszewski

From: Bartosz Golaszewski 

Currently the clocksource and clockevent support for davinci platforms
lives in mach-davinci. It hard-codes many things, uses global variables,
implements functionalities unused by any platform and has code fragments
scattered across many (often unrelated) files.

Implement a new, modern and simplified timer driver and put it into
drivers/clocksource. We still need to support legacy board files so
export a config structure and a function that allows machine code to
register the timer.

The timer we're using is 64-bit but can be programmed in dual 32-bit
mode (both chained and unchained). We're using dual 32-bit mode to
have separate counters for clockevents and clocksource.

This patch contains the core code and support for clockevent. The
clocksource code will be included in a subsequent patch.

Signed-off-by: Bartosz Golaszewski 
---
 drivers/clocksource/Kconfig |   5 +
 drivers/clocksource/Makefile|   1 +
 drivers/clocksource/timer-davinci.c | 272 
 include/clocksource/timer-davinci.h |  44 +
 4 files changed, 322 insertions(+)
 create mode 100644 drivers/clocksource/timer-davinci.c
 create mode 100644 include/clocksource/timer-davinci.h

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 171502a356aa..974f9b50ebf4 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -42,6 +42,11 @@ config BCM_KONA_TIMER
help
  Enables the support for the BCM Kona mobile timer driver.
 
+config DAVINCI_TIMER
+   bool "Texas Instruments DaVinci timer driver" if COMPILE_TEST
+   help
+ Enables the support for the TI DaVinci timer driver.
+
 config DIGICOLOR_TIMER
bool "Digicolor timer driver" if COMPILE_TEST
select CLKSRC_MMIO
diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
index be6e0fbc7489..3c73d0e58b45 100644
--- a/drivers/clocksource/Makefile
+++ b/drivers/clocksource/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_SH_TIMER_TMU)+= sh_tmu.o
 obj-$(CONFIG_EM_TIMER_STI) += em_sti.o
 obj-$(CONFIG_CLKBLD_I8253) += i8253.o
 obj-$(CONFIG_CLKSRC_MMIO)  += mmio.o
+obj-$(CONFIG_DAVINCI_TIMER)+= timer-davinci.o
 obj-$(CONFIG_DIGICOLOR_TIMER)  += timer-digicolor.o
 obj-$(CONFIG_OMAP_DM_TIMER)+= timer-ti-dm.o
 obj-$(CONFIG_DW_APB_TIMER) += dw_apb_timer.o
diff --git a/drivers/clocksource/timer-davinci.c 
b/drivers/clocksource/timer-davinci.c
new file mode 100644
index ..d30f81a4088e
--- /dev/null
+++ b/drivers/clocksource/timer-davinci.c
@@ -0,0 +1,272 @@
+// SPDX-License-Identifier: GPL-2.0-only
+//
+// TI DaVinci clocksource driver
+//
+// Copyright (C) 2019 Texas Instruments
+// Author: Bartosz Golaszewski 
+// (with tiny parts adopted from code by Kevin Hilman )
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#undef pr_fmt
+#define pr_fmt(fmt) "%s: " fmt "\n", __func__
+
+#define DAVINCI_TIMER_REG_TIM120x10
+#define DAVINCI_TIMER_REG_TIM340x14
+#define DAVINCI_TIMER_REG_PRD120x18
+#define DAVINCI_TIMER_REG_PRD340x1c
+#define DAVINCI_TIMER_REG_TCR  0x20
+#define DAVINCI_TIMER_REG_TGCR 0x24
+
+#define DAVINCI_TIMER_TIMMODE_MASK GENMASK(3, 2)
+#define DAVINCI_TIMER_RESET_MASK   GENMASK(1, 0)
+#define DAVINCI_TIMER_TIMMODE_32BIT_UNCHAINED  BIT(2)
+#define DAVINCI_TIMER_UNRESET  GENMASK(1, 0)
+
+#define DAVINCI_TIMER_ENAMODE_MASK GENMASK(1, 0)
+#define DAVINCI_TIMER_ENAMODE_DISABLED 0x00
+#define DAVINCI_TIMER_ENAMODE_ONESHOT  BIT(0)
+#define DAVINCI_TIMER_ENAMODE_PERIODIC BIT(1)
+
+#define DAVINCI_TIMER_ENAMODE_SHIFT_TIM12  6
+#define DAVINCI_TIMER_ENAMODE_SHIFT_TIM34  22
+
+#define DAVINCI_TIMER_MIN_DELTA0x01
+#define DAVINCI_TIMER_MAX_DELTA0xfffe
+
+#define DAVINCI_TIMER_TGCR_DEFAULT \
+   (DAVINCI_TIMER_TIMMODE_32BIT_UNCHAINED | DAVINCI_TIMER_UNRESET)
+
+struct davinci_clockevent {
+   struct clock_event_device dev;
+   void __iomem *base;
+
+   unsigned int tim_off;
+   unsigned int prd_off;
+   unsigned int cmp_off;
+
+   unsigned int enamode_disabled;
+   unsigned int enamode_oneshot;
+   unsigned int enamode_periodic;
+   unsigned int enamode_mask;
+};
+
+static struct davinci_clockevent *
+to_davinci_clockevent(struct clock_event_device *clockevent)
+{
+   return container_of(clockevent, struct davinci_clockevent, dev);
+}
+
+static unsigned int
+davinci_clockevent_read(struct davinci_clockevent *clockevent,
+   unsigned int reg)
+{
+   return readl_relaxed(clockevent->base + reg);
+}
+
+static void davinci_clockevent_write(struct davinci_clockevent *clockevent,
+unsigned int

[RFC 0/2] clocksource: davinci-timer: new driver

2019-04-17 Thread Bartosz Golaszewski

From: Bartosz Golaszewski 

Hi Daniel,

as discussed - this is the davinci timer driver split into the clockevent
and clocksource parts.

Since it won't work without all the other (left out for now) changes, I'm
marking it as RFC.

The code has been simplified as requested, the duplicated enums and the
davinci_timer structure have been removed.

Please let me know if that's better. I retested it on da850-lcdk, da830-evm
and dm365-evm.

Bartosz Golaszewski (2):
  clocksource: davinci-timer: add support for clockevents
  clocksource: timer-davinci: add support for clocksource

 drivers/clocksource/Kconfig |   5 +
 drivers/clocksource/Makefile|   1 +
 drivers/clocksource/timer-davinci.c | 342 
 include/clocksource/timer-davinci.h |  44 
 4 files changed, 392 insertions(+)
 create mode 100644 drivers/clocksource/timer-davinci.c
 create mode 100644 include/clocksource/timer-davinci.h

-- 
2.21.0

Re: [PATCH] prctl: Don't compile some of prctl functions when CRUI

2019-04-17 Thread Michal Koutný

On Wed, Apr 17, 2019 at 03:38:41PM +0300, Cyrill Gorcunov  
wrote:
> I've a bit vague memory what we've ended up with, but iirc there was
> a problem with brk() syscall or similar. Then I think we left everything
> as is.
Was this related to the removal of non PR_SET_MM_MAP operations too?
Do you have any pointers to the topic?

Thanks,
Michal


signature.asc
Description: Digital signature

Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER

2019-04-17 Thread Kairui Song

On Wed, Apr 17, 2019 at 4:16 AM Josh Poimboeuf  wrote:
>
> On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> > On Tue, Apr 16, 2019 at 7:30 PM Kairui Song  wrote:
> > >
> > > On Tue, Apr 16, 2019 at 12:59 AM Josh Poimboeuf  
> > > wrote:
> > > >
> > > > On Mon, Apr 15, 2019 at 05:36:22PM +0200, Peter Zijlstra wrote:
> > > > >
> > > > > I'll mostly defer to Josh on unwinding, but a few comments below.
> > > > >
> > > > > On Tue, Apr 09, 2019 at 12:59:42AM +0800, Kairui Song wrote:
> > > > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > > > index e2b1447192a8..6075a4f94376 100644
> > > > > > --- a/arch/x86/events/core.c
> > > > > > +++ b/arch/x86/events/core.c
> > > > > > @@ -2355,6 +2355,12 @@ void arch_perf_update_userpage(struct 
> > > > > > perf_event *event,
> > > > > > cyc2ns_read_end();
> > > > > >  }
> > > > > >
> > > > > > +static inline int
> > > > > > +valid_perf_registers(struct pt_regs *regs)
> > > > > > +{
> > > > > > +   return (regs->ip && regs->bp && regs->sp);
> > > > > > +}
> > > > >
> > > > > I'm unconvinced by this, with both guess and orc having !bp is 
> > > > > perfectly
> > > > > valid.
> > > > >
> > > > > >  void
> > > > > >  perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, 
> > > > > > struct pt_regs *regs)
> > > > > >  {
> > > > > > @@ -2366,11 +2372,17 @@ perf_callchain_kernel(struct 
> > > > > > perf_callchain_entry_ctx *entry, struct pt_regs *re
> > > > > > return;
> > > > > > }
> > > > > >
> > > > > > -   if (perf_callchain_store(entry, regs->ip))
> > > > > > +   if (valid_perf_registers(regs)) {
> > > > > > +   if (perf_callchain_store(entry, regs->ip))
> > > > > > +   return;
> > > > > > +   unwind_start(, current, regs, NULL);
> > > > > > +   } else if (regs->sp) {
> > > > > > +   unwind_start(, current, NULL, (unsigned long 
> > > > > > *)regs->sp);
> > > > > > +   } else {
> > > > > > return;
> > > > > > +   }
> > > > >
> > > > > AFAICT if we, by pure accident, end up with !bp for ORC, then we
> > > > > initialize the unwind wrong.
> > > > >
> > > > > Note that @regs is mostly trivially correct, except for that 
> > > > > tracepoint
> > > > > case. So I don't think we should magic here.
> > > >
> > > > Ah, I didn't quite understand this code before, and I still don't
> > > > really, but I guess the issue is that @regs can be either real or fake.
> > > >
> > > > In the real @regs case, we just want to always unwind starting from
> > > > regs->sp.
> > > >
> > > > But in the fake @regs case, we should instead unwind from the current
> > > > frame, skipping all frames until we hit the fake regs->sp.  Because
> > > > starting from fake/incomplete regs is most likely going to cause
> > > > problems with ORC (or DWARF for other arches).
> > > >
> > > > The idea of a fake regs is fragile and confusing.  Is it possible to
> > > > just pass in the "skip" stack pointer directly instead?  That should
> > > > work for both FP and non-FP.  And I _think_ there's no need to ever
> > > > capture regs->bp anyway -- the stack pointer should be sufficient.
> > >
> > > Hi, that will break some other usage, if perf_callchain_kernel is
> > > called but it won't unwind to the callsite (could be produced by
> > > attach an ebpf call to kprobe), things will also go wrong. It should
> > > start with given registers when the register is valid.
> > > And it's true with omit frame pointer BP value could be anything, so 0
> > > is also valid, I think I need to find a better way to tell if we could
> > > start with the registers value or direct start unwinding and skip
> > > until got the stack.
> > >
> >
> > Hi, sorry I might have some misunderstanding. Adding an extra argument
> > (eg. skip_sp) to indicate if it should just unwind from the current
> > frame, and use SP as the "skip mark", should work well.
> >
> > And I also think the "fake"/"real" reg is fragile, could we abuse
> > another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> > partially dumped fake registers? So perf_callchain_kernel just check
> > if it's a "partial registers", and in such case it can start unwinding
> > and skip until it get to SP. This make it easier to tell if the
> > registers are "fake".
>
> If you do the regs->eflags thing to mark the regs as fake in
> (perf_arch_fetch_caller_regs()), then I don't think skip_sp would be
> needed, because regs->sp could probably mark the skip point.
>
> Instead I was actually hoping we could get rid of fake regs and
> perf_arch_fetch_caller_regs() altogether, because it's a nasty hack.
> But I don't know what else those fake regs are used for.
>

Despite it's hacky, it seems not necessary to dump every register. And
is there a straight way to get caller's regs in the trace point? It
seems more trouble some. Or if we just use the regs inside the
tracepoint, but it would need even more workaround (eg. unwind one
frame before do anything).

-- 
Best

Re: rseq/arm32: choosing rseq code signature

2019-04-17 Thread Mathieu Desnoyers

- On Apr 17, 2019, at 6:37 AM, richard earnshaw richard.earns...@arm.com 
wrote:

> On 16/04/2019 14:39, Mathieu Desnoyers wrote:
>> - On Apr 15, 2019, at 9:37 AM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
>> 
>>> - On Apr 15, 2019, at 9:30 AM, peter maydell peter.mayd...@linaro.org 
>>> wrote:
>>>
 On Mon, 15 Apr 2019 at 14:11, Mathieu Desnoyers
  wrote:
>
> - On Apr 11, 2019, at 3:55 PM, peter maydell peter.mayd...@linaro.org 
> wrote:
>
>> On Thu, 11 Apr 2019 at 18:51, Mathieu Desnoyers
>>  wrote:
>>>  * This translates to the following instruction pattern in the T16 
>>> instruction
>>>  * set:
>>>  *
>>>  * little endian:
>>>  * def3udf#243  ; 0xf3
>>>  * e7f5b.n<7f5>
>>>  *
>>>  * big endian:
>>>  * e7f5b.n<7f5>
>>>  * def3udf#243  ; 0xf3
>>
>> Do we really care about big-endian instruction-ordering for Thumb?
>> It requires (AIUI) either an ARMv7R CPU which implements and sets
>> SCTLR.IE to 1, or a v6-or-earlier CPU using BE32, and it's going to
>> be even rarer than normal BE8 big-endian...
>
> I don't think we care enough about it to look for a trick to
> turn the branch into something else (which would not branch away from the
> udf instruction), but considering this signature will be ABI, it's good to
> be thorough documentation-wise and cover all existing cases.

 I think if you want to document it it would be helpful to
 readers to make it clear that this is the ultra-rare
 big-endian-instruction-order "big endian Thumb", not the only
 moderately-rare little-endian-instructions-big-endian-data
 "big endian Thumb".
>>>
>>> I'm actually very much concerned about environments with big endian
>>> data and little endian code. Which gcc compiler flags do I need to
>>> use to test it ?
>>>
>>> I'm concerned about a signature mismatch between what is passed to
>>> the rseq system call ("data-endian signature") and what is generated
>>> in the code ("instruction-endian signature").
>> 
>> Based on this page:
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFBBCHB.html
>> 
>> My understanding is that the situation is as follows (please confirm):
>> 
>> - Prior to ARMv6, you could build and run code that is either big or little
>> endian,
>>   given you had a matching Linux kernel endianness. Code and data endianness
>>   needed
>>   to match,
>> - Starting from ARMv6, only little endian code is supported. The endianness 
>> for
>> data
>>   access can be changed through bit [9], the E bit, of the Program Status
>>   Register,
>>   (mixed endianness)
>> 
>> Looking at ARM build options for gcc, it seems you can select either big or
>> little
>> endian (-mbig-endian or -mlittle-endian (default)) which affects both
>> instruction and
>> data endianness. So I suspect the -mbig-endian option is really only useful 
>> for
>> pre-ARMv6.
> 
> -mbig-endian is still correct, even on later architectures.  The linker
> gets involved, however, and (using the mapping symbol information) swaps
> the code segments to little-endian form (this is why you have to use
> .inst rather than .word when inserting instructions, so that the correct
> mapping symbols are inserted).

So what you're saying is that if I have:

void main()
{
asm volatile (
".arm\n\t"
".inst 0xe7f5def3\n\t"
".long 0xe7f5def3\n\t");
}

and compile it with:

arm-linux-gnueabihf-gcc -mbig-endian -march=armv6 -c -o arm-big-endianv6.o 
arm-test-endian.c

It's expected that the generated .o will have big endian instructions, matching
the endianness of the data, e.g.:

hexdump arm-big-endianv6.o

[...]
030 0a00 0900 80b5 00af f5e7 f3de f5e7 f3de

But it's then at the linking stage that the linker will
reverse the endianness of the ".inst" (but not .long).

Let's see:

arm-linux-gnueabihf-gcc -nodefaultlibs -nostdlib -mbig-endian -march=armv6 -o 
arm-big-endianv6 arm-big-endianv6.o 
/usr/lib/gcc-cross/arm-linux-gnueabihf/7/../../../../arm-linux-gnueabihf/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 01b0

hexdump gives me:
[...]
1b0 80b5 00af f5e7 f3de f5e7 f3de c046 bd46

So it has not reversed the instruction endianness.

What am I doing wrong ?

I'm using:

gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04)
GNU ld (GNU Binutils for Ubuntu) 2.30

Thanks,

Mathieu

> 
>> 
>> For ARMv6+ mixed-endianness, it seems to be a mode that temporarily swap
>> endianness
>> of load/store instructions for specific memory accesses communicating with 
>> DMA
>> devices,
>> so I don't see any scenario where we can generate a binary that has little
>> endian code
>> and big endian data. If that is true, then it should be fine to declare the
>> signature
>> with ".arm .inst" and expect the data

Re: [RFC PATCH v2] perf/x86: make perf callchain work without CONFIG_FRAME_POINTER

2019-04-17 Thread Kairui Song

On Wed, Apr 17, 2019 at 1:45 AM Peter Zijlstra  wrote:
>
> On Wed, Apr 17, 2019 at 01:39:19AM +0800, Kairui Song wrote:
> > And I also think the "fake"/"real" reg is fragile, could we abuse
> > another eflag (just like PERF_EFLAGS_EXACT) to indicate the regs are
> > partially dumped fake registers?
>
> Sure, the SDM seems to suggest bits 1,3,5,15 are 'available'. We've
> already used 3 and 5, and I think we can use !X86_EFLAGS_FIXED to
> indicate a fake regs set. Any real regs set will always have that set.

Thanks! This is a good idea. Will update accordingly in V3 later.





--
Best Regards,
Kairui Song

Re: [PATCH] mm: get_cmdline use arg_lock instead of mmap_sem

2019-04-17 Thread Michal Koutný

On Wed, Apr 17, 2019 at 03:41:52PM +0200, Michal Hocko  
wrote:
> Don't we need to use the lock in prctl_set_mm as well then?

Correct. The patch alone just moves the race from
get_cmdline/prctl_set_mm_map to get_cmdline/prctl_set_mm.

arg_lock could be used in prctl_set_mm but the better idea (IMO) is
complete removal of that code in favor of prctl_set_mm_map [1].

Michal

[1] https://lore.kernel.org/lkml/20180405182651.gm15...@uranus.lan/

Re: [PATCH 05/12] dma-buf: add explicit buffer pinning

2019-04-17 Thread Daniel Vetter

On Wed, Apr 17, 2019 at 04:30:51PM +0200, Daniel Vetter wrote:
> On Wed, Apr 17, 2019 at 04:20:02PM +0200, Daniel Vetter wrote:
> > On Tue, Apr 16, 2019 at 08:38:34PM +0200, Christian König wrote:
> > > Add optional explicit pinning callbacks instead of implicitly assume the
> > > exporter pins the buffer when a mapping is created.
> > > 
> > > Signed-off-by: Christian König 
> > 
> > Don't we need this together with the invalidate callback and the dynamic
> > stuff? Also I'm assuming that pin/unpin is pretty much required for
> > dynamic bo, so could we look at these callbacks instead of the dynamic
> > flag you add in patch 1.
> > 
> > I'm assuming following rules hold:
> > no pin/upin from exporter:
> > 
> > dma-buf is not dynamic, and pinned for the duration of map/unmap. I'm
> > not 100% sure whether really everyone wants the mapping to be cached for
> > the entire attachment, only drm_prime does that. And that's not the only
> > dma-buf importer.
> > 
> > pin/unpin calls are noops.
> > 
> > pin/unpin exist in the exporter, but importer has not provided an
> > invalidate callback:
> > 
> > We map at attach time, and we also have to pin, since the importer can't
> > handle the buffer disappearing, at attach time. We unmap/unpin at detach.
> 
> For this case we should have a WARN in pin/unpin, to make sure importers
> don't do something stupid. One more thought below on pin/unpin.
> 
> > pin/unpin from exporter, invalidate from importer:
> > 
> > Full dynamic mapping. We assume the importer will do caching, attach
> > fences as needed, and pin the underlying bo when it needs it it
> > permanently, without attaching fences (i.e. the scanout case).
> > 
> > Assuming I'm not terribly off with my understanding, then I think it'd be
> > best to introduce the entire new dma-buf api in the first patch, and flesh
> > it out later. Instead of spread over a few patches. Plus the above (maybe
> > prettier) as a nice kerneldoc overview comment for how dynamic dma-buf is
> > supposed to work really.
> > -Daniel
> > 
> > > ---
> > >  drivers/dma-buf/dma-buf.c | 39 +++
> > >  include/linux/dma-buf.h   | 37 +++--
> > >  2 files changed, 70 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > index a3738fab3927..f23ff8355505 100644
> > > --- a/drivers/dma-buf/dma-buf.c
> > > +++ b/drivers/dma-buf/dma-buf.c
> > > @@ -630,6 +630,41 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct 
> > > dma_buf_attachment *attach)
> > >  }
> > >  EXPORT_SYMBOL_GPL(dma_buf_detach);
> > >  
> > > +/**
> > > + * dma_buf_pin - Lock down the DMA-buf
> > > + *
> > > + * @dmabuf:  [in]DMA-buf to lock down.
> > > + *
> > > + * Returns:
> > > + * 0 on success, negative error code on failure.
> > > + */
> > > +int dma_buf_pin(struct dma_buf *dmabuf)
> 
> Hm, I think it'd be better to pin the attachment, not the underlying
> buffer. Attachment is the thin the importer will have to pin, and it's at
> attach/detach time where dma-buf needs to pin for importers who don't
> understand dynamic buffer sharing.
> 
> Plus when we put that onto attachments, we can do a
> 
>   WARN_ON(!attach->invalidate);
> 
> sanity check. I think that would be good to have.

Another validation idea: dma-buf.c could track the pin_count on the struct
dma_buf, and if an exporter tries to invalidate while pinned WARN and bail
out. Because that's clearly a driver bug.

All in the interest in making the contract between importers and exporters
as clear as possible.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v1] perf record: collect user registers set jointly with dwarf stacks

2019-04-17 Thread Jiri Olsa

On Wed, Apr 17, 2019 at 11:35:42AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Apr 17, 2019 at 09:39:52AM +0200, Jiri Olsa escreveu:
> > On Mon, Apr 15, 2019 at 06:36:13PM +0300, Alexey Budankov wrote:
> > > 
> > > When dwarf stacks are collected jointly with user specified register
> > > set using --user-regs option like below the full register context is
> > > still captured on a sample:
> > > 
> > >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- 
> > > matrix.gcc.g.O3
> > > 
> > >   188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 
> > > 23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac
> > >   ... FP chain: nr:0
> > >   ... user regs: mask 0xff0fff ABI 64-bit
> > >    AX0x53b
> > >    BX0x7ffedbdd3cc0
> > >    CX0x
> > >    DX0x33d3a
> > >    SI0x7f09b74c38d0
> > >    DI0x0
> > >    BP0x401260
> > >    SP0x7ffedbdd3cc0
> > >    IP0x401236
> > >    FLAGS 0x20a
> > >    CS0x33
> > >    SS0x2b
> > >    R80x7f09b74c3800
> > >    R90x7f09b74c2da0
> > >    R10   0xf3ce
> > >    R11   0x246
> > >    R12   0x401070
> > >    R13   0x7ffedbdd5db0
> > >    R14   0x0
> > >    R15   0x0
> > >   ... ustack: size 1024, offset 0xe0
> > >. data_src: 0x5080021
> > >... thread: stack_test2.g.O:23828
> > >.. dso: /root/abudanko/stacks/stack_test2.g.O3
> > > 
> > > After applying the change suggested in the patch the sample data contain
> > > only user specified register values:
> > > 
> > >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- 
> > > matrix.gcc.g.03
> > > 
> > >   188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 
> > > 23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c
> > >   ... FP chain: nr:0
> > >   ... user regs: mask 0x1c0 ABI 64-bit
> > >    BP0x401260
> > >    SP0x7ffd3d85cc20
> > >    IP0x401236
> > >   ... ustack: size 1024, offset 0x58
> > >. data_src: 0x5080021
> > >... thread: stack_test2.g.O:23839
> > >.. dso: /root/abudanko/stacks/stack_test2.g.O3
> > > 
> > > Signed-off-by: Alexey Budankov 
> > 
> > Acked-by: Jiri Olsa 
> 
> So, there are registers that are needed to do the DWARF unwinding,
> right? But at the same time, if the user says only some are needed, he
> better know what they're doing and ask for at least the registers needed
> for the unwinding process to be successfull, right?

yep, that's how understand that

jirka

> 
> - Arnaldo
>  
> > thanks,
> > jirka
> > 
> > > ---
> > >  tools/perf/util/evsel.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > index 84cfb9fe2fc6..29a223b4c699 100644
> > > --- a/tools/perf/util/evsel.c
> > > +++ b/tools/perf/util/evsel.c
> > > @@ -702,7 +702,8 @@ static void __perf_evsel__config_callchain(struct 
> > > perf_evsel *evsel,
> > >   if (!function) {
> > >   perf_evsel__set_sample_bit(evsel, REGS_USER);
> > >   perf_evsel__set_sample_bit(evsel, STACK_USER);
> > > - attr->sample_regs_user |= PERF_REGS_MASK;
> > > + if (!opts->sample_user_regs)
> > > + attr->sample_regs_user |= PERF_REGS_MASK;
> > >   attr->sample_stack_user = param->dump_size;
> > >   attr->exclude_callchain_user = 1;
> > >   } else {
> > > -- 
> > > 2.20.1
> > > 
> 
> -- 
> 
> - Arnaldo

Re: [PATCH] dt-bindings: spi: spi-mt65xx: add support for MT8516

2019-04-17 Thread Mark Brown

On Wed, Apr 17, 2019 at 04:47:16PM +0800, Leilk Liu wrote:
> Add binding documentation of spi-mt65xx for MT8516 SOC.

Please use subject lines matching the style for the subsystem.  This
makes it easier for people to identify relevant patches.


signature.asc
Description: PGP signature

Re: IOMMU Page faults when running DMA transfers from PCIe device

2019-04-17 Thread Jerome Glisse

On Wed, Apr 17, 2019 at 04:17:09PM +0200, Patrick Brunner wrote:
> Am Dienstag, 16. April 2019, 17:33:07 CEST schrieb Jerome Glisse:
> > On Mon, Apr 15, 2019 at 06:04:11PM +0200, Patrick Brunner wrote:
> > > Dear all,
> > > 
> > > I'm encountering very nasty problems regarding DMA transfers from an
> > > external PCIe device to the main memory while the IOMMU is enabled, and
> > > I'm running out of ideas. I'm not even sure, whether it's a kernel issue
> > > or not. But I would highly appreciate any hints from experienced
> > > developers how to proceed to solve that issue.
> > > 
> > > The problem: An FPGA (see details below) should write a small amount of
> > > data (~128 bytes) over a PCIe 2.0 x1 link to an address in the CPU's
> > > memory space. The destination address (64 bits) for the Mem Write TLP is
> > > written to a BAR- mapped register before-hand.
> > > 
> > > On the system side, the driver consists of the usual setup code:
> > > - request PCI regions
> > > - pci_set_master
> > > - I/O remapping of BARs
> > > - setting DMA mask (dma_set_mask_and_coherent), tried both 32/64 bits
> > > - allocating DMA buffers with dma_alloc_coherent (4096 bytes, but also
> > > tried smaller numbers)
> > > - allocating IRQ lines (MSI) with pci_alloc_irq_vectors and pci_irq_vector
> > > - writing the DMA buffers' logical address (as returned in dma_handle_t
> > > from dma_alloc_coherent) to a BAR-mapped register
> > > 
> > > There is also an IRQ handler dumping the first 2 DWs from the DMA buffer
> > > when triggered.
> > > 
> > > The FPGA part will initiate following transfers at an interval of 2.5ms:
> > > - Memory write to DMA address
> > > - Send MSI (to signal that transfer is done)
> > > - Memory read from DMA address+offset
> > > 
> > > And now, the clue: everything works fine with the IOMMU disabled
> > > (iommu=off), i.e. the 2 DWs dumped in the ISR handler contain valid data.
> > > But if the IOMMU is enabled (iommu=soft or force), I receive an IO page
> > > fault (sometimes even more, depending on the payload size) on every
> > > transfer, and the data is all zeros:
> > > 
> > > [   49.001605] IO_PAGE_FAULT device=00:00.0 domain=0x
> > > address=0xffbf8000 flags=0x0070]
> > > 
> > > Where the device ID corresponds to the Host bridge, and the address
> > > corresponds to the DMA handle I got from dma_alloc_coherent respectively.
> > 
> > I am now expert but i am guessing your FPGA set the request field in the
> > PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
> > might not work when IOMMU is on ie when IOMMU is on your device should set
> > the request field to the FPGA PCIE id so that the IOMMU knows for which
> > device the PCIE write or read packet is and thus against which IOMMU page
> > table.
> > 
> > Cheers,
> > Jérôme
> 
> Hi Jérôme
> 
> Thank you very much for your response.
> 
> You hit the nail! That was exactly the root cause of the problem. The request 
> field was properly filled in for the Memory Read TLP, but not for the Memory 
> Write TLP, where it was all-zeroes.
> 
> If I may ask another question: Is it possible to remap a buffer for DMA which 
> was allocated by other means? For the second phase, we are going to use the 
> RTAI extension(*) which provides its own memory allocation routines (e.g. 
> rt_shm_alloc()). There, you may pass the flag USE_GFP_DMA to indicate that 
> this buffer should be suitable for DMA. I've tried to remap this memory area 
> using virt_to_phys() and use the resulting address for the DMA transfer from 
> the FPGA, getting other IO page faults. E.g.:
> 
> [   70.100140] IO_PAGE_FAULT device=01:00.0 domain=0x0001 
> address=0x0008 flags=0x0020]
> 
> It's remarkable that the logical addresses returned from dma_alloc_coherent 
> (e.g. ffbd8000) look quite different from those returned by rt_shm_alloc
> +virt_to_phys (e.g. 0008).
> 
> Unfortunately, it does not seem possible to do that the other way round, i.e. 
> forcing RTAI to use the buffer from dma_alloc_coherent.

You can use pci_map_page() or dma_map_page(). First you must get the page
that correspond to the virtual address (maybe with get_user_pages*() but
i would advice against it as it comes with a long list of gotcha and they
are no other alternative unless your device is advance enough).

Once you have the page for the virtual address then you can call either
dma_map_page() or pci_map_page(). I am sure you can find example within
the kernel for there usage.

It is also documented somewhere in Documentations/

Hopes this helps.

Cheers,
Jérôme

Re: [PATCH v10 0/7] Add Fieldbus subsystem + support HMS Profinet card

2019-04-17 Thread Sven Van Asbroeck

On Tue, Apr 16, 2019 at 5:21 PM Enrico Weigelt, metux IT consult
 wrote:
>
> Yet another question: does each fieldbus_dev instance talk to exactly
> one plc process memory, or can there be many ?

I'm by no means a fieldbus expert, so I had a little chat with one of
the fieldbus
people in the company here.

AFAIK if multiple plcs connect to the same device, all of them 'see' the same
process memory for that device. Any mechanism to prevent these plcs from
racing with each other is implemented in the application software. Could
be handshaking, could be each plc dealing with a separate section of
the process memory.

So for a fieldbus *device* (which is what this subsystem deals with) there can
only be a single process memory. So we have a single devnode per device.

A fieldbus *controller* would have one process memory per device it connects to.
But this subsystem does not deal with controllers.

This is my understanding, but I'd appreciate your input if I have overlooked
something.

Re: [PATCH v4 03/10] of/irq: document properties for wakeup interrupt parent

2019-04-17 Thread Linus Walleij

On Wed, Mar 13, 2019 at 10:19 PM Lina Iyer  wrote:

> +Mapping the interrupt specifiers in the device tree can be done using the
> +"irqdomain-map" property. The property contains interrupt specifier at the
> +current interrupt controller followed by the interrupt specifier at the 
> mapped
> +interrupt controller.
> +
> +   irqdomain-map = 
> +
> +The optional properties "irqdomain-map-mask" and "irqdomain-map-pass-thru" 
> may
> +be provided to help interpret the valid bits of the incoming and mapped
> +interrupt specifiers respectively.
> +
> +   Example:
> +   irqdomain-map = <22 0  36 0>, <24 0  37 0>;
> +   irqdomain-map-mask = <0xff 0>;
> +   irqdomain-map-pass-thru = <0 0xff>;

This is looking a bit familiar to the existing interrupt-map that is
used for PCI interrupts and Swizzling back to a set of PCI
host interrupts.

I tried to document interrupt-map here:
https://elinux.org/Device_Tree_Usage#Advanced_Interrupt_Mapping

interrupt-map is a bit convoluted, so I don't know if it would be subject
to reuse for this. I suspect that interrupt-map, despite the name,
is for PCI only.

Yours,
Linus Walleij

Re: [PATCH v1] perf record: collect user registers set jointly with dwarf stacks

2019-04-17 Thread Arnaldo Carvalho de Melo

Em Wed, Apr 17, 2019 at 09:39:52AM +0200, Jiri Olsa escreveu:
> On Mon, Apr 15, 2019 at 06:36:13PM +0300, Alexey Budankov wrote:
> > 
> > When dwarf stacks are collected jointly with user specified register
> > set using --user-regs option like below the full register context is
> > still captured on a sample:
> > 
> >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- 
> > matrix.gcc.g.O3
> > 
> >   188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 
> > 23828/23828: 0x401236 period: 1363819 addr: 0x7ffedbdd51ac
> >   ... FP chain: nr:0
> >   ... user regs: mask 0xff0fff ABI 64-bit
> >    AX0x53b
> >    BX0x7ffedbdd3cc0
> >    CX0x
> >    DX0x33d3a
> >    SI0x7f09b74c38d0
> >    DI0x0
> >    BP0x401260
> >    SP0x7ffedbdd3cc0
> >    IP0x401236
> >    FLAGS 0x20a
> >    CS0x33
> >    SS0x2b
> >    R80x7f09b74c3800
> >    R90x7f09b74c2da0
> >    R10   0xf3ce
> >    R11   0x246
> >    R12   0x401070
> >    R13   0x7ffedbdd5db0
> >    R14   0x0
> >    R15   0x0
> >   ... ustack: size 1024, offset 0xe0
> >. data_src: 0x5080021
> >... thread: stack_test2.g.O:23828
> >.. dso: /root/abudanko/stacks/stack_test2.g.O3
> > 
> > After applying the change suggested in the patch the sample data contain
> > only user specified register values:
> > 
> >   $ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- 
> > matrix.gcc.g.03
> > 
> >   188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 
> > 23839/23839: 0x401236 period: 1260507 addr: 0x7ffd3d85e96c
> >   ... FP chain: nr:0
> >   ... user regs: mask 0x1c0 ABI 64-bit
> >    BP0x401260
> >    SP0x7ffd3d85cc20
> >    IP0x401236
> >   ... ustack: size 1024, offset 0x58
> >. data_src: 0x5080021
> >... thread: stack_test2.g.O:23839
> >.. dso: /root/abudanko/stacks/stack_test2.g.O3
> > 
> > Signed-off-by: Alexey Budankov 
> 
> Acked-by: Jiri Olsa 

So, there are registers that are needed to do the DWARF unwinding,
right? But at the same time, if the user says only some are needed, he
better know what they're doing and ask for at least the registers needed
for the unwinding process to be successfull, right?

- Arnaldo
 
> thanks,
> jirka
> 
> > ---
> >  tools/perf/util/evsel.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 84cfb9fe2fc6..29a223b4c699 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -702,7 +702,8 @@ static void __perf_evsel__config_callchain(struct 
> > perf_evsel *evsel,
> > if (!function) {
> > perf_evsel__set_sample_bit(evsel, REGS_USER);
> > perf_evsel__set_sample_bit(evsel, STACK_USER);
> > -   attr->sample_regs_user |= PERF_REGS_MASK;
> > +   if (!opts->sample_user_regs)
> > +   attr->sample_regs_user |= PERF_REGS_MASK;
> > attr->sample_stack_user = param->dump_size;
> > attr->exclude_callchain_user = 1;
> > } else {
> > -- 
> > 2.20.1
> > 

-- 

- Arnaldo

Re: [PATCH 1/5] regulator: arizona: Switch to SPDX identifier

2019-04-17 Thread Richard Fitzgerald


On 17/04/19 15:16, Axel Lin wrote:

Signed-off-by: Axel Lin 
---
  drivers/regulator/arizona-ldo1.c| 19 +++
  drivers/regulator/arizona-micsupp.c | 19 +++
  2 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/regulator/arizona-ldo1.c b/drivers/regulator/arizona-ldo1.c
index bf3ab405eed1..e4bc7b1e5ccd 100644
--- a/drivers/regulator/arizona-ldo1.c
+++ b/drivers/regulator/arizona-ldo1.c
@@ -1,15 +1,10 @@
-/*
- * arizona-ldo1.c  --  LDO1 supply for Arizona devices
- *
- * Copyright 2012 Wolfson Microelectronics PLC.
- *
- * Author: Mark Brown 
- *
- *  This program is free software; you can redistribute  it and/or modify it
- *  under  the terms of  the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the  License, or (at your
- *  option) any later version.
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// arizona-ldo1.c  --  LDO1 supply for Arizona devices
+//
+// Copyright 2012 Wolfson Microelectronics PLC.
+//
+// Author: Mark Brown 
  
  #include 

  #include 
diff --git a/drivers/regulator/arizona-micsupp.c 
b/drivers/regulator/arizona-micsupp.c
index 120de94caf02..be0d46da51a1 100644
--- a/drivers/regulator/arizona-micsupp.c
+++ b/drivers/regulator/arizona-micsupp.c
@@ -1,15 +1,10 @@
-/*
- * arizona-micsupp.c  --  Microphone supply for Arizona devices
- *
- * Copyright 2012 Wolfson Microelectronics PLC.
- *
- * Author: Mark Brown 
- *
- *  This program is free software; you can redistribute  it and/or modify it
- *  under  the terms of  the GNU General  Public License as published by the
- *  Free Software Foundation;  either version 2 of the  License, or (at your
- *  option) any later version.
- */
+// SPDX-License-Identifier: GPL-2.0+
+//
+// arizona-micsupp.c  --  Microphone supply for Arizona devices
+//
+// Copyright 2012 Wolfson Microelectronics PLC.
+//
+// Author: Mark Brown 
  
  #include 

  #include 



Acked-by: Richard Fitzgerald

[RFC PATCH 3/3] x86: mm: Switch to using generic pt_dump

2019-04-17 Thread Steven Price

Instead of providing our own callbacks for walking the page tables,
switch to using the generic version instead.

Signed-off-by: Steven Price 
---
 arch/x86/Kconfig  |   1 +
 arch/x86/Kconfig.debug|  20 +--
 arch/x86/mm/Makefile  |   4 +-
 arch/x86/mm/dump_pagetables.c | 297 +++---
 4 files changed, 62 insertions(+), 260 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1f9b3cf437c..122c24055f02 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -106,6 +106,7 @@ config X86
select GENERIC_IRQ_RESERVATION_MODE
select GENERIC_IRQ_SHOW
select GENERIC_PENDING_IRQ  if SMP
+   select GENERIC_PTDUMP
select GENERIC_SMP_IDLE_THREAD
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 15d0fbe27872..dc1dfe213657 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -62,26 +62,10 @@ config EARLY_PRINTK_USB_XDBC
 config MCSAFE_TEST
def_bool n
 
-config X86_PTDUMP_CORE
-   def_bool n
-
-config X86_PTDUMP
-   tristate "Export kernel pagetable layout to userspace via debugfs"
-   depends on DEBUG_KERNEL
-   select DEBUG_FS
-   select X86_PTDUMP_CORE
-   ---help---
- Say Y here if you want to show the kernel pagetable layout in a
- debugfs file. This information is only useful for kernel developers
- who are working in architecture specific areas of the kernel.
- It is probably not a good idea to enable this feature in a production
- kernel.
- If in doubt, say "N"
-
 config EFI_PGT_DUMP
bool "Dump the EFI pagetable"
depends on EFI
-   select X86_PTDUMP_CORE
+   select PTDUMP_CORE
---help---
  Enable this if you want to dump the EFI page table before
  enabling virtual mode. This can be used to debug miscellaneous
@@ -90,7 +74,7 @@ config EFI_PGT_DUMP
 
 config DEBUG_WX
bool "Warn on W+X mappings at boot"
-   select X86_PTDUMP_CORE
+   select PTDUMP_CORE
---help---
  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..5233190fc6bf 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -28,8 +28,8 @@ obj-$(CONFIG_X86_PAT) += pat_rbtree.o
 obj-$(CONFIG_X86_32)   += pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP_CORE)  += dump_pagetables.o
-obj-$(CONFIG_X86_PTDUMP)   += debug_pagetables.o
+obj-$(CONFIG_PTDUMP_CORE)  += dump_pagetables.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)   += debug_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)  += highmem_32.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index f6b814aaddf7..955824c7cddb 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -30,15 +31,12 @@
  * when a "break" in the continuity is found.
  */
 struct pg_state {
+   struct ptdump_state ptdump;
int level;
-   pgprot_t current_prot;
+   pgprotval_t current_prot;
pgprotval_t effective_prot;
-   pgprotval_t effective_prot_pgd;
-   pgprotval_t effective_prot_p4d;
-   pgprotval_t effective_prot_pud;
-   pgprotval_t effective_prot_pmd;
+   pgprotval_t prot_levels[5];
unsigned long start_address;
-   unsigned long current_address;
const struct addr_marker *marker;
unsigned long lines;
bool to_dmesg;
@@ -179,9 +177,8 @@ static struct addr_marker address_markers[] = {
 /*
  * Print a readable form of a pgprot_t to the seq_file
  */
-static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool 
dmsg)
+static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool 
dmsg)
 {
-   pgprotval_t pr = pgprot_val(prot);
static const char * const level_name[] =
{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
 
@@ -228,24 +225,11 @@ static void printk_prot(struct seq_file *m, pgprot_t 
prot, int level, bool dmsg)
pt_dump_cont_printf(m, dmsg, "%s\n", level_name[level]);
 }
 
-/*
- * On 64 bits, sign-extend the 48 bit address to 64 bit
- */
-static unsigned long normalize_addr(unsigned long u)
-{
-   int shift;
-   if (!IS_ENABLED(CONFIG_X86_64))
-   return u;
-
-   shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
-   return (signed long)(u << shift) >> shift;
-}
-
-static void note_wx(struct pg_state *st)
+static void note_wx(struct pg_state *st, unsigned long addr)
 {
unsigned long npages;
 
-   npages = (st->current_address - st->start_address) / PAGE_SIZE;
+   npages = (addr - st->start_address) / PAGE_SIZE;
 
 #ifdef CONFIG_PCI_BIOS
/*
@@ -253,7 +237,7 @@ static void

[RFC PATCH 2/3] arm64: mm: Switch to using generic pt_dump

2019-04-17 Thread Steven Price

Instead of providing our own callbacks for walking the page tables,
switch to using the generic version instead.

Signed-off-by: Steven Price 
---
 arch/arm64/Kconfig  |   1 +
 arch/arm64/Kconfig.debug|  19 +-
 arch/arm64/include/asm/ptdump.h |   8 +--
 arch/arm64/mm/Makefile  |   4 +-
 arch/arm64/mm/dump.c| 104 +---
 arch/arm64/mm/ptdump_debugfs.c  |   2 +-
 6 files changed, 37 insertions(+), 101 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 117b2541ef3d..4ff55b3ce8dc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -97,6 +97,7 @@ config ARM64
select GENERIC_IRQ_SHOW
select GENERIC_IRQ_SHOW_LEVEL
select GENERIC_PCI_IOMAP
+   select GENERIC_PTDUMP
select GENERIC_SCHED_CLOCK
select GENERIC_SMP_IDLE_THREAD
select GENERIC_STRNCPY_FROM_USER
diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index 69c9170bdd24..570dba4d4a0e 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -1,21 +1,4 @@
 
-config ARM64_PTDUMP_CORE
-   def_bool n
-
-config ARM64_PTDUMP_DEBUGFS
-   bool "Export kernel pagetable layout to userspace via debugfs"
-   depends on DEBUG_KERNEL
-   select ARM64_PTDUMP_CORE
-   select DEBUG_FS
-help
- Say Y here if you want to show the kernel pagetable layout in a
- debugfs file. This information is only useful for kernel developers
- who are working in architecture specific areas of the kernel.
- It is probably not a good idea to enable this feature in a production
- kernel.
-
- If in doubt, say N.
-
 config PID_IN_CONTEXTIDR
bool "Write the current PID to the CONTEXTIDR register"
help
@@ -41,7 +24,7 @@ config ARM64_RANDOMIZE_TEXT_OFFSET
 
 config DEBUG_WX
bool "Warn on W+X mappings at boot"
-   select ARM64_PTDUMP_CORE
+   select PTDUMP_CORE
---help---
  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 9e948a93d26c..f8fecda7b61d 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -16,7 +16,7 @@
 #ifndef __ASM_PTDUMP_H
 #define __ASM_PTDUMP_H
 
-#ifdef CONFIG_ARM64_PTDUMP_CORE
+#ifdef CONFIG_PTDUMP_CORE
 
 #include 
 #include 
@@ -32,15 +32,15 @@ struct ptdump_info {
unsigned long   base_addr;
 };
 
-void ptdump_walk_pgd(struct seq_file *s, struct ptdump_info *info);
-#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
+void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
+#ifdef CONFIG_PTDUMP_DEBUGFS
 void ptdump_debugfs_register(struct ptdump_info *info, const char *name);
 #else
 static inline void ptdump_debugfs_register(struct ptdump_info *info,
   const char *name) { }
 #endif
 void ptdump_check_wx(void);
-#endif /* CONFIG_ARM64_PTDUMP_CORE */
+#endif /* CONFIG_PTDUMP_CORE */
 
 #ifdef CONFIG_DEBUG_WX
 #define debug_checkwx()ptdump_check_wx()
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 849c1df3d214..d91030f0ffee 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -4,8 +4,8 @@ obj-y   := dma-mapping.o extable.o 
fault.o init.o \
   ioremap.o mmap.o pgd.o mmu.o \
   context.o proc.o pageattr.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-obj-$(CONFIG_ARM64_PTDUMP_CORE)+= dump.o
-obj-$(CONFIG_ARM64_PTDUMP_DEBUGFS) += ptdump_debugfs.o
+obj-$(CONFIG_PTDUMP_CORE)  += dump.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)   += ptdump_debugfs.o
 obj-$(CONFIG_NUMA) += numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o
 KASAN_SANITIZE_physaddr.o  += n
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index ea20c1213498..e68df2ad8863 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -69,6 +70,7 @@ static const struct addr_marker address_markers[] = {
  * dumps out a description of the range.
  */
 struct pg_state {
+   struct ptdump_state ptdump;
struct seq_file *seq;
const struct addr_marker *marker;
unsigned long start_address;
@@ -172,6 +174,10 @@ static struct pg_level pg_level[] = {
.name   = "PGD",
.bits   = pte_bits,
.num= ARRAY_SIZE(pte_bits),
+   }, { /* p4d */
+   .name   = "P4D",
+   .bits   = pte_bits,
+   .num= ARRAY_SIZE(pte_bits),
}, { /* pud */
.name   = (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
.bits   = pte_bits,
@@ -234,9 +240,10 @@ static void note_prot_wx(struct pg_state *st, unsigned 
long addr)
st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
 }

[RFC PATCH 1/3] mm: Add generic ptdump

2019-04-17 Thread Steven Price

Add a generic version of page table dumping that architectures can
opt-in to

Signed-off-by: Steven Price 
---
 include/linux/ptdump.h |  19 +
 mm/Kconfig.debug   |  21 ++
 mm/Makefile|   1 +
 mm/ptdump.c| 159 +
 4 files changed, 200 insertions(+)
 create mode 100644 include/linux/ptdump.h
 create mode 100644 mm/ptdump.c

diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
new file mode 100644
index ..eb8e78154be3
--- /dev/null
+++ b/include/linux/ptdump.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_PTDUMP_H
+#define _LINUX_PTDUMP_H
+
+struct ptdump_range {
+   unsigned long start;
+   unsigned long end;
+};
+
+struct ptdump_state {
+   void (*note_page)(struct ptdump_state *st, unsigned long addr,
+ int level, unsigned long val);
+   const struct ptdump_range *range;
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
+
+#endif /* _LINUX_PTDUMP_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index e3df921208c0..21bbf559408b 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -111,3 +111,24 @@ config DEBUG_RODATA_TEST
 depends on STRICT_KERNEL_RWX
 ---help---
   This option enables a testcase for the setting rodata read-only.
+
+config GENERIC_PTDUMP
+   bool
+
+config PTDUMP_CORE
+   bool
+
+config PTDUMP_DEBUGFS
+   bool "Export kerenl pagetable layout to userspace via debugfs"
+   depends on DEBUG_KERNEL
+   depends on DEBUG_FS
+   depends on GENERIC_PTDUMP
+   select PTDUMP_CORE
+   help
+ Say Y here if you want to show the kernel pagetable layout in a
+ debugfs file. This information is only useful for kernel developers
+ who are working in architecture specific areas of the kernel.
+ It is probably not a good idea to enable this feature in a production
+ kernel.
+
+ If in doubt, say N.
diff --git a/mm/Makefile b/mm/Makefile
index d210cc9d6f80..59d653c3250d 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
 obj-$(CONFIG_HMM) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
+obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
diff --git a/mm/ptdump.c b/mm/ptdump.c
new file mode 100644
index ..c8e4c08ce206
--- /dev/null
+++ b/mm/ptdump.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+
+static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
+   unsigned long next, struct mm_walk *walk)
+{
+   struct ptdump_state *st = walk->private;
+   pgd_t val = READ_ONCE(*pgd);
+
+   if (pgd_large(val))
+   st->note_page(st, addr, 1, pgd_val(val));
+
+   return 0;
+}
+
+static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
+   unsigned long next, struct mm_walk *walk)
+{
+   struct ptdump_state *st = walk->private;
+   p4d_t val = READ_ONCE(*p4d);
+
+   if (p4d_large(val))
+   st->note_page(st, addr, 2, p4d_val(val));
+
+   return 0;
+}
+
+static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
+   unsigned long next, struct mm_walk *walk)
+{
+   struct ptdump_state *st = walk->private;
+   pud_t val = READ_ONCE(*pud);
+
+   if (pud_large(val))
+   st->note_page(st, addr, 3, pud_val(val));
+
+   return 0;
+}
+
+static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
+   unsigned long next, struct mm_walk *walk)
+{
+   struct ptdump_state *st = walk->private;
+   pmd_t val = READ_ONCE(*pmd);
+
+   if (pmd_large(val))
+   st->note_page(st, addr, 4, pmd_val(val));
+
+   return 0;
+}
+
+static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
+   unsigned long next, struct mm_walk *walk)
+{
+   struct ptdump_state *st = walk->private;
+
+   st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+
+   return 0;
+}
+
+#ifdef CONFIG_KASAN
+/*
+ * This is an optimization for KASAN=y case. Since all kasan page tables
+ * eventually point to the kasan_early_shadow_page we could call note_page()
+ * right away without walking through lower level page tables. This saves
+ * us dozens of seconds (minutes for 5-level config) while checking for
+ * W+X mapping or reading kernel_page_tables debugfs file.
+ */
+static inline bool kasan_page_table(struct ptdump_state *st, void *pt,
+   unsigned long addr)
+{
+   if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
+   (pgtable_l5_enabled() &&
+   __pa(pt) == __pa(kasan_early_shadow_p4d)) ||
+   __pa(pt) == __pa(kasan_early_shadow_pud)) {
+   st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+

Re: [PATCH v8 00/20] Convert x86 & arm64 to use generic page walk

2019-04-17 Thread Steven Price

On 12/04/2019 15:44, Dave Hansen wrote:
> On 4/10/19 7:56 AM, Steven Price wrote:
>> Gentle ping: who can take this? Is there anything blocking this series?
> 
> First of all, I really appreciate that you tried this.  Every open-coded
> page walk has a set of common pitfalls, but is pretty unbounded in what
> kinds of bugs it can contain.  I think this at least gets us to the
> point where some of those pitfalls won't happen.  That's cool, but I'm a
> worried that it hasn't gotten easier in the end.

My plan was to implement the generic infrastructure and then work to
remove the per-arch code for ptdump debugfs where possible. This patch
series doesn't actually get that far because I wanted to get some
confidence that the general approach would be accepted.

> Linus also had some strong opinions in the past on how page walks should
> be written.  He needs to have a look before we go much further.

Fair enough. I'll post the initial work I've done on unifying the
x86/arm64 ptdump code - the diffstat is a bit nicer on that - but
there's still work to be done so I'm posting just as an RFC.

Thanks,

Steve

Re: kernel BUG at kernel/cred.c:434!

2019-04-17 Thread Paul Moore

On Tue, Apr 16, 2019 at 10:46 AM chengjian (D)  wrote:
> On 2019/4/16 11:40, Kees Cook wrote:
> > On Mon, Apr 15, 2019 at 11:20 AM Paul Moore  wrote:
> >> On Mon, Apr 15, 2019 at 11:05 AM Oleg Nesterov  wrote:
> >>> On 04/15, Paul Moore wrote:
>  On Mon, Apr 15, 2019 at 9:43 AM Oleg Nesterov  wrote:
> > Well, acct("/proc/self/attr/current") doesn't look like a good idea, 
> > but I do
> > not know where should we put the additional check... And probably
> > "echo /proc/self/attr/current > /proc/sys/kernel/core_pattern" can hit 
> > the
> > same problem, do_coredump() does override_creds() too.
> >
> > May be just add
> >
> >  if (current->cred != current->real_cred)
> >  return -EACCES;
> >
> > into proc_pid_attr_write(), I dunno.
>  Is the problem that do_acct_process() is calling override_creds() and
>  the returned/old credentials are being freed before do_acct_process()
>  can reinstall the creds via revert_creds()?  Presumably because the
>  process accounting is causing the credentials to be replaced?
> >>> Afaics, the problem is that do_acct_process() does override_creds() and
> >>> then __kernel_write(). Which calls proc_pid_attr_write(), which in turn 
> >>> calls
> >>> selinux_setprocattr(), which does another prepare_creds() + 
> >>> commit_creds();
> >>> and commit_creds() hits
> >>>
> >>>  BUG_ON(task->cred != old);
> >> Gotcha.  In the process of looking at the backtrace I forgot about the
> >> BUG_ON() at the top of the oops message.
> >>
> >> I wonder what terrible things would happen if we changed the BUG_ON()
> >> in commit_creds to simple returning an error an error code to the
> >> caller.  There is a warning/requirement in commit_creds() function
> >> header comment that it should always return 0.
> > Would callers be expected to call abort_creds() on failure? There are
> > a number of places where it'd need fixing up. And would likely be best
> > with a __must_check marking.
> >
> > It seems like avoiding the pathological case might be simpler?
>
> Yeah, Avoiding this pathological case is a better solution.

No arguments that this is particularly messed up scenario, I'm just
trying to arrive at a solution that isn't too ugly.

> From: Yang Yingliang 
> Date: Sat, 13 Apr 2019 21:56:01 +0800
> Subject: [PATCH] fix cred bug_on
>
> Signed-off-by: Yang Yingliang 
> ---
>   kernel/acct.c| 3 ++-
>   kernel/cred.c| 6 ++
>   security/selinux/hooks.c | 2 ++
>   3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/acct.c b/kernel/acct.c
> index addf7732fb56..f2065f899eee 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -522,7 +522,8 @@ static void do_acct_process(struct bsd_acct_struct
> *acct)
>   }
>   out:
>   current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim;
> -revert_creds(orig_cred);
> +if (orig_cred == current->real_cred)// [2]
> +revert_creds(orig_cred);
>   }
>
>   /**
> diff --git a/kernel/cred.c b/kernel/cred.c
> index ecf03657e71c..c4d5ba92fb9b 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -522,6 +522,9 @@ const struct cred *override_creds(const struct cred
> *new)
>   {
>   const struct cred *old = current->cred;
>
> +if (old == new)//  [3]
> +return old;
> +
>   kdebug("override_creds(%p{%d,%d})", new,
>  atomic_read(>usage),
>  read_cred_subscribers(new));
> @@ -551,6 +554,9 @@ void revert_creds(const struct cred *old)
>   {
>   const struct cred *override = current->cred;
>
> +if (override == old)// [3]
> +return;
> +
>   kdebug("revert_creds(%p{%d,%d})", old,
>  atomic_read(>usage),
>  read_cred_subscribers(old));
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index b5017beb4ef7..bc8108e4e90f 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6590,6 +6590,8 @@ static int selinux_setprocattr(const char *name,
> void *value, size_t size)
>   goto abort_change;
>   }
>
> +if (current->cred != current->real_cred)// [1]
> +revert_creds(current->real_cred);
>   commit_creds(new);
>   return size;

Doing the revert only to then commit the creds seems really ugly to
me.  I'm tempted to simply return an error in selinux_setprocattr() if
the task's credentials are not the same as its real_cred; if we do
that I believe we should resolve this problem.  The accounting write
to the SELinux file in /proc would fail of course, but I think we can
all consider that as a positive side-effect.

While I don't think this should have a negative impact on anything
else, I haven't convinced myself of that just yet.

-- 
paul moore
www.paul-moore.com

Re: [PATCH v1 2/4] clone: add CLONE_PIDFD

2019-04-17 Thread Christian Brauner

On Wed, Apr 17, 2019 at 04:25:51PM +0200, Christian Brauner wrote:
> On Wed, Apr 17, 2019 at 04:22:54PM +0200, Oleg Nesterov wrote:
> > On 04/16, Christian Brauner wrote:
> > >
> > > + if (clone_flags & CLONE_PIDFD) {
> > > + retval = pidfd_create(pid, );
> > > + if (retval < 0)
> > > + goto bad_fork_free_pid;
> > > + pidfd = retval;
> > > + }
> > 
> > ...
> > 
> > > + if (clone_flags & CLONE_PIDFD) {
> > > + fd_install(pidfd, pidfdf);
> > > + put_user(pidfd, parent_tidptr);
> > 
> > put_user() can fail, I don't think this error should be silently ignored,

Fwiw, the same is currently done for PARENT_SETTID which seems odd as
well...

< 1 2 3 4 5 6 7 8 9 >

301 - 400 of 800 matches

Mail list logo