Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

2018-09-04 Thread Srinivas Pandruvada


[...]

> > > 
> > > This patch causes a number of statistically significant
> > > regressions
> > > (with significance of 1%) on the two systems I've tested it
> > > on.  On
> > > my
> > 
> > Sure. These patches are targeted to Atom clients where some of
> > these
> > server like workload may have some minor regression on few watts
> > TDP
> > parts.
> 
> Neither the 36% regression of fs-mark, the 21% regression of sqlite,
> nor
> the 10% regression of warsaw qualify as small.  And most of the test
> cases on the list of regressions aren't exclusively server-like, if
> at
> all.  Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
> benchmarks -- Latency is as important if not more for interactive
> workloads than it is for server workloads.  In the case of a conflict
> like the one we're dealing with right now between optimizing for
> throughput (e.g. for the maximum number of requests per second) and
> optimizing for latency (e.g. for the minimum request duration), you
> are
> more likely to be concerned about the former than about the latter in
> a
> server setup.

Eero,
Please add your test results here.

No matter which algorithm you do, there will be variations. So you have
to look at the platforms which you are targeting. For this platform 
number one item is use of less turbo and hope you know why?
On this platform GFX patch caused this issue as it was submitted after
io boost patchset. So rather that should be reverted first before
optimizing anything.


> 
> > But weighing against reduced TURBO usage (which is enough trigger)
> > and
> > improvement in tests done by Eero (which was primary complaint to
> > us).
> > 
> > It is trivial patch. Yes, the patch is not perfect and doesn't
> > close
> > doors for any improvements.
> > 
> 
> It's sort of self-contained because it's awfully incomplete.Don't
> agtr

> 
> > I see your idea, but how to implement in acceptable way is a
> > challenge.
> 
> Main challenge was getting the code to work without regressions in
> latency-sensitive workloads, which I did, because you told me that it
> wasn't acceptable for it to cause any regressions on the Phoronix
> daily-system-tracker, whether latency-bound or not.
Yes, because your intention was to have a global change not a low power
Atom specific,

>   What is left in
> order to address Peter's concerns is for the most part plumbing,
> that's
> guaranteed not to have any functional impact on the heuristic.  The
> fact
> that we don't expect it to change the performance of the system makes
> it
> substantially less time-consuming to validate than altering the
> performance trade-offs of the heuristic dynamically in order to avoid
> regressions (which is what has kept my systems busy most of the time
> lately).  Seems like my series, even in its current state without the
> changes requested by Peter is closer to being ready for production
> than
> this patch is.

Sorry, Not at all. We call such patches as experimental series.
You caused 100% regression to idle power. There is no version 2 after
that, even if you fixed locally even to look.

Thanks,
Srinivas



Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

2018-09-04 Thread Srinivas Pandruvada


[...]

> > > 
> > > This patch causes a number of statistically significant
> > > regressions
> > > (with significance of 1%) on the two systems I've tested it
> > > on.  On
> > > my
> > 
> > Sure. These patches are targeted to Atom clients where some of
> > these
> > server like workload may have some minor regression on few watts
> > TDP
> > parts.
> 
> Neither the 36% regression of fs-mark, the 21% regression of sqlite,
> nor
> the 10% regression of warsaw qualify as small.  And most of the test
> cases on the list of regressions aren't exclusively server-like, if
> at
> all.  Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
> benchmarks -- Latency is as important if not more for interactive
> workloads than it is for server workloads.  In the case of a conflict
> like the one we're dealing with right now between optimizing for
> throughput (e.g. for the maximum number of requests per second) and
> optimizing for latency (e.g. for the minimum request duration), you
> are
> more likely to be concerned about the former than about the latter in
> a
> server setup.

Eero,
Please add your test results here.

No matter which algorithm you do, there will be variations. So you have
to look at the platforms which you are targeting. For this platform 
number one item is use of less turbo and hope you know why?
On this platform GFX patch caused this issue as it was submitted after
io boost patchset. So rather that should be reverted first before
optimizing anything.


> 
> > But weighing against reduced TURBO usage (which is enough trigger)
> > and
> > improvement in tests done by Eero (which was primary complaint to
> > us).
> > 
> > It is trivial patch. Yes, the patch is not perfect and doesn't
> > close
> > doors for any improvements.
> > 
> 
> It's sort of self-contained because it's awfully incomplete.Don't
> agtr

> 
> > I see your idea, but how to implement in acceptable way is a
> > challenge.
> 
> Main challenge was getting the code to work without regressions in
> latency-sensitive workloads, which I did, because you told me that it
> wasn't acceptable for it to cause any regressions on the Phoronix
> daily-system-tracker, whether latency-bound or not.
Yes, because your intention was to have a global change not a low power
Atom specific,

>   What is left in
> order to address Peter's concerns is for the most part plumbing,
> that's
> guaranteed not to have any functional impact on the heuristic.  The
> fact
> that we don't expect it to change the performance of the system makes
> it
> substantially less time-consuming to validate than altering the
> performance trade-offs of the heuristic dynamically in order to avoid
> regressions (which is what has kept my systems busy most of the time
> lately).  Seems like my series, even in its current state without the
> changes requested by Peter is closer to being ready for production
> than
> this patch is.

Sorry, Not at all. We call such patches as experimental series.
You caused 100% regression to idle power. There is no version 2 after
that, even if you fixed locally even to look.

Thanks,
Srinivas



Re: [GIT PULL] s390 patches for the 4.19 merge window #2

2018-09-04 Thread Martin Schwidefsky
On Tue, 4 Sep 2018 17:16:31 -0700
Kees Cook  wrote:

> On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky
>  wrote:
> > Harald Freudenberger (5):
> >   s390/zcrypt: hex string mask improvements for apmask and aqmask.
> 
> This (and an earlier 2017 commit) adds VLAs, which are being
> removed[1] from the kernel:
> 
> drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable
> length array ‘clrm’ [-Wvla]
> drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable
> length array ‘setm’ [-Wvla]
> drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable
> length array ‘setm’ [-Wvla]
> 
> static int process_mask_arg(const char *str,
> unsigned long *bitmap, int bits,
> struct mutex *lock)
> ...
> DECLARE_BITMAP(clrm, bits);
> DECLARE_BITMAP(setm, bits);
> 
> Can someone please adjust this to make these fixed size again?

Oops, sorry about that. We will provide a solution for this.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [GIT PULL] s390 patches for the 4.19 merge window #2

2018-09-04 Thread Martin Schwidefsky
On Tue, 4 Sep 2018 17:16:31 -0700
Kees Cook  wrote:

> On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky
>  wrote:
> > Harald Freudenberger (5):
> >   s390/zcrypt: hex string mask improvements for apmask and aqmask.
> 
> This (and an earlier 2017 commit) adds VLAs, which are being
> removed[1] from the kernel:
> 
> drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable
> length array ‘clrm’ [-Wvla]
> drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable
> length array ‘setm’ [-Wvla]
> drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable
> length array ‘setm’ [-Wvla]
> 
> static int process_mask_arg(const char *str,
> unsigned long *bitmap, int bits,
> struct mutex *lock)
> ...
> DECLARE_BITMAP(clrm, bits);
> DECLARE_BITMAP(setm, bits);
> 
> Can someone please adjust this to make these fixed size again?

Oops, sorry about that. We will provide a solution for this.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.



Re: [PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Uwe Kleine-König
Hello,

On Wed, Sep 05, 2018 at 09:24:26AM +0800, Anson Huang wrote:
> In noirq suspend/resume stage with no_console_suspend enabled,
> imx_uart_console_write() may be called to print out log_buf
> message by printk(), so there will be race condition between
> imx_uart_console_write() and imx_uart_save/restore_context(),
> need to add lock to protect the registers save/restore operations.
> 
> Signed-off-by: Anson Huang 
> ---
>  drivers/tty/serial/imx.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 239c0fa..c15332d 100644

Acked-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Uwe Kleine-König
Hello,

On Wed, Sep 05, 2018 at 09:24:26AM +0800, Anson Huang wrote:
> In noirq suspend/resume stage with no_console_suspend enabled,
> imx_uart_console_write() may be called to print out log_buf
> message by printk(), so there will be race condition between
> imx_uart_console_write() and imx_uart_save/restore_context(),
> need to add lock to protect the registers save/restore operations.
> 
> Signed-off-by: Anson Huang 
> ---
>  drivers/tty/serial/imx.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
> index 239c0fa..c15332d 100644

Acked-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend

2018-09-04 Thread Uwe Kleine-König
On Wed, Sep 05, 2018 at 09:24:27AM +0800, Anson Huang wrote:
> On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power
> state retention), UART iomux settings will be lost, need to add
> pinctrl sleep/default mode switch during suspend/resume to make
> sure UART iomux settings are correct after resume.
> 
> Signed-off-by: Anson Huang 

Acked-by: Uwe Kleine-König 

Best regards
Uwe
-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend

2018-09-04 Thread Uwe Kleine-König
On Wed, Sep 05, 2018 at 09:24:27AM +0800, Anson Huang wrote:
> On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power
> state retention), UART iomux settings will be lost, need to add
> pinctrl sleep/default mode switch during suspend/resume to make
> sure UART iomux settings are correct after resume.
> 
> Signed-off-by: Anson Huang 

Acked-by: Uwe Kleine-König 

Best regards
Uwe
-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()

2018-09-04 Thread Neeraj Upadhyay
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and
need to be after the load of st->should_run, to prevent
reordering of the later load/stores w.r.t. the load of
st->should_run.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..eb4041f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()

2018-09-04 Thread Neeraj Upadhyay
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and
need to be after the load of st->should_run, to prevent
reordering of the later load/stores w.r.t. the load of
st->should_run.

Signed-off-by: Neeraj Upadhyay 
---
 kernel/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85..eb4041f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu)
bool bringup = st->bringup;
enum cpuhp_state state;
 
+   if (WARN_ON_ONCE(!st->should_run))
+   return;
+
/*
 * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures
 * that if we see ->should_run we also see the rest of the state.
 */
smp_mb();
 
-   if (WARN_ON_ONCE(!st->should_run))
-   return;
-
cpuhp_lock_acquire(bringup);
 
if (st->single) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
On (09/05/18 14:36), Sergey Senozhatsky wrote:
> 
> OK, thanks for the report!
> So I think that your case is CON_PRINTBUFFER related as well. We have
> a number of logbuf messages before we parse quiet and console_loglevel.
> Those messages pass the suppress_message() test. Then when we finally
> register a CON_PRINTBUFFER console we update console_loglevel and flush
> all logbuf messages, but console_loglevel does not matter anymore.
> 

Hmm, earlycon can also be affected, probably. But I'm not sure if people
do "earlycon" and "quiet" at the same time. May be, who knows...

-ss


Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
On (09/05/18 14:36), Sergey Senozhatsky wrote:
> 
> OK, thanks for the report!
> So I think that your case is CON_PRINTBUFFER related as well. We have
> a number of logbuf messages before we parse quiet and console_loglevel.
> Those messages pass the suppress_message() test. Then when we finally
> register a CON_PRINTBUFFER console we update console_loglevel and flush
> all logbuf messages, but console_loglevel does not matter anymore.
> 

Hmm, earlycon can also be affected, probably. But I'm not sure if people
do "earlycon" and "quiet" at the same time. May be, who knows...

-ss


[PATCH] x86/paravirt: fix compile warning

2018-09-04 Thread Juergen Gross
Fix a compile warning "SAVE_FLAGS redefined" introduced recently by
tip.git commit: 6da63eb241a05b0e676d6 ("x86/paravirt: Move the
pv_irq_ops under the PARAVIRT_XXL umbrella").

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 969343c72a9b..4bf42f9e4eea 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -914,7 +914,6 @@ extern void default_banner(void);
PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-#endif
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
@@ -924,6 +923,7 @@ extern void default_banner(void);
  call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);\
  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
 #endif
+#endif
 
 #endif /* CONFIG_X86_32 */
 
-- 
2.16.4



[PATCH] x86/paravirt: fix compile warning

2018-09-04 Thread Juergen Gross
Fix a compile warning "SAVE_FLAGS redefined" introduced recently by
tip.git commit: 6da63eb241a05b0e676d6 ("x86/paravirt: Move the
pv_irq_ops under the PARAVIRT_XXL umbrella").

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 969343c72a9b..4bf42f9e4eea 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -914,7 +914,6 @@ extern void default_banner(void);
PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-#endif
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
@@ -924,6 +923,7 @@ extern void default_banner(void);
  call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);\
  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
 #endif
+#endif
 
 #endif /* CONFIG_X86_32 */
 
-- 
2.16.4



Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
On (09/05/18 06:53), Hans de Goede wrote:
> > 
> > Do you use earlycon?
> 
> No, I'm seeing this with the regular/normal console stuff.

OK, thanks for the report!
So I think that your case is CON_PRINTBUFFER related as well. We have
a number of logbuf messages before we parse quiet and console_loglevel.
Those messages pass the suppress_message() test. Then when we finally
register a CON_PRINTBUFFER console we update console_loglevel and flush
all logbuf messages, but console_loglevel does not matter anymore.

> > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
> > > messages and doing the loglevel check for these while flushing as before.
> > > 
> > 
> > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
> > the case that we sort of broke CON_PRINTBUFFER handling.
> > 
> > console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
> > register CON_PRINTBUFFER console
> > -> no debug output
> > 
> > So I think that when console_unlock() re-flushes already seen logbuf
> > messages to a newly registered exclusive [CON_PRINTBUFFER] console we
> > probably need to look at the current console_loglevel in console_unlock()
> > loop.
> 
> So if it breaks quiet and the above use-case maybe we should revert
> 375899cddcbb for now?

Maybe all we need to do is to re-introduce suppress_message() check to
console_unlock() when we have exclusive console set (IOW, when we re-flush
logbuf messages to a CON_PRINTBUFFER console).

Just a demonstration of the idea. It does not look very good, tho.
I'd rather have just one suppress_message_printing() in printk code.

// This is not a proposed patch, hence the 80-cols violation.

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c036f128cdc3..231ac18423e1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2416,7 +2416,7 @@ void console_unlock(void)
break;
 
msg = log_from_idx(console_idx);
-   if (msg->flags & LOG_NOCONS) {
+   if (msg->flags & LOG_NOCONS || (exclusive_console && 
suppress_message_printing(msg->level))) {
/*
 * Skip record if !ignore_loglevel, and
 * record has level above the console loglevel.

---

We are still early in 4.19 -rcs, let's wait a bit and hear from Steven
and Petr, they have bright ideas oftentimes.

-ss


Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
On (09/05/18 06:53), Hans de Goede wrote:
> > 
> > Do you use earlycon?
> 
> No, I'm seeing this with the regular/normal console stuff.

OK, thanks for the report!
So I think that your case is CON_PRINTBUFFER related as well. We have
a number of logbuf messages before we parse quiet and console_loglevel.
Those messages pass the suppress_message() test. Then when we finally
register a CON_PRINTBUFFER console we update console_loglevel and flush
all logbuf messages, but console_loglevel does not matter anymore.

> > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
> > > messages and doing the loglevel check for these while flushing as before.
> > > 
> > 
> > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
> > the case that we sort of broke CON_PRINTBUFFER handling.
> > 
> > console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
> > register CON_PRINTBUFFER console
> > -> no debug output
> > 
> > So I think that when console_unlock() re-flushes already seen logbuf
> > messages to a newly registered exclusive [CON_PRINTBUFFER] console we
> > probably need to look at the current console_loglevel in console_unlock()
> > loop.
> 
> So if it breaks quiet and the above use-case maybe we should revert
> 375899cddcbb for now?

Maybe all we need to do is to re-introduce suppress_message() check to
console_unlock() when we have exclusive console set (IOW, when we re-flush
logbuf messages to a CON_PRINTBUFFER console).

Just a demonstration of the idea. It does not look very good, tho.
I'd rather have just one suppress_message_printing() in printk code.

// This is not a proposed patch, hence the 80-cols violation.

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c036f128cdc3..231ac18423e1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2416,7 +2416,7 @@ void console_unlock(void)
break;
 
msg = log_from_idx(console_idx);
-   if (msg->flags & LOG_NOCONS) {
+   if (msg->flags & LOG_NOCONS || (exclusive_console && 
suppress_message_printing(msg->level))) {
/*
 * Skip record if !ignore_loglevel, and
 * record has level above the console loglevel.

---

We are still early in 4.19 -rcs, let's wait a bit and hear from Steven
and Petr, they have bright ideas oftentimes.

-ss


Re: [PATCH] HID: core: fix NULL pointer dereference

2018-09-04 Thread Stefan Agner
On 29.08.2018 08:22, Gustavo A. R. Silva wrote:
> There is a NULL pointer dereference in case memory resources
> for *parse* are not successfully allocated.
> 
> Fix this by adding a new goto label and make the execution
> path jump to it in case vzalloc() fails.
> 
> Addresses-Coverity-ID: 1473081 ("Dereference after null check")
> Fixes: b2dd9f2e5a8a ("HID: core: fix memory leak on probe")
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Stefan Agner 

--
Stefan

> ---
>  drivers/hid/hid-core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
> index 4548dae..5bec924 100644
> --- a/drivers/hid/hid-core.c
> +++ b/drivers/hid/hid-core.c
> @@ -1000,7 +1000,7 @@ int hid_open_report(struct hid_device *device)
>   parser = vzalloc(sizeof(struct hid_parser));
>   if (!parser) {
>   ret = -ENOMEM;
> - goto err;
> + goto alloc_err;
>   }
>  
>   parser->device = device;
> @@ -1049,6 +1049,7 @@ int hid_open_report(struct hid_device *device)
>   hid_err(device, "item fetching failed at offset %d\n", (int)(end - 
> start));
>  err:
>   kfree(parser->collection_stack);
> +alloc_err:
>   vfree(parser);
>   hid_close_report(device);
>   return ret;


Re: [PATCH] HID: core: fix NULL pointer dereference

2018-09-04 Thread Stefan Agner
On 29.08.2018 08:22, Gustavo A. R. Silva wrote:
> There is a NULL pointer dereference in case memory resources
> for *parse* are not successfully allocated.
> 
> Fix this by adding a new goto label and make the execution
> path jump to it in case vzalloc() fails.
> 
> Addresses-Coverity-ID: 1473081 ("Dereference after null check")
> Fixes: b2dd9f2e5a8a ("HID: core: fix memory leak on probe")
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Stefan Agner 

--
Stefan

> ---
>  drivers/hid/hid-core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
> index 4548dae..5bec924 100644
> --- a/drivers/hid/hid-core.c
> +++ b/drivers/hid/hid-core.c
> @@ -1000,7 +1000,7 @@ int hid_open_report(struct hid_device *device)
>   parser = vzalloc(sizeof(struct hid_parser));
>   if (!parser) {
>   ret = -ENOMEM;
> - goto err;
> + goto alloc_err;
>   }
>  
>   parser->device = device;
> @@ -1049,6 +1049,7 @@ int hid_open_report(struct hid_device *device)
>   hid_err(device, "item fetching failed at offset %d\n", (int)(end - 
> start));
>  err:
>   kfree(parser->collection_stack);
> +alloc_err:
>   vfree(parser);
>   hid_close_report(device);
>   return ret;


Re: [PATCH 3/3] of: make default address and size cells sizes private

2018-09-04 Thread Mark Cave-Ayland
On 05/09/18 02:55, Frank Rowand wrote:

> On 08/30/18 12:05, Rob Herring wrote:
>> Only some old OpenFirmware implementations rely on default sizes. Any
>> FDT and modern implementation should have explicit properties. Make the
>> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside
>> users.
>>
>> This also gets us one step closer to removing the asm/prom.h dependency on
>> Sparc.

Just for the record: you say "any FDT and modern implementation should
have explicit properties", however the default values of these
properties when missing are clearly defined in the IEEE-1275
specification (Annex A):

"#address-cells"
Standard property name to define the package’s address format.
...
In a package with a "decode-unit" method, a missing "#address-cells"
property signifies that the number of
address cells is two.

"#size-cells"
Standard property name to define the package’s address size format.
...
A missing "#size-cells" property signifies the default value of one.

I can't speak for FDT but it isn't completely unreasonable for a guest
parsing a DT without these properties to assume these default values.


ATB,

Mark.


Re: [PATCH 3/3] of: make default address and size cells sizes private

2018-09-04 Thread Mark Cave-Ayland
On 05/09/18 02:55, Frank Rowand wrote:

> On 08/30/18 12:05, Rob Herring wrote:
>> Only some old OpenFirmware implementations rely on default sizes. Any
>> FDT and modern implementation should have explicit properties. Make the
>> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside
>> users.
>>
>> This also gets us one step closer to removing the asm/prom.h dependency on
>> Sparc.

Just for the record: you say "any FDT and modern implementation should
have explicit properties", however the default values of these
properties when missing are clearly defined in the IEEE-1275
specification (Annex A):

"#address-cells"
Standard property name to define the package’s address format.
...
In a package with a "decode-unit" method, a missing "#address-cells"
property signifies that the number of
address cells is two.

"#size-cells"
Standard property name to define the package’s address size format.
...
A missing "#size-cells" property signifies the default value of one.

I can't speak for FDT but it isn't completely unreasonable for a guest
parsing a DT without these properties to assume these default values.


ATB,

Mark.


Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Hans de Goede

Hi,

On 05-09-18 04:35, Sergey Senozhatsky wrote:

Hi,

On (09/04/18 20:01), Hans de Goede wrote:

Commit 375899cddcbb ("printk: make sure to print log on console."), moved
the checking of the loglevel of messages from flush time to the actual
log time.

This introduces one problem, some early boot messages are printed before
parse_early_param() gets called and thus before kernel commandline options
such as quiet, loglevel and ignore_loglevel are parsed.


Do you use earlycon?


No, I'm seeing this with the regular/normal console stuff.


This causes e.g. the following messages to get printed on x86 systems,
despite the presence of the "quiet" option:

[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00057fff] usable
...
[0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable

This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
messages and doing the loglevel check for these while flushing as before.



Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
the case that we sort of broke CON_PRINTBUFFER handling.

console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
register CON_PRINTBUFFER console
-> no debug output

So I think that when console_unlock() re-flushes already seen logbuf
messages to a newly registered exclusive [CON_PRINTBUFFER] console we
probably need to look at the current console_loglevel in console_unlock()
loop.


So if it breaks quiet and the above use-case maybe we should revert
375899cddcbb for now?

Regards,

Hans


Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Hans de Goede

Hi,

On 05-09-18 04:35, Sergey Senozhatsky wrote:

Hi,

On (09/04/18 20:01), Hans de Goede wrote:

Commit 375899cddcbb ("printk: make sure to print log on console."), moved
the checking of the loglevel of messages from flush time to the actual
log time.

This introduces one problem, some early boot messages are printed before
parse_early_param() gets called and thus before kernel commandline options
such as quiet, loglevel and ignore_loglevel are parsed.


Do you use earlycon?


No, I'm seeing this with the regular/normal console stuff.


This causes e.g. the following messages to get printed on x86 systems,
despite the presence of the "quiet" option:

[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00057fff] usable
...
[0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable

This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
messages and doing the loglevel check for these while flushing as before.



Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
the case that we sort of broke CON_PRINTBUFFER handling.

console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
register CON_PRINTBUFFER console
-> no debug output

So I think that when console_unlock() re-flushes already seen logbuf
messages to a newly registered exclusive [CON_PRINTBUFFER] console we
probably need to look at the current console_loglevel in console_unlock()
loop.


So if it breaks quiet and the above use-case maybe we should revert
375899cddcbb for now?

Regards,

Hans


Re: [RFC PATCH 3/5] RISC-V: Select useful GENERIC_IRQ kconfig options

2018-09-04 Thread Anup Patel
On Wed, Sep 5, 2018 at 12:26 AM, Christoph Hellwig  wrote:
> On Tue, Sep 04, 2018 at 06:15:12PM +0530, Anup Patel wrote:
>> This patch selects following GENERIC_IRQ kconfig options:
>> GENERIC_IRQ_MULTI_HANDLER
>
> This is already selected by arch/riscv/Kconfig.
>
>> GENERIC_IRQ_PROBE
>
> This is something only used by ISA drivers.  Why would we want that
> on RISC-V?

Yes, thanks for pointing. GENERIC_IRQ_PROBE is not required at
this time. I will drop this selection. May be will re-consider later.

>
>> GENERIC_IRQ_SHOW_LEVEL
>
> We don't really have any special level triggerd irq handling in
> RISC-V.  That being said this is trivial and I don't see why it
> even is a Kconfig option.  Please have a discussion with Thomas
> and Marc on why we have this option instead of a default.

Most of MMIO device interrupts are level-interrupts. In fact, all
HW interrupt lines coming to PLIC will be level-interrupts. It's
just that PLIC does not implement state machine to sample
Level-IRQs and Edge-IRQs differently. Even the interrupt-controller
virtualization in hypervisors deal with Level and Edge interrupts
differently.

I am sure we will see both Level and Edge triggered interrupts
in RISC-V system. The MMIO device interrupts will be mostly
Level triggered and PCI MSIs will be mapped as Edge triggered
by MSI controller.

We should definitely select GENERIC_IRQ_SHOW_LEVEL
so that nature of IRQ interrupt line is evident in /proc/interrupts.

>
>> HANDLE_DOMAIN_IRQ
>
> We aren't using handle_domain_irq anywhere in RISC-V, no need to
> build this.

The new RISC-V local interrupt controller driver introduced by
this patchset uses handle_domain_irq().

The main advantage of handle_domain_irq() is that it helps reduce
few lines of code which is otherwise common across interrupt-controller
drivers (mostly code related to irq_enter(), irq_exit(), and set_irq_regs()).

Regards,
Anup


Re: [RFC PATCH 3/5] RISC-V: Select useful GENERIC_IRQ kconfig options

2018-09-04 Thread Anup Patel
On Wed, Sep 5, 2018 at 12:26 AM, Christoph Hellwig  wrote:
> On Tue, Sep 04, 2018 at 06:15:12PM +0530, Anup Patel wrote:
>> This patch selects following GENERIC_IRQ kconfig options:
>> GENERIC_IRQ_MULTI_HANDLER
>
> This is already selected by arch/riscv/Kconfig.
>
>> GENERIC_IRQ_PROBE
>
> This is something only used by ISA drivers.  Why would we want that
> on RISC-V?

Yes, thanks for pointing. GENERIC_IRQ_PROBE is not required at
this time. I will drop this selection. May be will re-consider later.

>
>> GENERIC_IRQ_SHOW_LEVEL
>
> We don't really have any special level triggerd irq handling in
> RISC-V.  That being said this is trivial and I don't see why it
> even is a Kconfig option.  Please have a discussion with Thomas
> and Marc on why we have this option instead of a default.

Most of MMIO device interrupts are level-interrupts. In fact, all
HW interrupt lines coming to PLIC will be level-interrupts. It's
just that PLIC does not implement state machine to sample
Level-IRQs and Edge-IRQs differently. Even the interrupt-controller
virtualization in hypervisors deal with Level and Edge interrupts
differently.

I am sure we will see both Level and Edge triggered interrupts
in RISC-V system. The MMIO device interrupts will be mostly
Level triggered and PCI MSIs will be mapped as Edge triggered
by MSI controller.

We should definitely select GENERIC_IRQ_SHOW_LEVEL
so that nature of IRQ interrupt line is evident in /proc/interrupts.

>
>> HANDLE_DOMAIN_IRQ
>
> We aren't using handle_domain_irq anywhere in RISC-V, no need to
> build this.

The new RISC-V local interrupt controller driver introduced by
this patchset uses handle_domain_irq().

The main advantage of handle_domain_irq() is that it helps reduce
few lines of code which is otherwise common across interrupt-controller
drivers (mostly code related to irq_enter(), irq_exit(), and set_irq_regs()).

Regards,
Anup


Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible

2018-09-04 Thread Anup Patel
On Wed, Sep 5, 2018 at 12:20 AM, Christoph Hellwig  wrote:
> On Tue, Sep 04, 2018 at 06:15:10PM +0530, Anup Patel wrote:
>> The mechanism to trigger IPI is generally part of interrupt-controller
>> driver for various architectures. On RISC-V, we have an option to trigger
>> IPI using SBI or SOC vendor can implement RISC-V CPU where IPI will be
>> triggered using SOC interrupt-controller (e.g. custom PLIC).
>
> Which is exactly what we want to avoid, and should not make it easy.
>
> The last thing we need is non-standard whacky IPI mechanisms, and
> that is why we habe SBI calls for it.  I think we should simply
> stat that if an RISC-V cpu design bypasse the SBI for no good reason
> we'll simply not support it.

It's outrageous to call IPI mechanisms using interrupt-controller as "wacky".

Lot of architectures have well thought-out interrupt-controller designs with
IPI support.

In fact having IPIs through interrupt-controller drivers is much faster because
SBI call will have it's own overhead and M-mode code with eventually write
to some platform-specific/interrupt-controller register. The SBI call only makes
sense for very simple interrupt-controller (such as PLIC) which do not provide
IPI mechanism. It totally seems like SBI call for triggering IPIs was added as
workaround to address limitations of current PLIC.

RISC-V systems require a more mature and feature complete interrupt-controllers
which supports IPIs, PCI MSI, and Virtualization Extensions.

I am sure will see a much better interrupt controller (PLIC++ or something else)
soon.

>
> So NAK for this patch.

I think you jumped the gun to quickly here.

This patch does two things:
1. Adds a pluggable IPI triggering mechanism
2. Make IPI handling mechanism more generic so that we can
call IPI handler from interrupt-controller driver.

Your primary objection seems to be for point1 above. I will drop that
part only keep changes related to point2 above.

Regards,
Anup


Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible

2018-09-04 Thread Anup Patel
On Wed, Sep 5, 2018 at 12:20 AM, Christoph Hellwig  wrote:
> On Tue, Sep 04, 2018 at 06:15:10PM +0530, Anup Patel wrote:
>> The mechanism to trigger IPI is generally part of interrupt-controller
>> driver for various architectures. On RISC-V, we have an option to trigger
>> IPI using SBI or SOC vendor can implement RISC-V CPU where IPI will be
>> triggered using SOC interrupt-controller (e.g. custom PLIC).
>
> Which is exactly what we want to avoid, and should not make it easy.
>
> The last thing we need is non-standard whacky IPI mechanisms, and
> that is why we habe SBI calls for it.  I think we should simply
> stat that if an RISC-V cpu design bypasse the SBI for no good reason
> we'll simply not support it.

It's outrageous to call IPI mechanisms using interrupt-controller as "wacky".

Lot of architectures have well thought-out interrupt-controller designs with
IPI support.

In fact having IPIs through interrupt-controller drivers is much faster because
SBI call will have it's own overhead and M-mode code with eventually write
to some platform-specific/interrupt-controller register. The SBI call only makes
sense for very simple interrupt-controller (such as PLIC) which do not provide
IPI mechanism. It totally seems like SBI call for triggering IPIs was added as
workaround to address limitations of current PLIC.

RISC-V systems require a more mature and feature complete interrupt-controllers
which supports IPIs, PCI MSI, and Virtualization Extensions.

I am sure will see a much better interrupt controller (PLIC++ or something else)
soon.

>
> So NAK for this patch.

I think you jumped the gun to quickly here.

This patch does two things:
1. Adds a pluggable IPI triggering mechanism
2. Make IPI handling mechanism more generic so that we can
call IPI handler from interrupt-controller driver.

Your primary objection seems to be for point1 above. I will drop that
part only keep changes related to point2 above.

Regards,
Anup


Re: [PATCH] of: Add device_type access helper functions

2018-09-04 Thread Frank Rowand
On 08/30/18 11:52, Rob Herring wrote:
> In preparation to remove direct access to device_node.type, add
> of_node_is_type() and of_node_get_device_type() helpers to check and
> retrieve the device type.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  include/linux/of.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/of.h b/include/linux/of.h
> index a40f63a36afa..506beca9588d 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -980,6 +980,18 @@ static inline struct device_node *of_find_matching_node(
>   return of_find_matching_node_and_match(from, matches, NULL);
>  }
>  
> +static inline const char *of_node_get_device_type(const struct device_node 
> *np)
> +{
> + return of_get_property(np, "type", NULL);

I'm a little confused.  The patch comment says "remove direct access to 
device_node.type.
device_node.type is set to the value of the property "device_type" in 
populate_node(),
but of_node_get_device_type() is looking for property "type".


> +}> +
> +static inline bool of_node_is_type(const struct device_node *np, const char 
> *type)
> +{
> + const char *match = of_node_get_device_type(np);
> +
> + return np && match && type && !of_prop_cmp(match, type);
> +}
> +
>  /**
>   * of_property_count_u8_elems - Count the number of u8 elements in a property
>   *
> 

I'd like to see one or two examples of how you plan to use 
of_node_get_device_type()
(outside of of_node_is_type()) and examples of planned use of of_node_is_type() 
so
I can better understand anticipated usage.  Complete, clean patches not needed 
for
the examples, just enough text for me to see the resulting code change where the
helpers are used.

-Frank


Re: [PATCH] of: Add device_type access helper functions

2018-09-04 Thread Frank Rowand
On 08/30/18 11:52, Rob Herring wrote:
> In preparation to remove direct access to device_node.type, add
> of_node_is_type() and of_node_get_device_type() helpers to check and
> retrieve the device type.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  include/linux/of.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/of.h b/include/linux/of.h
> index a40f63a36afa..506beca9588d 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -980,6 +980,18 @@ static inline struct device_node *of_find_matching_node(
>   return of_find_matching_node_and_match(from, matches, NULL);
>  }
>  
> +static inline const char *of_node_get_device_type(const struct device_node 
> *np)
> +{
> + return of_get_property(np, "type", NULL);

I'm a little confused.  The patch comment says "remove direct access to 
device_node.type.
device_node.type is set to the value of the property "device_type" in 
populate_node(),
but of_node_get_device_type() is looking for property "type".


> +}> +
> +static inline bool of_node_is_type(const struct device_node *np, const char 
> *type)
> +{
> + const char *match = of_node_get_device_type(np);
> +
> + return np && match && type && !of_prop_cmp(match, type);
> +}
> +
>  /**
>   * of_property_count_u8_elems - Count the number of u8 elements in a property
>   *
> 

I'd like to see one or two examples of how you plan to use 
of_node_get_device_type()
(outside of of_node_is_type()) and examples of planned use of of_node_is_type() 
so
I can better understand anticipated usage.  Complete, clean patches not needed 
for
the examples, just enough text for me to see the resulting code change where the
helpers are used.

-Frank


Re: VirtIO console hangs

2018-09-04 Thread Nicholas Piggin
On Fri, 31 Aug 2018 15:17:44 +
Matteo Croce  wrote:

> On Fri, Aug 31, 2018 at 3:49 AM Nicholas Piggin  wrote:
> >
> > On Tue, 28 Aug 2018 15:00:14 +
> > Matteo Croce  wrote:
> >  
> > > On Tue, Aug 28, 2018 at 2:35 PM Nicholas Piggin  
> > > wrote:  
> > > >
> > > > On Tue, 28 Aug 2018 12:54:08 +
> > > > Matteo Croce  wrote:
> > > >  
> > > > > With kernel 4.19.0-rc1 virtio_console hangs very often.
> > > > > I can always trigger the bug by pasting some characters in the
> > > > > terminal window, the console will stop receiving keypresses, but I can
> > > > > still see output from the console.
> > > > > Stangely, logging in the VM via SSH and sending lot of data to hvc0,
> > > > > like 'dmesg >/dev/hvc0' will fix the issue until the next paste.
> > > > >
> > > > > I did a git bisect and I've found that this is the offending commit,
> > > > > reverting it fixes it.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > commit ec97eaad1383ab2500fcf9a07ade6044fbcc67f5
> > > > > Author: Nicholas Piggin 
> > > > > Date:   Tue May 1 00:55:54 2018 +1000
> > > > >
> > > > > tty: hvc: hvc_poll() break hv read loop  
> > > >
> > > > Thanks for the report. I can't immediately see what the problem
> > > > is. Can you try get a stack trace of where it is stuck?
> > > >  
> > >
> > > I tried but didn't get one.
> > >  
> > > > Perhaps try this patch if you have time (it's a bit of a shot
> > > > in the dark).
> > > >  
> > >
> > > Yes it seems to fix. Thanks!  
> >
> > Okay sorry for the delay, I can reproduce it here and found a better
> > fix, if I could trouble you to test again.
> >
> > [PATCH] tty: hvc: hvc_poll() fix read loop hang
> >  
> 
> Hi Nicholas,
> 
> the patch works, but now pasting text inside the terminal is extremely
> slow, it feels worse than a 9600 baud serial line.
> 
> Btw, I had to apply the patch by hand as it was corrupted, some lines
> were collapsed into one.

Not sure why that happened, I seem to be able to send myself
uncorrupted patches...

Can you try this patch for performance? Hopefully it applies.
--

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5414c4a87bea..f5fc3ba49130 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -49,6 +49,8 @@
 #define N_OUTBUF   16
 #define N_INBUF16
 
+#define HVC_ATOMIC_READ_MAX128
+
 #define __ALIGNED__ __attribute__((__aligned__(sizeof(long
 
 static struct tty_driver *hvc_driver;
@@ -522,6 +524,8 @@ static int hvc_write(struct tty_struct *tty, const unsigned 
char *buf, int count
return -EIO;
 
while (count > 0) {
+   int ret;
+
spin_lock_irqsave(>lock, flags);
 
rsize = hp->outbuf_size - hp->n_outbuf;
@@ -537,10 +541,13 @@ static int hvc_write(struct tty_struct *tty, const 
unsigned char *buf, int count
}
 
if (hp->n_outbuf > 0)
-   hvc_push(hp);
+   ret = hvc_push(hp);
 
spin_unlock_irqrestore(>lock, flags);
 
+   if (!ret)
+   break;
+
if (count) {
if (hp->n_outbuf > 0)
hvc_flush(hp);
@@ -669,8 +676,8 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep)
if (!hp->irq_requested)
poll_mask |= HVC_POLL_READ;
 
+ read_again:
/* Read data if any */
-
count = tty_buffer_request_room(>port, N_INBUF);
 
/* If flip is full, just reschedule a later read */
@@ -717,9 +724,23 @@ static int __hvc_poll(struct hvc_struct *hp, bool 
may_sleep)
 #endif /* CONFIG_MAGIC_SYSRQ */
tty_insert_flip_char(>port, buf[i], 0);
}
-   if (n == count)
-   poll_mask |= HVC_POLL_READ;
-   read_total = n;
+   read_total += n;
+
+   if (may_sleep) {
+   /* Keep going until the flip is full */
+   spin_unlock_irqrestore(>lock, flags);
+   cond_resched();
+   spin_lock_irqsave(>lock, flags);
+   goto read_again;
+   } else if (read_total < HVC_ATOMIC_READ_MAX) {
+   /* Break and defer if it's a large read in atomic */
+   goto read_again;
+   }
+
+   /*
+* Latency break, schedule another poll immediately.
+*/
+   poll_mask |= HVC_POLL_READ;
 
  out:
/* Wakeup write queue if necessary */


Re: VirtIO console hangs

2018-09-04 Thread Nicholas Piggin
On Fri, 31 Aug 2018 15:17:44 +
Matteo Croce  wrote:

> On Fri, Aug 31, 2018 at 3:49 AM Nicholas Piggin  wrote:
> >
> > On Tue, 28 Aug 2018 15:00:14 +
> > Matteo Croce  wrote:
> >  
> > > On Tue, Aug 28, 2018 at 2:35 PM Nicholas Piggin  
> > > wrote:  
> > > >
> > > > On Tue, 28 Aug 2018 12:54:08 +
> > > > Matteo Croce  wrote:
> > > >  
> > > > > With kernel 4.19.0-rc1 virtio_console hangs very often.
> > > > > I can always trigger the bug by pasting some characters in the
> > > > > terminal window, the console will stop receiving keypresses, but I can
> > > > > still see output from the console.
> > > > > Stangely, logging in the VM via SSH and sending lot of data to hvc0,
> > > > > like 'dmesg >/dev/hvc0' will fix the issue until the next paste.
> > > > >
> > > > > I did a git bisect and I've found that this is the offending commit,
> > > > > reverting it fixes it.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > commit ec97eaad1383ab2500fcf9a07ade6044fbcc67f5
> > > > > Author: Nicholas Piggin 
> > > > > Date:   Tue May 1 00:55:54 2018 +1000
> > > > >
> > > > > tty: hvc: hvc_poll() break hv read loop  
> > > >
> > > > Thanks for the report. I can't immediately see what the problem
> > > > is. Can you try get a stack trace of where it is stuck?
> > > >  
> > >
> > > I tried but didn't get one.
> > >  
> > > > Perhaps try this patch if you have time (it's a bit of a shot
> > > > in the dark).
> > > >  
> > >
> > > Yes it seems to fix. Thanks!  
> >
> > Okay sorry for the delay, I can reproduce it here and found a better
> > fix, if I could trouble you to test again.
> >
> > [PATCH] tty: hvc: hvc_poll() fix read loop hang
> >  
> 
> Hi Nicholas,
> 
> the patch works, but now pasting text inside the terminal is extremely
> slow, it feels worse than a 9600 baud serial line.
> 
> Btw, I had to apply the patch by hand as it was corrupted, some lines
> were collapsed into one.

Not sure why that happened, I seem to be able to send myself
uncorrupted patches...

Can you try this patch for performance? Hopefully it applies.
--

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5414c4a87bea..f5fc3ba49130 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -49,6 +49,8 @@
 #define N_OUTBUF   16
 #define N_INBUF16
 
+#define HVC_ATOMIC_READ_MAX128
+
 #define __ALIGNED__ __attribute__((__aligned__(sizeof(long
 
 static struct tty_driver *hvc_driver;
@@ -522,6 +524,8 @@ static int hvc_write(struct tty_struct *tty, const unsigned 
char *buf, int count
return -EIO;
 
while (count > 0) {
+   int ret;
+
spin_lock_irqsave(>lock, flags);
 
rsize = hp->outbuf_size - hp->n_outbuf;
@@ -537,10 +541,13 @@ static int hvc_write(struct tty_struct *tty, const 
unsigned char *buf, int count
}
 
if (hp->n_outbuf > 0)
-   hvc_push(hp);
+   ret = hvc_push(hp);
 
spin_unlock_irqrestore(>lock, flags);
 
+   if (!ret)
+   break;
+
if (count) {
if (hp->n_outbuf > 0)
hvc_flush(hp);
@@ -669,8 +676,8 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep)
if (!hp->irq_requested)
poll_mask |= HVC_POLL_READ;
 
+ read_again:
/* Read data if any */
-
count = tty_buffer_request_room(>port, N_INBUF);
 
/* If flip is full, just reschedule a later read */
@@ -717,9 +724,23 @@ static int __hvc_poll(struct hvc_struct *hp, bool 
may_sleep)
 #endif /* CONFIG_MAGIC_SYSRQ */
tty_insert_flip_char(>port, buf[i], 0);
}
-   if (n == count)
-   poll_mask |= HVC_POLL_READ;
-   read_total = n;
+   read_total += n;
+
+   if (may_sleep) {
+   /* Keep going until the flip is full */
+   spin_unlock_irqrestore(>lock, flags);
+   cond_resched();
+   spin_lock_irqsave(>lock, flags);
+   goto read_again;
+   } else if (read_total < HVC_ATOMIC_READ_MAX) {
+   /* Break and defer if it's a large read in atomic */
+   goto read_again;
+   }
+
+   /*
+* Latency break, schedule another poll immediately.
+*/
+   poll_mask |= HVC_POLL_READ;
 
  out:
/* Wakeup write queue if necessary */


linux-next: Tree for Sep 5

2018-09-04 Thread Stephen Rothwell
Hi all,

Changes since 20180904:

Dropped trees: xarray, ida (temporarily)

The net-next tree still had its build failure for which I reverted
a commit.

The devicetree tree gained a build failure for which I applied a patch.

Non-merge commits (relative to Linus' tree): 1853
 2252 files changed, 64880 insertions(+), 38423 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 286 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (28619527b8a7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging fixes/master (217c3e019675 disable stringop truncation warnings for now)
Merging kbuild-current/fixes (914b087ff9e0 kbuild: make missing $DEPMOD a 
Warning instead of an Error)
Merging arc-current/for-curr (dd45210b6dd4 ARC: don't check for HIGHMEM pages 
in arch_dma_alloc)
Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return 
for binutils 2.29+)
Merging arm64-fixes/for-next/fixes (f52bb98f5ade arm64: mm: always enable 
CONFIG_HOLES_IN_ZONE)
Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response 
format)
Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global 
invalidations when removing copro)
Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (a33710bdb6b2 net: phy: sfp: Handle unimplemented hwmon 
limits and alarms)
Merging bpf/master (597222f72a94 bpf: avoid misuse of psock when TCP_ULP_BPF 
collides with another ULP)
Merging ipsec/master (215ab0f021c9 xfrm6: call kfree_skb when skb is toobig)
Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in 
flushing set)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (53ae914d898e net/rds: Use rdma_read_gids to 
get connection SGID/DGID in IPv6)
Merging mac80211/master (fc3e3bf55f48 Merge tag 'mac80211-for-davem-2018-09-03' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging rdma-fixes/for-rc (5b394b2ddf03 Linux 4.19-rc1)
Merging sound-current/for-linus (f7c50fa636f7 ALSA: hda: Fix several mismatch 
for register mask and value)
Merging sound-asoc-fixes/for-linus (9c9b8f2dc626 Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (5b394b2ddf03 Linux 4.19-rc1)
Merging regulator-fixes/for-linus (25be6316a30a Merge branch 'regulator-4.19' 
into regulator-linus)
Merging spi-fixes/for-linus (051e00694c64 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (5b394b2ddf03 Linux 4.19-rc1)
Merging driver-core.current/driver-core-linus (57361846b52b Linux 4.19-rc2)
Merging tty.current/tty-linus (57361846b52b Linux 4.19-rc2)
Merging usb.current/usb-linus (57361846b52b Linux 4.19-rc2)
Merging usb-gadget-fixes/fixes (5b394b2ddf03 Linux 4.19-rc1)
Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: 
fix array underflow in completion handler)
Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: 
Always build ULPI code)
Merging phy/fixes (5b394b2ddf03 Linux 4.19-rc1)
Merging sta

linux-next: Tree for Sep 5

2018-09-04 Thread Stephen Rothwell
Hi all,

Changes since 20180904:

Dropped trees: xarray, ida (temporarily)

The net-next tree still had its build failure for which I reverted
a commit.

The devicetree tree gained a build failure for which I applied a patch.

Non-merge commits (relative to Linus' tree): 1853
 2252 files changed, 64880 insertions(+), 38423 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 286 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (28619527b8a7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging fixes/master (217c3e019675 disable stringop truncation warnings for now)
Merging kbuild-current/fixes (914b087ff9e0 kbuild: make missing $DEPMOD a 
Warning instead of an Error)
Merging arc-current/for-curr (dd45210b6dd4 ARC: don't check for HIGHMEM pages 
in arch_dma_alloc)
Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return 
for binutils 2.29+)
Merging arm64-fixes/for-next/fixes (f52bb98f5ade arm64: mm: always enable 
CONFIG_HOLES_IN_ZONE)
Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response 
format)
Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global 
invalidations when removing copro)
Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (a33710bdb6b2 net: phy: sfp: Handle unimplemented hwmon 
limits and alarms)
Merging bpf/master (597222f72a94 bpf: avoid misuse of psock when TCP_ULP_BPF 
collides with another ULP)
Merging ipsec/master (215ab0f021c9 xfrm6: call kfree_skb when skb is toobig)
Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in 
flushing set)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (53ae914d898e net/rds: Use rdma_read_gids to 
get connection SGID/DGID in IPv6)
Merging mac80211/master (fc3e3bf55f48 Merge tag 'mac80211-for-davem-2018-09-03' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211)
Merging rdma-fixes/for-rc (5b394b2ddf03 Linux 4.19-rc1)
Merging sound-current/for-linus (f7c50fa636f7 ALSA: hda: Fix several mismatch 
for register mask and value)
Merging sound-asoc-fixes/for-linus (9c9b8f2dc626 Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (5b394b2ddf03 Linux 4.19-rc1)
Merging regulator-fixes/for-linus (25be6316a30a Merge branch 'regulator-4.19' 
into regulator-linus)
Merging spi-fixes/for-linus (051e00694c64 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (5b394b2ddf03 Linux 4.19-rc1)
Merging driver-core.current/driver-core-linus (57361846b52b Linux 4.19-rc2)
Merging tty.current/tty-linus (57361846b52b Linux 4.19-rc2)
Merging usb.current/usb-linus (57361846b52b Linux 4.19-rc2)
Merging usb-gadget-fixes/fixes (5b394b2ddf03 Linux 4.19-rc1)
Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: 
fix array underflow in completion handler)
Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: 
Always build ULPI code)
Merging phy/fixes (5b394b2ddf03 Linux 4.19-rc1)
Merging sta

[GIT PULL] nds32 updates for v4.19

2018-09-04 Thread Greentime Hu
Hi Linus,

The following changes since commit 57361846b52bc686112da6ca5368d11210796804:

  Linux 4.19-rc2 (2018-09-02 14:37:30 -0700)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/greentime/linux.git
tags/nds32-for-linus-4.19-tag1

for you to fetch changes up to 3350139c0ff3c95724b784f7109987d533cb3ecd:

  nds32: linker script: GCOV kernel may refers data in __exit
(2018-09-05 10:16:26 +0800)


nds32 patches for 4.19

Here is the nds32 patch set based on 4.19-rc2.
Contained in here are the bug fixes, building error fixes and ftrace support
for nds32.

These are the LTP20170427 testing results.

Total Tests: 1902
Total Skipped Tests: 592
Total Failures: 420
Kernel Version: 4.19.0-rc2-00018-g2c9d30cc16f0-dirty
Machine Architecture: nds32

Signed-off-by: Greentime Hu 


Greentime Hu (5):
  nds32: fix logic for module
  nds32: Only print one page of stack when die to prevent printing
too much information.
  nds32: Fix a kernel panic issue because of wrong frame pointer access.
  nds32: fix build error because of wrong semicolon
  nds32: linker script: GCOV kernel may refers data in __exit

YueHaibing (1):
  nds32: add NULL entry to the end of_device_id array

Zong Li (12):
  nds32: Fix empty call trace
  nds32: Fix get_user/put_user macro expand pointer problem
  nds32: Clean up the coding style
  nds32: Extract the checking and getting pointer to a macro
  nds32/ftrace: Support static function tracer
  nds32/ftrace: Support static function graph tracer
  nds32/ftrace: Add RECORD_MCOUNT support
  nds32/ftrace: Support dynamic function tracer
  nds32/ftrace: Support dynamic function graph tracer
  nds32/stack: Get real return address by using ftrace_graph_ret_addr
  nds32: Remove the deprecated ABI implementation
  nds32: Add macro definition for offset of lp register on stack

 arch/nds32/Kconfig   |   4 ++
 arch/nds32/Makefile  |   4 ++
 arch/nds32/include/asm/elf.h |   4 +-
 arch/nds32/include/asm/ftrace.h  |  46 ++
 arch/nds32/include/asm/nds32.h   |   1 +
 arch/nds32/include/asm/uaccess.h | 229
-
 arch/nds32/kernel/Makefile   |   6 +++
 arch/nds32/kernel/atl2c.c|   3 +-
 arch/nds32/kernel/ex-entry.S |   2 +-
 arch/nds32/kernel/ex-exit.S  |   4 +-
 arch/nds32/kernel/ftrace.c   | 309

 arch/nds32/kernel/module.c   |   4 +-
 arch/nds32/kernel/stacktrace.c   |   6 ++-
 arch/nds32/kernel/traps.c|  42 +---
 arch/nds32/kernel/vmlinux.lds.S  |  12 ++
 scripts/recordmcount.pl  |   3 ++
 16 files changed, 527 insertions(+), 152 deletions(-)
 create mode 100644 arch/nds32/include/asm/ftrace.h
 create mode 100644 arch/nds32/kernel/ftrace.c


[GIT PULL] nds32 updates for v4.19

2018-09-04 Thread Greentime Hu
Hi Linus,

The following changes since commit 57361846b52bc686112da6ca5368d11210796804:

  Linux 4.19-rc2 (2018-09-02 14:37:30 -0700)

are available in the Git repository at:

  ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/greentime/linux.git
tags/nds32-for-linus-4.19-tag1

for you to fetch changes up to 3350139c0ff3c95724b784f7109987d533cb3ecd:

  nds32: linker script: GCOV kernel may refers data in __exit
(2018-09-05 10:16:26 +0800)


nds32 patches for 4.19

Here is the nds32 patch set based on 4.19-rc2.
Contained in here are the bug fixes, building error fixes and ftrace support
for nds32.

These are the LTP20170427 testing results.

Total Tests: 1902
Total Skipped Tests: 592
Total Failures: 420
Kernel Version: 4.19.0-rc2-00018-g2c9d30cc16f0-dirty
Machine Architecture: nds32

Signed-off-by: Greentime Hu 


Greentime Hu (5):
  nds32: fix logic for module
  nds32: Only print one page of stack when die to prevent printing
too much information.
  nds32: Fix a kernel panic issue because of wrong frame pointer access.
  nds32: fix build error because of wrong semicolon
  nds32: linker script: GCOV kernel may refers data in __exit

YueHaibing (1):
  nds32: add NULL entry to the end of_device_id array

Zong Li (12):
  nds32: Fix empty call trace
  nds32: Fix get_user/put_user macro expand pointer problem
  nds32: Clean up the coding style
  nds32: Extract the checking and getting pointer to a macro
  nds32/ftrace: Support static function tracer
  nds32/ftrace: Support static function graph tracer
  nds32/ftrace: Add RECORD_MCOUNT support
  nds32/ftrace: Support dynamic function tracer
  nds32/ftrace: Support dynamic function graph tracer
  nds32/stack: Get real return address by using ftrace_graph_ret_addr
  nds32: Remove the deprecated ABI implementation
  nds32: Add macro definition for offset of lp register on stack

 arch/nds32/Kconfig   |   4 ++
 arch/nds32/Makefile  |   4 ++
 arch/nds32/include/asm/elf.h |   4 +-
 arch/nds32/include/asm/ftrace.h  |  46 ++
 arch/nds32/include/asm/nds32.h   |   1 +
 arch/nds32/include/asm/uaccess.h | 229
-
 arch/nds32/kernel/Makefile   |   6 +++
 arch/nds32/kernel/atl2c.c|   3 +-
 arch/nds32/kernel/ex-entry.S |   2 +-
 arch/nds32/kernel/ex-exit.S  |   4 +-
 arch/nds32/kernel/ftrace.c   | 309

 arch/nds32/kernel/module.c   |   4 +-
 arch/nds32/kernel/stacktrace.c   |   6 ++-
 arch/nds32/kernel/traps.c|  42 +---
 arch/nds32/kernel/vmlinux.lds.S  |  12 ++
 scripts/recordmcount.pl  |   3 ++
 16 files changed, 527 insertions(+), 152 deletions(-)
 create mode 100644 arch/nds32/include/asm/ftrace.h
 create mode 100644 arch/nds32/kernel/ftrace.c


Re: [PATCH] of: add node name compare helper functions

2018-09-04 Thread Frank Rowand
On 08/30/18 11:52, Rob Herring wrote:
> In preparation to remove device_node.name pointer, add helper functions
> for node name comparisons which are a common pattern throughout the kernel.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/base.c  | 22 ++
>  include/linux/of.h | 13 +
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 466e3c8582f0..185bfe077d0a 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -54,6 +54,28 @@ DEFINE_MUTEX(of_mutex);
>   */
>  DEFINE_RAW_SPINLOCK(devtree_lock);
>  
> +bool of_node_name_eq(const struct device_node *np, const char *name)
> +{
> + const char *node_name;
> + size_t len;
> +
> + if (!np)
> + return false;
> +
> + node_name = kbasename(np->full_name);
> + len = strchrnul(node_name, '@') - node_name;
> +
> + return (strlen(name) == len) && (strncmp(node_name, name, len) == 0);
> +}
> +
> +bool of_node_name_prefix(const struct device_node *np, const char *prefix)

of_node_name_prefix_eq() would be more clear and consistent.

of_node_name_prefix() sounds like it would return the prefix of *np.


> +{
> + if (!np)
> + return false;
> +
> + return strncmp(kbasename(np->full_name), prefix, strlen(prefix)) == 0;
> +}
> +
>  int of_n_addr_cells(struct device_node *np)
>  {
>   u32 cells;
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 4d25e4f952d9..a40f63a36afa 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -256,6 +256,9 @@ static inline unsigned long of_read_ulong(const __be32 
> *cell, int size)
>  #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags)
>  #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags)
>  
> +extern bool of_node_name_eq(const struct device_node *np, const char *name);
> +extern bool of_node_name_prefix(const struct device_node *np, const char 
> *prefix);
> +
>  static inline const char *of_node_full_name(const struct device_node *np)
>  {
>   return np ? np->full_name : "";
> @@ -561,6 +564,16 @@ static inline struct device_node *to_of_node(const 
> struct fwnode_handle *fwnode)
>   return NULL;
>  }
>  
> +static inline bool of_node_name_eq(const struct device_node *np, const char 
> *name)
> +{
> + return false;
> +}
> +
> +static inline bool of_node_name_prefix(const struct device_node *np, const 
> char *prefix)
> +{
> + return false;
> +}
> +
>  static inline const char* of_node_full_name(const struct device_node *np)
>  {
>   return "";
> 



Re: [PATCH] of: add node name compare helper functions

2018-09-04 Thread Frank Rowand
On 08/30/18 11:52, Rob Herring wrote:
> In preparation to remove device_node.name pointer, add helper functions
> for node name comparisons which are a common pattern throughout the kernel.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/base.c  | 22 ++
>  include/linux/of.h | 13 +
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 466e3c8582f0..185bfe077d0a 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -54,6 +54,28 @@ DEFINE_MUTEX(of_mutex);
>   */
>  DEFINE_RAW_SPINLOCK(devtree_lock);
>  
> +bool of_node_name_eq(const struct device_node *np, const char *name)
> +{
> + const char *node_name;
> + size_t len;
> +
> + if (!np)
> + return false;
> +
> + node_name = kbasename(np->full_name);
> + len = strchrnul(node_name, '@') - node_name;
> +
> + return (strlen(name) == len) && (strncmp(node_name, name, len) == 0);
> +}
> +
> +bool of_node_name_prefix(const struct device_node *np, const char *prefix)

of_node_name_prefix_eq() would be more clear and consistent.

of_node_name_prefix() sounds like it would return the prefix of *np.


> +{
> + if (!np)
> + return false;
> +
> + return strncmp(kbasename(np->full_name), prefix, strlen(prefix)) == 0;
> +}
> +
>  int of_n_addr_cells(struct device_node *np)
>  {
>   u32 cells;
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 4d25e4f952d9..a40f63a36afa 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -256,6 +256,9 @@ static inline unsigned long of_read_ulong(const __be32 
> *cell, int size)
>  #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags)
>  #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags)
>  
> +extern bool of_node_name_eq(const struct device_node *np, const char *name);
> +extern bool of_node_name_prefix(const struct device_node *np, const char 
> *prefix);
> +
>  static inline const char *of_node_full_name(const struct device_node *np)
>  {
>   return np ? np->full_name : "";
> @@ -561,6 +564,16 @@ static inline struct device_node *to_of_node(const 
> struct fwnode_handle *fwnode)
>   return NULL;
>  }
>  
> +static inline bool of_node_name_eq(const struct device_node *np, const char 
> *name)
> +{
> + return false;
> +}
> +
> +static inline bool of_node_name_prefix(const struct device_node *np, const 
> char *prefix)
> +{
> + return false;
> +}
> +
>  static inline const char* of_node_full_name(const struct device_node *np)
>  {
>   return "";
> 



[PATCH] selftests/efivarfs: add required kernel configs

2018-09-04 Thread Lei Yang
add config file

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/efivarfs/config | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tools/testing/selftests/efivarfs/config

diff --git a/tools/testing/selftests/efivarfs/config 
b/tools/testing/selftests/efivarfs/config
new file mode 100644
index 000..4e151f1
--- /dev/null
+++ b/tools/testing/selftests/efivarfs/config
@@ -0,0 +1 @@
+CONFIG_EFIVAR_FS=y
-- 
1.9.1



[PATCH] selftests/efivarfs: add required kernel configs

2018-09-04 Thread Lei Yang
add config file

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/efivarfs/config | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 tools/testing/selftests/efivarfs/config

diff --git a/tools/testing/selftests/efivarfs/config 
b/tools/testing/selftests/efivarfs/config
new file mode 100644
index 000..4e151f1
--- /dev/null
+++ b/tools/testing/selftests/efivarfs/config
@@ -0,0 +1 @@
+CONFIG_EFIVAR_FS=y
-- 
1.9.1



Re: [PATCH] sysctl: kselftests: add test_core to .gitignore

2018-09-04 Thread lei yang
sorry, the title should be :"cgroup: kselftests: add test_core to 
.gitignore"


Lei



On 2018年09月05日 10:39, Lei Yang wrote:

Update .gitignore files.

Signed-off-by: Lei Yang 
---
  tools/testing/selftests/cgroup/.gitignore | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/cgroup/.gitignore 
b/tools/testing/selftests/cgroup/.gitignore
index 95eb3a5..adacda5 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1 +1,2 @@
  test_memcontrol
+test_core




Re: [PATCH] sysctl: kselftests: add test_core to .gitignore

2018-09-04 Thread lei yang
sorry, the title should be :"cgroup: kselftests: add test_core to 
.gitignore"


Lei



On 2018年09月05日 10:39, Lei Yang wrote:

Update .gitignore files.

Signed-off-by: Lei Yang 
---
  tools/testing/selftests/cgroup/.gitignore | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/cgroup/.gitignore 
b/tools/testing/selftests/cgroup/.gitignore
index 95eb3a5..adacda5 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1 +1,2 @@
  test_memcontrol
+test_core




Re: [PATCH v8 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-04 Thread Baolin Wang
Hi Jacek,

On 5 September 2018 at 04:19, Jacek Anaszewski
 wrote:
> Hi Baolin,
>
> On 09/04/2018 01:01 PM, Baolin Wang wrote:
>> This patch implements the 'pattern_set'and 'pattern_clear'
>> interfaces to support SC27XX LED breathing mode.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>> Changes from v7:
>>  - Add its own ABI documentation file.
>>
>> Changes from v6:
>>  - None.
>>
>> Changes from v5:
>>  - None.
>>
>> Changes from v4:
>>  - None.
>>
>> Changes from v3:
>>  - None.
>>
>> Changes from v2:
>>  - None.
>>
>> Changes from v1:
>>  - Remove pattern_get interface.
>> ---
>>  .../ABI/testing/sysfs-class-led-driver-sc27xx  |   11 +++
>>  drivers/leds/leds-sc27xx-bltc.c|   94 
>> 
>>  2 files changed, 105 insertions(+)
>>  create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>>
>> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
>> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>> new file mode 100644
>> index 000..ecef3ba
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>> @@ -0,0 +1,11 @@
>> +What:/sys/class/leds//hw_pattern
>> +Date:September 2018
>> +KernelVersion:   4.20
>> +Description:
>> + Specify a hardware pattern for the SC27XX LED. For the SC27XX
>> + LED controller, it only supports 4 hardware patterns to 
>> configure
>
> If I understand it correctly then the four components: low, rise, high,
> and fall time make a single pattern. So calling it "4 hardware patterns"
> can be misleading. Below you're using "stage" notion - it would be good
> to use it consequently on the whole span of this document.

Sure. I will modify the documentation to avoid misleading words.

>
>> + the low time, rise time, high time and fall time for the 
>> breathing
>> + mode, and each stage duration unit is 125ms. So the format of
>> + the hardware pattern values should be:
>
> I'd be more precise here:
>
> Min stage duration: ???
> Max stage duration: ???
> Stage duration step: 125 ms

OK.

>
>> + "brightness_1 duration_1 brightness_2 duration_2 brightness_3
>> + duration_3 brightness_4 duration_4".
>
> Looking at sc27xx_led_pattern_set() it doesn't seem like you would
> use brightnesses other then brightness_1. I assume that the low
> brightness is fixed to 0 and the high brightness is the brightness_1.
> If yes, than we don't need brightnesses in this pattern definition,
> since the current brightness will suffice.
>
> You'd need to alter the hw_pattern description to say that the
> brightness currently set is to be used as a high brightness, and the
> hw_pattern for this driver consists only of the four delta_t components.

Sorry for confusing here. For SC27XX LED, the 4 stages use one same
brightness. To be compatible with the hardware pattern format, so we
force to set one brightness for each stage. I will add some comments
to explain the LED expects the same brightness for each stage instead
of using the current brightness file which is not used for our
breathing mode.

>
> This is clear example of what I had on mind when having doubts
> about using tuples for pattern_set.
>
> Alternatively, if we want to enforce tuples format for hw_pattern,
> and if we want to be consistent - we'd need to require defining target
> brightness for each stage properly, i.e.
>
> echo "100 500 100 500 0 500 0 500" > pattern
>
> Which would mean:
>
> 1. Rise from brightness 0 to 100 for 500 ms.
> 2. Keep brightness 100 for 500 ms.
> 3. Fall from brightness 100 to 0 for 500 ms.
> 4. Keep brightness 0 for 500 ms.
>
> --
> Best regards,
> Jacek Anaszewski



-- 
Baolin Wang
Best Regards


Re: [PATCH v8 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller

2018-09-04 Thread Baolin Wang
Hi Jacek,

On 5 September 2018 at 04:19, Jacek Anaszewski
 wrote:
> Hi Baolin,
>
> On 09/04/2018 01:01 PM, Baolin Wang wrote:
>> This patch implements the 'pattern_set'and 'pattern_clear'
>> interfaces to support SC27XX LED breathing mode.
>>
>> Signed-off-by: Baolin Wang 
>> ---
>> Changes from v7:
>>  - Add its own ABI documentation file.
>>
>> Changes from v6:
>>  - None.
>>
>> Changes from v5:
>>  - None.
>>
>> Changes from v4:
>>  - None.
>>
>> Changes from v3:
>>  - None.
>>
>> Changes from v2:
>>  - None.
>>
>> Changes from v1:
>>  - Remove pattern_get interface.
>> ---
>>  .../ABI/testing/sysfs-class-led-driver-sc27xx  |   11 +++
>>  drivers/leds/leds-sc27xx-bltc.c|   94 
>> 
>>  2 files changed, 105 insertions(+)
>>  create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>>
>> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx 
>> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>> new file mode 100644
>> index 000..ecef3ba
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx
>> @@ -0,0 +1,11 @@
>> +What:/sys/class/leds//hw_pattern
>> +Date:September 2018
>> +KernelVersion:   4.20
>> +Description:
>> + Specify a hardware pattern for the SC27XX LED. For the SC27XX
>> + LED controller, it only supports 4 hardware patterns to 
>> configure
>
> If I understand it correctly then the four components: low, rise, high,
> and fall time make a single pattern. So calling it "4 hardware patterns"
> can be misleading. Below you're using "stage" notion - it would be good
> to use it consequently on the whole span of this document.

Sure. I will modify the documentation to avoid misleading words.

>
>> + the low time, rise time, high time and fall time for the 
>> breathing
>> + mode, and each stage duration unit is 125ms. So the format of
>> + the hardware pattern values should be:
>
> I'd be more precise here:
>
> Min stage duration: ???
> Max stage duration: ???
> Stage duration step: 125 ms

OK.

>
>> + "brightness_1 duration_1 brightness_2 duration_2 brightness_3
>> + duration_3 brightness_4 duration_4".
>
> Looking at sc27xx_led_pattern_set() it doesn't seem like you would
> use brightnesses other then brightness_1. I assume that the low
> brightness is fixed to 0 and the high brightness is the brightness_1.
> If yes, than we don't need brightnesses in this pattern definition,
> since the current brightness will suffice.
>
> You'd need to alter the hw_pattern description to say that the
> brightness currently set is to be used as a high brightness, and the
> hw_pattern for this driver consists only of the four delta_t components.

Sorry for confusing here. For SC27XX LED, the 4 stages use one same
brightness. To be compatible with the hardware pattern format, so we
force to set one brightness for each stage. I will add some comments
to explain the LED expects the same brightness for each stage instead
of using the current brightness file which is not used for our
breathing mode.

>
> This is clear example of what I had on mind when having doubts
> about using tuples for pattern_set.
>
> Alternatively, if we want to enforce tuples format for hw_pattern,
> and if we want to be consistent - we'd need to require defining target
> brightness for each stage properly, i.e.
>
> echo "100 500 100 500 0 500 0 500" > pattern
>
> Which would mean:
>
> 1. Rise from brightness 0 to 100 for 500 ms.
> 2. Keep brightness 100 for 500 ms.
> 3. Fall from brightness 100 to 0 for 500 ms.
> 4. Keep brightness 0 for 500 ms.
>
> --
> Best regards,
> Jacek Anaszewski



-- 
Baolin Wang
Best Regards


Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote:
> Sorry, I've been working in security too long for my
> optimistic streak to be very wide.

Eheh. So I was simply trying to follow in context, but it wasn't
entirely clear, so I tried to take it out of context and then it was
even less clear, never mind I was only looking for some more
explanation.

> Not especially, no. I have gotten considerable feedback that
> it should be configurable, and while I may not see the value myself,
> I do respect the people who've requested the configurability.

Correct me if I'm wrong, but LSM as last word "module" implies to make
this logic modular.

My wondering is because:

1) why not just a single version of this logic in the scheduler?
   (i.e. can only be on or off or perhaps a only-ibpb-dumpable tweak
   to retain current slightly higher perf but less secure behavior)

2) even if you want a myriad of versions of this IBPB logic, how do
   the different versions can possibly fit in a whole "module" whose
   semantics have pretty much nothing to do with the other methods in
   the module like LSM_HOOK_INIT(capable, cap_capable),
   LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory), and
   LSM_HOOK_INIT(mmap_addr, cap_mmap_addr), or even
   LSM_HOOK_INIT(inode_follow_link, selinux_inode_follow_link),
   LSM_HOOK_INIT(inode_permission, selinux_inode_permission). I mean
   it looks an apple to oranges monolithic link in the same module. Or
   are you going to load this method in a separate module with only a
   single method implemented and then load multiple LSM modules at the
   same time?

3) if you need so much tweaking that not even a single off/on switch
   independent of any module loaded through LSM is not enough, how is
   it ok that all the rest shouldn't be configurable? All the rest has
   more performance impact than this one so it'd start from there if
   something.

I understand how configurablity is valuable (this is why we kept
everything dynamic tunable at runtime, including the PTI stop machine
to allow even PTI TLB flushes to be turned off).

I'm just trying to understand how IBPB fits in a LSM module
implementation.

Even if you build with CONFIG_SECURITY=n PTI won't go away, retpoline
won't go away, the l1flush in vmenter won't go away, the
pte/transhugepmd inversion won't go away, why only the runtime
tunability or tweaking of IBPB fits in a LSM module?

> If they provide different answers there should be different
> functions. It's a question of viewpoint. If you don't care about
> the SELinux viewpoint you shouldn't have to include it in your
> checks, any more than you care about x86 issues when you're
> running on a MIPS processor.

I don't see how selinux fits into this. Selinux doesn't affect virtual
memory protection. Not having selinux even built in doesn't mean you
are ok with virtual memory protection not being provided by the CPU.

Or in other words selinux fits into this as much as seccomp or
apparmor fits into this.

This is just like a memory barrier mb() to be executed in the context
switch code. Security issues can emerge with the lack of it regardless
if selinux is built in and enabled or CONFIG_SECURITY=n.

selinux matters after an exploit already happened, this as opposed is
needed to prevent the exploit in the first place.

Also the correct fully secure version provides a single answer
(i.e. in theory assuming a perfect implementation that didn't forget
anything so having a single implementation will also increase the
chances that we get as close as possible to the theoretical correct
implementation).

If it provides different answers in this case it is because somebody
wants not perfect security to provide higher performance, i.e. the
"configurablity" which is valuable and fine feature to provide. Just a
LSM module doesn't seem the most flexibile way to provide
configurability by far.

> Yes, even security people have to worry about locking.

Yes it was some lock that if contended would lockup if used from
inside the scheduler.

> What *is* the most robust way?

The below pretty much explains it.

> Yes, there are locking issues. The code in the LSM infrastructure and in
> the security modules needs to be aware of that and deal with it. The SELinux
> code I proposed is more complex than it could be because the audit code
> requires locking that doesn't work in the switching context.

Or in other words having multiple versions of this called from within
the scheduler is a bit like making the kernel modular and then because
the locking is subtle you may then have scheduler deadlocks only
happening with some kernel module loaded instead of others, but the
real question is what is the payoff compared to just allowing the
scheduler code to be tuned with x86 debugfs or sysctl the normal
way.

Also how would a new common code method in LSM fit for CPUs that are
so slow that they can't possibly need anything like IBPB and
flush_RSB()?

Thanks!
Andrea


Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote:
> Sorry, I've been working in security too long for my
> optimistic streak to be very wide.

Eheh. So I was simply trying to follow in context, but it wasn't
entirely clear, so I tried to take it out of context and then it was
even less clear, never mind I was only looking for some more
explanation.

> Not especially, no. I have gotten considerable feedback that
> it should be configurable, and while I may not see the value myself,
> I do respect the people who've requested the configurability.

Correct me if I'm wrong, but LSM as last word "module" implies to make
this logic modular.

My wondering is because:

1) why not just a single version of this logic in the scheduler?
   (i.e. can only be on or off or perhaps a only-ibpb-dumpable tweak
   to retain current slightly higher perf but less secure behavior)

2) even if you want a myriad of versions of this IBPB logic, how do
   the different versions can possibly fit in a whole "module" whose
   semantics have pretty much nothing to do with the other methods in
   the module like LSM_HOOK_INIT(capable, cap_capable),
   LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory), and
   LSM_HOOK_INIT(mmap_addr, cap_mmap_addr), or even
   LSM_HOOK_INIT(inode_follow_link, selinux_inode_follow_link),
   LSM_HOOK_INIT(inode_permission, selinux_inode_permission). I mean
   it looks an apple to oranges monolithic link in the same module. Or
   are you going to load this method in a separate module with only a
   single method implemented and then load multiple LSM modules at the
   same time?

3) if you need so much tweaking that not even a single off/on switch
   independent of any module loaded through LSM is not enough, how is
   it ok that all the rest shouldn't be configurable? All the rest has
   more performance impact than this one so it'd start from there if
   something.

I understand how configurablity is valuable (this is why we kept
everything dynamic tunable at runtime, including the PTI stop machine
to allow even PTI TLB flushes to be turned off).

I'm just trying to understand how IBPB fits in a LSM module
implementation.

Even if you build with CONFIG_SECURITY=n PTI won't go away, retpoline
won't go away, the l1flush in vmenter won't go away, the
pte/transhugepmd inversion won't go away, why only the runtime
tunability or tweaking of IBPB fits in a LSM module?

> If they provide different answers there should be different
> functions. It's a question of viewpoint. If you don't care about
> the SELinux viewpoint you shouldn't have to include it in your
> checks, any more than you care about x86 issues when you're
> running on a MIPS processor.

I don't see how selinux fits into this. Selinux doesn't affect virtual
memory protection. Not having selinux even built in doesn't mean you
are ok with virtual memory protection not being provided by the CPU.

Or in other words selinux fits into this as much as seccomp or
apparmor fits into this.

This is just like a memory barrier mb() to be executed in the context
switch code. Security issues can emerge with the lack of it regardless
if selinux is built in and enabled or CONFIG_SECURITY=n.

selinux matters after an exploit already happened, this as opposed is
needed to prevent the exploit in the first place.

Also the correct fully secure version provides a single answer
(i.e. in theory assuming a perfect implementation that didn't forget
anything so having a single implementation will also increase the
chances that we get as close as possible to the theoretical correct
implementation).

If it provides different answers in this case it is because somebody
wants not perfect security to provide higher performance, i.e. the
"configurablity" which is valuable and fine feature to provide. Just a
LSM module doesn't seem the most flexibile way to provide
configurability by far.

> Yes, even security people have to worry about locking.

Yes it was some lock that if contended would lockup if used from
inside the scheduler.

> What *is* the most robust way?

The below pretty much explains it.

> Yes, there are locking issues. The code in the LSM infrastructure and in
> the security modules needs to be aware of that and deal with it. The SELinux
> code I proposed is more complex than it could be because the audit code
> requires locking that doesn't work in the switching context.

Or in other words having multiple versions of this called from within
the scheduler is a bit like making the kernel modular and then because
the locking is subtle you may then have scheduler deadlocks only
happening with some kernel module loaded instead of others, but the
real question is what is the payoff compared to just allowing the
scheduler code to be tuned with x86 debugfs or sysctl the normal
way.

Also how would a new common code method in LSM fit for CPUs that are
so slow that they can't possibly need anything like IBPB and
flush_RSB()?

Thanks!
Andrea


[PATCH] sysctl: kselftests: add test_core to .gitignore

2018-09-04 Thread Lei Yang
Update .gitignore files.

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/cgroup/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/cgroup/.gitignore 
b/tools/testing/selftests/cgroup/.gitignore
index 95eb3a5..adacda5 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1 +1,2 @@
 test_memcontrol
+test_core
-- 
1.9.1



[PATCH] sysctl: kselftests: add test_core to .gitignore

2018-09-04 Thread Lei Yang
Update .gitignore files.

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/cgroup/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/cgroup/.gitignore 
b/tools/testing/selftests/cgroup/.gitignore
index 95eb3a5..adacda5 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1 +1,2 @@
 test_memcontrol
+test_core
-- 
1.9.1



Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
Hi,

On (09/04/18 20:01), Hans de Goede wrote:
> Commit 375899cddcbb ("printk: make sure to print log on console."), moved
> the checking of the loglevel of messages from flush time to the actual
> log time.
> 
> This introduces one problem, some early boot messages are printed before
> parse_early_param() gets called and thus before kernel commandline options
> such as quiet, loglevel and ignore_loglevel are parsed.

Do you use earlycon?

> This causes e.g. the following messages to get printed on x86 systems,
> despite the presence of the "quiet" option:
> 
> [0.00] BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x-0x00057fff] usable
> ...
> [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable
> 
> This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
> messages and doing the loglevel check for these while flushing as before.
> 

Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
the case that we sort of broke CON_PRINTBUFFER handling.

console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
register CON_PRINTBUFFER console
-> no debug output

So I think that when console_unlock() re-flushes already seen logbuf
messages to a newly registered exclusive [CON_PRINTBUFFER] console we
probably need to look at the current console_loglevel in console_unlock()
loop.

-ss


Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer

2018-09-04 Thread Sergey Senozhatsky
Hi,

On (09/04/18 20:01), Hans de Goede wrote:
> Commit 375899cddcbb ("printk: make sure to print log on console."), moved
> the checking of the loglevel of messages from flush time to the actual
> log time.
> 
> This introduces one problem, some early boot messages are printed before
> parse_early_param() gets called and thus before kernel commandline options
> such as quiet, loglevel and ignore_loglevel are parsed.

Do you use earlycon?

> This causes e.g. the following messages to get printed on x86 systems,
> despite the presence of the "quiet" option:
> 
> [0.00] BIOS-provided physical RAM map:
> [0.00] BIOS-e820: [mem 0x-0x00057fff] usable
> ...
> [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable
> 
> This commit fixes this by setting a new LOG_CHK_LEVEL on early boot
> messages and doing the loglevel check for these while flushing as before.
> 

Hmm, OK, chances are we need to re-think 375899cddcbb. It might be
the case that we sort of broke CON_PRINTBUFFER handling.

console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH
register CON_PRINTBUFFER console
-> no debug output

So I think that when console_unlock() re-flushes already seen logbuf
messages to a newly registered exclusive [CON_PRINTBUFFER] console we
probably need to look at the current console_loglevel in console_unlock()
loop.

-ss


[PATCH] leds: leds-gpio: Add a condition check for active low leds.

2018-09-04 Thread Song Qiang
Some leds on our board are active low leds, which means these leds
are lighted when the corresponding gpio line is low, while the
original leds-gpio driver default all leds are active high leds.
This patch adds a devicetree node "light-state", whose value should
be "high" for active high leds and "low" for active low leds.
The default value is "high" for compatible for the original driver.

Signed-off-by: Song Qiang 
---
 .../devicetree/bindings/leds/leds-gpio.txt| 15 +++
 drivers/leds/leds-gpio.c  | 25 +--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/leds/leds-gpio.txt 
b/Documentation/devicetree/bindings/leds/leds-gpio.txt
index a48dda268f81..0a8fad75c704 100644
--- a/Documentation/devicetree/bindings/leds/leds-gpio.txt
+++ b/Documentation/devicetree/bindings/leds/leds-gpio.txt
@@ -23,6 +23,9 @@ LED sub-node properties:
   remains up.
 - panic-indicator : (optional)
   see Documentation/devicetree/bindings/leds/common.txt
+- light-state: (optional) Values should be "high" or "low", which indicates
+   the state of the GPIO pin when the led is on.
+  see Documentation/devicetree/bindings/leds/common.txt
 
 Examples:
 
@@ -64,3 +67,15 @@ leds {
retain-state-suspended;
};
 };
+
+leds {
+   compatible = "gpio-leds";
+
+   led0 {
+   label = "led0";
+   gpios = < 19 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "heartbeat";
+   default-state = "off";
+   light-state = "low";
+   };
+}
diff --git a/drivers/leds/leds-gpio.c b/drivers/leds/leds-gpio.c
index 764c31301f90..463ded0c71ac 100644
--- a/drivers/leds/leds-gpio.c
+++ b/drivers/leds/leds-gpio.c
@@ -26,9 +26,13 @@ struct gpio_led_data {
struct gpio_desc *gpiod;
u8 can_sleep;
u8 blinking;
+   u8 light_state;
gpio_blink_set_t platform_gpio_blink_set;
 };
 
+#define LEDS_GPIO_LIGHTSTATE_HIGH  0
+#define LEDS_GPIO_LIGHTSTATE_LOW   1
+
 static inline struct gpio_led_data *
cdev_to_gpio_led_data(struct led_classdev *led_cdev)
 {
@@ -42,9 +46,15 @@ static void gpio_led_set(struct led_classdev *led_cdev,
int level;
 
if (value == LED_OFF)
-   level = 0;
+   if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH)
+   level = 0;
+   else
+   level = 1;
else
-   level = 1;
+   if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH)
+   level = 1;
+   else
+   level = 0;
 
if (led_dat->blinking) {
led_dat->platform_gpio_blink_set(led_dat->gpiod, level,
@@ -205,6 +215,17 @@ static struct gpio_leds_priv *gpio_leds_create(struct 
platform_device *pdev)
led.default_state = LEDS_GPIO_DEFSTATE_OFF;
}
 
+   if (!fwnode_property_read_string(child, "light-state",
+)) {
+   if (!strcmp(state, "low"))
+   led_dat->light_state =
+   LEDS_GPIO_LIGHTSTATE_LOW;
+   else{
+   led_dat->light_state =
+   LEDS_GPIO_LIGHTSTATE_HIGH;
+   }
+   }
+
if (fwnode_property_present(child, "retain-state-suspended"))
led.retain_state_suspended = 1;
if (fwnode_property_present(child, "retain-state-shutdown"))
-- 
2.17.1



[PATCH] leds: leds-gpio: Add a condition check for active low leds.

2018-09-04 Thread Song Qiang
Some leds on our board are active low leds, which means these leds
are lighted when the corresponding gpio line is low, while the
original leds-gpio driver default all leds are active high leds.
This patch adds a devicetree node "light-state", whose value should
be "high" for active high leds and "low" for active low leds.
The default value is "high" for compatible for the original driver.

Signed-off-by: Song Qiang 
---
 .../devicetree/bindings/leds/leds-gpio.txt| 15 +++
 drivers/leds/leds-gpio.c  | 25 +--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/leds/leds-gpio.txt 
b/Documentation/devicetree/bindings/leds/leds-gpio.txt
index a48dda268f81..0a8fad75c704 100644
--- a/Documentation/devicetree/bindings/leds/leds-gpio.txt
+++ b/Documentation/devicetree/bindings/leds/leds-gpio.txt
@@ -23,6 +23,9 @@ LED sub-node properties:
   remains up.
 - panic-indicator : (optional)
   see Documentation/devicetree/bindings/leds/common.txt
+- light-state: (optional) Values should be "high" or "low", which indicates
+   the state of the GPIO pin when the led is on.
+  see Documentation/devicetree/bindings/leds/common.txt
 
 Examples:
 
@@ -64,3 +67,15 @@ leds {
retain-state-suspended;
};
 };
+
+leds {
+   compatible = "gpio-leds";
+
+   led0 {
+   label = "led0";
+   gpios = < 19 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "heartbeat";
+   default-state = "off";
+   light-state = "low";
+   };
+}
diff --git a/drivers/leds/leds-gpio.c b/drivers/leds/leds-gpio.c
index 764c31301f90..463ded0c71ac 100644
--- a/drivers/leds/leds-gpio.c
+++ b/drivers/leds/leds-gpio.c
@@ -26,9 +26,13 @@ struct gpio_led_data {
struct gpio_desc *gpiod;
u8 can_sleep;
u8 blinking;
+   u8 light_state;
gpio_blink_set_t platform_gpio_blink_set;
 };
 
+#define LEDS_GPIO_LIGHTSTATE_HIGH  0
+#define LEDS_GPIO_LIGHTSTATE_LOW   1
+
 static inline struct gpio_led_data *
cdev_to_gpio_led_data(struct led_classdev *led_cdev)
 {
@@ -42,9 +46,15 @@ static void gpio_led_set(struct led_classdev *led_cdev,
int level;
 
if (value == LED_OFF)
-   level = 0;
+   if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH)
+   level = 0;
+   else
+   level = 1;
else
-   level = 1;
+   if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH)
+   level = 1;
+   else
+   level = 0;
 
if (led_dat->blinking) {
led_dat->platform_gpio_blink_set(led_dat->gpiod, level,
@@ -205,6 +215,17 @@ static struct gpio_leds_priv *gpio_leds_create(struct 
platform_device *pdev)
led.default_state = LEDS_GPIO_DEFSTATE_OFF;
}
 
+   if (!fwnode_property_read_string(child, "light-state",
+)) {
+   if (!strcmp(state, "low"))
+   led_dat->light_state =
+   LEDS_GPIO_LIGHTSTATE_LOW;
+   else{
+   led_dat->light_state =
+   LEDS_GPIO_LIGHTSTATE_HIGH;
+   }
+   }
+
if (fwnode_property_present(child, "retain-state-suspended"))
led.retain_state_suspended = 1;
if (fwnode_property_present(child, "retain-state-shutdown"))
-- 
2.17.1



[PATCH 0/2] add new UniPhier PCIe host driver

2018-09-04 Thread Kunihiko Hayashi
This series adds PCIe host controller driver for Socionext UniPhier SoCs.
This controller is based on the DesignWare PCIe core. This driver
supports LD20 and PXs3 SoCs.

Kunihiko Hayashi (2):
  dt-bindings: pci: add UniPhier PCIe host controller description
  pci: dwc: add UniPhier PCIe host controller support

 .../devicetree/bindings/pci/uniphier-pcie.txt  |  78 
 drivers/pci/controller/dwc/Kconfig |   9 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pcie-uniphier.c | 464 +
 4 files changed, 552 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt
 create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c

-- 
2.7.4



[PATCH 2/2] pci: dwc: add UniPhier PCIe host controller support

2018-09-04 Thread Kunihiko Hayashi
This introduces specific glue layer for UniPhier platform to support
PCIe host controller that is based on the Designware PCIe Core, and
this driver supports Root Complex (host) mode.

Signed-off-by: Kunihiko Hayashi 
---
 drivers/pci/controller/dwc/Kconfig |   9 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pcie-uniphier.c | 464 +
 3 files changed, 474 insertions(+)
 create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c

diff --git a/drivers/pci/controller/dwc/Kconfig 
b/drivers/pci/controller/dwc/Kconfig
index 91b0194..d8fdb02 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -193,4 +193,13 @@ config PCIE_HISI_STB
help
   Say Y here if you want PCIe controller support on HiSilicon STB SoCs
 
+config PCIE_UNIPHIER
+   bool "Socionext UniPhier PCIe controllers"
+   depends on OF && (ARCH_UNIPHIER || COMPILE_TEST)
+   depends on PCI_MSI_IRQ_DOMAIN
+   select PCIE_DW_HOST
+   help
+ Say Y here if you want PCIe controller support on UniPhier SoCs.
+ This driver supports LD20 and PXs3 SoCs.
+
 endmenu
diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index 5d2ce72..cbde733 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
 obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o
 obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o
+obj-$(CONFIG_PCIE_UNIPHIER) += pcie-uniphier.o
 
 # The following drivers are for devices that use the generic ACPI
 # pci_root.c driver but don't support standard ECAM config access.
diff --git a/drivers/pci/controller/dwc/pcie-uniphier.c 
b/drivers/pci/controller/dwc/pcie-uniphier.c
new file mode 100644
index 000..13ce02e
--- /dev/null
+++ b/drivers/pci/controller/dwc/pcie-uniphier.c
@@ -0,0 +1,464 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// PCI-express host controller driver for UniPhier SoCs
+// Copyright 2018 Socionext Inc.
+// Author: Kunihiko Hayashi 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define PCL_PINCTRL0   0x002c
+#define PCL_PERST_PLDN_REGEN   BIT(12)
+#define PCL_PERST_NOE_REGENBIT(11)
+#define PCL_PERST_OUT_REGENBIT(8)
+#define PCL_PERST_PLDN_REGVAL  BIT(4)
+#define PCL_PERST_NOE_REGVAL   BIT(3)
+#define PCL_PERST_OUT_REGVAL   BIT(0)
+
+#define PCL_PIPEMON0x0044
+#define PCL_PCLK_ALIVE BIT(15)
+
+#define PCL_APP_READY_CTRL 0x8008
+#define PCL_APP_LTSSM_ENABLE   BIT(0)
+
+#define PCL_APP_PM00x8078
+#define PCL_SYS_AUX_PWR_DETBIT(8)
+
+#define PCL_RCV_INT0x8108
+#define PCL_CFG_BW_MGT_ENABLE  BIT(20)
+#define PCL_CFG_LINK_AUTO_BW_ENABLEBIT(19)
+#define PCL_CFG_AER_RC_ERR_MSI_ENABLE  BIT(18)
+#define PCL_CFG_PME_MSI_ENABLE BIT(17)
+#define PCL_CFG_BW_MGT_STATUS  BIT(4)
+#define PCL_CFG_LINK_AUTO_BW_STATUSBIT(3)
+#define PCL_CFG_AER_RC_ERR_MSI_STATUS  BIT(2)
+#define PCL_CFG_PME_MSI_STATUS BIT(1)
+#define PCL_RCV_INT_ALL_ENABLE \
+   (PCL_CFG_BW_MGT_ENABLE | PCL_CFG_LINK_AUTO_BW_ENABLE \
+| PCL_CFG_AER_RC_ERR_MSI_ENABLE | PCL_CFG_PME_MSI_ENABLE)
+
+#define PCL_RCV_INTX   0x810c
+#define PCL_RADM_INTD_ENABLE   BIT(19)
+#define PCL_RADM_INTC_ENABLE   BIT(18)
+#define PCL_RADM_INTB_ENABLE   BIT(17)
+#define PCL_RADM_INTA_ENABLE   BIT(16)
+#define PCL_RADM_INTD_STATUS   BIT(3)
+#define PCL_RADM_INTC_STATUS   BIT(2)
+#define PCL_RADM_INTB_STATUS   BIT(1)
+#define PCL_RADM_INTA_STATUS   BIT(0)
+#define PCL_RCV_INTX_ALL_ENABLE\
+   (PCL_RADM_INTD_ENABLE | PCL_RADM_INTC_ENABLE \
+| PCL_RADM_INTB_ENABLE | PCL_RADM_INTA_ENABLE)
+
+#define PCL_STATUS_LINK0x8140
+#define PCL_RDLH_LINK_UP   BIT(1)
+#define PCL_XMLH_LINK_UP   BIT(0)
+
+struct uniphier_pcie_priv {
+   void __iomem *base;
+   struct dw_pcie pci;
+   struct clk *clk;
+   struct reset_control *rst;
+   struct phy *phy;
+   struct irq_domain *irq_domain;
+};
+
+#define to_uniphier_pcie(x)dev_get_drvdata((x)->dev)
+
+static void uniphier_pcie_ltssm_enable(struct uniphier_pcie_priv *priv)
+{
+   u32 val;
+
+   val = readl(priv->base + PCL_APP_READY_CTRL);
+   val |= PCL_APP_LTSSM_ENABLE;
+   writel(val, priv->base + PCL_APP_READY_CTRL);
+}
+
+static void uniphier_pcie_ltssm_disable(struct uniphier_pcie_priv *priv)
+{
+   u32 val;
+
+   val = readl(priv->base + PCL_APP_READY_CTRL);
+   val &= 

[PATCH 1/2] dt-bindings: pci: add UniPhier PCIe host controller description

2018-09-04 Thread Kunihiko Hayashi
Add DT bindings for PCIe controller implemented in UniPhier SoCs when
configured in Root Complex (host) mode. This controller is based on
the Designware PCIe Core.

Signed-off-by: Kunihiko Hayashi 
---
 .../devicetree/bindings/pci/uniphier-pcie.txt  | 78 ++
 1 file changed, 78 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt

diff --git a/Documentation/devicetree/bindings/pci/uniphier-pcie.txt 
b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt
new file mode 100644
index 000..ea63f78
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt
@@ -0,0 +1,78 @@
+Socionext UniPhier PCI-express host controller bindings
+
+This describes the devicetree bindings for PCI-express host controller
+implemented on Socionext UniPhier SoCs.
+
+UniPhier PCI-express host controller is based on the Synopsys DesignWare
+PCI core. It shares common functions with the PCIe DesignWare core driver
+and inherits common properties defined in
+Documentation/devicetree/bindings/pci/designware-pcie.txt.
+
+Required properties:
+- compatible: Should be "socionext,uniphier-pcie".
+- reg: Specifies offset and length of the register set for the device.
+   According to the reg-names, appropriate register sets are required.
+- reg-names: Must include the following entries:
+"dbi"- controller configuration registers
+"link"   - SoC-specific glue layer registers
+"config" - PCIe configuration space
+- clocks: A phandle to the clock gate for PCIe glue layer including
+   the host controller.
+- resets: A phandle to the reset line for PCIe glue layer including
+   the host controller.
+- interrupts: A list of interrupt specifiers. According to the
+   interrupt-names, appropriate interrupts are required.
+- interrupt-names: Must include the following entries:
+"dma"- DMA interrupt
+"msi"- MSI interrupt
+"intx"   - Legacy INTA/B/C/D interrupt
+
+Optional properties:
+- phys: A phandle to generic PCIe PHY. According to the phy-names, appropriate
+   phys are required.
+- phy-names: Must be "pcie-phy".
+
+Required sub-node:
+- interrupt-controller: Specifies interrupt controller for legacy PCI
+   interrupts. The node name isn't important.
+
+Required properties for interrupt-controller:
+- interrupt-controller: identifies the node as an interrupt controller.
+- #interrupt-cells: specifies the number of cells needed to encode an
+   interrupt source. The value must be 1.
+
+Example:
+
+   pcie: pcie@6600 {
+   compatible = "socionext,uniphier-pcie", "snps,dw-pcie";
+   status = "disabled";
+   reg-names = "dbi", "link", "config";
+   reg = <0x6600 0x1000>, <0x6601 0x1>,
+ <0x2fff 0x1>;
+   #address-cells = <3>;
+   #size-cells = <2>;
+   clocks = <_clk 24>;
+   resets = <_rst 24>;
+   num-lanes = <1>;
+   num-viewport = <1>;
+   bus-range = <0x0 0xff>;
+   device_type = "pci";
+   ranges =
+   /* downstream I/O */
+   <0x8100 0 0x  0x2ffe  0 0x0001
+   /* non-prefetchable memory */
+0x8200 0 0x  0x2000  0 0x0ffe>;
+   #interrupt-cells = <1>;
+   interrupt-names = "dma", "msi", "intx";
+   interrupts = <0 224 4>, <0 225 4>, <0 226 4>;
+   interrupt-map-mask = <0 0 0  7>;
+   interrupt-map = <0 0 0  1  _intc 0>,   /* INTA */
+   <0 0 0  2  _intc 1>,   /* INTB */
+   <0 0 0  3  _intc 2>,   /* INTC */
+   <0 0 0  4  _intc 3>;   /* INTD */
+
+   pcie_intc: interrupt-controller {
+   interrupt-controller;
+   #interrupt-cells = <1>;
+   };
+   };
-- 
2.7.4



[PATCH 0/2] add new UniPhier PCIe host driver

2018-09-04 Thread Kunihiko Hayashi
This series adds PCIe host controller driver for Socionext UniPhier SoCs.
This controller is based on the DesignWare PCIe core. This driver
supports LD20 and PXs3 SoCs.

Kunihiko Hayashi (2):
  dt-bindings: pci: add UniPhier PCIe host controller description
  pci: dwc: add UniPhier PCIe host controller support

 .../devicetree/bindings/pci/uniphier-pcie.txt  |  78 
 drivers/pci/controller/dwc/Kconfig |   9 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pcie-uniphier.c | 464 +
 4 files changed, 552 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt
 create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c

-- 
2.7.4



[PATCH 2/2] pci: dwc: add UniPhier PCIe host controller support

2018-09-04 Thread Kunihiko Hayashi
This introduces specific glue layer for UniPhier platform to support
PCIe host controller that is based on the Designware PCIe Core, and
this driver supports Root Complex (host) mode.

Signed-off-by: Kunihiko Hayashi 
---
 drivers/pci/controller/dwc/Kconfig |   9 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pcie-uniphier.c | 464 +
 3 files changed, 474 insertions(+)
 create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c

diff --git a/drivers/pci/controller/dwc/Kconfig 
b/drivers/pci/controller/dwc/Kconfig
index 91b0194..d8fdb02 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -193,4 +193,13 @@ config PCIE_HISI_STB
help
   Say Y here if you want PCIe controller support on HiSilicon STB SoCs
 
+config PCIE_UNIPHIER
+   bool "Socionext UniPhier PCIe controllers"
+   depends on OF && (ARCH_UNIPHIER || COMPILE_TEST)
+   depends on PCI_MSI_IRQ_DOMAIN
+   select PCIE_DW_HOST
+   help
+ Say Y here if you want PCIe controller support on UniPhier SoCs.
+ This driver supports LD20 and PXs3 SoCs.
+
 endmenu
diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index 5d2ce72..cbde733 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
 obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o
 obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o
+obj-$(CONFIG_PCIE_UNIPHIER) += pcie-uniphier.o
 
 # The following drivers are for devices that use the generic ACPI
 # pci_root.c driver but don't support standard ECAM config access.
diff --git a/drivers/pci/controller/dwc/pcie-uniphier.c 
b/drivers/pci/controller/dwc/pcie-uniphier.c
new file mode 100644
index 000..13ce02e
--- /dev/null
+++ b/drivers/pci/controller/dwc/pcie-uniphier.c
@@ -0,0 +1,464 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// PCI-express host controller driver for UniPhier SoCs
+// Copyright 2018 Socionext Inc.
+// Author: Kunihiko Hayashi 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define PCL_PINCTRL0   0x002c
+#define PCL_PERST_PLDN_REGEN   BIT(12)
+#define PCL_PERST_NOE_REGENBIT(11)
+#define PCL_PERST_OUT_REGENBIT(8)
+#define PCL_PERST_PLDN_REGVAL  BIT(4)
+#define PCL_PERST_NOE_REGVAL   BIT(3)
+#define PCL_PERST_OUT_REGVAL   BIT(0)
+
+#define PCL_PIPEMON0x0044
+#define PCL_PCLK_ALIVE BIT(15)
+
+#define PCL_APP_READY_CTRL 0x8008
+#define PCL_APP_LTSSM_ENABLE   BIT(0)
+
+#define PCL_APP_PM00x8078
+#define PCL_SYS_AUX_PWR_DETBIT(8)
+
+#define PCL_RCV_INT0x8108
+#define PCL_CFG_BW_MGT_ENABLE  BIT(20)
+#define PCL_CFG_LINK_AUTO_BW_ENABLEBIT(19)
+#define PCL_CFG_AER_RC_ERR_MSI_ENABLE  BIT(18)
+#define PCL_CFG_PME_MSI_ENABLE BIT(17)
+#define PCL_CFG_BW_MGT_STATUS  BIT(4)
+#define PCL_CFG_LINK_AUTO_BW_STATUSBIT(3)
+#define PCL_CFG_AER_RC_ERR_MSI_STATUS  BIT(2)
+#define PCL_CFG_PME_MSI_STATUS BIT(1)
+#define PCL_RCV_INT_ALL_ENABLE \
+   (PCL_CFG_BW_MGT_ENABLE | PCL_CFG_LINK_AUTO_BW_ENABLE \
+| PCL_CFG_AER_RC_ERR_MSI_ENABLE | PCL_CFG_PME_MSI_ENABLE)
+
+#define PCL_RCV_INTX   0x810c
+#define PCL_RADM_INTD_ENABLE   BIT(19)
+#define PCL_RADM_INTC_ENABLE   BIT(18)
+#define PCL_RADM_INTB_ENABLE   BIT(17)
+#define PCL_RADM_INTA_ENABLE   BIT(16)
+#define PCL_RADM_INTD_STATUS   BIT(3)
+#define PCL_RADM_INTC_STATUS   BIT(2)
+#define PCL_RADM_INTB_STATUS   BIT(1)
+#define PCL_RADM_INTA_STATUS   BIT(0)
+#define PCL_RCV_INTX_ALL_ENABLE\
+   (PCL_RADM_INTD_ENABLE | PCL_RADM_INTC_ENABLE \
+| PCL_RADM_INTB_ENABLE | PCL_RADM_INTA_ENABLE)
+
+#define PCL_STATUS_LINK0x8140
+#define PCL_RDLH_LINK_UP   BIT(1)
+#define PCL_XMLH_LINK_UP   BIT(0)
+
+struct uniphier_pcie_priv {
+   void __iomem *base;
+   struct dw_pcie pci;
+   struct clk *clk;
+   struct reset_control *rst;
+   struct phy *phy;
+   struct irq_domain *irq_domain;
+};
+
+#define to_uniphier_pcie(x)dev_get_drvdata((x)->dev)
+
+static void uniphier_pcie_ltssm_enable(struct uniphier_pcie_priv *priv)
+{
+   u32 val;
+
+   val = readl(priv->base + PCL_APP_READY_CTRL);
+   val |= PCL_APP_LTSSM_ENABLE;
+   writel(val, priv->base + PCL_APP_READY_CTRL);
+}
+
+static void uniphier_pcie_ltssm_disable(struct uniphier_pcie_priv *priv)
+{
+   u32 val;
+
+   val = readl(priv->base + PCL_APP_READY_CTRL);
+   val &= 

[PATCH 1/2] dt-bindings: pci: add UniPhier PCIe host controller description

2018-09-04 Thread Kunihiko Hayashi
Add DT bindings for PCIe controller implemented in UniPhier SoCs when
configured in Root Complex (host) mode. This controller is based on
the Designware PCIe Core.

Signed-off-by: Kunihiko Hayashi 
---
 .../devicetree/bindings/pci/uniphier-pcie.txt  | 78 ++
 1 file changed, 78 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt

diff --git a/Documentation/devicetree/bindings/pci/uniphier-pcie.txt 
b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt
new file mode 100644
index 000..ea63f78
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt
@@ -0,0 +1,78 @@
+Socionext UniPhier PCI-express host controller bindings
+
+This describes the devicetree bindings for PCI-express host controller
+implemented on Socionext UniPhier SoCs.
+
+UniPhier PCI-express host controller is based on the Synopsys DesignWare
+PCI core. It shares common functions with the PCIe DesignWare core driver
+and inherits common properties defined in
+Documentation/devicetree/bindings/pci/designware-pcie.txt.
+
+Required properties:
+- compatible: Should be "socionext,uniphier-pcie".
+- reg: Specifies offset and length of the register set for the device.
+   According to the reg-names, appropriate register sets are required.
+- reg-names: Must include the following entries:
+"dbi"- controller configuration registers
+"link"   - SoC-specific glue layer registers
+"config" - PCIe configuration space
+- clocks: A phandle to the clock gate for PCIe glue layer including
+   the host controller.
+- resets: A phandle to the reset line for PCIe glue layer including
+   the host controller.
+- interrupts: A list of interrupt specifiers. According to the
+   interrupt-names, appropriate interrupts are required.
+- interrupt-names: Must include the following entries:
+"dma"- DMA interrupt
+"msi"- MSI interrupt
+"intx"   - Legacy INTA/B/C/D interrupt
+
+Optional properties:
+- phys: A phandle to generic PCIe PHY. According to the phy-names, appropriate
+   phys are required.
+- phy-names: Must be "pcie-phy".
+
+Required sub-node:
+- interrupt-controller: Specifies interrupt controller for legacy PCI
+   interrupts. The node name isn't important.
+
+Required properties for interrupt-controller:
+- interrupt-controller: identifies the node as an interrupt controller.
+- #interrupt-cells: specifies the number of cells needed to encode an
+   interrupt source. The value must be 1.
+
+Example:
+
+   pcie: pcie@6600 {
+   compatible = "socionext,uniphier-pcie", "snps,dw-pcie";
+   status = "disabled";
+   reg-names = "dbi", "link", "config";
+   reg = <0x6600 0x1000>, <0x6601 0x1>,
+ <0x2fff 0x1>;
+   #address-cells = <3>;
+   #size-cells = <2>;
+   clocks = <_clk 24>;
+   resets = <_rst 24>;
+   num-lanes = <1>;
+   num-viewport = <1>;
+   bus-range = <0x0 0xff>;
+   device_type = "pci";
+   ranges =
+   /* downstream I/O */
+   <0x8100 0 0x  0x2ffe  0 0x0001
+   /* non-prefetchable memory */
+0x8200 0 0x  0x2000  0 0x0ffe>;
+   #interrupt-cells = <1>;
+   interrupt-names = "dma", "msi", "intx";
+   interrupts = <0 224 4>, <0 225 4>, <0 226 4>;
+   interrupt-map-mask = <0 0 0  7>;
+   interrupt-map = <0 0 0  1  _intc 0>,   /* INTA */
+   <0 0 0  2  _intc 1>,   /* INTB */
+   <0 0 0  3  _intc 2>,   /* INTC */
+   <0 0 0  4  _intc 3>;   /* INTD */
+
+   pcie_intc: interrupt-controller {
+   interrupt-controller;
+   #interrupt-cells = <1>;
+   };
+   };
-- 
2.7.4



[PATCH] sysctl: kselftests: use kernel module instead of built-in

2018-09-04 Thread Lei Yang
It uses modprobe $TEST_DRIVER in sysctl.sh, so update
config to use "m" instead

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/sysctl/config | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/sysctl/config 
b/tools/testing/selftests/sysctl/config
index 6ca1480..fc263ef 100644
--- a/tools/testing/selftests/sysctl/config
+++ b/tools/testing/selftests/sysctl/config
@@ -1 +1 @@
-CONFIG_TEST_SYSCTL=y
+CONFIG_TEST_SYSCTL=m
-- 
1.9.1



[PATCH] sysctl: kselftests: use kernel module instead of built-in

2018-09-04 Thread Lei Yang
It uses modprobe $TEST_DRIVER in sysctl.sh, so update
config to use "m" instead

Signed-off-by: Lei Yang 
---
 tools/testing/selftests/sysctl/config | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/sysctl/config 
b/tools/testing/selftests/sysctl/config
index 6ca1480..fc263ef 100644
--- a/tools/testing/selftests/sysctl/config
+++ b/tools/testing/selftests/sysctl/config
@@ -1 +1 @@
-CONFIG_TEST_SYSCTL=y
+CONFIG_TEST_SYSCTL=m
-- 
1.9.1



Re: [PATCH 0/3] of: root #{size,address}-cells clean-ups

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> This is a small restructuring and clean-up of handling root node
> #size-cells and #address-cells (or lack of). As only Sparc needs a 
> different default value and only for #address-cells, we can handle that 
> locally and remove one more dependency on asm/prom.h.
> 
> Rob
> 
> Rob Herring (3):
>   of/fdt: Scan the root node properties earlier
>   of/fdt: avoid re-parsing '#{address,size}-cells' in
> of_fdt_limit_memory
>   of: make default address and size cells sizes private
> 
>  arch/sparc/include/asm/prom.h |  3 ---
>  drivers/of/fdt.c  | 30 +-
>  drivers/of/of_private.h   |  8 
>  include/linux/of.h|  6 --
>  4 files changed, 13 insertions(+), 34 deletions(-)
> 

Nice cleanup!

-Frank


Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

2018-09-04 Thread Francisco Jerez
Srinivas Pandruvada  writes:

> On Mon, 2018-09-03 at 23:53 -0700, Francisco Jerez wrote:
>> Eero Tamminen  writes:
>> 
>> > Hi,
>> > 
>> > On 31.08.2018 20:28, Srinivas Pandruvada wrote:
>> > ...
>> > > As per testing Eero Tamminen, the results are comparable to the
>> > > patchset
>> > > https://patchwork.kernel.org/patch/10312259/
>> > > But he has to watch results for several days to check trend.
>> > 
>> > It's close, but there is some gap compared to Francisco's version.
>> > 
>> > (Because of the large variance on TDP limited devices, and factors 
>> > causing extra perf differences e.g. between boots, it's hard to
>> > give 
>> > exact number without having trends from several days / weeks.  I
>> > would 
>> > also need new version of Fransisco's patch set that applies to
>> > latest 
>> > kernel like yours does.)
>> > 
>> > 
>> > > Since here boost is getting limited to turbo and non turbo, we
>> > > need some
>> > > ways to adjust the fractions corresponding to max non turbo as
>> > > well. It
>> > > is much easier to use the actual P-state limits for boost instead
>> > > of
>> > > fractions. So here P-state io boost limit is applied on top of
>> > > the
>> > > P-state limit calculated via current algorithm by removing
>> > > current
>> > > io_wait boost calculation using fractions.
>> > > 
>> > > Since we prefer to use common algorithm for all processor
>> > > platforms, this
>> > > change was tested on other client and sever platforms as well.
>> > > All results
>> > > were within the margin of errors. Results:
>> > > https://bugzilla.kernel.org/attachment.cgi?id=278149
>> > 
>> > Good.
>> > 
>> > Francisco, how well the listed PTS tests cover latency bound cases
>> > you 
>> > were concerned about?  [1]
>> > 
>> > 
>> >- Eero
>> > 
>> > [1] Fransisco was concerned that the patch:
>> > * trade-off might regress latency bound cases (which I haven't
>> > tested, I 
>> > tested only 3D throughput), and
>> > * that it addressed only other of the sources of energy
>> > inefficiencies 
>> > he had identified (which could explain slightly better 3D results
>> > from 
>> > his more complex patch set).
>> 
>> This patch causes a number of statistically significant regressions
>> (with significance of 1%) on the two systems I've tested it on.  On
>> my
> Sure. These patches are targeted to Atom clients where some of these
> server like workload may have some minor regression on few watts TDP
> parts.

Neither the 36% regression of fs-mark, the 21% regression of sqlite, nor
the 10% regression of warsaw qualify as small.  And most of the test
cases on the list of regressions aren't exclusively server-like, if at
all.  Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
benchmarks -- Latency is as important if not more for interactive
workloads than it is for server workloads.  In the case of a conflict
like the one we're dealing with right now between optimizing for
throughput (e.g. for the maximum number of requests per second) and
optimizing for latency (e.g. for the minimum request duration), you are
more likely to be concerned about the former than about the latter in a
server setup.

> But weighing against reduced TURBO usage (which is enough trigger) and
> improvement in tests done by Eero (which was primary complaint to us).
>
> It is trivial patch. Yes, the patch is not perfect and doesn't close
> doors for any improvements.
>

It's sort of self-contained because it's awfully incomplete.

> I see your idea, but how to implement in acceptable way is a challenge.

Main challenge was getting the code to work without regressions in
latency-sensitive workloads, which I did, because you told me that it
wasn't acceptable for it to cause any regressions on the Phoronix
daily-system-tracker, whether latency-bound or not.  What is left in
order to address Peter's concerns is for the most part plumbing, that's
guaranteed not to have any functional impact on the heuristic.  The fact
that we don't expect it to change the performance of the system makes it
substantially less time-consuming to validate than altering the
performance trade-offs of the heuristic dynamically in order to avoid
regressions (which is what has kept my systems busy most of the time
lately).  Seems like my series, even in its current state without the
changes requested by Peter is closer to being ready for production than
this patch is.

> So ultimately we will get there, but will require some time compared
> with other high priority items.
>
>
> Thanks
> Srinivas
>
>> CHV N3050:
>> 
>> > phoronix/fs-
>> > mark/test=0:   
>> > XXX ±7.25% x34 -> XXX ±7.00% x39  d=-36.85% ±5.91%  p=0.00%
>> > phoronix/sqlite:   
>> > XXX ±1.86% x34 -> XXX ±1.88% x39  d=-21.73%
>> > ±1.66%  p=0.00%
>> > warsow/benchsow:   
>> > XXX ±1.25% x34 -> XXX ±1.95% x39  d=-10.83%
>> > 

Re: [PATCH 0/3] of: root #{size,address}-cells clean-ups

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> This is a small restructuring and clean-up of handling root node
> #size-cells and #address-cells (or lack of). As only Sparc needs a 
> different default value and only for #address-cells, we can handle that 
> locally and remove one more dependency on asm/prom.h.
> 
> Rob
> 
> Rob Herring (3):
>   of/fdt: Scan the root node properties earlier
>   of/fdt: avoid re-parsing '#{address,size}-cells' in
> of_fdt_limit_memory
>   of: make default address and size cells sizes private
> 
>  arch/sparc/include/asm/prom.h |  3 ---
>  drivers/of/fdt.c  | 30 +-
>  drivers/of/of_private.h   |  8 
>  include/linux/of.h|  6 --
>  4 files changed, 13 insertions(+), 34 deletions(-)
> 

Nice cleanup!

-Frank


Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

2018-09-04 Thread Francisco Jerez
Srinivas Pandruvada  writes:

> On Mon, 2018-09-03 at 23:53 -0700, Francisco Jerez wrote:
>> Eero Tamminen  writes:
>> 
>> > Hi,
>> > 
>> > On 31.08.2018 20:28, Srinivas Pandruvada wrote:
>> > ...
>> > > As per testing Eero Tamminen, the results are comparable to the
>> > > patchset
>> > > https://patchwork.kernel.org/patch/10312259/
>> > > But he has to watch results for several days to check trend.
>> > 
>> > It's close, but there is some gap compared to Francisco's version.
>> > 
>> > (Because of the large variance on TDP limited devices, and factors 
>> > causing extra perf differences e.g. between boots, it's hard to
>> > give 
>> > exact number without having trends from several days / weeks.  I
>> > would 
>> > also need new version of Fransisco's patch set that applies to
>> > latest 
>> > kernel like yours does.)
>> > 
>> > 
>> > > Since here boost is getting limited to turbo and non turbo, we
>> > > need some
>> > > ways to adjust the fractions corresponding to max non turbo as
>> > > well. It
>> > > is much easier to use the actual P-state limits for boost instead
>> > > of
>> > > fractions. So here P-state io boost limit is applied on top of
>> > > the
>> > > P-state limit calculated via current algorithm by removing
>> > > current
>> > > io_wait boost calculation using fractions.
>> > > 
>> > > Since we prefer to use common algorithm for all processor
>> > > platforms, this
>> > > change was tested on other client and sever platforms as well.
>> > > All results
>> > > were within the margin of errors. Results:
>> > > https://bugzilla.kernel.org/attachment.cgi?id=278149
>> > 
>> > Good.
>> > 
>> > Francisco, how well the listed PTS tests cover latency bound cases
>> > you 
>> > were concerned about?  [1]
>> > 
>> > 
>> >- Eero
>> > 
>> > [1] Fransisco was concerned that the patch:
>> > * trade-off might regress latency bound cases (which I haven't
>> > tested, I 
>> > tested only 3D throughput), and
>> > * that it addressed only other of the sources of energy
>> > inefficiencies 
>> > he had identified (which could explain slightly better 3D results
>> > from 
>> > his more complex patch set).
>> 
>> This patch causes a number of statistically significant regressions
>> (with significance of 1%) on the two systems I've tested it on.  On
>> my
> Sure. These patches are targeted to Atom clients where some of these
> server like workload may have some minor regression on few watts TDP
> parts.

Neither the 36% regression of fs-mark, the 21% regression of sqlite, nor
the 10% regression of warsaw qualify as small.  And most of the test
cases on the list of regressions aren't exclusively server-like, if at
all.  Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
benchmarks -- Latency is as important if not more for interactive
workloads than it is for server workloads.  In the case of a conflict
like the one we're dealing with right now between optimizing for
throughput (e.g. for the maximum number of requests per second) and
optimizing for latency (e.g. for the minimum request duration), you are
more likely to be concerned about the former than about the latter in a
server setup.

> But weighing against reduced TURBO usage (which is enough trigger) and
> improvement in tests done by Eero (which was primary complaint to us).
>
> It is trivial patch. Yes, the patch is not perfect and doesn't close
> doors for any improvements.
>

It's sort of self-contained because it's awfully incomplete.

> I see your idea, but how to implement in acceptable way is a challenge.

Main challenge was getting the code to work without regressions in
latency-sensitive workloads, which I did, because you told me that it
wasn't acceptable for it to cause any regressions on the Phoronix
daily-system-tracker, whether latency-bound or not.  What is left in
order to address Peter's concerns is for the most part plumbing, that's
guaranteed not to have any functional impact on the heuristic.  The fact
that we don't expect it to change the performance of the system makes it
substantially less time-consuming to validate than altering the
performance trade-offs of the heuristic dynamically in order to avoid
regressions (which is what has kept my systems busy most of the time
lately).  Seems like my series, even in its current state without the
changes requested by Peter is closer to being ready for production than
this patch is.

> So ultimately we will get there, but will require some time compared
> with other high priority items.
>
>
> Thanks
> Srinivas
>
>> CHV N3050:
>> 
>> > phoronix/fs-
>> > mark/test=0:   
>> > XXX ±7.25% x34 -> XXX ±7.00% x39  d=-36.85% ±5.91%  p=0.00%
>> > phoronix/sqlite:   
>> > XXX ±1.86% x34 -> XXX ±1.88% x39  d=-21.73%
>> > ±1.66%  p=0.00%
>> > warsow/benchsow:   
>> > XXX ±1.25% x34 -> XXX ±1.95% x39  d=-10.83%
>> > 

Re: [PATCH 3/3] of: make default address and size cells sizes private

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Only some old OpenFirmware implementations rely on default sizes. Any
> FDT and modern implementation should have explicit properties. Make the
> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside
> users.
> 
> This also gets us one step closer to removing the asm/prom.h dependency on
> Sparc.
> 
> Cc: "David S. Miller" 
> Cc: Frank Rowand 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  arch/sparc/include/asm/prom.h | 3 ---
>  drivers/of/of_private.h   | 8 
>  include/linux/of.h| 6 --
>  3 files changed, 8 insertions(+), 9 deletions(-)

Reviewed-by: Frank Rowand 


> diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h
> index d955c8df62d6..1902db27ff4b 100644
> --- a/arch/sparc/include/asm/prom.h
> +++ b/arch/sparc/include/asm/prom.h
> @@ -24,9 +24,6 @@
>  #include 
>  #include 
>  
> -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT  2
> -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT  1
> -
>  #define of_compat_cmp(s1, s2, l) strncmp((s1), (s2), (l))
>  #define of_prop_cmp(s1, s2)  strcasecmp((s1), (s2))
>  #define of_node_cmp(s1, s2)  strcmp((s1), (s2))
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index 216175d11d3d..5d1567025358 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -27,6 +27,14 @@ struct alias_prop {
>   char stem[0];
>  };
>  
> +#if defined(CONFIG_SPARC)
> +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2
> +#else
> +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1
> +#endif
> +
> +#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1
> +
>  extern struct mutex of_mutex;
>  extern struct list_head aliases_lookup;
>  extern struct kset *of_kset;
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 506beca9588d..49d85f670c75 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -247,12 +247,6 @@ static inline unsigned long of_read_ulong(const __be32 
> *cell, int size)
>  #include 
>  #endif
>  
> -/* Default #address and #size cells.  Allow arch asm/prom.h to override */
> -#if !defined(OF_ROOT_NODE_ADDR_CELLS_DEFAULT)
> -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1
> -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1
> -#endif
> -
>  #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags)
>  #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags)
>  
> 



Re: [PATCH 3/3] of: make default address and size cells sizes private

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Only some old OpenFirmware implementations rely on default sizes. Any
> FDT and modern implementation should have explicit properties. Make the
> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside
> users.
> 
> This also gets us one step closer to removing the asm/prom.h dependency on
> Sparc.
> 
> Cc: "David S. Miller" 
> Cc: Frank Rowand 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  arch/sparc/include/asm/prom.h | 3 ---
>  drivers/of/of_private.h   | 8 
>  include/linux/of.h| 6 --
>  3 files changed, 8 insertions(+), 9 deletions(-)

Reviewed-by: Frank Rowand 


> diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h
> index d955c8df62d6..1902db27ff4b 100644
> --- a/arch/sparc/include/asm/prom.h
> +++ b/arch/sparc/include/asm/prom.h
> @@ -24,9 +24,6 @@
>  #include 
>  #include 
>  
> -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT  2
> -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT  1
> -
>  #define of_compat_cmp(s1, s2, l) strncmp((s1), (s2), (l))
>  #define of_prop_cmp(s1, s2)  strcasecmp((s1), (s2))
>  #define of_node_cmp(s1, s2)  strcmp((s1), (s2))
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index 216175d11d3d..5d1567025358 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -27,6 +27,14 @@ struct alias_prop {
>   char stem[0];
>  };
>  
> +#if defined(CONFIG_SPARC)
> +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2
> +#else
> +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1
> +#endif
> +
> +#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1
> +
>  extern struct mutex of_mutex;
>  extern struct list_head aliases_lookup;
>  extern struct kset *of_kset;
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 506beca9588d..49d85f670c75 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -247,12 +247,6 @@ static inline unsigned long of_read_ulong(const __be32 
> *cell, int size)
>  #include 
>  #endif
>  
> -/* Default #address and #size cells.  Allow arch asm/prom.h to override */
> -#if !defined(OF_ROOT_NODE_ADDR_CELLS_DEFAULT)
> -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1
> -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1
> -#endif
> -
>  #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags)
>  #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags)
>  
> 



Re: linux-next: build warnings after merge of the devicetree tree

2018-09-04 Thread Stephen Rothwell
Hi all,

On Wed, 5 Sep 2018 10:39:06 +1000 Stephen Rothwell  
wrote:
>
> After merging the devicetree tree, today's linux-next build (x86_64
> allmodconfig) produced these warnings:
> 
> WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the 
> function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells
> The function of_fdt_limit_memory() references
> the variable __initdata dt_root_addr_cells.
> This is often because of_fdt_limit_memory lacks a __initdata 
> annotation or the annotation of dt_root_addr_cells is wrong.
> 
> WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the 
> function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells
> The function of_fdt_limit_memory() references
> the variable __initdata dt_root_size_cells.
> This is often because of_fdt_limit_memory lacks a __initdata 
> annotation or the annotation of dt_root_size_cells is wrong.
> 
> Introduced by commit
> 
>   bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in 
> of_fdt_limit_memory")

It turns out that section mismatches are fatal errors in some configs (like 
powerpc allnoconfig), so I have added the following patch for today:

From: Stephen Rothwell 
Date: Wed, 5 Sep 2018 11:50:29 +1000
Subject: [PATCH] mark of_fdt_limit_memory() as __init

Signed-off-by: Stephen Rothwell 
---
 drivers/of/fdt.c   | 2 +-
 include/linux/of_fdt.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index fef4b2c8a171..fe78bed8925f 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -39,7 +39,7 @@
  * memory entries in the /memory node. This function may be called
  * any time after initial_boot_param is set.
  */
-void of_fdt_limit_memory(int limit)
+void __init of_fdt_limit_memory(int limit)
 {
int memory;
int len;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index b9cd9ebdf9b9..19ebf22a7862 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -46,7 +46,7 @@ extern char __dtb_end[];
 
 /* Other Prototypes */
 extern u64 of_flat_dt_translate_address(unsigned long node);
-extern void of_fdt_limit_memory(int limit);
+extern void __init of_fdt_limit_memory(int limit);
 #endif /* CONFIG_OF_FLATTREE */
 
 #ifdef CONFIG_OF_EARLY_FLATTREE
-- 
2.18.0

-- 
Cheers,
Stephen Rothwell


pgpNjpkMIxaNL.pgp
Description: OpenPGP digital signature


Re: linux-next: build warnings after merge of the devicetree tree

2018-09-04 Thread Stephen Rothwell
Hi all,

On Wed, 5 Sep 2018 10:39:06 +1000 Stephen Rothwell  
wrote:
>
> After merging the devicetree tree, today's linux-next build (x86_64
> allmodconfig) produced these warnings:
> 
> WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the 
> function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells
> The function of_fdt_limit_memory() references
> the variable __initdata dt_root_addr_cells.
> This is often because of_fdt_limit_memory lacks a __initdata 
> annotation or the annotation of dt_root_addr_cells is wrong.
> 
> WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the 
> function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells
> The function of_fdt_limit_memory() references
> the variable __initdata dt_root_size_cells.
> This is often because of_fdt_limit_memory lacks a __initdata 
> annotation or the annotation of dt_root_size_cells is wrong.
> 
> Introduced by commit
> 
>   bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in 
> of_fdt_limit_memory")

It turns out that section mismatches are fatal errors in some configs (like 
powerpc allnoconfig), so I have added the following patch for today:

From: Stephen Rothwell 
Date: Wed, 5 Sep 2018 11:50:29 +1000
Subject: [PATCH] mark of_fdt_limit_memory() as __init

Signed-off-by: Stephen Rothwell 
---
 drivers/of/fdt.c   | 2 +-
 include/linux/of_fdt.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index fef4b2c8a171..fe78bed8925f 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -39,7 +39,7 @@
  * memory entries in the /memory node. This function may be called
  * any time after initial_boot_param is set.
  */
-void of_fdt_limit_memory(int limit)
+void __init of_fdt_limit_memory(int limit)
 {
int memory;
int len;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index b9cd9ebdf9b9..19ebf22a7862 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -46,7 +46,7 @@ extern char __dtb_end[];
 
 /* Other Prototypes */
 extern u64 of_flat_dt_translate_address(unsigned long node);
-extern void of_fdt_limit_memory(int limit);
+extern void __init of_fdt_limit_memory(int limit);
 #endif /* CONFIG_OF_FLATTREE */
 
 #ifdef CONFIG_OF_EARLY_FLATTREE
-- 
2.18.0

-- 
Cheers,
Stephen Rothwell


pgpNjpkMIxaNL.pgp
Description: OpenPGP digital signature


Re: [PATCH 2/3] of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Now that we initialize dt_root_addr_cells and dt_root_size_cells earlier,
> use them and simplify of_fdt_limit_memory.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/fdt.c | 23 +--
>  1 file changed, 1 insertion(+), 22 deletions(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 49abe18f1bde..fef4b2c8a171 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c

As kbuild test robot noted, of_fdt_limit_memory() will access
__initdata dt_root_addr_cells and __initdata dt_root_size_cells.
A possible (untested) fix would be:

- void of_fdt_limit_memory(int limit)
- {

+ void __init of_fdt_limit_memory(int limit)
+ {


> @@ -44,28 +44,7 @@ void of_fdt_limit_memory(int limit)
>   int memory;
>   int len;
>   const void *val;
> - int nr_address_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT;
> - int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT;
> - const __be32 *addr_prop;
> - const __be32 *size_prop;
> - int root_offset;
> - int cell_size;
> -
> - root_offset = fdt_path_offset(initial_boot_params, "/");
> - if (root_offset < 0)
> - return;
> -
> - addr_prop = fdt_getprop(initial_boot_params, root_offset,
> - "#address-cells", NULL);
> - if (addr_prop)
> - nr_address_cells = fdt32_to_cpu(*addr_prop);
> -
> - size_prop = fdt_getprop(initial_boot_params, root_offset,
> - "#size-cells", NULL);
> - if (size_prop)
> - nr_size_cells = fdt32_to_cpu(*size_prop);
> -
> - cell_size = sizeof(uint32_t)*(nr_address_cells + nr_size_cells);
> + int cell_size = sizeof(uint32_t)*(dt_root_addr_cells + 
> dt_root_size_cells);
>  
>   memory = fdt_path_offset(initial_boot_params, "/memory");
>   if (memory > 0) {
> 



Re: [PATCH 2/3] of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Now that we initialize dt_root_addr_cells and dt_root_size_cells earlier,
> use them and simplify of_fdt_limit_memory.
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/fdt.c | 23 +--
>  1 file changed, 1 insertion(+), 22 deletions(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 49abe18f1bde..fef4b2c8a171 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c

As kbuild test robot noted, of_fdt_limit_memory() will access
__initdata dt_root_addr_cells and __initdata dt_root_size_cells.
A possible (untested) fix would be:

- void of_fdt_limit_memory(int limit)
- {

+ void __init of_fdt_limit_memory(int limit)
+ {


> @@ -44,28 +44,7 @@ void of_fdt_limit_memory(int limit)
>   int memory;
>   int len;
>   const void *val;
> - int nr_address_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT;
> - int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT;
> - const __be32 *addr_prop;
> - const __be32 *size_prop;
> - int root_offset;
> - int cell_size;
> -
> - root_offset = fdt_path_offset(initial_boot_params, "/");
> - if (root_offset < 0)
> - return;
> -
> - addr_prop = fdt_getprop(initial_boot_params, root_offset,
> - "#address-cells", NULL);
> - if (addr_prop)
> - nr_address_cells = fdt32_to_cpu(*addr_prop);
> -
> - size_prop = fdt_getprop(initial_boot_params, root_offset,
> - "#size-cells", NULL);
> - if (size_prop)
> - nr_size_cells = fdt32_to_cpu(*size_prop);
> -
> - cell_size = sizeof(uint32_t)*(nr_address_cells + nr_size_cells);
> + int cell_size = sizeof(uint32_t)*(dt_root_addr_cells + 
> dt_root_size_cells);
>  
>   memory = fdt_path_offset(initial_boot_params, "/memory");
>   if (memory > 0) {
> 



Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Scan the root node properties (#{size,address}-cells) earlier, 

^^^
 before mdesc->dt_fixup() is called

> so that
> the dt_root_addr_cells and dt_root_size_cells variables are initialized
> and can be used.
 by mdesc->dt_fixup()
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/fdt.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
Moving early_init_dt_scan_root() to inside early_init_dt_verify()
puts something that has nothing to do with verifying the fdt
into a function whose purpose is the verify.  It hides the side
effect of initializing the dt_root_addr_cells and dt_root_size_cells
variables.

I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(),
move the chunk of code there instead of to early_init_dt_scan_nodes(),
and call the new function from setup_machine_fdt(), just before
calling  mdesc->dt_fixup().  This would be a little bit more code,
but more clearly showing the intent.

-Frank

> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 800ad252cf9c..49abe18f1bde 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1215,6 +1215,10 @@ bool __init early_init_dt_verify(void *params)
>   initial_boot_params = params;
>   of_fdt_crc32 = crc32_be(~0, initial_boot_params,
>   fdt_totalsize(initial_boot_params));
> +
> + /* Initialize {size,address}-cells info */
> + of_scan_flat_dt(early_init_dt_scan_root, NULL);
> +
>   return true;
>  }
>  
> @@ -1224,9 +1228,6 @@ void __init early_init_dt_scan_nodes(void)
>   /* Retrieve various information from the /chosen node */
>   of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line);
>  
> - /* Initialize {size,address}-cells info */
> - of_scan_flat_dt(early_init_dt_scan_root, NULL);
> -
>   /* Setup memory, calling early_init_dt_add_memory_arch */
>   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>  }
> 



Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier

2018-09-04 Thread Frank Rowand
On 08/30/18 12:05, Rob Herring wrote:
> Scan the root node properties (#{size,address}-cells) earlier, 

^^^
 before mdesc->dt_fixup() is called

> so that
> the dt_root_addr_cells and dt_root_size_cells variables are initialized
> and can be used.
 by mdesc->dt_fixup()
> 
> Cc: Frank Rowand 
> Signed-off-by: Rob Herring 
> ---
>  drivers/of/fdt.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
Moving early_init_dt_scan_root() to inside early_init_dt_verify()
puts something that has nothing to do with verifying the fdt
into a function whose purpose is the verify.  It hides the side
effect of initializing the dt_root_addr_cells and dt_root_size_cells
variables.

I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(),
move the chunk of code there instead of to early_init_dt_scan_nodes(),
and call the new function from setup_machine_fdt(), just before
calling  mdesc->dt_fixup().  This would be a little bit more code,
but more clearly showing the intent.

-Frank

> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 800ad252cf9c..49abe18f1bde 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1215,6 +1215,10 @@ bool __init early_init_dt_verify(void *params)
>   initial_boot_params = params;
>   of_fdt_crc32 = crc32_be(~0, initial_boot_params,
>   fdt_totalsize(initial_boot_params));
> +
> + /* Initialize {size,address}-cells info */
> + of_scan_flat_dt(early_init_dt_scan_root, NULL);
> +
>   return true;
>  }
>  
> @@ -1224,9 +1228,6 @@ void __init early_init_dt_scan_nodes(void)
>   /* Retrieve various information from the /chosen node */
>   of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line);
>  
> - /* Initialize {size,address}-cells info */
> - of_scan_flat_dt(early_init_dt_scan_root, NULL);
> -
>   /* Setup memory, calling early_init_dt_add_memory_arch */
>   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>  }
> 



Re: [PATCH] lib/parser.c: switch match_number() over to use match_strdup()

2018-09-04 Thread Eric Biggers
On Thu, Aug 30, 2018 at 12:47:27PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> This simplifies the code.  No change in behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  lib/parser.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/lib/parser.c b/lib/parser.c
> index 0142ef28f0eb..96656a6dd59b 100644
> --- a/lib/parser.c
> +++ b/lib/parser.c
> @@ -131,13 +131,10 @@ static int match_number(substring_t *s, int *result, 
> int base)
>   char *buf;
>   int ret;
>   long val;
> - size_t len = s->to - s->from;
>  
> - buf = kmalloc(len + 1, GFP_KERNEL);
> + buf = match_strdup(s);
>   if (!buf)
>   return -ENOMEM;
> - memcpy(buf, s->from, len);
> - buf[len] = '\0';
>  
>   ret = 0;
>   val = simple_strtol(buf, , base);
> -- 

Hi Andrew, are you planning to take this patch too?  The similar patch

  lib/parser.c: switch match_u64int() over to use match_strdup()

is in -mm now, but this one is not.

Thanks,

- Eric


Re: [PATCH] lib/parser.c: switch match_number() over to use match_strdup()

2018-09-04 Thread Eric Biggers
On Thu, Aug 30, 2018 at 12:47:27PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> This simplifies the code.  No change in behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  lib/parser.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/lib/parser.c b/lib/parser.c
> index 0142ef28f0eb..96656a6dd59b 100644
> --- a/lib/parser.c
> +++ b/lib/parser.c
> @@ -131,13 +131,10 @@ static int match_number(substring_t *s, int *result, 
> int base)
>   char *buf;
>   int ret;
>   long val;
> - size_t len = s->to - s->from;
>  
> - buf = kmalloc(len + 1, GFP_KERNEL);
> + buf = match_strdup(s);
>   if (!buf)
>   return -ENOMEM;
> - memcpy(buf, s->from, len);
> - buf[len] = '\0';
>  
>   ret = 0;
>   val = simple_strtol(buf, , base);
> -- 

Hi Andrew, are you planning to take this patch too?  The similar patch

  lib/parser.c: switch match_u64int() over to use match_strdup()

is in -mm now, but this one is not.

Thanks,

- Eric


[PATCH] x86/intel_rdt: Show missing resctrl mount options

2018-09-04 Thread Fenghua Yu
In resctrl filesystem, we have some mount options to enable
L3/L2 CDP and MBA Software Controller features if platform supports them:
 mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl

But currently only "cdp" option can be displayed. "cdpl2" and "mba_MBps"
options are not showed even when they are active mount options.

Before this patch:
 #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 #grep resctrl /proc/mounts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0
After this patch:
 #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 #grep resctrl /proc/mountts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0

Signed-off-by: Xiaochen Shen 
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b799c00bef09..82459a41fa84 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2760,6 +2760,13 @@ static int rdtgroup_show_options(struct seq_file *seq, 
struct kernfs_root *kf)
 {
if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled)
seq_puts(seq, ",cdp");
+
+   if (rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled)
+   seq_puts(seq, ",cdpl2");
+
+   if (is_mba_sc(_resources_all[RDT_RESOURCE_MBA]))
+   seq_puts(seq, ",mba_MBps");
+
return 0;
 }
 
-- 
2.18.0



[PATCH] x86/intel_rdt: Show missing resctrl mount options

2018-09-04 Thread Fenghua Yu
In resctrl filesystem, we have some mount options to enable
L3/L2 CDP and MBA Software Controller features if platform supports them:
 mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl

But currently only "cdp" option can be displayed. "cdpl2" and "mba_MBps"
options are not showed even when they are active mount options.

Before this patch:
 #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 #grep resctrl /proc/mounts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0
After this patch:
 #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 #grep resctrl /proc/mountts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0

Signed-off-by: Xiaochen Shen 
Signed-off-by: Fenghua Yu 
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b799c00bef09..82459a41fa84 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2760,6 +2760,13 @@ static int rdtgroup_show_options(struct seq_file *seq, 
struct kernfs_root *kf)
 {
if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled)
seq_puts(seq, ",cdp");
+
+   if (rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled)
+   seq_puts(seq, ",cdpl2");
+
+   if (is_mba_sc(_resources_all[RDT_RESOURCE_MBA]))
+   seq_puts(seq, ",mba_MBps");
+
return 0;
 }
 
-- 
2.18.0



[PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend

2018-09-04 Thread Anson Huang
On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power
state retention), UART iomux settings will be lost, need to add
pinctrl sleep/default mode switch during suspend/resume to make
sure UART iomux settings are correct after resume.

Signed-off-by: Anson Huang 
---
 drivers/tty/serial/imx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index c15332d..6ccb5c4 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2449,6 +2450,8 @@ static int imx_uart_suspend_noirq(struct device *dev)
 
clk_disable(sport->clk_ipg);
 
+   pinctrl_pm_select_sleep_state(dev);
+
return 0;
 }
 
@@ -2457,6 +2460,8 @@ static int imx_uart_resume_noirq(struct device *dev)
struct imx_port *sport = dev_get_drvdata(dev);
int ret;
 
+   pinctrl_pm_select_default_state(dev);
+
ret = clk_enable(sport->clk_ipg);
if (ret)
return ret;
-- 
2.7.4



[PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend

2018-09-04 Thread Anson Huang
On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power
state retention), UART iomux settings will be lost, need to add
pinctrl sleep/default mode switch during suspend/resume to make
sure UART iomux settings are correct after resume.

Signed-off-by: Anson Huang 
---
 drivers/tty/serial/imx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index c15332d..6ccb5c4 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2449,6 +2450,8 @@ static int imx_uart_suspend_noirq(struct device *dev)
 
clk_disable(sport->clk_ipg);
 
+   pinctrl_pm_select_sleep_state(dev);
+
return 0;
 }
 
@@ -2457,6 +2460,8 @@ static int imx_uart_resume_noirq(struct device *dev)
struct imx_port *sport = dev_get_drvdata(dev);
int ret;
 
+   pinctrl_pm_select_default_state(dev);
+
ret = clk_enable(sport->clk_ipg);
if (ret)
return ret;
-- 
2.7.4



[PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Anson Huang
In noirq suspend/resume stage with no_console_suspend enabled,
imx_uart_console_write() may be called to print out log_buf
message by printk(), so there will be race condition between
imx_uart_console_write() and imx_uart_save/restore_context(),
need to add lock to protect the registers save/restore operations.

Signed-off-by: Anson Huang 
---
 drivers/tty/serial/imx.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 239c0fa..c15332d 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -2376,8 +2376,13 @@ static int imx_uart_remove(struct platform_device *pdev)
 
 static void imx_uart_restore_context(struct imx_port *sport)
 {
-   if (!sport->context_saved)
+   unsigned long flags;
+
+   spin_lock_irqsave(>port.lock, flags);
+   if (!sport->context_saved) {
+   spin_unlock_irqrestore(>port.lock, flags);
return;
+   }
 
imx_uart_writel(sport, sport->saved_reg[4], UFCR);
imx_uart_writel(sport, sport->saved_reg[5], UESC);
@@ -2390,11 +2395,15 @@ static void imx_uart_restore_context(struct imx_port 
*sport)
imx_uart_writel(sport, sport->saved_reg[2], UCR3);
imx_uart_writel(sport, sport->saved_reg[3], UCR4);
sport->context_saved = false;
+   spin_unlock_irqrestore(>port.lock, flags);
 }
 
 static void imx_uart_save_context(struct imx_port *sport)
 {
+   unsigned long flags;
+
/* Save necessary regs */
+   spin_lock_irqsave(>port.lock, flags);
sport->saved_reg[0] = imx_uart_readl(sport, UCR1);
sport->saved_reg[1] = imx_uart_readl(sport, UCR2);
sport->saved_reg[2] = imx_uart_readl(sport, UCR3);
@@ -2406,6 +2415,7 @@ static void imx_uart_save_context(struct imx_port *sport)
sport->saved_reg[8] = imx_uart_readl(sport, UBMR);
sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS);
sport->context_saved = true;
+   spin_unlock_irqrestore(>port.lock, flags);
 }
 
 static void imx_uart_enable_wakeup(struct imx_port *sport, bool on)
-- 
2.7.4



[PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Anson Huang
In noirq suspend/resume stage with no_console_suspend enabled,
imx_uart_console_write() may be called to print out log_buf
message by printk(), so there will be race condition between
imx_uart_console_write() and imx_uart_save/restore_context(),
need to add lock to protect the registers save/restore operations.

Signed-off-by: Anson Huang 
---
 drivers/tty/serial/imx.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 239c0fa..c15332d 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -2376,8 +2376,13 @@ static int imx_uart_remove(struct platform_device *pdev)
 
 static void imx_uart_restore_context(struct imx_port *sport)
 {
-   if (!sport->context_saved)
+   unsigned long flags;
+
+   spin_lock_irqsave(>port.lock, flags);
+   if (!sport->context_saved) {
+   spin_unlock_irqrestore(>port.lock, flags);
return;
+   }
 
imx_uart_writel(sport, sport->saved_reg[4], UFCR);
imx_uart_writel(sport, sport->saved_reg[5], UESC);
@@ -2390,11 +2395,15 @@ static void imx_uart_restore_context(struct imx_port 
*sport)
imx_uart_writel(sport, sport->saved_reg[2], UCR3);
imx_uart_writel(sport, sport->saved_reg[3], UCR4);
sport->context_saved = false;
+   spin_unlock_irqrestore(>port.lock, flags);
 }
 
 static void imx_uart_save_context(struct imx_port *sport)
 {
+   unsigned long flags;
+
/* Save necessary regs */
+   spin_lock_irqsave(>port.lock, flags);
sport->saved_reg[0] = imx_uart_readl(sport, UCR1);
sport->saved_reg[1] = imx_uart_readl(sport, UCR2);
sport->saved_reg[2] = imx_uart_readl(sport, UCR3);
@@ -2406,6 +2415,7 @@ static void imx_uart_save_context(struct imx_port *sport)
sport->saved_reg[8] = imx_uart_readl(sport, UBMR);
sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS);
sport->context_saved = true;
+   spin_unlock_irqrestore(>port.lock, flags);
 }
 
 static void imx_uart_enable_wakeup(struct imx_port *sport, bool on)
-- 
2.7.4



RE: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Anson Huang
Hi, Uwe

Anson Huang
Best Regards!


> -Original Message-
> From: Uwe Kleine-König 
> Sent: Wednesday, September 5, 2018 4:32 AM
> To: Anson Huang 
> Cc: gre...@linuxfoundation.org; jsl...@suse.com;
> linux-ser...@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx
> 
> Subject: Re: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore
> 
> Hello,
> 
> On Thu, Aug 09, 2018 at 06:06:08PM +0800, Anson Huang wrote:
> > In noirq suspend/resume stage with no_console_suspend enabled,
> > .imx_console_write() may be called to print out log_buf message by
> > .printk(), so there will be race condition between
> > .imx_console_write() and .serial_imx_save/restore_context(),
> > need to add lock to protect the registers save/restore operations.
> 
> The function names changes some time ago. Also I'd drop the leading dots in
> the names which I would understand as members in a struct.
> 
> Is this a theoretical issue, or did you see actual breakage?
 
I will update the function name and remove the dots.

And we did see actual breakage during stress test, although the reproduce rate
is NOT that high, but the issue came out and fixed by this patch. 

> 
> > Signed-off-by: Anson Huang 
> > ---
> >  drivers/tty/serial/imx.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index
> > 239c0fa..a09ccef 100644
> > --- a/drivers/tty/serial/imx.c
> > +++ b/drivers/tty/serial/imx.c
> > @@ -2376,9 +2376,12 @@ static int imx_uart_remove(struct
> > platform_device *pdev)
> >
> >  static void imx_uart_restore_context(struct imx_port *sport)  {
> > +   unsigned long flags;
> > +
> > if (!sport->context_saved)
> > return;
> 
> Is it safe to check .context_saved without holding the lock?
 
Ah, my mistake, will put the check inside the lock. Please help
review V2 patch set, thanks.

Anson.

> 
> > +   spin_lock_irqsave(>port.lock, flags);
> > imx_uart_writel(sport, sport->saved_reg[4], UFCR);
> > imx_uart_writel(sport, sport->saved_reg[5], UESC);
> > imx_uart_writel(sport, sport->saved_reg[6], UTIM); @@ -2390,11
> > +2393,15 @@ static void imx_uart_restore_context(struct imx_port *sport)
> > imx_uart_writel(sport, sport->saved_reg[2], UCR3);
> > imx_uart_writel(sport, sport->saved_reg[3], UCR4);
> > sport->context_saved = false;
> > +   spin_unlock_irqrestore(>port.lock, flags);
> >  }
> >
> >  static void imx_uart_save_context(struct imx_port *sport)  {
> > +   unsigned long flags;
> > +
> > /* Save necessary regs */
> > +   spin_lock_irqsave(>port.lock, flags);
> > sport->saved_reg[0] = imx_uart_readl(sport, UCR1);
> > sport->saved_reg[1] = imx_uart_readl(sport, UCR2);
> > sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6
> > +2413,7 @@ static void imx_uart_save_context(struct imx_port *sport)
> > sport->saved_reg[8] = imx_uart_readl(sport, UBMR);
> > sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS);
> > sport->context_saved = true;
> > +   spin_unlock_irqrestore(>port.lock, flags);
> >  }
> >
> >  static void imx_uart_enable_wakeup(struct imx_port *sport, bool on)
> 
> Best regards
> Uwe
> 
> --
> Pengutronix e.K.   | Uwe Kleine-König
> |
> Industrial Linux Solutions |
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
> pengutronix.de%2Fdata=02%7C01%7Canson.huang%40nxp.com%7C74
> a28e347934451a59bf08d612a57f8c%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C636716899353510311sdata=lByZDd%2BbQitjEf%2FaiN3
> e26El3ErT0fnmvaKCqckF7Qc%3Dreserved=0  |


RE: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore

2018-09-04 Thread Anson Huang
Hi, Uwe

Anson Huang
Best Regards!


> -Original Message-
> From: Uwe Kleine-König 
> Sent: Wednesday, September 5, 2018 4:32 AM
> To: Anson Huang 
> Cc: gre...@linuxfoundation.org; jsl...@suse.com;
> linux-ser...@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx
> 
> Subject: Re: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore
> 
> Hello,
> 
> On Thu, Aug 09, 2018 at 06:06:08PM +0800, Anson Huang wrote:
> > In noirq suspend/resume stage with no_console_suspend enabled,
> > .imx_console_write() may be called to print out log_buf message by
> > .printk(), so there will be race condition between
> > .imx_console_write() and .serial_imx_save/restore_context(),
> > need to add lock to protect the registers save/restore operations.
> 
> The function names changes some time ago. Also I'd drop the leading dots in
> the names which I would understand as members in a struct.
> 
> Is this a theoretical issue, or did you see actual breakage?
 
I will update the function name and remove the dots.

And we did see actual breakage during stress test, although the reproduce rate
is NOT that high, but the issue came out and fixed by this patch. 

> 
> > Signed-off-by: Anson Huang 
> > ---
> >  drivers/tty/serial/imx.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index
> > 239c0fa..a09ccef 100644
> > --- a/drivers/tty/serial/imx.c
> > +++ b/drivers/tty/serial/imx.c
> > @@ -2376,9 +2376,12 @@ static int imx_uart_remove(struct
> > platform_device *pdev)
> >
> >  static void imx_uart_restore_context(struct imx_port *sport)  {
> > +   unsigned long flags;
> > +
> > if (!sport->context_saved)
> > return;
> 
> Is it safe to check .context_saved without holding the lock?
 
Ah, my mistake, will put the check inside the lock. Please help
review V2 patch set, thanks.

Anson.

> 
> > +   spin_lock_irqsave(>port.lock, flags);
> > imx_uart_writel(sport, sport->saved_reg[4], UFCR);
> > imx_uart_writel(sport, sport->saved_reg[5], UESC);
> > imx_uart_writel(sport, sport->saved_reg[6], UTIM); @@ -2390,11
> > +2393,15 @@ static void imx_uart_restore_context(struct imx_port *sport)
> > imx_uart_writel(sport, sport->saved_reg[2], UCR3);
> > imx_uart_writel(sport, sport->saved_reg[3], UCR4);
> > sport->context_saved = false;
> > +   spin_unlock_irqrestore(>port.lock, flags);
> >  }
> >
> >  static void imx_uart_save_context(struct imx_port *sport)  {
> > +   unsigned long flags;
> > +
> > /* Save necessary regs */
> > +   spin_lock_irqsave(>port.lock, flags);
> > sport->saved_reg[0] = imx_uart_readl(sport, UCR1);
> > sport->saved_reg[1] = imx_uart_readl(sport, UCR2);
> > sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6
> > +2413,7 @@ static void imx_uart_save_context(struct imx_port *sport)
> > sport->saved_reg[8] = imx_uart_readl(sport, UBMR);
> > sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS);
> > sport->context_saved = true;
> > +   spin_unlock_irqrestore(>port.lock, flags);
> >  }
> >
> >  static void imx_uart_enable_wakeup(struct imx_port *sport, bool on)
> 
> Best regards
> Uwe
> 
> --
> Pengutronix e.K.   | Uwe Kleine-König
> |
> Industrial Linux Solutions |
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
> pengutronix.de%2Fdata=02%7C01%7Canson.huang%40nxp.com%7C74
> a28e347934451a59bf08d612a57f8c%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C636716899353510311sdata=lByZDd%2BbQitjEf%2FaiN3
> e26El3ErT0fnmvaKCqckF7Qc%3Dreserved=0  |


kselftests for memory.oom.group

2018-09-04 Thread jgkamat
Hi Shuah,

I wrote some tests for the new memory.oom.group feature in cgroups 2.

In the process, I discovered a few small bugs in the cgroups tests, which
I have fixed as well in a separate commit.

This is my first ever patch to Linux, so let me know if you see any issues or
improvements that can be made.

Thanks for all your amazing work on Linux!
-Jay





kselftests for memory.oom.group

2018-09-04 Thread jgkamat
Hi Shuah,

I wrote some tests for the new memory.oom.group feature in cgroups 2.

In the process, I discovered a few small bugs in the cgroups tests, which
I have fixed as well in a separate commit.

This is my first ever patch to Linux, so let me know if you see any issues or
improvements that can be made.

Thanks for all your amazing work on Linux!
-Jay





[PATCH 2/2] Add tests for memory.oom.group

2018-09-04 Thread jgkamat
From: Jay Kamat 

Add tests for memory.oom.group for the following cases:
- Killing all processes in a leaf cgroup, but leaving the
  parent untouched
- Killing all processes in a parent and leaf cgroup
- Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered
  for being killed by the group oom killer.

Signed-off-by: Jay Kamat 
---
 tools/testing/selftests/cgroup/cgroup_util.c  |  21 ++
 tools/testing/selftests/cgroup/cgroup_util.h  |   1 +
 .../selftests/cgroup/test_memcontrol.c| 205 ++
 3 files changed, 227 insertions(+)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c 
b/tools/testing/selftests/cgroup/cgroup_util.c
index 4aadf38bcd5d..6799c69d7f03 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -338,3 +338,24 @@ int is_swap_enabled(void)
 
return cnt > 1;
 }
+
+int set_oom_adj_score(int pid, int score)
+{
+   char path[PATH_MAX];
+   int fd, len;
+
+   sprintf(path, "/proc/%d/oom_score_adj", pid);
+
+   fd = open(path, O_WRONLY | O_APPEND);
+   if (fd < 0)
+   return fd;
+
+   len = dprintf(fd, "%d", score);
+   if (len < 0) {
+   close(fd);
+   return len;
+   }
+
+   close(fd);
+   return 0;
+}
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h 
b/tools/testing/selftests/cgroup/cgroup_util.h
index fe82a297d4e0..cabd43fd137a 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -39,3 +39,4 @@ extern int get_temp_fd(void);
 extern int alloc_pagecache(int fd, size_t size);
 extern int alloc_anon(const char *cgroup, void *arg);
 extern int is_swap_enabled(void);
+extern int set_oom_adj_score(int pid, int score);
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c 
b/tools/testing/selftests/cgroup/test_memcontrol.c
index cf0bddc9d271..017c15a7a935 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -2,6 +2,7 @@
 #define _GNU_SOURCE
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -202,6 +203,36 @@ static int alloc_pagecache_50M_noexit(const char *cgroup, 
void *arg)
return 0;
 }
 
+static int alloc_anon_noexit(const char *cgroup, void *arg)
+{
+   int ppid = getppid();
+
+   if (alloc_anon(cgroup, arg))
+   return -1;
+
+   while (getppid() == ppid)
+   sleep(1);
+
+   return 0;
+}
+
+/*
+ * Wait until processes are killed asynchronously by the OOM killer
+ * If we exceed a timeout, fail.
+ */
+static int cg_test_proc_killed(const char *cgroup)
+{
+   int limit;
+
+   for (limit = 10; limit > 0; limit--) {
+   if (cg_read_strcmp(cgroup, "cgroup.procs", "") == 0)
+   return 0;
+
+   usleep(10);
+   }
+   return -1;
+}
+
 /*
  * First, this test creates the following hierarchy:
  * A   memory.min = 50M,  memory.max = 200M
@@ -964,6 +995,177 @@ static int test_memcg_sock(const char *root)
return ret;
 }
 
+/*
+ * This test disables swapping and tries to allocate anonymous memory
+ * up to OOM with memory.group.oom set. Then it checks that all
+ * processes in the leaf (but not the parent) were killed.
+ */
+static int test_memcg_oom_group_leaf_events(const char *root)
+{
+   int ret = KSFT_FAIL;
+   char *parent, *child;
+
+   parent = cg_name(root, "memcg_test_0");
+   child = cg_name(root, "memcg_test_0/memcg_test_1");
+
+   if (!parent || !child)
+   goto cleanup;
+
+   if (cg_create(parent))
+   goto cleanup;
+
+   if (cg_create(child))
+   goto cleanup;
+
+   if (cg_write(parent, "cgroup.subtree_control", "+memory"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.max", "50M"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.swap.max", "0"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.oom.group", "1"))
+   goto cleanup;
+
+   cg_run_nowait(parent, alloc_anon_noexit, (void *) MB(60));
+   cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1));
+   cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1));
+   if (!cg_run(child, alloc_anon, (void *)MB(100)))
+   goto cleanup;
+
+   if (cg_test_proc_killed(child))
+   goto cleanup;
+
+   if (cg_read_key_long(child, "memory.events", "oom_kill ") <= 0)
+   goto cleanup;
+
+   if (cg_read_key_long(parent, "memory.events", "oom_kill ") != 0)
+   goto cleanup;
+
+   ret = KSFT_PASS;
+
+cleanup:
+   if (child)
+   cg_destroy(child);
+   if (parent)
+   cg_destroy(parent);
+   free(child);
+   free(parent);
+
+   return ret;
+}
+
+/*
+ * This test disables swapping and tries to allocate anonymous memory
+ * up to OOM with 

[PATCH 1/2] Fix cg_read_strcmp()

2018-09-04 Thread jgkamat
From: Jay Kamat 

Fix a couple issues with cg_read_strcmp(), to improve correctness of
cgroup tests
- Fix cg_read_strcmp() always returning 0 for empty "needle" strings
- Fix a memory leak in cg_read_strcmp()

Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests")

Signed-off-by: Jay Kamat 
---
 tools/testing/selftests/cgroup/cgroup_util.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c 
b/tools/testing/selftests/cgroup/cgroup_util.c
index 1e9e3c470561..4aadf38bcd5d 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -91,15 +91,24 @@ int cg_read_strcmp(const char *cgroup, const char *control,
 {
size_t size = strlen(expected) + 1;
char *buf;
+   int ret;
+
+   /* Handle the case of comparing against empty string */
+   if (size == 1)
+   size = 32;
 
buf = malloc(size);
if (!buf)
return -1;
 
-   if (cg_read(cgroup, control, buf, size))
+   if (cg_read(cgroup, control, buf, size)) {
+   free(buf);
return -1;
+   }
 
-   return strcmp(expected, buf);
+   ret = strcmp(expected, buf);
+   free(buf);
+   return ret;
 }
 
 int cg_read_strstr(const char *cgroup, const char *control, const char *needle)
-- 
2.17.1



[PATCH 2/2] Add tests for memory.oom.group

2018-09-04 Thread jgkamat
From: Jay Kamat 

Add tests for memory.oom.group for the following cases:
- Killing all processes in a leaf cgroup, but leaving the
  parent untouched
- Killing all processes in a parent and leaf cgroup
- Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered
  for being killed by the group oom killer.

Signed-off-by: Jay Kamat 
---
 tools/testing/selftests/cgroup/cgroup_util.c  |  21 ++
 tools/testing/selftests/cgroup/cgroup_util.h  |   1 +
 .../selftests/cgroup/test_memcontrol.c| 205 ++
 3 files changed, 227 insertions(+)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c 
b/tools/testing/selftests/cgroup/cgroup_util.c
index 4aadf38bcd5d..6799c69d7f03 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -338,3 +338,24 @@ int is_swap_enabled(void)
 
return cnt > 1;
 }
+
+int set_oom_adj_score(int pid, int score)
+{
+   char path[PATH_MAX];
+   int fd, len;
+
+   sprintf(path, "/proc/%d/oom_score_adj", pid);
+
+   fd = open(path, O_WRONLY | O_APPEND);
+   if (fd < 0)
+   return fd;
+
+   len = dprintf(fd, "%d", score);
+   if (len < 0) {
+   close(fd);
+   return len;
+   }
+
+   close(fd);
+   return 0;
+}
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h 
b/tools/testing/selftests/cgroup/cgroup_util.h
index fe82a297d4e0..cabd43fd137a 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -39,3 +39,4 @@ extern int get_temp_fd(void);
 extern int alloc_pagecache(int fd, size_t size);
 extern int alloc_anon(const char *cgroup, void *arg);
 extern int is_swap_enabled(void);
+extern int set_oom_adj_score(int pid, int score);
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c 
b/tools/testing/selftests/cgroup/test_memcontrol.c
index cf0bddc9d271..017c15a7a935 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -2,6 +2,7 @@
 #define _GNU_SOURCE
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -202,6 +203,36 @@ static int alloc_pagecache_50M_noexit(const char *cgroup, 
void *arg)
return 0;
 }
 
+static int alloc_anon_noexit(const char *cgroup, void *arg)
+{
+   int ppid = getppid();
+
+   if (alloc_anon(cgroup, arg))
+   return -1;
+
+   while (getppid() == ppid)
+   sleep(1);
+
+   return 0;
+}
+
+/*
+ * Wait until processes are killed asynchronously by the OOM killer
+ * If we exceed a timeout, fail.
+ */
+static int cg_test_proc_killed(const char *cgroup)
+{
+   int limit;
+
+   for (limit = 10; limit > 0; limit--) {
+   if (cg_read_strcmp(cgroup, "cgroup.procs", "") == 0)
+   return 0;
+
+   usleep(10);
+   }
+   return -1;
+}
+
 /*
  * First, this test creates the following hierarchy:
  * A   memory.min = 50M,  memory.max = 200M
@@ -964,6 +995,177 @@ static int test_memcg_sock(const char *root)
return ret;
 }
 
+/*
+ * This test disables swapping and tries to allocate anonymous memory
+ * up to OOM with memory.group.oom set. Then it checks that all
+ * processes in the leaf (but not the parent) were killed.
+ */
+static int test_memcg_oom_group_leaf_events(const char *root)
+{
+   int ret = KSFT_FAIL;
+   char *parent, *child;
+
+   parent = cg_name(root, "memcg_test_0");
+   child = cg_name(root, "memcg_test_0/memcg_test_1");
+
+   if (!parent || !child)
+   goto cleanup;
+
+   if (cg_create(parent))
+   goto cleanup;
+
+   if (cg_create(child))
+   goto cleanup;
+
+   if (cg_write(parent, "cgroup.subtree_control", "+memory"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.max", "50M"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.swap.max", "0"))
+   goto cleanup;
+
+   if (cg_write(child, "memory.oom.group", "1"))
+   goto cleanup;
+
+   cg_run_nowait(parent, alloc_anon_noexit, (void *) MB(60));
+   cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1));
+   cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1));
+   if (!cg_run(child, alloc_anon, (void *)MB(100)))
+   goto cleanup;
+
+   if (cg_test_proc_killed(child))
+   goto cleanup;
+
+   if (cg_read_key_long(child, "memory.events", "oom_kill ") <= 0)
+   goto cleanup;
+
+   if (cg_read_key_long(parent, "memory.events", "oom_kill ") != 0)
+   goto cleanup;
+
+   ret = KSFT_PASS;
+
+cleanup:
+   if (child)
+   cg_destroy(child);
+   if (parent)
+   cg_destroy(parent);
+   free(child);
+   free(parent);
+
+   return ret;
+}
+
+/*
+ * This test disables swapping and tries to allocate anonymous memory
+ * up to OOM with 

[PATCH 1/2] Fix cg_read_strcmp()

2018-09-04 Thread jgkamat
From: Jay Kamat 

Fix a couple issues with cg_read_strcmp(), to improve correctness of
cgroup tests
- Fix cg_read_strcmp() always returning 0 for empty "needle" strings
- Fix a memory leak in cg_read_strcmp()

Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests")

Signed-off-by: Jay Kamat 
---
 tools/testing/selftests/cgroup/cgroup_util.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c 
b/tools/testing/selftests/cgroup/cgroup_util.c
index 1e9e3c470561..4aadf38bcd5d 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -91,15 +91,24 @@ int cg_read_strcmp(const char *cgroup, const char *control,
 {
size_t size = strlen(expected) + 1;
char *buf;
+   int ret;
+
+   /* Handle the case of comparing against empty string */
+   if (size == 1)
+   size = 32;
 
buf = malloc(size);
if (!buf)
return -1;
 
-   if (cg_read(cgroup, control, buf, size))
+   if (cg_read(cgroup, control, buf, size)) {
+   free(buf);
return -1;
+   }
 
-   return strcmp(expected, buf);
+   ret = strcmp(expected, buf);
+   free(buf);
+   return ret;
 }
 
 int cg_read_strstr(const char *cgroup, const char *control, const char *needle)
-- 
2.17.1



RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Schaufler, Casey
> -Original Message-
> From: Andrea Arcangeli [mailto:aarca...@redhat.com]
> Sent: Tuesday, September 04, 2018 4:37 PM
> To: Schaufler, Casey 
> Cc: Jiri Kosina ; Tim Chen ;
> Thomas Gleixner ; Ingo Molnar ;
> Peter Zijlstra ; Josh Poimboeuf
> ; Woodhouse, David ; Oleg
> Nesterov ; linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> Hello,
> 
> On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote:
> > The real reason to use an LSM based approach is that overloading ptrace
> > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
> > processor interface. Even if ptrace_may_access() does exactly what you
> 
> "Side channel is a processor interface" doesn't make me optimistic,
> but I assume you're not speaking for Intel.

Sorry, I've been working in security too long for my
optimistic streak to be very wide.

> > want it to for side-channel mitigation today it would be incredibly
> > inappropriate to tie the two together for eternity. You don't want to 
> > restrict
> > the ptrace checks to only those things that are also good for side-channel,
> > and vice versa.
> 
> It seems like you want to make this more configurable,

Not especially, no. I have gotten considerable feedback that
it should be configurable, and while I may not see the value myself,
I do respect the people who've requested the configurability.

> we have all
> debugfs x86 specific tweaks to disable IBPB at runtime and we don't
> allow a runtime opt-out of IBPB alone.
> 
> If you shutdown either IBRS or retpolines at runtime with debugfs,
> then IBPB goes away too.
> 
> Giving a finegrined way to disable only IBPB we found was overkill
> because IBPB has never been measurable if done only when the prev task
> cannot ptrace the next task (which is a superset of the too weak
> upstream not dumpable check, but still not a performance issue).
> 
> Allowing IBPB runtime opt-out doesn't make much sense if you don't
> allow to disable retpolines too still at runtime. And disabling
> retpolines from LSM doesn't sounds the right place, it's an x86
> temporary quirk only relevant for current buggy CPUs.
> 
> There should be a function that decides when IBPB and flush_RSB should
> be run (flush_RSB has to be run if SMEP because with SMEP there's no
> need to run flush_RSB at every kernel entry anymore), and that
> function happens to check ptrace today, but it would check other stuff
> too if we had other interfaces besides ptrace that would allow the
> prev task to read the memory of the next task. So it's not so much
> about ptrace nor about IBPB, it's about "IBPB_RSB" vs "prev task
> can read memory of next task". Then each arch can implement
> "IBPB_RSB" method its own way but the check is for the common
> code and it should be in the scheduler and there's just 1 version of
> this check needed.

Right, I get that. There's more to ptrace access than "I can read the
other task". There's more to what the processor is up to than IBPB.

> 
> I don't think there should be a ton of different versions of this
> function each providing a different answer, which is what LSM would
> provide.

If they provide different answers there should be different
functions. It's a question of viewpoint. If you don't care about
the SELinux viewpoint you shouldn't have to include it in your
checks, any more than you care about x86 issues when you're
running on a MIPS processor.

> At most you can add a x86 debugfs tweak to shut off the logic but
> again it would make more sense if retpolines could be shut off at
> runtime too, doing it just for IBPB sounds backwards because it has
> such an unmeasurable overhead.
> 
> > Yes. That would be me.
> 
> Even the attempt to make an innocuous call like
> ptrace_has_cap(tcred->user_ns, mode) will eventually
> deadlock there.

Which is yet another reason there needs to be separate logic.

> I used a PTRACE_MODE_ check that the ptrace check code uses to filter
> out specific calls that may eventually enter LSM and hard lockup in
> non reproducible workloads (some false positive IBPB is ok, especially
> if it avoids a deadlock).

Yes, even security people have to worry about locking.

> Everything can be fixed in any way, but calling LSM from scheduler
> code doesn't sound the most robust thing to do in general because what
> works outside the scheduler doesn't work from within those semi atomic
> regions (it tends to work initially until it eventually
> deadlocks). The original code of such check, had all sort of deadlocks
> because of that.

What *is* the most robust way?
Yes, there are locking issues. The code in the LSM infrastructure and in
the security modules needs to be aware of that and deal with it. The SELinux
code I proposed is more complex than it could be because the audit code
requires locking that doesn't work in the switching context.

> Thanks,
> Andrea


RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Schaufler, Casey
> -Original Message-
> From: Andrea Arcangeli [mailto:aarca...@redhat.com]
> Sent: Tuesday, September 04, 2018 4:37 PM
> To: Schaufler, Casey 
> Cc: Jiri Kosina ; Tim Chen ;
> Thomas Gleixner ; Ingo Molnar ;
> Peter Zijlstra ; Josh Poimboeuf
> ; Woodhouse, David ; Oleg
> Nesterov ; linux-kernel@vger.kernel.org; x...@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> Hello,
> 
> On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote:
> > The real reason to use an LSM based approach is that overloading ptrace
> > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
> > processor interface. Even if ptrace_may_access() does exactly what you
> 
> "Side channel is a processor interface" doesn't make me optimistic,
> but I assume you're not speaking for Intel.

Sorry, I've been working in security too long for my
optimistic streak to be very wide.

> > want it to for side-channel mitigation today it would be incredibly
> > inappropriate to tie the two together for eternity. You don't want to 
> > restrict
> > the ptrace checks to only those things that are also good for side-channel,
> > and vice versa.
> 
> It seems like you want to make this more configurable,

Not especially, no. I have gotten considerable feedback that
it should be configurable, and while I may not see the value myself,
I do respect the people who've requested the configurability.

> we have all
> debugfs x86 specific tweaks to disable IBPB at runtime and we don't
> allow a runtime opt-out of IBPB alone.
> 
> If you shutdown either IBRS or retpolines at runtime with debugfs,
> then IBPB goes away too.
> 
> Giving a finegrined way to disable only IBPB we found was overkill
> because IBPB has never been measurable if done only when the prev task
> cannot ptrace the next task (which is a superset of the too weak
> upstream not dumpable check, but still not a performance issue).
> 
> Allowing IBPB runtime opt-out doesn't make much sense if you don't
> allow to disable retpolines too still at runtime. And disabling
> retpolines from LSM doesn't sounds the right place, it's an x86
> temporary quirk only relevant for current buggy CPUs.
> 
> There should be a function that decides when IBPB and flush_RSB should
> be run (flush_RSB has to be run if SMEP because with SMEP there's no
> need to run flush_RSB at every kernel entry anymore), and that
> function happens to check ptrace today, but it would check other stuff
> too if we had other interfaces besides ptrace that would allow the
> prev task to read the memory of the next task. So it's not so much
> about ptrace nor about IBPB, it's about "IBPB_RSB" vs "prev task
> can read memory of next task". Then each arch can implement
> "IBPB_RSB" method its own way but the check is for the common
> code and it should be in the scheduler and there's just 1 version of
> this check needed.

Right, I get that. There's more to ptrace access than "I can read the
other task". There's more to what the processor is up to than IBPB.

> 
> I don't think there should be a ton of different versions of this
> function each providing a different answer, which is what LSM would
> provide.

If they provide different answers there should be different
functions. It's a question of viewpoint. If you don't care about
the SELinux viewpoint you shouldn't have to include it in your
checks, any more than you care about x86 issues when you're
running on a MIPS processor.

> At most you can add a x86 debugfs tweak to shut off the logic but
> again it would make more sense if retpolines could be shut off at
> runtime too, doing it just for IBPB sounds backwards because it has
> such an unmeasurable overhead.
> 
> > Yes. That would be me.
> 
> Even the attempt to make an innocuous call like
> ptrace_has_cap(tcred->user_ns, mode) will eventually
> deadlock there.

Which is yet another reason there needs to be separate logic.

> I used a PTRACE_MODE_ check that the ptrace check code uses to filter
> out specific calls that may eventually enter LSM and hard lockup in
> non reproducible workloads (some false positive IBPB is ok, especially
> if it avoids a deadlock).

Yes, even security people have to worry about locking.

> Everything can be fixed in any way, but calling LSM from scheduler
> code doesn't sound the most robust thing to do in general because what
> works outside the scheduler doesn't work from within those semi atomic
> regions (it tends to work initially until it eventually
> deadlocks). The original code of such check, had all sort of deadlocks
> because of that.

What *is* the most robust way?
Yes, there are locking issues. The code in the LSM infrastructure and in
the security modules needs to be aware of that and deal with it. The SELinux
code I proposed is more complex than it could be because the audit code
requires locking that doesn't work in the switching context.

> Thanks,
> Andrea


Re: POSIX violation by writeback error

2018-09-04 Thread Dave Chinner
On Tue, Sep 04, 2018 at 01:35:34PM -0700, Vito Caputo wrote:
> On Tue, Sep 04, 2018 at 04:18:18PM -0400, Jeff Layton wrote:
> > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote:
> > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote:
> > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote:
> > > > > Well, I think the point was that in the above examples you'd prefer 
> > > > > that
> > > > > the read just fail--no need to keep the data.  A bit marking the file
> > > > > (or even the entire filesystem) unreadable would satisfy posix, I 
> > > > > guess.
> > > > > Whether that's practical, I don't know.
> > > > 
> > > > When you would do it like that (mark the whole filesystem as "in
> > > > error") things go from bad to worse even faster. The Linux kernel 
> > > > tries to keep the system up even in the face of errors. 
> > > > 
> > > > With that suggestion, having one application run into a writeback
> > > > error would effectively crash the whole system because the filesystem
> > > > may be the root filesystem and stuff like "sshd" that you need to
> > > > diagnose the problem needs to be read from the disk 
> > > 
> > > Well, the absolutist position on posix compliance here would be that a
> > > crash is still preferable to returning the wrong data.  And for the
> > > cases 焦晓冬 gives, that sounds right?  Maybe it's the wrong balance in
> > > general, I don't know.  And we do already have filesystems with
> > > panic-on-error options, so if they aren't used maybe then maybe users
> > > have already voted against that level of strictness.
> > > 
> > 
> > Yeah, idk. The problem here is that this is squarely in the domain of
> > implementation defined behavior. I do think that the current "policy"
> > (if you call it that) of what to do after a wb error is weird and wrong.
> > What we probably ought to do is start considering how we'd like it to
> > behave.
> > 
> > How about something like this?
> > 
> > Mark the pages as "uncleanable" after a writeback error. We'll satisfy
> > reads from the cached data until someone calls fsync, at which point
> > we'd return the error and invalidate the uncleanable pages.
> > 
> > If no one calls fsync and scrapes the error, we'll hold on to it for as
> > long as we can (or up to some predefined limit) and then after that
> > we'll invalidate the uncleanable pages and start returning errors on
> > reads. If someone eventually calls fsync afterward, we can return to
> > normal operation.
> > 
> > As always though...what about mmap? Would we need to SIGBUS at the point
> > where we'd start returning errors on read()?
> > 
> > Would that approximate the current behavior enough and make sense?
> > Implementing it all sounds non-trivial though...
> > 
> 
> Here's a crazy and potentially stupid idea:
> 
> Implement a new class of swap space for backing dirty pages which fail
> to write back.  Pages in this space survive reboots, essentially backing
> the implicit commitment POSIX establishes in the face of asynchronous
> writeback errors.  Rather than evicting these pages as clean, they are
> swapped out to the persistent swap.

And when that "swap" area gets write errors, too? What then? We're
straight back to the same "what the hell do we do with the error"
problem.

Adding more turtles doesn't help solve this issue.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: POSIX violation by writeback error

2018-09-04 Thread Dave Chinner
On Tue, Sep 04, 2018 at 01:35:34PM -0700, Vito Caputo wrote:
> On Tue, Sep 04, 2018 at 04:18:18PM -0400, Jeff Layton wrote:
> > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote:
> > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote:
> > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote:
> > > > > Well, I think the point was that in the above examples you'd prefer 
> > > > > that
> > > > > the read just fail--no need to keep the data.  A bit marking the file
> > > > > (or even the entire filesystem) unreadable would satisfy posix, I 
> > > > > guess.
> > > > > Whether that's practical, I don't know.
> > > > 
> > > > When you would do it like that (mark the whole filesystem as "in
> > > > error") things go from bad to worse even faster. The Linux kernel 
> > > > tries to keep the system up even in the face of errors. 
> > > > 
> > > > With that suggestion, having one application run into a writeback
> > > > error would effectively crash the whole system because the filesystem
> > > > may be the root filesystem and stuff like "sshd" that you need to
> > > > diagnose the problem needs to be read from the disk 
> > > 
> > > Well, the absolutist position on posix compliance here would be that a
> > > crash is still preferable to returning the wrong data.  And for the
> > > cases 焦晓冬 gives, that sounds right?  Maybe it's the wrong balance in
> > > general, I don't know.  And we do already have filesystems with
> > > panic-on-error options, so if they aren't used maybe then maybe users
> > > have already voted against that level of strictness.
> > > 
> > 
> > Yeah, idk. The problem here is that this is squarely in the domain of
> > implementation defined behavior. I do think that the current "policy"
> > (if you call it that) of what to do after a wb error is weird and wrong.
> > What we probably ought to do is start considering how we'd like it to
> > behave.
> > 
> > How about something like this?
> > 
> > Mark the pages as "uncleanable" after a writeback error. We'll satisfy
> > reads from the cached data until someone calls fsync, at which point
> > we'd return the error and invalidate the uncleanable pages.
> > 
> > If no one calls fsync and scrapes the error, we'll hold on to it for as
> > long as we can (or up to some predefined limit) and then after that
> > we'll invalidate the uncleanable pages and start returning errors on
> > reads. If someone eventually calls fsync afterward, we can return to
> > normal operation.
> > 
> > As always though...what about mmap? Would we need to SIGBUS at the point
> > where we'd start returning errors on read()?
> > 
> > Would that approximate the current behavior enough and make sense?
> > Implementing it all sounds non-trivial though...
> > 
> 
> Here's a crazy and potentially stupid idea:
> 
> Implement a new class of swap space for backing dirty pages which fail
> to write back.  Pages in this space survive reboots, essentially backing
> the implicit commitment POSIX establishes in the face of asynchronous
> writeback errors.  Rather than evicting these pages as clean, they are
> swapped out to the persistent swap.

And when that "swap" area gets write errors, too? What then? We're
straight back to the same "what the hell do we do with the error"
problem.

Adding more turtles doesn't help solve this issue.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [RFC PATCH 1/2] pipe: introduce busy wait for pipe

2018-09-04 Thread Subhra Mazumdar




On 08/31/2018 09:09 AM, Steven Sistare wrote:

On 8/30/2018 4:24 PM, subhra mazumdar wrote:

Introduce pipe_ll_usec field for pipes that indicates the amount of micro
seconds a thread should spin if pipe is empty or full before sleeping. This
is similar to network sockets. Workloads like hackbench in pipe mode
benefits significantly from this by avoiding the sleep and wakeup overhead.
Other similar usecases can benefit. pipe_wait_flag is used to signal any
thread busy waiting. pipe_busy_loop_timeout checks if spin time is over.

Signed-off-by: subhra mazumdar 
---
  include/linux/pipe_fs_i.h | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e7497c9..fdfd2a2 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -1,6 +1,8 @@
  #ifndef _LINUX_PIPE_FS_I_H
  #define _LINUX_PIPE_FS_I_H
  
+#include 

+
  #define PIPE_DEF_BUFFERS  16
  
  #define PIPE_BUF_FLAG_LRU	0x01	/* page is on the LRU */

@@ -54,6 +56,8 @@ struct pipe_inode_info {
unsigned int waiting_writers;
unsigned int r_counter;
unsigned int w_counter;
+   unsigned int pipe_ll_usec;
+   unsigned long pipe_wait_flag;
struct page *tmp_page;
struct fasync_struct *fasync_readers;
struct fasync_struct *fasync_writers;
@@ -157,6 +161,21 @@ static inline int pipe_buf_steal(struct pipe_inode_info 
*pipe,
return buf->ops->steal(pipe, buf);
  }
  
+static inline unsigned long pipe_busy_loop_current_time(void)

+{
+   return (unsigned long)(local_clock() >> 10);

Why ">> 10" ? local_lock() has nanosec units, and you compare to the tunable
pipe_llc_sec which has microsec units.  Should be ">> 3".  Better yet, redefine
the tunable to have nanosec units.  I suspect you will need very large values
of the tunable to show similar results.

It's 2^10. I don't think using nanosec units is necessary. It is unlikely
data will be read or written in nano seconds. sk_busy_loop_timeout for
sockets uses micro seconds too.


Also, since this type of optimization consumes CPU extra cycles that could
be used by other tasks, show the overall CPU utilization before and after
the optimization, such as by using "time hackbench ...".

OK.

Thanks,
Subhra


- Steve


+}
+
+static inline bool pipe_busy_loop_timeout(struct pipe_inode_info *pipe,
+ unsigned long start_time)
+{
+   unsigned long bp_usec = READ_ONCE(pipe->pipe_ll_usec);
+   unsigned long end_time = start_time + bp_usec;
+   unsigned long now = pipe_busy_loop_current_time();
+
+   return time_after(now, end_time);
+}
+
  /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
 memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
  #define PIPE_SIZE PAGE_SIZE





Re: [RFC PATCH 1/2] pipe: introduce busy wait for pipe

2018-09-04 Thread Subhra Mazumdar




On 08/31/2018 09:09 AM, Steven Sistare wrote:

On 8/30/2018 4:24 PM, subhra mazumdar wrote:

Introduce pipe_ll_usec field for pipes that indicates the amount of micro
seconds a thread should spin if pipe is empty or full before sleeping. This
is similar to network sockets. Workloads like hackbench in pipe mode
benefits significantly from this by avoiding the sleep and wakeup overhead.
Other similar usecases can benefit. pipe_wait_flag is used to signal any
thread busy waiting. pipe_busy_loop_timeout checks if spin time is over.

Signed-off-by: subhra mazumdar 
---
  include/linux/pipe_fs_i.h | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e7497c9..fdfd2a2 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -1,6 +1,8 @@
  #ifndef _LINUX_PIPE_FS_I_H
  #define _LINUX_PIPE_FS_I_H
  
+#include 

+
  #define PIPE_DEF_BUFFERS  16
  
  #define PIPE_BUF_FLAG_LRU	0x01	/* page is on the LRU */

@@ -54,6 +56,8 @@ struct pipe_inode_info {
unsigned int waiting_writers;
unsigned int r_counter;
unsigned int w_counter;
+   unsigned int pipe_ll_usec;
+   unsigned long pipe_wait_flag;
struct page *tmp_page;
struct fasync_struct *fasync_readers;
struct fasync_struct *fasync_writers;
@@ -157,6 +161,21 @@ static inline int pipe_buf_steal(struct pipe_inode_info 
*pipe,
return buf->ops->steal(pipe, buf);
  }
  
+static inline unsigned long pipe_busy_loop_current_time(void)

+{
+   return (unsigned long)(local_clock() >> 10);

Why ">> 10" ? local_lock() has nanosec units, and you compare to the tunable
pipe_llc_sec which has microsec units.  Should be ">> 3".  Better yet, redefine
the tunable to have nanosec units.  I suspect you will need very large values
of the tunable to show similar results.

It's 2^10. I don't think using nanosec units is necessary. It is unlikely
data will be read or written in nano seconds. sk_busy_loop_timeout for
sockets uses micro seconds too.


Also, since this type of optimization consumes CPU extra cycles that could
be used by other tasks, show the overall CPU utilization before and after
the optimization, such as by using "time hackbench ...".

OK.

Thanks,
Subhra


- Steve


+}
+
+static inline bool pipe_busy_loop_timeout(struct pipe_inode_info *pipe,
+ unsigned long start_time)
+{
+   unsigned long bp_usec = READ_ONCE(pipe->pipe_ll_usec);
+   unsigned long end_time = start_time + bp_usec;
+   unsigned long now = pipe_busy_loop_current_time();
+
+   return time_after(now, end_time);
+}
+
  /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
 memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
  #define PIPE_SIZE PAGE_SIZE





linux-next: build warnings after merge of the devicetree tree

2018-09-04 Thread Stephen Rothwell
Hi Rob,

After merging the devicetree tree, today's linux-next build (x86_64
allmodconfig) produced these warnings:

WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the 
function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells
The function of_fdt_limit_memory() references
the variable __initdata dt_root_addr_cells.
This is often because of_fdt_limit_memory lacks a __initdata 
annotation or the annotation of dt_root_addr_cells is wrong.

WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the 
function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells
The function of_fdt_limit_memory() references
the variable __initdata dt_root_size_cells.
This is often because of_fdt_limit_memory lacks a __initdata 
annotation or the annotation of dt_root_size_cells is wrong.

Introduced by commit

  bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in 
of_fdt_limit_memory")

-- 
Cheers,
Stephen Rothwell


pgphx2tNF83ur.pgp
Description: OpenPGP digital signature


linux-next: build warnings after merge of the devicetree tree

2018-09-04 Thread Stephen Rothwell
Hi Rob,

After merging the devicetree tree, today's linux-next build (x86_64
allmodconfig) produced these warnings:

WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the 
function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells
The function of_fdt_limit_memory() references
the variable __initdata dt_root_addr_cells.
This is often because of_fdt_limit_memory lacks a __initdata 
annotation or the annotation of dt_root_addr_cells is wrong.

WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the 
function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells
The function of_fdt_limit_memory() references
the variable __initdata dt_root_size_cells.
This is often because of_fdt_limit_memory lacks a __initdata 
annotation or the annotation of dt_root_size_cells is wrong.

Introduced by commit

  bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in 
of_fdt_limit_memory")

-- 
Cheers,
Stephen Rothwell


pgphx2tNF83ur.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH 2/2] pipe: use pipe busy wait

2018-09-04 Thread Subhra Mazumdar




On 09/04/2018 02:54 PM, Thomas Gleixner wrote:

On Thu, 30 Aug 2018, subhra mazumdar wrote:
  
+void pipe_busy_wait(struct pipe_inode_info *pipe)

+{
+   unsigned long wait_flag = pipe->pipe_wait_flag;
+   unsigned long start_time = pipe_busy_loop_current_time();
+
+   pipe_unlock(pipe);
+   preempt_disable();
+   for (;;) {
+   if (pipe->pipe_wait_flag > wait_flag) {
+   preempt_enable();
+   pipe_lock(pipe);
+   return;
+   }
+   if (pipe_busy_loop_timeout(pipe, start_time))
+   break;
+   cpu_relax();
+   }
+   preempt_enable();

You are not really serious about busy looping with preemption disabled?

That's just wrong. Why do you want to block others from getting on the CPU
if there is nothing in the pipe?

There is no point in doing so, really. If the wait loop is preempted
because there is more important work to do, then it will come back and
either see new data, or leave due to wait time reached.

Makes sense. I will remove it.

Thanks,
Subhra


Thanks,

tglx





Re: [RFC PATCH 2/2] pipe: use pipe busy wait

2018-09-04 Thread Subhra Mazumdar




On 09/04/2018 02:54 PM, Thomas Gleixner wrote:

On Thu, 30 Aug 2018, subhra mazumdar wrote:
  
+void pipe_busy_wait(struct pipe_inode_info *pipe)

+{
+   unsigned long wait_flag = pipe->pipe_wait_flag;
+   unsigned long start_time = pipe_busy_loop_current_time();
+
+   pipe_unlock(pipe);
+   preempt_disable();
+   for (;;) {
+   if (pipe->pipe_wait_flag > wait_flag) {
+   preempt_enable();
+   pipe_lock(pipe);
+   return;
+   }
+   if (pipe_busy_loop_timeout(pipe, start_time))
+   break;
+   cpu_relax();
+   }
+   preempt_enable();

You are not really serious about busy looping with preemption disabled?

That's just wrong. Why do you want to block others from getting on the CPU
if there is nothing in the pipe?

There is no point in doing so, really. If the wait loop is preempted
because there is more important work to do, then it will come back and
either see new data, or leave due to wait time reached.

Makes sense. I will remove it.

Thanks,
Subhra


Thanks,

tglx





Re: [GIT PULL] s390 patches for the 4.19 merge window #2

2018-09-04 Thread Kees Cook
On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky
 wrote:
> Harald Freudenberger (5):
>   s390/zcrypt: hex string mask improvements for apmask and aqmask.

This (and an earlier 2017 commit) adds VLAs, which are being
removed[1] from the kernel:

drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable
length array ‘clrm’ [-Wvla]
drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable
length array ‘setm’ [-Wvla]
drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable
length array ‘setm’ [-Wvla]

static int process_mask_arg(const char *str,
unsigned long *bitmap, int bits,
struct mutex *lock)
...
DECLARE_BITMAP(clrm, bits);
DECLARE_BITMAP(setm, bits);

Can someone please adjust this to make these fixed size again?

Thanks!

-Kees

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

-- 
Kees Cook
Pixel Security


Re: [GIT PULL] s390 patches for the 4.19 merge window #2

2018-09-04 Thread Kees Cook
On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky
 wrote:
> Harald Freudenberger (5):
>   s390/zcrypt: hex string mask improvements for apmask and aqmask.

This (and an earlier 2017 commit) adds VLAs, which are being
removed[1] from the kernel:

drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable
length array ‘clrm’ [-Wvla]
drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable
length array ‘setm’ [-Wvla]
drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable
length array ‘setm’ [-Wvla]

static int process_mask_arg(const char *str,
unsigned long *bitmap, int bits,
struct mutex *lock)
...
DECLARE_BITMAP(clrm, bits);
DECLARE_BITMAP(setm, bits);

Can someone please adjust this to make these fixed size again?

Thanks!

-Kees

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

-- 
Kees Cook
Pixel Security


Re: linux-next: error when fetching the bcm2835 tree

2018-09-04 Thread Stephen Rothwell
Hi Stefan,

On Tue, 4 Sep 2018 22:07:15 +0200 (CEST) Stefan Wahren  
wrote:
>
> > Eric Anholt  hat am 4. September 2018 um 19:41 geschrieben:
> > 
> > Stephen Rothwell  writes:
> >   
> > > Fetching the bcm2835 tree (git://github.com/anholt/linux.git#for-next)
> > > produces this error:
> > >
> > > fatal: Couldn't find remote ref refs/heads/for-next  
> > 
> > I'd passed responsibility for the git tree off to Stefan while I'm
> > mostly-unavailable due to new baby.  Maybe Stefan had deleted the branch
> > for a bit?  
> 
> sorry for that. I deleted the branch and i manually setup the new one later.
> 
> I hope, everything is fine now?

Yes, all good.

-- 
Cheers,
Stephen Rothwell


pgpZ5h6V9RzV5.pgp
Description: OpenPGP digital signature


Re: linux-next: error when fetching the bcm2835 tree

2018-09-04 Thread Stephen Rothwell
Hi Stefan,

On Tue, 4 Sep 2018 22:07:15 +0200 (CEST) Stefan Wahren  
wrote:
>
> > Eric Anholt  hat am 4. September 2018 um 19:41 geschrieben:
> > 
> > Stephen Rothwell  writes:
> >   
> > > Fetching the bcm2835 tree (git://github.com/anholt/linux.git#for-next)
> > > produces this error:
> > >
> > > fatal: Couldn't find remote ref refs/heads/for-next  
> > 
> > I'd passed responsibility for the git tree off to Stefan while I'm
> > mostly-unavailable due to new baby.  Maybe Stefan had deleted the branch
> > for a bit?  
> 
> sorry for that. I deleted the branch and i manually setup the new one later.
> 
> I hope, everything is fine now?

Yes, all good.

-- 
Cheers,
Stephen Rothwell


pgpZ5h6V9RzV5.pgp
Description: OpenPGP digital signature


  1   2   3   4   5   6   7   8   9   10   >