Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode
[...] > > > > > > This patch causes a number of statistically significant > > > regressions > > > (with significance of 1%) on the two systems I've tested it > > > on. On > > > my > > > > Sure. These patches are targeted to Atom clients where some of > > these > > server like workload may have some minor regression on few watts > > TDP > > parts. > > Neither the 36% regression of fs-mark, the 21% regression of sqlite, > nor > the 10% regression of warsaw qualify as small. And most of the test > cases on the list of regressions aren't exclusively server-like, if > at > all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics > benchmarks -- Latency is as important if not more for interactive > workloads than it is for server workloads. In the case of a conflict > like the one we're dealing with right now between optimizing for > throughput (e.g. for the maximum number of requests per second) and > optimizing for latency (e.g. for the minimum request duration), you > are > more likely to be concerned about the former than about the latter in > a > server setup. Eero, Please add your test results here. No matter which algorithm you do, there will be variations. So you have to look at the platforms which you are targeting. For this platform number one item is use of less turbo and hope you know why? On this platform GFX patch caused this issue as it was submitted after io boost patchset. So rather that should be reverted first before optimizing anything. > > > But weighing against reduced TURBO usage (which is enough trigger) > > and > > improvement in tests done by Eero (which was primary complaint to > > us). > > > > It is trivial patch. Yes, the patch is not perfect and doesn't > > close > > doors for any improvements. > > > > It's sort of self-contained because it's awfully incomplete.Don't > agtr > > > I see your idea, but how to implement in acceptable way is a > > challenge. > > Main challenge was getting the code to work without regressions in > latency-sensitive workloads, which I did, because you told me that it > wasn't acceptable for it to cause any regressions on the Phoronix > daily-system-tracker, whether latency-bound or not. Yes, because your intention was to have a global change not a low power Atom specific, > What is left in > order to address Peter's concerns is for the most part plumbing, > that's > guaranteed not to have any functional impact on the heuristic. The > fact > that we don't expect it to change the performance of the system makes > it > substantially less time-consuming to validate than altering the > performance trade-offs of the heuristic dynamically in order to avoid > regressions (which is what has kept my systems busy most of the time > lately). Seems like my series, even in its current state without the > changes requested by Peter is closer to being ready for production > than > this patch is. Sorry, Not at all. We call such patches as experimental series. You caused 100% regression to idle power. There is no version 2 after that, even if you fixed locally even to look. Thanks, Srinivas
Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode
[...] > > > > > > This patch causes a number of statistically significant > > > regressions > > > (with significance of 1%) on the two systems I've tested it > > > on. On > > > my > > > > Sure. These patches are targeted to Atom clients where some of > > these > > server like workload may have some minor regression on few watts > > TDP > > parts. > > Neither the 36% regression of fs-mark, the 21% regression of sqlite, > nor > the 10% regression of warsaw qualify as small. And most of the test > cases on the list of regressions aren't exclusively server-like, if > at > all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics > benchmarks -- Latency is as important if not more for interactive > workloads than it is for server workloads. In the case of a conflict > like the one we're dealing with right now between optimizing for > throughput (e.g. for the maximum number of requests per second) and > optimizing for latency (e.g. for the minimum request duration), you > are > more likely to be concerned about the former than about the latter in > a > server setup. Eero, Please add your test results here. No matter which algorithm you do, there will be variations. So you have to look at the platforms which you are targeting. For this platform number one item is use of less turbo and hope you know why? On this platform GFX patch caused this issue as it was submitted after io boost patchset. So rather that should be reverted first before optimizing anything. > > > But weighing against reduced TURBO usage (which is enough trigger) > > and > > improvement in tests done by Eero (which was primary complaint to > > us). > > > > It is trivial patch. Yes, the patch is not perfect and doesn't > > close > > doors for any improvements. > > > > It's sort of self-contained because it's awfully incomplete.Don't > agtr > > > I see your idea, but how to implement in acceptable way is a > > challenge. > > Main challenge was getting the code to work without regressions in > latency-sensitive workloads, which I did, because you told me that it > wasn't acceptable for it to cause any regressions on the Phoronix > daily-system-tracker, whether latency-bound or not. Yes, because your intention was to have a global change not a low power Atom specific, > What is left in > order to address Peter's concerns is for the most part plumbing, > that's > guaranteed not to have any functional impact on the heuristic. The > fact > that we don't expect it to change the performance of the system makes > it > substantially less time-consuming to validate than altering the > performance trade-offs of the heuristic dynamically in order to avoid > regressions (which is what has kept my systems busy most of the time > lately). Seems like my series, even in its current state without the > changes requested by Peter is closer to being ready for production > than > this patch is. Sorry, Not at all. We call such patches as experimental series. You caused 100% regression to idle power. There is no version 2 after that, even if you fixed locally even to look. Thanks, Srinivas
Re: [GIT PULL] s390 patches for the 4.19 merge window #2
On Tue, 4 Sep 2018 17:16:31 -0700 Kees Cook wrote: > On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky > wrote: > > Harald Freudenberger (5): > > s390/zcrypt: hex string mask improvements for apmask and aqmask. > > This (and an earlier 2017 commit) adds VLAs, which are being > removed[1] from the kernel: > > drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable > length array ‘clrm’ [-Wvla] > drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable > length array ‘setm’ [-Wvla] > drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable > length array ‘setm’ [-Wvla] > > static int process_mask_arg(const char *str, > unsigned long *bitmap, int bits, > struct mutex *lock) > ... > DECLARE_BITMAP(clrm, bits); > DECLARE_BITMAP(setm, bits); > > Can someone please adjust this to make these fixed size again? Oops, sorry about that. We will provide a solution for this. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [GIT PULL] s390 patches for the 4.19 merge window #2
On Tue, 4 Sep 2018 17:16:31 -0700 Kees Cook wrote: > On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky > wrote: > > Harald Freudenberger (5): > > s390/zcrypt: hex string mask improvements for apmask and aqmask. > > This (and an earlier 2017 commit) adds VLAs, which are being > removed[1] from the kernel: > > drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable > length array ‘clrm’ [-Wvla] > drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable > length array ‘setm’ [-Wvla] > drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable > length array ‘setm’ [-Wvla] > > static int process_mask_arg(const char *str, > unsigned long *bitmap, int bits, > struct mutex *lock) > ... > DECLARE_BITMAP(clrm, bits); > DECLARE_BITMAP(setm, bits); > > Can someone please adjust this to make these fixed size again? Oops, sorry about that. We will provide a solution for this. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore
Hello, On Wed, Sep 05, 2018 at 09:24:26AM +0800, Anson Huang wrote: > In noirq suspend/resume stage with no_console_suspend enabled, > imx_uart_console_write() may be called to print out log_buf > message by printk(), so there will be race condition between > imx_uart_console_write() and imx_uart_save/restore_context(), > need to add lock to protect the registers save/restore operations. > > Signed-off-by: Anson Huang > --- > drivers/tty/serial/imx.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c > index 239c0fa..c15332d 100644 Acked-by: Uwe Kleine-König Thanks Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: [PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore
Hello, On Wed, Sep 05, 2018 at 09:24:26AM +0800, Anson Huang wrote: > In noirq suspend/resume stage with no_console_suspend enabled, > imx_uart_console_write() may be called to print out log_buf > message by printk(), so there will be race condition between > imx_uart_console_write() and imx_uart_save/restore_context(), > need to add lock to protect the registers save/restore operations. > > Signed-off-by: Anson Huang > --- > drivers/tty/serial/imx.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c > index 239c0fa..c15332d 100644 Acked-by: Uwe Kleine-König Thanks Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: [PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend
On Wed, Sep 05, 2018 at 09:24:27AM +0800, Anson Huang wrote: > On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power > state retention), UART iomux settings will be lost, need to add > pinctrl sleep/default mode switch during suspend/resume to make > sure UART iomux settings are correct after resume. > > Signed-off-by: Anson Huang Acked-by: Uwe Kleine-König Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: [PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend
On Wed, Sep 05, 2018 at 09:24:27AM +0800, Anson Huang wrote: > On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power > state retention), UART iomux settings will be lost, need to add > pinctrl sleep/default mode switch during suspend/resume to make > sure UART iomux settings are correct after resume. > > Signed-off-by: Anson Huang Acked-by: Uwe Kleine-König Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and need to be after the load of st->should_run, to prevent reordering of the later load/stores w.r.t. the load of st->should_run. Signed-off-by: Neeraj Upadhyay --- kernel/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index aa7fe85..eb4041f 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu) bool bringup = st->bringup; enum cpuhp_state state; + if (WARN_ON_ONCE(!st->should_run)) + return; + /* * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures * that if we see ->should_run we also see the rest of the state. */ smp_mb(); - if (WARN_ON_ONCE(!st->should_run)) - return; - cpuhp_lock_acquire(bringup); if (st->single) { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
[PATCH] cpu/hotplug: Fix acquire of st->should_run in cpuhp_thread_fun()
The smp_mb() in cpuhp_thread_fun() appears to be misplaced, and need to be after the load of st->should_run, to prevent reordering of the later load/stores w.r.t. the load of st->should_run. Signed-off-by: Neeraj Upadhyay --- kernel/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index aa7fe85..eb4041f 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -607,15 +607,15 @@ static void cpuhp_thread_fun(unsigned int cpu) bool bringup = st->bringup; enum cpuhp_state state; + if (WARN_ON_ONCE(!st->should_run)) + return; + /* * ACQUIRE for the cpuhp_should_run() load of ->should_run. Ensures * that if we see ->should_run we also see the rest of the state. */ smp_mb(); - if (WARN_ON_ONCE(!st->should_run)) - return; - cpuhp_lock_acquire(bringup); if (st->single) { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/05/18 14:36), Sergey Senozhatsky wrote: > > OK, thanks for the report! > So I think that your case is CON_PRINTBUFFER related as well. We have > a number of logbuf messages before we parse quiet and console_loglevel. > Those messages pass the suppress_message() test. Then when we finally > register a CON_PRINTBUFFER console we update console_loglevel and flush > all logbuf messages, but console_loglevel does not matter anymore. > Hmm, earlycon can also be affected, probably. But I'm not sure if people do "earlycon" and "quiet" at the same time. May be, who knows... -ss
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/05/18 14:36), Sergey Senozhatsky wrote: > > OK, thanks for the report! > So I think that your case is CON_PRINTBUFFER related as well. We have > a number of logbuf messages before we parse quiet and console_loglevel. > Those messages pass the suppress_message() test. Then when we finally > register a CON_PRINTBUFFER console we update console_loglevel and flush > all logbuf messages, but console_loglevel does not matter anymore. > Hmm, earlycon can also be affected, probably. But I'm not sure if people do "earlycon" and "quiet" at the same time. May be, who knows... -ss
[PATCH] x86/paravirt: fix compile warning
Fix a compile warning "SAVE_FLAGS redefined" introduced recently by tip.git commit: 6da63eb241a05b0e676d6 ("x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella"). Signed-off-by: Juergen Gross --- arch/x86/include/asm/paravirt.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 969343c72a9b..4bf42f9e4eea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -914,7 +914,6 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64), \ ANNOTATE_RETPOLINE_SAFE; \ jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);) -#endif #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(clobbers)\ @@ -924,6 +923,7 @@ extern void default_banner(void); call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);\ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) #endif +#endif #endif /* CONFIG_X86_32 */ -- 2.16.4
[PATCH] x86/paravirt: fix compile warning
Fix a compile warning "SAVE_FLAGS redefined" introduced recently by tip.git commit: 6da63eb241a05b0e676d6 ("x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella"). Signed-off-by: Juergen Gross --- arch/x86/include/asm/paravirt.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 969343c72a9b..4bf42f9e4eea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -914,7 +914,6 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64), \ ANNOTATE_RETPOLINE_SAFE; \ jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);) -#endif #ifdef CONFIG_DEBUG_ENTRY #define SAVE_FLAGS(clobbers)\ @@ -924,6 +923,7 @@ extern void default_banner(void); call PARA_INDIRECT(pv_ops+PV_IRQ_save_fl);\ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) #endif +#endif #endif /* CONFIG_X86_32 */ -- 2.16.4
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/05/18 06:53), Hans de Goede wrote: > > > > Do you use earlycon? > > No, I'm seeing this with the regular/normal console stuff. OK, thanks for the report! So I think that your case is CON_PRINTBUFFER related as well. We have a number of logbuf messages before we parse quiet and console_loglevel. Those messages pass the suppress_message() test. Then when we finally register a CON_PRINTBUFFER console we update console_loglevel and flush all logbuf messages, but console_loglevel does not matter anymore. > > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot > > > messages and doing the loglevel check for these while flushing as before. > > > > > > > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be > > the case that we sort of broke CON_PRINTBUFFER handling. > > > > console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH > > register CON_PRINTBUFFER console > > -> no debug output > > > > So I think that when console_unlock() re-flushes already seen logbuf > > messages to a newly registered exclusive [CON_PRINTBUFFER] console we > > probably need to look at the current console_loglevel in console_unlock() > > loop. > > So if it breaks quiet and the above use-case maybe we should revert > 375899cddcbb for now? Maybe all we need to do is to re-introduce suppress_message() check to console_unlock() when we have exclusive console set (IOW, when we re-flush logbuf messages to a CON_PRINTBUFFER console). Just a demonstration of the idea. It does not look very good, tho. I'd rather have just one suppress_message_printing() in printk code. // This is not a proposed patch, hence the 80-cols violation. --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index c036f128cdc3..231ac18423e1 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2416,7 +2416,7 @@ void console_unlock(void) break; msg = log_from_idx(console_idx); - if (msg->flags & LOG_NOCONS) { + if (msg->flags & LOG_NOCONS || (exclusive_console && suppress_message_printing(msg->level))) { /* * Skip record if !ignore_loglevel, and * record has level above the console loglevel. --- We are still early in 4.19 -rcs, let's wait a bit and hear from Steven and Petr, they have bright ideas oftentimes. -ss
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
On (09/05/18 06:53), Hans de Goede wrote: > > > > Do you use earlycon? > > No, I'm seeing this with the regular/normal console stuff. OK, thanks for the report! So I think that your case is CON_PRINTBUFFER related as well. We have a number of logbuf messages before we parse quiet and console_loglevel. Those messages pass the suppress_message() test. Then when we finally register a CON_PRINTBUFFER console we update console_loglevel and flush all logbuf messages, but console_loglevel does not matter anymore. > > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot > > > messages and doing the loglevel check for these while flushing as before. > > > > > > > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be > > the case that we sort of broke CON_PRINTBUFFER handling. > > > > console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH > > register CON_PRINTBUFFER console > > -> no debug output > > > > So I think that when console_unlock() re-flushes already seen logbuf > > messages to a newly registered exclusive [CON_PRINTBUFFER] console we > > probably need to look at the current console_loglevel in console_unlock() > > loop. > > So if it breaks quiet and the above use-case maybe we should revert > 375899cddcbb for now? Maybe all we need to do is to re-introduce suppress_message() check to console_unlock() when we have exclusive console set (IOW, when we re-flush logbuf messages to a CON_PRINTBUFFER console). Just a demonstration of the idea. It does not look very good, tho. I'd rather have just one suppress_message_printing() in printk code. // This is not a proposed patch, hence the 80-cols violation. --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index c036f128cdc3..231ac18423e1 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2416,7 +2416,7 @@ void console_unlock(void) break; msg = log_from_idx(console_idx); - if (msg->flags & LOG_NOCONS) { + if (msg->flags & LOG_NOCONS || (exclusive_console && suppress_message_printing(msg->level))) { /* * Skip record if !ignore_loglevel, and * record has level above the console loglevel. --- We are still early in 4.19 -rcs, let's wait a bit and hear from Steven and Petr, they have bright ideas oftentimes. -ss
Re: [PATCH] HID: core: fix NULL pointer dereference
On 29.08.2018 08:22, Gustavo A. R. Silva wrote: > There is a NULL pointer dereference in case memory resources > for *parse* are not successfully allocated. > > Fix this by adding a new goto label and make the execution > path jump to it in case vzalloc() fails. > > Addresses-Coverity-ID: 1473081 ("Dereference after null check") > Fixes: b2dd9f2e5a8a ("HID: core: fix memory leak on probe") > Signed-off-by: Gustavo A. R. Silva Reviewed-by: Stefan Agner -- Stefan > --- > drivers/hid/hid-core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c > index 4548dae..5bec924 100644 > --- a/drivers/hid/hid-core.c > +++ b/drivers/hid/hid-core.c > @@ -1000,7 +1000,7 @@ int hid_open_report(struct hid_device *device) > parser = vzalloc(sizeof(struct hid_parser)); > if (!parser) { > ret = -ENOMEM; > - goto err; > + goto alloc_err; > } > > parser->device = device; > @@ -1049,6 +1049,7 @@ int hid_open_report(struct hid_device *device) > hid_err(device, "item fetching failed at offset %d\n", (int)(end - > start)); > err: > kfree(parser->collection_stack); > +alloc_err: > vfree(parser); > hid_close_report(device); > return ret;
Re: [PATCH] HID: core: fix NULL pointer dereference
On 29.08.2018 08:22, Gustavo A. R. Silva wrote: > There is a NULL pointer dereference in case memory resources > for *parse* are not successfully allocated. > > Fix this by adding a new goto label and make the execution > path jump to it in case vzalloc() fails. > > Addresses-Coverity-ID: 1473081 ("Dereference after null check") > Fixes: b2dd9f2e5a8a ("HID: core: fix memory leak on probe") > Signed-off-by: Gustavo A. R. Silva Reviewed-by: Stefan Agner -- Stefan > --- > drivers/hid/hid-core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c > index 4548dae..5bec924 100644 > --- a/drivers/hid/hid-core.c > +++ b/drivers/hid/hid-core.c > @@ -1000,7 +1000,7 @@ int hid_open_report(struct hid_device *device) > parser = vzalloc(sizeof(struct hid_parser)); > if (!parser) { > ret = -ENOMEM; > - goto err; > + goto alloc_err; > } > > parser->device = device; > @@ -1049,6 +1049,7 @@ int hid_open_report(struct hid_device *device) > hid_err(device, "item fetching failed at offset %d\n", (int)(end - > start)); > err: > kfree(parser->collection_stack); > +alloc_err: > vfree(parser); > hid_close_report(device); > return ret;
Re: [PATCH 3/3] of: make default address and size cells sizes private
On 05/09/18 02:55, Frank Rowand wrote: > On 08/30/18 12:05, Rob Herring wrote: >> Only some old OpenFirmware implementations rely on default sizes. Any >> FDT and modern implementation should have explicit properties. Make the >> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside >> users. >> >> This also gets us one step closer to removing the asm/prom.h dependency on >> Sparc. Just for the record: you say "any FDT and modern implementation should have explicit properties", however the default values of these properties when missing are clearly defined in the IEEE-1275 specification (Annex A): "#address-cells" Standard property name to define the package’s address format. ... In a package with a "decode-unit" method, a missing "#address-cells" property signifies that the number of address cells is two. "#size-cells" Standard property name to define the package’s address size format. ... A missing "#size-cells" property signifies the default value of one. I can't speak for FDT but it isn't completely unreasonable for a guest parsing a DT without these properties to assume these default values. ATB, Mark.
Re: [PATCH 3/3] of: make default address and size cells sizes private
On 05/09/18 02:55, Frank Rowand wrote: > On 08/30/18 12:05, Rob Herring wrote: >> Only some old OpenFirmware implementations rely on default sizes. Any >> FDT and modern implementation should have explicit properties. Make the >> OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside >> users. >> >> This also gets us one step closer to removing the asm/prom.h dependency on >> Sparc. Just for the record: you say "any FDT and modern implementation should have explicit properties", however the default values of these properties when missing are clearly defined in the IEEE-1275 specification (Annex A): "#address-cells" Standard property name to define the package’s address format. ... In a package with a "decode-unit" method, a missing "#address-cells" property signifies that the number of address cells is two. "#size-cells" Standard property name to define the package’s address size format. ... A missing "#size-cells" property signifies the default value of one. I can't speak for FDT but it isn't completely unreasonable for a guest parsing a DT without these properties to assume these default values. ATB, Mark.
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
Hi, On 05-09-18 04:35, Sergey Senozhatsky wrote: Hi, On (09/04/18 20:01), Hans de Goede wrote: Commit 375899cddcbb ("printk: make sure to print log on console."), moved the checking of the loglevel of messages from flush time to the actual log time. This introduces one problem, some early boot messages are printed before parse_early_param() gets called and thus before kernel commandline options such as quiet, loglevel and ignore_loglevel are parsed. Do you use earlycon? No, I'm seeing this with the regular/normal console stuff. This causes e.g. the following messages to get printed on x86 systems, despite the presence of the "quiet" option: [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00057fff] usable ... [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable This commit fixes this by setting a new LOG_CHK_LEVEL on early boot messages and doing the loglevel check for these while flushing as before. Hmm, OK, chances are we need to re-think 375899cddcbb. It might be the case that we sort of broke CON_PRINTBUFFER handling. console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH register CON_PRINTBUFFER console -> no debug output So I think that when console_unlock() re-flushes already seen logbuf messages to a newly registered exclusive [CON_PRINTBUFFER] console we probably need to look at the current console_loglevel in console_unlock() loop. So if it breaks quiet and the above use-case maybe we should revert 375899cddcbb for now? Regards, Hans
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
Hi, On 05-09-18 04:35, Sergey Senozhatsky wrote: Hi, On (09/04/18 20:01), Hans de Goede wrote: Commit 375899cddcbb ("printk: make sure to print log on console."), moved the checking of the loglevel of messages from flush time to the actual log time. This introduces one problem, some early boot messages are printed before parse_early_param() gets called and thus before kernel commandline options such as quiet, loglevel and ignore_loglevel are parsed. Do you use earlycon? No, I'm seeing this with the regular/normal console stuff. This causes e.g. the following messages to get printed on x86 systems, despite the presence of the "quiet" option: [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00057fff] usable ... [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable This commit fixes this by setting a new LOG_CHK_LEVEL on early boot messages and doing the loglevel check for these while flushing as before. Hmm, OK, chances are we need to re-think 375899cddcbb. It might be the case that we sort of broke CON_PRINTBUFFER handling. console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH register CON_PRINTBUFFER console -> no debug output So I think that when console_unlock() re-flushes already seen logbuf messages to a newly registered exclusive [CON_PRINTBUFFER] console we probably need to look at the current console_loglevel in console_unlock() loop. So if it breaks quiet and the above use-case maybe we should revert 375899cddcbb for now? Regards, Hans
Re: [RFC PATCH 3/5] RISC-V: Select useful GENERIC_IRQ kconfig options
On Wed, Sep 5, 2018 at 12:26 AM, Christoph Hellwig wrote: > On Tue, Sep 04, 2018 at 06:15:12PM +0530, Anup Patel wrote: >> This patch selects following GENERIC_IRQ kconfig options: >> GENERIC_IRQ_MULTI_HANDLER > > This is already selected by arch/riscv/Kconfig. > >> GENERIC_IRQ_PROBE > > This is something only used by ISA drivers. Why would we want that > on RISC-V? Yes, thanks for pointing. GENERIC_IRQ_PROBE is not required at this time. I will drop this selection. May be will re-consider later. > >> GENERIC_IRQ_SHOW_LEVEL > > We don't really have any special level triggerd irq handling in > RISC-V. That being said this is trivial and I don't see why it > even is a Kconfig option. Please have a discussion with Thomas > and Marc on why we have this option instead of a default. Most of MMIO device interrupts are level-interrupts. In fact, all HW interrupt lines coming to PLIC will be level-interrupts. It's just that PLIC does not implement state machine to sample Level-IRQs and Edge-IRQs differently. Even the interrupt-controller virtualization in hypervisors deal with Level and Edge interrupts differently. I am sure we will see both Level and Edge triggered interrupts in RISC-V system. The MMIO device interrupts will be mostly Level triggered and PCI MSIs will be mapped as Edge triggered by MSI controller. We should definitely select GENERIC_IRQ_SHOW_LEVEL so that nature of IRQ interrupt line is evident in /proc/interrupts. > >> HANDLE_DOMAIN_IRQ > > We aren't using handle_domain_irq anywhere in RISC-V, no need to > build this. The new RISC-V local interrupt controller driver introduced by this patchset uses handle_domain_irq(). The main advantage of handle_domain_irq() is that it helps reduce few lines of code which is otherwise common across interrupt-controller drivers (mostly code related to irq_enter(), irq_exit(), and set_irq_regs()). Regards, Anup
Re: [RFC PATCH 3/5] RISC-V: Select useful GENERIC_IRQ kconfig options
On Wed, Sep 5, 2018 at 12:26 AM, Christoph Hellwig wrote: > On Tue, Sep 04, 2018 at 06:15:12PM +0530, Anup Patel wrote: >> This patch selects following GENERIC_IRQ kconfig options: >> GENERIC_IRQ_MULTI_HANDLER > > This is already selected by arch/riscv/Kconfig. > >> GENERIC_IRQ_PROBE > > This is something only used by ISA drivers. Why would we want that > on RISC-V? Yes, thanks for pointing. GENERIC_IRQ_PROBE is not required at this time. I will drop this selection. May be will re-consider later. > >> GENERIC_IRQ_SHOW_LEVEL > > We don't really have any special level triggerd irq handling in > RISC-V. That being said this is trivial and I don't see why it > even is a Kconfig option. Please have a discussion with Thomas > and Marc on why we have this option instead of a default. Most of MMIO device interrupts are level-interrupts. In fact, all HW interrupt lines coming to PLIC will be level-interrupts. It's just that PLIC does not implement state machine to sample Level-IRQs and Edge-IRQs differently. Even the interrupt-controller virtualization in hypervisors deal with Level and Edge interrupts differently. I am sure we will see both Level and Edge triggered interrupts in RISC-V system. The MMIO device interrupts will be mostly Level triggered and PCI MSIs will be mapped as Edge triggered by MSI controller. We should definitely select GENERIC_IRQ_SHOW_LEVEL so that nature of IRQ interrupt line is evident in /proc/interrupts. > >> HANDLE_DOMAIN_IRQ > > We aren't using handle_domain_irq anywhere in RISC-V, no need to > build this. The new RISC-V local interrupt controller driver introduced by this patchset uses handle_domain_irq(). The main advantage of handle_domain_irq() is that it helps reduce few lines of code which is otherwise common across interrupt-controller drivers (mostly code related to irq_enter(), irq_exit(), and set_irq_regs()). Regards, Anup
Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible
On Wed, Sep 5, 2018 at 12:20 AM, Christoph Hellwig wrote: > On Tue, Sep 04, 2018 at 06:15:10PM +0530, Anup Patel wrote: >> The mechanism to trigger IPI is generally part of interrupt-controller >> driver for various architectures. On RISC-V, we have an option to trigger >> IPI using SBI or SOC vendor can implement RISC-V CPU where IPI will be >> triggered using SOC interrupt-controller (e.g. custom PLIC). > > Which is exactly what we want to avoid, and should not make it easy. > > The last thing we need is non-standard whacky IPI mechanisms, and > that is why we habe SBI calls for it. I think we should simply > stat that if an RISC-V cpu design bypasse the SBI for no good reason > we'll simply not support it. It's outrageous to call IPI mechanisms using interrupt-controller as "wacky". Lot of architectures have well thought-out interrupt-controller designs with IPI support. In fact having IPIs through interrupt-controller drivers is much faster because SBI call will have it's own overhead and M-mode code with eventually write to some platform-specific/interrupt-controller register. The SBI call only makes sense for very simple interrupt-controller (such as PLIC) which do not provide IPI mechanism. It totally seems like SBI call for triggering IPIs was added as workaround to address limitations of current PLIC. RISC-V systems require a more mature and feature complete interrupt-controllers which supports IPIs, PCI MSI, and Virtualization Extensions. I am sure will see a much better interrupt controller (PLIC++ or something else) soon. > > So NAK for this patch. I think you jumped the gun to quickly here. This patch does two things: 1. Adds a pluggable IPI triggering mechanism 2. Make IPI handling mechanism more generic so that we can call IPI handler from interrupt-controller driver. Your primary objection seems to be for point1 above. I will drop that part only keep changes related to point2 above. Regards, Anup
Re: [RFC PATCH 1/5] RISC-V: Make IPI triggering flexible
On Wed, Sep 5, 2018 at 12:20 AM, Christoph Hellwig wrote: > On Tue, Sep 04, 2018 at 06:15:10PM +0530, Anup Patel wrote: >> The mechanism to trigger IPI is generally part of interrupt-controller >> driver for various architectures. On RISC-V, we have an option to trigger >> IPI using SBI or SOC vendor can implement RISC-V CPU where IPI will be >> triggered using SOC interrupt-controller (e.g. custom PLIC). > > Which is exactly what we want to avoid, and should not make it easy. > > The last thing we need is non-standard whacky IPI mechanisms, and > that is why we habe SBI calls for it. I think we should simply > stat that if an RISC-V cpu design bypasse the SBI for no good reason > we'll simply not support it. It's outrageous to call IPI mechanisms using interrupt-controller as "wacky". Lot of architectures have well thought-out interrupt-controller designs with IPI support. In fact having IPIs through interrupt-controller drivers is much faster because SBI call will have it's own overhead and M-mode code with eventually write to some platform-specific/interrupt-controller register. The SBI call only makes sense for very simple interrupt-controller (such as PLIC) which do not provide IPI mechanism. It totally seems like SBI call for triggering IPIs was added as workaround to address limitations of current PLIC. RISC-V systems require a more mature and feature complete interrupt-controllers which supports IPIs, PCI MSI, and Virtualization Extensions. I am sure will see a much better interrupt controller (PLIC++ or something else) soon. > > So NAK for this patch. I think you jumped the gun to quickly here. This patch does two things: 1. Adds a pluggable IPI triggering mechanism 2. Make IPI handling mechanism more generic so that we can call IPI handler from interrupt-controller driver. Your primary objection seems to be for point1 above. I will drop that part only keep changes related to point2 above. Regards, Anup
Re: [PATCH] of: Add device_type access helper functions
On 08/30/18 11:52, Rob Herring wrote: > In preparation to remove direct access to device_node.type, add > of_node_is_type() and of_node_get_device_type() helpers to check and > retrieve the device type. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > include/linux/of.h | 12 > 1 file changed, 12 insertions(+) > > diff --git a/include/linux/of.h b/include/linux/of.h > index a40f63a36afa..506beca9588d 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -980,6 +980,18 @@ static inline struct device_node *of_find_matching_node( > return of_find_matching_node_and_match(from, matches, NULL); > } > > +static inline const char *of_node_get_device_type(const struct device_node > *np) > +{ > + return of_get_property(np, "type", NULL); I'm a little confused. The patch comment says "remove direct access to device_node.type. device_node.type is set to the value of the property "device_type" in populate_node(), but of_node_get_device_type() is looking for property "type". > +}> + > +static inline bool of_node_is_type(const struct device_node *np, const char > *type) > +{ > + const char *match = of_node_get_device_type(np); > + > + return np && match && type && !of_prop_cmp(match, type); > +} > + > /** > * of_property_count_u8_elems - Count the number of u8 elements in a property > * > I'd like to see one or two examples of how you plan to use of_node_get_device_type() (outside of of_node_is_type()) and examples of planned use of of_node_is_type() so I can better understand anticipated usage. Complete, clean patches not needed for the examples, just enough text for me to see the resulting code change where the helpers are used. -Frank
Re: [PATCH] of: Add device_type access helper functions
On 08/30/18 11:52, Rob Herring wrote: > In preparation to remove direct access to device_node.type, add > of_node_is_type() and of_node_get_device_type() helpers to check and > retrieve the device type. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > include/linux/of.h | 12 > 1 file changed, 12 insertions(+) > > diff --git a/include/linux/of.h b/include/linux/of.h > index a40f63a36afa..506beca9588d 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -980,6 +980,18 @@ static inline struct device_node *of_find_matching_node( > return of_find_matching_node_and_match(from, matches, NULL); > } > > +static inline const char *of_node_get_device_type(const struct device_node > *np) > +{ > + return of_get_property(np, "type", NULL); I'm a little confused. The patch comment says "remove direct access to device_node.type. device_node.type is set to the value of the property "device_type" in populate_node(), but of_node_get_device_type() is looking for property "type". > +}> + > +static inline bool of_node_is_type(const struct device_node *np, const char > *type) > +{ > + const char *match = of_node_get_device_type(np); > + > + return np && match && type && !of_prop_cmp(match, type); > +} > + > /** > * of_property_count_u8_elems - Count the number of u8 elements in a property > * > I'd like to see one or two examples of how you plan to use of_node_get_device_type() (outside of of_node_is_type()) and examples of planned use of of_node_is_type() so I can better understand anticipated usage. Complete, clean patches not needed for the examples, just enough text for me to see the resulting code change where the helpers are used. -Frank
Re: VirtIO console hangs
On Fri, 31 Aug 2018 15:17:44 + Matteo Croce wrote: > On Fri, Aug 31, 2018 at 3:49 AM Nicholas Piggin wrote: > > > > On Tue, 28 Aug 2018 15:00:14 + > > Matteo Croce wrote: > > > > > On Tue, Aug 28, 2018 at 2:35 PM Nicholas Piggin > > > wrote: > > > > > > > > On Tue, 28 Aug 2018 12:54:08 + > > > > Matteo Croce wrote: > > > > > > > > > With kernel 4.19.0-rc1 virtio_console hangs very often. > > > > > I can always trigger the bug by pasting some characters in the > > > > > terminal window, the console will stop receiving keypresses, but I can > > > > > still see output from the console. > > > > > Stangely, logging in the VM via SSH and sending lot of data to hvc0, > > > > > like 'dmesg >/dev/hvc0' will fix the issue until the next paste. > > > > > > > > > > I did a git bisect and I've found that this is the offending commit, > > > > > reverting it fixes it. > > > > > > > > > > Cheers, > > > > > > > > > > commit ec97eaad1383ab2500fcf9a07ade6044fbcc67f5 > > > > > Author: Nicholas Piggin > > > > > Date: Tue May 1 00:55:54 2018 +1000 > > > > > > > > > > tty: hvc: hvc_poll() break hv read loop > > > > > > > > Thanks for the report. I can't immediately see what the problem > > > > is. Can you try get a stack trace of where it is stuck? > > > > > > > > > > I tried but didn't get one. > > > > > > > Perhaps try this patch if you have time (it's a bit of a shot > > > > in the dark). > > > > > > > > > > Yes it seems to fix. Thanks! > > > > Okay sorry for the delay, I can reproduce it here and found a better > > fix, if I could trouble you to test again. > > > > [PATCH] tty: hvc: hvc_poll() fix read loop hang > > > > Hi Nicholas, > > the patch works, but now pasting text inside the terminal is extremely > slow, it feels worse than a 9600 baud serial line. > > Btw, I had to apply the patch by hand as it was corrupted, some lines > were collapsed into one. Not sure why that happened, I seem to be able to send myself uncorrupted patches... Can you try this patch for performance? Hopefully it applies. -- diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 5414c4a87bea..f5fc3ba49130 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -49,6 +49,8 @@ #define N_OUTBUF 16 #define N_INBUF16 +#define HVC_ATOMIC_READ_MAX128 + #define __ALIGNED__ __attribute__((__aligned__(sizeof(long static struct tty_driver *hvc_driver; @@ -522,6 +524,8 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count return -EIO; while (count > 0) { + int ret; + spin_lock_irqsave(>lock, flags); rsize = hp->outbuf_size - hp->n_outbuf; @@ -537,10 +541,13 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count } if (hp->n_outbuf > 0) - hvc_push(hp); + ret = hvc_push(hp); spin_unlock_irqrestore(>lock, flags); + if (!ret) + break; + if (count) { if (hp->n_outbuf > 0) hvc_flush(hp); @@ -669,8 +676,8 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) if (!hp->irq_requested) poll_mask |= HVC_POLL_READ; + read_again: /* Read data if any */ - count = tty_buffer_request_room(>port, N_INBUF); /* If flip is full, just reschedule a later read */ @@ -717,9 +724,23 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) #endif /* CONFIG_MAGIC_SYSRQ */ tty_insert_flip_char(>port, buf[i], 0); } - if (n == count) - poll_mask |= HVC_POLL_READ; - read_total = n; + read_total += n; + + if (may_sleep) { + /* Keep going until the flip is full */ + spin_unlock_irqrestore(>lock, flags); + cond_resched(); + spin_lock_irqsave(>lock, flags); + goto read_again; + } else if (read_total < HVC_ATOMIC_READ_MAX) { + /* Break and defer if it's a large read in atomic */ + goto read_again; + } + + /* +* Latency break, schedule another poll immediately. +*/ + poll_mask |= HVC_POLL_READ; out: /* Wakeup write queue if necessary */
Re: VirtIO console hangs
On Fri, 31 Aug 2018 15:17:44 + Matteo Croce wrote: > On Fri, Aug 31, 2018 at 3:49 AM Nicholas Piggin wrote: > > > > On Tue, 28 Aug 2018 15:00:14 + > > Matteo Croce wrote: > > > > > On Tue, Aug 28, 2018 at 2:35 PM Nicholas Piggin > > > wrote: > > > > > > > > On Tue, 28 Aug 2018 12:54:08 + > > > > Matteo Croce wrote: > > > > > > > > > With kernel 4.19.0-rc1 virtio_console hangs very often. > > > > > I can always trigger the bug by pasting some characters in the > > > > > terminal window, the console will stop receiving keypresses, but I can > > > > > still see output from the console. > > > > > Stangely, logging in the VM via SSH and sending lot of data to hvc0, > > > > > like 'dmesg >/dev/hvc0' will fix the issue until the next paste. > > > > > > > > > > I did a git bisect and I've found that this is the offending commit, > > > > > reverting it fixes it. > > > > > > > > > > Cheers, > > > > > > > > > > commit ec97eaad1383ab2500fcf9a07ade6044fbcc67f5 > > > > > Author: Nicholas Piggin > > > > > Date: Tue May 1 00:55:54 2018 +1000 > > > > > > > > > > tty: hvc: hvc_poll() break hv read loop > > > > > > > > Thanks for the report. I can't immediately see what the problem > > > > is. Can you try get a stack trace of where it is stuck? > > > > > > > > > > I tried but didn't get one. > > > > > > > Perhaps try this patch if you have time (it's a bit of a shot > > > > in the dark). > > > > > > > > > > Yes it seems to fix. Thanks! > > > > Okay sorry for the delay, I can reproduce it here and found a better > > fix, if I could trouble you to test again. > > > > [PATCH] tty: hvc: hvc_poll() fix read loop hang > > > > Hi Nicholas, > > the patch works, but now pasting text inside the terminal is extremely > slow, it feels worse than a 9600 baud serial line. > > Btw, I had to apply the patch by hand as it was corrupted, some lines > were collapsed into one. Not sure why that happened, I seem to be able to send myself uncorrupted patches... Can you try this patch for performance? Hopefully it applies. -- diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 5414c4a87bea..f5fc3ba49130 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -49,6 +49,8 @@ #define N_OUTBUF 16 #define N_INBUF16 +#define HVC_ATOMIC_READ_MAX128 + #define __ALIGNED__ __attribute__((__aligned__(sizeof(long static struct tty_driver *hvc_driver; @@ -522,6 +524,8 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count return -EIO; while (count > 0) { + int ret; + spin_lock_irqsave(>lock, flags); rsize = hp->outbuf_size - hp->n_outbuf; @@ -537,10 +541,13 @@ static int hvc_write(struct tty_struct *tty, const unsigned char *buf, int count } if (hp->n_outbuf > 0) - hvc_push(hp); + ret = hvc_push(hp); spin_unlock_irqrestore(>lock, flags); + if (!ret) + break; + if (count) { if (hp->n_outbuf > 0) hvc_flush(hp); @@ -669,8 +676,8 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) if (!hp->irq_requested) poll_mask |= HVC_POLL_READ; + read_again: /* Read data if any */ - count = tty_buffer_request_room(>port, N_INBUF); /* If flip is full, just reschedule a later read */ @@ -717,9 +724,23 @@ static int __hvc_poll(struct hvc_struct *hp, bool may_sleep) #endif /* CONFIG_MAGIC_SYSRQ */ tty_insert_flip_char(>port, buf[i], 0); } - if (n == count) - poll_mask |= HVC_POLL_READ; - read_total = n; + read_total += n; + + if (may_sleep) { + /* Keep going until the flip is full */ + spin_unlock_irqrestore(>lock, flags); + cond_resched(); + spin_lock_irqsave(>lock, flags); + goto read_again; + } else if (read_total < HVC_ATOMIC_READ_MAX) { + /* Break and defer if it's a large read in atomic */ + goto read_again; + } + + /* +* Latency break, schedule another poll immediately. +*/ + poll_mask |= HVC_POLL_READ; out: /* Wakeup write queue if necessary */
linux-next: Tree for Sep 5
Hi all, Changes since 20180904: Dropped trees: xarray, ida (temporarily) The net-next tree still had its build failure for which I reverted a commit. The devicetree tree gained a build failure for which I applied a patch. Non-merge commits (relative to Linus' tree): 1853 2252 files changed, 64880 insertions(+), 38423 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 286 trees (counting Linus' and 66 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (28619527b8a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging fixes/master (217c3e019675 disable stringop truncation warnings for now) Merging kbuild-current/fixes (914b087ff9e0 kbuild: make missing $DEPMOD a Warning instead of an Error) Merging arc-current/for-curr (dd45210b6dd4 ARC: don't check for HIGHMEM pages in arch_dma_alloc) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (f52bb98f5ade arm64: mm: always enable CONFIG_HOLES_IN_ZONE) Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response format) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (a33710bdb6b2 net: phy: sfp: Handle unimplemented hwmon limits and alarms) Merging bpf/master (597222f72a94 bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP) Merging ipsec/master (215ab0f021c9 xfrm6: call kfree_skb when skb is toobig) Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in flushing set) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (53ae914d898e net/rds: Use rdma_read_gids to get connection SGID/DGID in IPv6) Merging mac80211/master (fc3e3bf55f48 Merge tag 'mac80211-for-davem-2018-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211) Merging rdma-fixes/for-rc (5b394b2ddf03 Linux 4.19-rc1) Merging sound-current/for-linus (f7c50fa636f7 ALSA: hda: Fix several mismatch for register mask and value) Merging sound-asoc-fixes/for-linus (9c9b8f2dc626 Merge branch 'asoc-4.19' into asoc-linus) Merging regmap-fixes/for-linus (5b394b2ddf03 Linux 4.19-rc1) Merging regulator-fixes/for-linus (25be6316a30a Merge branch 'regulator-4.19' into regulator-linus) Merging spi-fixes/for-linus (051e00694c64 Merge branch 'spi-4.19' into spi-linus) Merging pci-current/for-linus (5b394b2ddf03 Linux 4.19-rc1) Merging driver-core.current/driver-core-linus (57361846b52b Linux 4.19-rc2) Merging tty.current/tty-linus (57361846b52b Linux 4.19-rc2) Merging usb.current/usb-linus (57361846b52b Linux 4.19-rc2) Merging usb-gadget-fixes/fixes (5b394b2ddf03 Linux 4.19-rc1) Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: fix array underflow in completion handler) Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: Always build ULPI code) Merging phy/fixes (5b394b2ddf03 Linux 4.19-rc1) Merging sta
linux-next: Tree for Sep 5
Hi all, Changes since 20180904: Dropped trees: xarray, ida (temporarily) The net-next tree still had its build failure for which I reverted a commit. The devicetree tree gained a build failure for which I applied a patch. Non-merge commits (relative to Linus' tree): 1853 2252 files changed, 64880 insertions(+), 38423 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 286 trees (counting Linus' and 66 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (28619527b8a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging fixes/master (217c3e019675 disable stringop truncation warnings for now) Merging kbuild-current/fixes (914b087ff9e0 kbuild: make missing $DEPMOD a Warning instead of an Error) Merging arc-current/for-curr (dd45210b6dd4 ARC: don't check for HIGHMEM pages in arch_dma_alloc) Merging arm-current/fixes (afc9f65e01cd ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+) Merging arm64-fixes/for-next/fixes (f52bb98f5ade arm64: mm: always enable CONFIG_HOLES_IN_ZONE) Merging m68k-current/for-linus (0986b16ab49b m68k/mac: Use correct PMU response format) Merging powerpc-fixes/fixes (cca19f0b684f powerpc/64s/radix: Fix missing global invalidations when removing copro) Merging sparc/master (df2def49c57b Merge tag 'acpi-4.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (a33710bdb6b2 net: phy: sfp: Handle unimplemented hwmon limits and alarms) Merging bpf/master (597222f72a94 bpf: avoid misuse of psock when TCP_ULP_BPF collides with another ULP) Merging ipsec/master (215ab0f021c9 xfrm6: call kfree_skb when skb is toobig) Merging netfilter/master (7acfda539c0b netfilter: nf_tables: release chain in flushing set) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (53ae914d898e net/rds: Use rdma_read_gids to get connection SGID/DGID in IPv6) Merging mac80211/master (fc3e3bf55f48 Merge tag 'mac80211-for-davem-2018-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211) Merging rdma-fixes/for-rc (5b394b2ddf03 Linux 4.19-rc1) Merging sound-current/for-linus (f7c50fa636f7 ALSA: hda: Fix several mismatch for register mask and value) Merging sound-asoc-fixes/for-linus (9c9b8f2dc626 Merge branch 'asoc-4.19' into asoc-linus) Merging regmap-fixes/for-linus (5b394b2ddf03 Linux 4.19-rc1) Merging regulator-fixes/for-linus (25be6316a30a Merge branch 'regulator-4.19' into regulator-linus) Merging spi-fixes/for-linus (051e00694c64 Merge branch 'spi-4.19' into spi-linus) Merging pci-current/for-linus (5b394b2ddf03 Linux 4.19-rc1) Merging driver-core.current/driver-core-linus (57361846b52b Linux 4.19-rc2) Merging tty.current/tty-linus (57361846b52b Linux 4.19-rc2) Merging usb.current/usb-linus (57361846b52b Linux 4.19-rc2) Merging usb-gadget-fixes/fixes (5b394b2ddf03 Linux 4.19-rc1) Merging usb-serial-fixes/usb-linus (5dfdd24eb3d3 USB: serial: ti_usb_3410_5052: fix array underflow in completion handler) Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: Always build ULPI code) Merging phy/fixes (5b394b2ddf03 Linux 4.19-rc1) Merging sta
[GIT PULL] nds32 updates for v4.19
Hi Linus, The following changes since commit 57361846b52bc686112da6ca5368d11210796804: Linux 4.19-rc2 (2018-09-02 14:37:30 -0700) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/greentime/linux.git tags/nds32-for-linus-4.19-tag1 for you to fetch changes up to 3350139c0ff3c95724b784f7109987d533cb3ecd: nds32: linker script: GCOV kernel may refers data in __exit (2018-09-05 10:16:26 +0800) nds32 patches for 4.19 Here is the nds32 patch set based on 4.19-rc2. Contained in here are the bug fixes, building error fixes and ftrace support for nds32. These are the LTP20170427 testing results. Total Tests: 1902 Total Skipped Tests: 592 Total Failures: 420 Kernel Version: 4.19.0-rc2-00018-g2c9d30cc16f0-dirty Machine Architecture: nds32 Signed-off-by: Greentime Hu Greentime Hu (5): nds32: fix logic for module nds32: Only print one page of stack when die to prevent printing too much information. nds32: Fix a kernel panic issue because of wrong frame pointer access. nds32: fix build error because of wrong semicolon nds32: linker script: GCOV kernel may refers data in __exit YueHaibing (1): nds32: add NULL entry to the end of_device_id array Zong Li (12): nds32: Fix empty call trace nds32: Fix get_user/put_user macro expand pointer problem nds32: Clean up the coding style nds32: Extract the checking and getting pointer to a macro nds32/ftrace: Support static function tracer nds32/ftrace: Support static function graph tracer nds32/ftrace: Add RECORD_MCOUNT support nds32/ftrace: Support dynamic function tracer nds32/ftrace: Support dynamic function graph tracer nds32/stack: Get real return address by using ftrace_graph_ret_addr nds32: Remove the deprecated ABI implementation nds32: Add macro definition for offset of lp register on stack arch/nds32/Kconfig | 4 ++ arch/nds32/Makefile | 4 ++ arch/nds32/include/asm/elf.h | 4 +- arch/nds32/include/asm/ftrace.h | 46 ++ arch/nds32/include/asm/nds32.h | 1 + arch/nds32/include/asm/uaccess.h | 229 - arch/nds32/kernel/Makefile | 6 +++ arch/nds32/kernel/atl2c.c| 3 +- arch/nds32/kernel/ex-entry.S | 2 +- arch/nds32/kernel/ex-exit.S | 4 +- arch/nds32/kernel/ftrace.c | 309 arch/nds32/kernel/module.c | 4 +- arch/nds32/kernel/stacktrace.c | 6 ++- arch/nds32/kernel/traps.c| 42 +--- arch/nds32/kernel/vmlinux.lds.S | 12 ++ scripts/recordmcount.pl | 3 ++ 16 files changed, 527 insertions(+), 152 deletions(-) create mode 100644 arch/nds32/include/asm/ftrace.h create mode 100644 arch/nds32/kernel/ftrace.c
[GIT PULL] nds32 updates for v4.19
Hi Linus, The following changes since commit 57361846b52bc686112da6ca5368d11210796804: Linux 4.19-rc2 (2018-09-02 14:37:30 -0700) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/greentime/linux.git tags/nds32-for-linus-4.19-tag1 for you to fetch changes up to 3350139c0ff3c95724b784f7109987d533cb3ecd: nds32: linker script: GCOV kernel may refers data in __exit (2018-09-05 10:16:26 +0800) nds32 patches for 4.19 Here is the nds32 patch set based on 4.19-rc2. Contained in here are the bug fixes, building error fixes and ftrace support for nds32. These are the LTP20170427 testing results. Total Tests: 1902 Total Skipped Tests: 592 Total Failures: 420 Kernel Version: 4.19.0-rc2-00018-g2c9d30cc16f0-dirty Machine Architecture: nds32 Signed-off-by: Greentime Hu Greentime Hu (5): nds32: fix logic for module nds32: Only print one page of stack when die to prevent printing too much information. nds32: Fix a kernel panic issue because of wrong frame pointer access. nds32: fix build error because of wrong semicolon nds32: linker script: GCOV kernel may refers data in __exit YueHaibing (1): nds32: add NULL entry to the end of_device_id array Zong Li (12): nds32: Fix empty call trace nds32: Fix get_user/put_user macro expand pointer problem nds32: Clean up the coding style nds32: Extract the checking and getting pointer to a macro nds32/ftrace: Support static function tracer nds32/ftrace: Support static function graph tracer nds32/ftrace: Add RECORD_MCOUNT support nds32/ftrace: Support dynamic function tracer nds32/ftrace: Support dynamic function graph tracer nds32/stack: Get real return address by using ftrace_graph_ret_addr nds32: Remove the deprecated ABI implementation nds32: Add macro definition for offset of lp register on stack arch/nds32/Kconfig | 4 ++ arch/nds32/Makefile | 4 ++ arch/nds32/include/asm/elf.h | 4 +- arch/nds32/include/asm/ftrace.h | 46 ++ arch/nds32/include/asm/nds32.h | 1 + arch/nds32/include/asm/uaccess.h | 229 - arch/nds32/kernel/Makefile | 6 +++ arch/nds32/kernel/atl2c.c| 3 +- arch/nds32/kernel/ex-entry.S | 2 +- arch/nds32/kernel/ex-exit.S | 4 +- arch/nds32/kernel/ftrace.c | 309 arch/nds32/kernel/module.c | 4 +- arch/nds32/kernel/stacktrace.c | 6 ++- arch/nds32/kernel/traps.c| 42 +--- arch/nds32/kernel/vmlinux.lds.S | 12 ++ scripts/recordmcount.pl | 3 ++ 16 files changed, 527 insertions(+), 152 deletions(-) create mode 100644 arch/nds32/include/asm/ftrace.h create mode 100644 arch/nds32/kernel/ftrace.c
Re: [PATCH] of: add node name compare helper functions
On 08/30/18 11:52, Rob Herring wrote: > In preparation to remove device_node.name pointer, add helper functions > for node name comparisons which are a common pattern throughout the kernel. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/base.c | 22 ++ > include/linux/of.h | 13 + > 2 files changed, 35 insertions(+) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 466e3c8582f0..185bfe077d0a 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -54,6 +54,28 @@ DEFINE_MUTEX(of_mutex); > */ > DEFINE_RAW_SPINLOCK(devtree_lock); > > +bool of_node_name_eq(const struct device_node *np, const char *name) > +{ > + const char *node_name; > + size_t len; > + > + if (!np) > + return false; > + > + node_name = kbasename(np->full_name); > + len = strchrnul(node_name, '@') - node_name; > + > + return (strlen(name) == len) && (strncmp(node_name, name, len) == 0); > +} > + > +bool of_node_name_prefix(const struct device_node *np, const char *prefix) of_node_name_prefix_eq() would be more clear and consistent. of_node_name_prefix() sounds like it would return the prefix of *np. > +{ > + if (!np) > + return false; > + > + return strncmp(kbasename(np->full_name), prefix, strlen(prefix)) == 0; > +} > + > int of_n_addr_cells(struct device_node *np) > { > u32 cells; > diff --git a/include/linux/of.h b/include/linux/of.h > index 4d25e4f952d9..a40f63a36afa 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -256,6 +256,9 @@ static inline unsigned long of_read_ulong(const __be32 > *cell, int size) > #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags) > #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags) > > +extern bool of_node_name_eq(const struct device_node *np, const char *name); > +extern bool of_node_name_prefix(const struct device_node *np, const char > *prefix); > + > static inline const char *of_node_full_name(const struct device_node *np) > { > return np ? np->full_name : ""; > @@ -561,6 +564,16 @@ static inline struct device_node *to_of_node(const > struct fwnode_handle *fwnode) > return NULL; > } > > +static inline bool of_node_name_eq(const struct device_node *np, const char > *name) > +{ > + return false; > +} > + > +static inline bool of_node_name_prefix(const struct device_node *np, const > char *prefix) > +{ > + return false; > +} > + > static inline const char* of_node_full_name(const struct device_node *np) > { > return ""; >
Re: [PATCH] of: add node name compare helper functions
On 08/30/18 11:52, Rob Herring wrote: > In preparation to remove device_node.name pointer, add helper functions > for node name comparisons which are a common pattern throughout the kernel. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/base.c | 22 ++ > include/linux/of.h | 13 + > 2 files changed, 35 insertions(+) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 466e3c8582f0..185bfe077d0a 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -54,6 +54,28 @@ DEFINE_MUTEX(of_mutex); > */ > DEFINE_RAW_SPINLOCK(devtree_lock); > > +bool of_node_name_eq(const struct device_node *np, const char *name) > +{ > + const char *node_name; > + size_t len; > + > + if (!np) > + return false; > + > + node_name = kbasename(np->full_name); > + len = strchrnul(node_name, '@') - node_name; > + > + return (strlen(name) == len) && (strncmp(node_name, name, len) == 0); > +} > + > +bool of_node_name_prefix(const struct device_node *np, const char *prefix) of_node_name_prefix_eq() would be more clear and consistent. of_node_name_prefix() sounds like it would return the prefix of *np. > +{ > + if (!np) > + return false; > + > + return strncmp(kbasename(np->full_name), prefix, strlen(prefix)) == 0; > +} > + > int of_n_addr_cells(struct device_node *np) > { > u32 cells; > diff --git a/include/linux/of.h b/include/linux/of.h > index 4d25e4f952d9..a40f63a36afa 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -256,6 +256,9 @@ static inline unsigned long of_read_ulong(const __be32 > *cell, int size) > #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags) > #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags) > > +extern bool of_node_name_eq(const struct device_node *np, const char *name); > +extern bool of_node_name_prefix(const struct device_node *np, const char > *prefix); > + > static inline const char *of_node_full_name(const struct device_node *np) > { > return np ? np->full_name : ""; > @@ -561,6 +564,16 @@ static inline struct device_node *to_of_node(const > struct fwnode_handle *fwnode) > return NULL; > } > > +static inline bool of_node_name_eq(const struct device_node *np, const char > *name) > +{ > + return false; > +} > + > +static inline bool of_node_name_prefix(const struct device_node *np, const > char *prefix) > +{ > + return false; > +} > + > static inline const char* of_node_full_name(const struct device_node *np) > { > return ""; >
[PATCH] selftests/efivarfs: add required kernel configs
add config file Signed-off-by: Lei Yang --- tools/testing/selftests/efivarfs/config | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/efivarfs/config diff --git a/tools/testing/selftests/efivarfs/config b/tools/testing/selftests/efivarfs/config new file mode 100644 index 000..4e151f1 --- /dev/null +++ b/tools/testing/selftests/efivarfs/config @@ -0,0 +1 @@ +CONFIG_EFIVAR_FS=y -- 1.9.1
[PATCH] selftests/efivarfs: add required kernel configs
add config file Signed-off-by: Lei Yang --- tools/testing/selftests/efivarfs/config | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/efivarfs/config diff --git a/tools/testing/selftests/efivarfs/config b/tools/testing/selftests/efivarfs/config new file mode 100644 index 000..4e151f1 --- /dev/null +++ b/tools/testing/selftests/efivarfs/config @@ -0,0 +1 @@ +CONFIG_EFIVAR_FS=y -- 1.9.1
Re: [PATCH] sysctl: kselftests: add test_core to .gitignore
sorry, the title should be :"cgroup: kselftests: add test_core to .gitignore" Lei On 2018年09月05日 10:39, Lei Yang wrote: Update .gitignore files. Signed-off-by: Lei Yang --- tools/testing/selftests/cgroup/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index 95eb3a5..adacda5 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -1 +1,2 @@ test_memcontrol +test_core
Re: [PATCH] sysctl: kselftests: add test_core to .gitignore
sorry, the title should be :"cgroup: kselftests: add test_core to .gitignore" Lei On 2018年09月05日 10:39, Lei Yang wrote: Update .gitignore files. Signed-off-by: Lei Yang --- tools/testing/selftests/cgroup/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index 95eb3a5..adacda5 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -1 +1,2 @@ test_memcontrol +test_core
Re: [PATCH v8 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller
Hi Jacek, On 5 September 2018 at 04:19, Jacek Anaszewski wrote: > Hi Baolin, > > On 09/04/2018 01:01 PM, Baolin Wang wrote: >> This patch implements the 'pattern_set'and 'pattern_clear' >> interfaces to support SC27XX LED breathing mode. >> >> Signed-off-by: Baolin Wang >> --- >> Changes from v7: >> - Add its own ABI documentation file. >> >> Changes from v6: >> - None. >> >> Changes from v5: >> - None. >> >> Changes from v4: >> - None. >> >> Changes from v3: >> - None. >> >> Changes from v2: >> - None. >> >> Changes from v1: >> - Remove pattern_get interface. >> --- >> .../ABI/testing/sysfs-class-led-driver-sc27xx | 11 +++ >> drivers/leds/leds-sc27xx-bltc.c| 94 >> >> 2 files changed, 105 insertions(+) >> create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> >> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> new file mode 100644 >> index 000..ecef3ba >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> @@ -0,0 +1,11 @@ >> +What:/sys/class/leds//hw_pattern >> +Date:September 2018 >> +KernelVersion: 4.20 >> +Description: >> + Specify a hardware pattern for the SC27XX LED. For the SC27XX >> + LED controller, it only supports 4 hardware patterns to >> configure > > If I understand it correctly then the four components: low, rise, high, > and fall time make a single pattern. So calling it "4 hardware patterns" > can be misleading. Below you're using "stage" notion - it would be good > to use it consequently on the whole span of this document. Sure. I will modify the documentation to avoid misleading words. > >> + the low time, rise time, high time and fall time for the >> breathing >> + mode, and each stage duration unit is 125ms. So the format of >> + the hardware pattern values should be: > > I'd be more precise here: > > Min stage duration: ??? > Max stage duration: ??? > Stage duration step: 125 ms OK. > >> + "brightness_1 duration_1 brightness_2 duration_2 brightness_3 >> + duration_3 brightness_4 duration_4". > > Looking at sc27xx_led_pattern_set() it doesn't seem like you would > use brightnesses other then brightness_1. I assume that the low > brightness is fixed to 0 and the high brightness is the brightness_1. > If yes, than we don't need brightnesses in this pattern definition, > since the current brightness will suffice. > > You'd need to alter the hw_pattern description to say that the > brightness currently set is to be used as a high brightness, and the > hw_pattern for this driver consists only of the four delta_t components. Sorry for confusing here. For SC27XX LED, the 4 stages use one same brightness. To be compatible with the hardware pattern format, so we force to set one brightness for each stage. I will add some comments to explain the LED expects the same brightness for each stage instead of using the current brightness file which is not used for our breathing mode. > > This is clear example of what I had on mind when having doubts > about using tuples for pattern_set. > > Alternatively, if we want to enforce tuples format for hw_pattern, > and if we want to be consistent - we'd need to require defining target > brightness for each stage properly, i.e. > > echo "100 500 100 500 0 500 0 500" > pattern > > Which would mean: > > 1. Rise from brightness 0 to 100 for 500 ms. > 2. Keep brightness 100 for 500 ms. > 3. Fall from brightness 100 to 0 for 500 ms. > 4. Keep brightness 0 for 500 ms. > > -- > Best regards, > Jacek Anaszewski -- Baolin Wang Best Regards
Re: [PATCH v8 2/2] leds: sc27xx: Add pattern_set/clear interfaces for LED controller
Hi Jacek, On 5 September 2018 at 04:19, Jacek Anaszewski wrote: > Hi Baolin, > > On 09/04/2018 01:01 PM, Baolin Wang wrote: >> This patch implements the 'pattern_set'and 'pattern_clear' >> interfaces to support SC27XX LED breathing mode. >> >> Signed-off-by: Baolin Wang >> --- >> Changes from v7: >> - Add its own ABI documentation file. >> >> Changes from v6: >> - None. >> >> Changes from v5: >> - None. >> >> Changes from v4: >> - None. >> >> Changes from v3: >> - None. >> >> Changes from v2: >> - None. >> >> Changes from v1: >> - Remove pattern_get interface. >> --- >> .../ABI/testing/sysfs-class-led-driver-sc27xx | 11 +++ >> drivers/leds/leds-sc27xx-bltc.c| 94 >> >> 2 files changed, 105 insertions(+) >> create mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> >> diff --git a/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> new file mode 100644 >> index 000..ecef3ba >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-class-led-driver-sc27xx >> @@ -0,0 +1,11 @@ >> +What:/sys/class/leds//hw_pattern >> +Date:September 2018 >> +KernelVersion: 4.20 >> +Description: >> + Specify a hardware pattern for the SC27XX LED. For the SC27XX >> + LED controller, it only supports 4 hardware patterns to >> configure > > If I understand it correctly then the four components: low, rise, high, > and fall time make a single pattern. So calling it "4 hardware patterns" > can be misleading. Below you're using "stage" notion - it would be good > to use it consequently on the whole span of this document. Sure. I will modify the documentation to avoid misleading words. > >> + the low time, rise time, high time and fall time for the >> breathing >> + mode, and each stage duration unit is 125ms. So the format of >> + the hardware pattern values should be: > > I'd be more precise here: > > Min stage duration: ??? > Max stage duration: ??? > Stage duration step: 125 ms OK. > >> + "brightness_1 duration_1 brightness_2 duration_2 brightness_3 >> + duration_3 brightness_4 duration_4". > > Looking at sc27xx_led_pattern_set() it doesn't seem like you would > use brightnesses other then brightness_1. I assume that the low > brightness is fixed to 0 and the high brightness is the brightness_1. > If yes, than we don't need brightnesses in this pattern definition, > since the current brightness will suffice. > > You'd need to alter the hw_pattern description to say that the > brightness currently set is to be used as a high brightness, and the > hw_pattern for this driver consists only of the four delta_t components. Sorry for confusing here. For SC27XX LED, the 4 stages use one same brightness. To be compatible with the hardware pattern format, so we force to set one brightness for each stage. I will add some comments to explain the LED expects the same brightness for each stage instead of using the current brightness file which is not used for our breathing mode. > > This is clear example of what I had on mind when having doubts > about using tuples for pattern_set. > > Alternatively, if we want to enforce tuples format for hw_pattern, > and if we want to be consistent - we'd need to require defining target > brightness for each stage properly, i.e. > > echo "100 500 100 500 0 500 0 500" > pattern > > Which would mean: > > 1. Rise from brightness 0 to 100 for 500 ms. > 2. Keep brightness 100 for 500 ms. > 3. Fall from brightness 100 to 0 for 500 ms. > 4. Keep brightness 0 for 500 ms. > > -- > Best regards, > Jacek Anaszewski -- Baolin Wang Best Regards
Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote: > Sorry, I've been working in security too long for my > optimistic streak to be very wide. Eheh. So I was simply trying to follow in context, but it wasn't entirely clear, so I tried to take it out of context and then it was even less clear, never mind I was only looking for some more explanation. > Not especially, no. I have gotten considerable feedback that > it should be configurable, and while I may not see the value myself, > I do respect the people who've requested the configurability. Correct me if I'm wrong, but LSM as last word "module" implies to make this logic modular. My wondering is because: 1) why not just a single version of this logic in the scheduler? (i.e. can only be on or off or perhaps a only-ibpb-dumpable tweak to retain current slightly higher perf but less secure behavior) 2) even if you want a myriad of versions of this IBPB logic, how do the different versions can possibly fit in a whole "module" whose semantics have pretty much nothing to do with the other methods in the module like LSM_HOOK_INIT(capable, cap_capable), LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory), and LSM_HOOK_INIT(mmap_addr, cap_mmap_addr), or even LSM_HOOK_INIT(inode_follow_link, selinux_inode_follow_link), LSM_HOOK_INIT(inode_permission, selinux_inode_permission). I mean it looks an apple to oranges monolithic link in the same module. Or are you going to load this method in a separate module with only a single method implemented and then load multiple LSM modules at the same time? 3) if you need so much tweaking that not even a single off/on switch independent of any module loaded through LSM is not enough, how is it ok that all the rest shouldn't be configurable? All the rest has more performance impact than this one so it'd start from there if something. I understand how configurablity is valuable (this is why we kept everything dynamic tunable at runtime, including the PTI stop machine to allow even PTI TLB flushes to be turned off). I'm just trying to understand how IBPB fits in a LSM module implementation. Even if you build with CONFIG_SECURITY=n PTI won't go away, retpoline won't go away, the l1flush in vmenter won't go away, the pte/transhugepmd inversion won't go away, why only the runtime tunability or tweaking of IBPB fits in a LSM module? > If they provide different answers there should be different > functions. It's a question of viewpoint. If you don't care about > the SELinux viewpoint you shouldn't have to include it in your > checks, any more than you care about x86 issues when you're > running on a MIPS processor. I don't see how selinux fits into this. Selinux doesn't affect virtual memory protection. Not having selinux even built in doesn't mean you are ok with virtual memory protection not being provided by the CPU. Or in other words selinux fits into this as much as seccomp or apparmor fits into this. This is just like a memory barrier mb() to be executed in the context switch code. Security issues can emerge with the lack of it regardless if selinux is built in and enabled or CONFIG_SECURITY=n. selinux matters after an exploit already happened, this as opposed is needed to prevent the exploit in the first place. Also the correct fully secure version provides a single answer (i.e. in theory assuming a perfect implementation that didn't forget anything so having a single implementation will also increase the chances that we get as close as possible to the theoretical correct implementation). If it provides different answers in this case it is because somebody wants not perfect security to provide higher performance, i.e. the "configurablity" which is valuable and fine feature to provide. Just a LSM module doesn't seem the most flexibile way to provide configurability by far. > Yes, even security people have to worry about locking. Yes it was some lock that if contended would lockup if used from inside the scheduler. > What *is* the most robust way? The below pretty much explains it. > Yes, there are locking issues. The code in the LSM infrastructure and in > the security modules needs to be aware of that and deal with it. The SELinux > code I proposed is more complex than it could be because the audit code > requires locking that doesn't work in the switching context. Or in other words having multiple versions of this called from within the scheduler is a bit like making the kernel modular and then because the locking is subtle you may then have scheduler deadlocks only happening with some kernel module loaded instead of others, but the real question is what is the payoff compared to just allowing the scheduler code to be tuned with x86 debugfs or sysctl the normal way. Also how would a new common code method in LSM fit for CPUs that are so slow that they can't possibly need anything like IBPB and flush_RSB()? Thanks! Andrea
Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote: > Sorry, I've been working in security too long for my > optimistic streak to be very wide. Eheh. So I was simply trying to follow in context, but it wasn't entirely clear, so I tried to take it out of context and then it was even less clear, never mind I was only looking for some more explanation. > Not especially, no. I have gotten considerable feedback that > it should be configurable, and while I may not see the value myself, > I do respect the people who've requested the configurability. Correct me if I'm wrong, but LSM as last word "module" implies to make this logic modular. My wondering is because: 1) why not just a single version of this logic in the scheduler? (i.e. can only be on or off or perhaps a only-ibpb-dumpable tweak to retain current slightly higher perf but less secure behavior) 2) even if you want a myriad of versions of this IBPB logic, how do the different versions can possibly fit in a whole "module" whose semantics have pretty much nothing to do with the other methods in the module like LSM_HOOK_INIT(capable, cap_capable), LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory), and LSM_HOOK_INIT(mmap_addr, cap_mmap_addr), or even LSM_HOOK_INIT(inode_follow_link, selinux_inode_follow_link), LSM_HOOK_INIT(inode_permission, selinux_inode_permission). I mean it looks an apple to oranges monolithic link in the same module. Or are you going to load this method in a separate module with only a single method implemented and then load multiple LSM modules at the same time? 3) if you need so much tweaking that not even a single off/on switch independent of any module loaded through LSM is not enough, how is it ok that all the rest shouldn't be configurable? All the rest has more performance impact than this one so it'd start from there if something. I understand how configurablity is valuable (this is why we kept everything dynamic tunable at runtime, including the PTI stop machine to allow even PTI TLB flushes to be turned off). I'm just trying to understand how IBPB fits in a LSM module implementation. Even if you build with CONFIG_SECURITY=n PTI won't go away, retpoline won't go away, the l1flush in vmenter won't go away, the pte/transhugepmd inversion won't go away, why only the runtime tunability or tweaking of IBPB fits in a LSM module? > If they provide different answers there should be different > functions. It's a question of viewpoint. If you don't care about > the SELinux viewpoint you shouldn't have to include it in your > checks, any more than you care about x86 issues when you're > running on a MIPS processor. I don't see how selinux fits into this. Selinux doesn't affect virtual memory protection. Not having selinux even built in doesn't mean you are ok with virtual memory protection not being provided by the CPU. Or in other words selinux fits into this as much as seccomp or apparmor fits into this. This is just like a memory barrier mb() to be executed in the context switch code. Security issues can emerge with the lack of it regardless if selinux is built in and enabled or CONFIG_SECURITY=n. selinux matters after an exploit already happened, this as opposed is needed to prevent the exploit in the first place. Also the correct fully secure version provides a single answer (i.e. in theory assuming a perfect implementation that didn't forget anything so having a single implementation will also increase the chances that we get as close as possible to the theoretical correct implementation). If it provides different answers in this case it is because somebody wants not perfect security to provide higher performance, i.e. the "configurablity" which is valuable and fine feature to provide. Just a LSM module doesn't seem the most flexibile way to provide configurability by far. > Yes, even security people have to worry about locking. Yes it was some lock that if contended would lockup if used from inside the scheduler. > What *is* the most robust way? The below pretty much explains it. > Yes, there are locking issues. The code in the LSM infrastructure and in > the security modules needs to be aware of that and deal with it. The SELinux > code I proposed is more complex than it could be because the audit code > requires locking that doesn't work in the switching context. Or in other words having multiple versions of this called from within the scheduler is a bit like making the kernel modular and then because the locking is subtle you may then have scheduler deadlocks only happening with some kernel module loaded instead of others, but the real question is what is the payoff compared to just allowing the scheduler code to be tuned with x86 debugfs or sysctl the normal way. Also how would a new common code method in LSM fit for CPUs that are so slow that they can't possibly need anything like IBPB and flush_RSB()? Thanks! Andrea
[PATCH] sysctl: kselftests: add test_core to .gitignore
Update .gitignore files. Signed-off-by: Lei Yang --- tools/testing/selftests/cgroup/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index 95eb3a5..adacda5 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -1 +1,2 @@ test_memcontrol +test_core -- 1.9.1
[PATCH] sysctl: kselftests: add test_core to .gitignore
Update .gitignore files. Signed-off-by: Lei Yang --- tools/testing/selftests/cgroup/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index 95eb3a5..adacda5 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -1 +1,2 @@ test_memcontrol +test_core -- 1.9.1
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
Hi, On (09/04/18 20:01), Hans de Goede wrote: > Commit 375899cddcbb ("printk: make sure to print log on console."), moved > the checking of the loglevel of messages from flush time to the actual > log time. > > This introduces one problem, some early boot messages are printed before > parse_early_param() gets called and thus before kernel commandline options > such as quiet, loglevel and ignore_loglevel are parsed. Do you use earlycon? > This causes e.g. the following messages to get printed on x86 systems, > despite the presence of the "quiet" option: > > [0.00] BIOS-provided physical RAM map: > [0.00] BIOS-e820: [mem 0x-0x00057fff] usable > ... > [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot > messages and doing the loglevel check for these while flushing as before. > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be the case that we sort of broke CON_PRINTBUFFER handling. console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH register CON_PRINTBUFFER console -> no debug output So I think that when console_unlock() re-flushes already seen logbuf messages to a newly registered exclusive [CON_PRINTBUFFER] console we probably need to look at the current console_loglevel in console_unlock() loop. -ss
Re: [PATCH 4.19 regression fix] printk: For early boot messages check loglevel when flushing the buffer
Hi, On (09/04/18 20:01), Hans de Goede wrote: > Commit 375899cddcbb ("printk: make sure to print log on console."), moved > the checking of the loglevel of messages from flush time to the actual > log time. > > This introduces one problem, some early boot messages are printed before > parse_early_param() gets called and thus before kernel commandline options > such as quiet, loglevel and ignore_loglevel are parsed. Do you use earlycon? > This causes e.g. the following messages to get printed on x86 systems, > despite the presence of the "quiet" option: > > [0.00] BIOS-provided physical RAM map: > [0.00] BIOS-e820: [mem 0x-0x00057fff] usable > ... > [0.00] BIOS-e820: [mem 0x0001-0x000874ff] usable > > This commit fixes this by setting a new LOG_CHK_LEVEL on early boot > messages and doing the loglevel check for these while flushing as before. > Hmm, OK, chances are we need to re-think 375899cddcbb. It might be the case that we sort of broke CON_PRINTBUFFER handling. console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH register CON_PRINTBUFFER console -> no debug output So I think that when console_unlock() re-flushes already seen logbuf messages to a newly registered exclusive [CON_PRINTBUFFER] console we probably need to look at the current console_loglevel in console_unlock() loop. -ss
[PATCH] leds: leds-gpio: Add a condition check for active low leds.
Some leds on our board are active low leds, which means these leds are lighted when the corresponding gpio line is low, while the original leds-gpio driver default all leds are active high leds. This patch adds a devicetree node "light-state", whose value should be "high" for active high leds and "low" for active low leds. The default value is "high" for compatible for the original driver. Signed-off-by: Song Qiang --- .../devicetree/bindings/leds/leds-gpio.txt| 15 +++ drivers/leds/leds-gpio.c | 25 +-- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/leds/leds-gpio.txt b/Documentation/devicetree/bindings/leds/leds-gpio.txt index a48dda268f81..0a8fad75c704 100644 --- a/Documentation/devicetree/bindings/leds/leds-gpio.txt +++ b/Documentation/devicetree/bindings/leds/leds-gpio.txt @@ -23,6 +23,9 @@ LED sub-node properties: remains up. - panic-indicator : (optional) see Documentation/devicetree/bindings/leds/common.txt +- light-state: (optional) Values should be "high" or "low", which indicates + the state of the GPIO pin when the led is on. + see Documentation/devicetree/bindings/leds/common.txt Examples: @@ -64,3 +67,15 @@ leds { retain-state-suspended; }; }; + +leds { + compatible = "gpio-leds"; + + led0 { + label = "led0"; + gpios = < 19 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "heartbeat"; + default-state = "off"; + light-state = "low"; + }; +} diff --git a/drivers/leds/leds-gpio.c b/drivers/leds/leds-gpio.c index 764c31301f90..463ded0c71ac 100644 --- a/drivers/leds/leds-gpio.c +++ b/drivers/leds/leds-gpio.c @@ -26,9 +26,13 @@ struct gpio_led_data { struct gpio_desc *gpiod; u8 can_sleep; u8 blinking; + u8 light_state; gpio_blink_set_t platform_gpio_blink_set; }; +#define LEDS_GPIO_LIGHTSTATE_HIGH 0 +#define LEDS_GPIO_LIGHTSTATE_LOW 1 + static inline struct gpio_led_data * cdev_to_gpio_led_data(struct led_classdev *led_cdev) { @@ -42,9 +46,15 @@ static void gpio_led_set(struct led_classdev *led_cdev, int level; if (value == LED_OFF) - level = 0; + if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH) + level = 0; + else + level = 1; else - level = 1; + if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH) + level = 1; + else + level = 0; if (led_dat->blinking) { led_dat->platform_gpio_blink_set(led_dat->gpiod, level, @@ -205,6 +215,17 @@ static struct gpio_leds_priv *gpio_leds_create(struct platform_device *pdev) led.default_state = LEDS_GPIO_DEFSTATE_OFF; } + if (!fwnode_property_read_string(child, "light-state", +)) { + if (!strcmp(state, "low")) + led_dat->light_state = + LEDS_GPIO_LIGHTSTATE_LOW; + else{ + led_dat->light_state = + LEDS_GPIO_LIGHTSTATE_HIGH; + } + } + if (fwnode_property_present(child, "retain-state-suspended")) led.retain_state_suspended = 1; if (fwnode_property_present(child, "retain-state-shutdown")) -- 2.17.1
[PATCH] leds: leds-gpio: Add a condition check for active low leds.
Some leds on our board are active low leds, which means these leds are lighted when the corresponding gpio line is low, while the original leds-gpio driver default all leds are active high leds. This patch adds a devicetree node "light-state", whose value should be "high" for active high leds and "low" for active low leds. The default value is "high" for compatible for the original driver. Signed-off-by: Song Qiang --- .../devicetree/bindings/leds/leds-gpio.txt| 15 +++ drivers/leds/leds-gpio.c | 25 +-- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/leds/leds-gpio.txt b/Documentation/devicetree/bindings/leds/leds-gpio.txt index a48dda268f81..0a8fad75c704 100644 --- a/Documentation/devicetree/bindings/leds/leds-gpio.txt +++ b/Documentation/devicetree/bindings/leds/leds-gpio.txt @@ -23,6 +23,9 @@ LED sub-node properties: remains up. - panic-indicator : (optional) see Documentation/devicetree/bindings/leds/common.txt +- light-state: (optional) Values should be "high" or "low", which indicates + the state of the GPIO pin when the led is on. + see Documentation/devicetree/bindings/leds/common.txt Examples: @@ -64,3 +67,15 @@ leds { retain-state-suspended; }; }; + +leds { + compatible = "gpio-leds"; + + led0 { + label = "led0"; + gpios = < 19 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "heartbeat"; + default-state = "off"; + light-state = "low"; + }; +} diff --git a/drivers/leds/leds-gpio.c b/drivers/leds/leds-gpio.c index 764c31301f90..463ded0c71ac 100644 --- a/drivers/leds/leds-gpio.c +++ b/drivers/leds/leds-gpio.c @@ -26,9 +26,13 @@ struct gpio_led_data { struct gpio_desc *gpiod; u8 can_sleep; u8 blinking; + u8 light_state; gpio_blink_set_t platform_gpio_blink_set; }; +#define LEDS_GPIO_LIGHTSTATE_HIGH 0 +#define LEDS_GPIO_LIGHTSTATE_LOW 1 + static inline struct gpio_led_data * cdev_to_gpio_led_data(struct led_classdev *led_cdev) { @@ -42,9 +46,15 @@ static void gpio_led_set(struct led_classdev *led_cdev, int level; if (value == LED_OFF) - level = 0; + if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH) + level = 0; + else + level = 1; else - level = 1; + if (led_dat->light_state == LEDS_GPIO_LIGHTSTATE_HIGH) + level = 1; + else + level = 0; if (led_dat->blinking) { led_dat->platform_gpio_blink_set(led_dat->gpiod, level, @@ -205,6 +215,17 @@ static struct gpio_leds_priv *gpio_leds_create(struct platform_device *pdev) led.default_state = LEDS_GPIO_DEFSTATE_OFF; } + if (!fwnode_property_read_string(child, "light-state", +)) { + if (!strcmp(state, "low")) + led_dat->light_state = + LEDS_GPIO_LIGHTSTATE_LOW; + else{ + led_dat->light_state = + LEDS_GPIO_LIGHTSTATE_HIGH; + } + } + if (fwnode_property_present(child, "retain-state-suspended")) led.retain_state_suspended = 1; if (fwnode_property_present(child, "retain-state-shutdown")) -- 2.17.1
[PATCH 0/2] add new UniPhier PCIe host driver
This series adds PCIe host controller driver for Socionext UniPhier SoCs. This controller is based on the DesignWare PCIe core. This driver supports LD20 and PXs3 SoCs. Kunihiko Hayashi (2): dt-bindings: pci: add UniPhier PCIe host controller description pci: dwc: add UniPhier PCIe host controller support .../devicetree/bindings/pci/uniphier-pcie.txt | 78 drivers/pci/controller/dwc/Kconfig | 9 + drivers/pci/controller/dwc/Makefile| 1 + drivers/pci/controller/dwc/pcie-uniphier.c | 464 + 4 files changed, 552 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c -- 2.7.4
[PATCH 2/2] pci: dwc: add UniPhier PCIe host controller support
This introduces specific glue layer for UniPhier platform to support PCIe host controller that is based on the Designware PCIe Core, and this driver supports Root Complex (host) mode. Signed-off-by: Kunihiko Hayashi --- drivers/pci/controller/dwc/Kconfig | 9 + drivers/pci/controller/dwc/Makefile| 1 + drivers/pci/controller/dwc/pcie-uniphier.c | 464 + 3 files changed, 474 insertions(+) create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig index 91b0194..d8fdb02 100644 --- a/drivers/pci/controller/dwc/Kconfig +++ b/drivers/pci/controller/dwc/Kconfig @@ -193,4 +193,13 @@ config PCIE_HISI_STB help Say Y here if you want PCIe controller support on HiSilicon STB SoCs +config PCIE_UNIPHIER + bool "Socionext UniPhier PCIe controllers" + depends on OF && (ARCH_UNIPHIER || COMPILE_TEST) + depends on PCI_MSI_IRQ_DOMAIN + select PCIE_DW_HOST + help + Say Y here if you want PCIe controller support on UniPhier SoCs. + This driver supports LD20 and PXs3 SoCs. + endmenu diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile index 5d2ce72..cbde733 100644 --- a/drivers/pci/controller/dwc/Makefile +++ b/drivers/pci/controller/dwc/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o +obj-$(CONFIG_PCIE_UNIPHIER) += pcie-uniphier.o # The following drivers are for devices that use the generic ACPI # pci_root.c driver but don't support standard ECAM config access. diff --git a/drivers/pci/controller/dwc/pcie-uniphier.c b/drivers/pci/controller/dwc/pcie-uniphier.c new file mode 100644 index 000..13ce02e --- /dev/null +++ b/drivers/pci/controller/dwc/pcie-uniphier.c @@ -0,0 +1,464 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// PCI-express host controller driver for UniPhier SoCs +// Copyright 2018 Socionext Inc. +// Author: Kunihiko Hayashi + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pcie-designware.h" + +#define PCL_PINCTRL0 0x002c +#define PCL_PERST_PLDN_REGEN BIT(12) +#define PCL_PERST_NOE_REGENBIT(11) +#define PCL_PERST_OUT_REGENBIT(8) +#define PCL_PERST_PLDN_REGVAL BIT(4) +#define PCL_PERST_NOE_REGVAL BIT(3) +#define PCL_PERST_OUT_REGVAL BIT(0) + +#define PCL_PIPEMON0x0044 +#define PCL_PCLK_ALIVE BIT(15) + +#define PCL_APP_READY_CTRL 0x8008 +#define PCL_APP_LTSSM_ENABLE BIT(0) + +#define PCL_APP_PM00x8078 +#define PCL_SYS_AUX_PWR_DETBIT(8) + +#define PCL_RCV_INT0x8108 +#define PCL_CFG_BW_MGT_ENABLE BIT(20) +#define PCL_CFG_LINK_AUTO_BW_ENABLEBIT(19) +#define PCL_CFG_AER_RC_ERR_MSI_ENABLE BIT(18) +#define PCL_CFG_PME_MSI_ENABLE BIT(17) +#define PCL_CFG_BW_MGT_STATUS BIT(4) +#define PCL_CFG_LINK_AUTO_BW_STATUSBIT(3) +#define PCL_CFG_AER_RC_ERR_MSI_STATUS BIT(2) +#define PCL_CFG_PME_MSI_STATUS BIT(1) +#define PCL_RCV_INT_ALL_ENABLE \ + (PCL_CFG_BW_MGT_ENABLE | PCL_CFG_LINK_AUTO_BW_ENABLE \ +| PCL_CFG_AER_RC_ERR_MSI_ENABLE | PCL_CFG_PME_MSI_ENABLE) + +#define PCL_RCV_INTX 0x810c +#define PCL_RADM_INTD_ENABLE BIT(19) +#define PCL_RADM_INTC_ENABLE BIT(18) +#define PCL_RADM_INTB_ENABLE BIT(17) +#define PCL_RADM_INTA_ENABLE BIT(16) +#define PCL_RADM_INTD_STATUS BIT(3) +#define PCL_RADM_INTC_STATUS BIT(2) +#define PCL_RADM_INTB_STATUS BIT(1) +#define PCL_RADM_INTA_STATUS BIT(0) +#define PCL_RCV_INTX_ALL_ENABLE\ + (PCL_RADM_INTD_ENABLE | PCL_RADM_INTC_ENABLE \ +| PCL_RADM_INTB_ENABLE | PCL_RADM_INTA_ENABLE) + +#define PCL_STATUS_LINK0x8140 +#define PCL_RDLH_LINK_UP BIT(1) +#define PCL_XMLH_LINK_UP BIT(0) + +struct uniphier_pcie_priv { + void __iomem *base; + struct dw_pcie pci; + struct clk *clk; + struct reset_control *rst; + struct phy *phy; + struct irq_domain *irq_domain; +}; + +#define to_uniphier_pcie(x)dev_get_drvdata((x)->dev) + +static void uniphier_pcie_ltssm_enable(struct uniphier_pcie_priv *priv) +{ + u32 val; + + val = readl(priv->base + PCL_APP_READY_CTRL); + val |= PCL_APP_LTSSM_ENABLE; + writel(val, priv->base + PCL_APP_READY_CTRL); +} + +static void uniphier_pcie_ltssm_disable(struct uniphier_pcie_priv *priv) +{ + u32 val; + + val = readl(priv->base + PCL_APP_READY_CTRL); + val &=
[PATCH 1/2] dt-bindings: pci: add UniPhier PCIe host controller description
Add DT bindings for PCIe controller implemented in UniPhier SoCs when configured in Root Complex (host) mode. This controller is based on the Designware PCIe Core. Signed-off-by: Kunihiko Hayashi --- .../devicetree/bindings/pci/uniphier-pcie.txt | 78 ++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt diff --git a/Documentation/devicetree/bindings/pci/uniphier-pcie.txt b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt new file mode 100644 index 000..ea63f78 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt @@ -0,0 +1,78 @@ +Socionext UniPhier PCI-express host controller bindings + +This describes the devicetree bindings for PCI-express host controller +implemented on Socionext UniPhier SoCs. + +UniPhier PCI-express host controller is based on the Synopsys DesignWare +PCI core. It shares common functions with the PCIe DesignWare core driver +and inherits common properties defined in +Documentation/devicetree/bindings/pci/designware-pcie.txt. + +Required properties: +- compatible: Should be "socionext,uniphier-pcie". +- reg: Specifies offset and length of the register set for the device. + According to the reg-names, appropriate register sets are required. +- reg-names: Must include the following entries: +"dbi"- controller configuration registers +"link" - SoC-specific glue layer registers +"config" - PCIe configuration space +- clocks: A phandle to the clock gate for PCIe glue layer including + the host controller. +- resets: A phandle to the reset line for PCIe glue layer including + the host controller. +- interrupts: A list of interrupt specifiers. According to the + interrupt-names, appropriate interrupts are required. +- interrupt-names: Must include the following entries: +"dma"- DMA interrupt +"msi"- MSI interrupt +"intx" - Legacy INTA/B/C/D interrupt + +Optional properties: +- phys: A phandle to generic PCIe PHY. According to the phy-names, appropriate + phys are required. +- phy-names: Must be "pcie-phy". + +Required sub-node: +- interrupt-controller: Specifies interrupt controller for legacy PCI + interrupts. The node name isn't important. + +Required properties for interrupt-controller: +- interrupt-controller: identifies the node as an interrupt controller. +- #interrupt-cells: specifies the number of cells needed to encode an + interrupt source. The value must be 1. + +Example: + + pcie: pcie@6600 { + compatible = "socionext,uniphier-pcie", "snps,dw-pcie"; + status = "disabled"; + reg-names = "dbi", "link", "config"; + reg = <0x6600 0x1000>, <0x6601 0x1>, + <0x2fff 0x1>; + #address-cells = <3>; + #size-cells = <2>; + clocks = <_clk 24>; + resets = <_rst 24>; + num-lanes = <1>; + num-viewport = <1>; + bus-range = <0x0 0xff>; + device_type = "pci"; + ranges = + /* downstream I/O */ + <0x8100 0 0x 0x2ffe 0 0x0001 + /* non-prefetchable memory */ +0x8200 0 0x 0x2000 0 0x0ffe>; + #interrupt-cells = <1>; + interrupt-names = "dma", "msi", "intx"; + interrupts = <0 224 4>, <0 225 4>, <0 226 4>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 _intc 0>, /* INTA */ + <0 0 0 2 _intc 1>, /* INTB */ + <0 0 0 3 _intc 2>, /* INTC */ + <0 0 0 4 _intc 3>; /* INTD */ + + pcie_intc: interrupt-controller { + interrupt-controller; + #interrupt-cells = <1>; + }; + }; -- 2.7.4
[PATCH 0/2] add new UniPhier PCIe host driver
This series adds PCIe host controller driver for Socionext UniPhier SoCs. This controller is based on the DesignWare PCIe core. This driver supports LD20 and PXs3 SoCs. Kunihiko Hayashi (2): dt-bindings: pci: add UniPhier PCIe host controller description pci: dwc: add UniPhier PCIe host controller support .../devicetree/bindings/pci/uniphier-pcie.txt | 78 drivers/pci/controller/dwc/Kconfig | 9 + drivers/pci/controller/dwc/Makefile| 1 + drivers/pci/controller/dwc/pcie-uniphier.c | 464 + 4 files changed, 552 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c -- 2.7.4
[PATCH 2/2] pci: dwc: add UniPhier PCIe host controller support
This introduces specific glue layer for UniPhier platform to support PCIe host controller that is based on the Designware PCIe Core, and this driver supports Root Complex (host) mode. Signed-off-by: Kunihiko Hayashi --- drivers/pci/controller/dwc/Kconfig | 9 + drivers/pci/controller/dwc/Makefile| 1 + drivers/pci/controller/dwc/pcie-uniphier.c | 464 + 3 files changed, 474 insertions(+) create mode 100644 drivers/pci/controller/dwc/pcie-uniphier.c diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig index 91b0194..d8fdb02 100644 --- a/drivers/pci/controller/dwc/Kconfig +++ b/drivers/pci/controller/dwc/Kconfig @@ -193,4 +193,13 @@ config PCIE_HISI_STB help Say Y here if you want PCIe controller support on HiSilicon STB SoCs +config PCIE_UNIPHIER + bool "Socionext UniPhier PCIe controllers" + depends on OF && (ARCH_UNIPHIER || COMPILE_TEST) + depends on PCI_MSI_IRQ_DOMAIN + select PCIE_DW_HOST + help + Say Y here if you want PCIe controller support on UniPhier SoCs. + This driver supports LD20 and PXs3 SoCs. + endmenu diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile index 5d2ce72..cbde733 100644 --- a/drivers/pci/controller/dwc/Makefile +++ b/drivers/pci/controller/dwc/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o +obj-$(CONFIG_PCIE_UNIPHIER) += pcie-uniphier.o # The following drivers are for devices that use the generic ACPI # pci_root.c driver but don't support standard ECAM config access. diff --git a/drivers/pci/controller/dwc/pcie-uniphier.c b/drivers/pci/controller/dwc/pcie-uniphier.c new file mode 100644 index 000..13ce02e --- /dev/null +++ b/drivers/pci/controller/dwc/pcie-uniphier.c @@ -0,0 +1,464 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// PCI-express host controller driver for UniPhier SoCs +// Copyright 2018 Socionext Inc. +// Author: Kunihiko Hayashi + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pcie-designware.h" + +#define PCL_PINCTRL0 0x002c +#define PCL_PERST_PLDN_REGEN BIT(12) +#define PCL_PERST_NOE_REGENBIT(11) +#define PCL_PERST_OUT_REGENBIT(8) +#define PCL_PERST_PLDN_REGVAL BIT(4) +#define PCL_PERST_NOE_REGVAL BIT(3) +#define PCL_PERST_OUT_REGVAL BIT(0) + +#define PCL_PIPEMON0x0044 +#define PCL_PCLK_ALIVE BIT(15) + +#define PCL_APP_READY_CTRL 0x8008 +#define PCL_APP_LTSSM_ENABLE BIT(0) + +#define PCL_APP_PM00x8078 +#define PCL_SYS_AUX_PWR_DETBIT(8) + +#define PCL_RCV_INT0x8108 +#define PCL_CFG_BW_MGT_ENABLE BIT(20) +#define PCL_CFG_LINK_AUTO_BW_ENABLEBIT(19) +#define PCL_CFG_AER_RC_ERR_MSI_ENABLE BIT(18) +#define PCL_CFG_PME_MSI_ENABLE BIT(17) +#define PCL_CFG_BW_MGT_STATUS BIT(4) +#define PCL_CFG_LINK_AUTO_BW_STATUSBIT(3) +#define PCL_CFG_AER_RC_ERR_MSI_STATUS BIT(2) +#define PCL_CFG_PME_MSI_STATUS BIT(1) +#define PCL_RCV_INT_ALL_ENABLE \ + (PCL_CFG_BW_MGT_ENABLE | PCL_CFG_LINK_AUTO_BW_ENABLE \ +| PCL_CFG_AER_RC_ERR_MSI_ENABLE | PCL_CFG_PME_MSI_ENABLE) + +#define PCL_RCV_INTX 0x810c +#define PCL_RADM_INTD_ENABLE BIT(19) +#define PCL_RADM_INTC_ENABLE BIT(18) +#define PCL_RADM_INTB_ENABLE BIT(17) +#define PCL_RADM_INTA_ENABLE BIT(16) +#define PCL_RADM_INTD_STATUS BIT(3) +#define PCL_RADM_INTC_STATUS BIT(2) +#define PCL_RADM_INTB_STATUS BIT(1) +#define PCL_RADM_INTA_STATUS BIT(0) +#define PCL_RCV_INTX_ALL_ENABLE\ + (PCL_RADM_INTD_ENABLE | PCL_RADM_INTC_ENABLE \ +| PCL_RADM_INTB_ENABLE | PCL_RADM_INTA_ENABLE) + +#define PCL_STATUS_LINK0x8140 +#define PCL_RDLH_LINK_UP BIT(1) +#define PCL_XMLH_LINK_UP BIT(0) + +struct uniphier_pcie_priv { + void __iomem *base; + struct dw_pcie pci; + struct clk *clk; + struct reset_control *rst; + struct phy *phy; + struct irq_domain *irq_domain; +}; + +#define to_uniphier_pcie(x)dev_get_drvdata((x)->dev) + +static void uniphier_pcie_ltssm_enable(struct uniphier_pcie_priv *priv) +{ + u32 val; + + val = readl(priv->base + PCL_APP_READY_CTRL); + val |= PCL_APP_LTSSM_ENABLE; + writel(val, priv->base + PCL_APP_READY_CTRL); +} + +static void uniphier_pcie_ltssm_disable(struct uniphier_pcie_priv *priv) +{ + u32 val; + + val = readl(priv->base + PCL_APP_READY_CTRL); + val &=
[PATCH 1/2] dt-bindings: pci: add UniPhier PCIe host controller description
Add DT bindings for PCIe controller implemented in UniPhier SoCs when configured in Root Complex (host) mode. This controller is based on the Designware PCIe Core. Signed-off-by: Kunihiko Hayashi --- .../devicetree/bindings/pci/uniphier-pcie.txt | 78 ++ 1 file changed, 78 insertions(+) create mode 100644 Documentation/devicetree/bindings/pci/uniphier-pcie.txt diff --git a/Documentation/devicetree/bindings/pci/uniphier-pcie.txt b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt new file mode 100644 index 000..ea63f78 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/uniphier-pcie.txt @@ -0,0 +1,78 @@ +Socionext UniPhier PCI-express host controller bindings + +This describes the devicetree bindings for PCI-express host controller +implemented on Socionext UniPhier SoCs. + +UniPhier PCI-express host controller is based on the Synopsys DesignWare +PCI core. It shares common functions with the PCIe DesignWare core driver +and inherits common properties defined in +Documentation/devicetree/bindings/pci/designware-pcie.txt. + +Required properties: +- compatible: Should be "socionext,uniphier-pcie". +- reg: Specifies offset and length of the register set for the device. + According to the reg-names, appropriate register sets are required. +- reg-names: Must include the following entries: +"dbi"- controller configuration registers +"link" - SoC-specific glue layer registers +"config" - PCIe configuration space +- clocks: A phandle to the clock gate for PCIe glue layer including + the host controller. +- resets: A phandle to the reset line for PCIe glue layer including + the host controller. +- interrupts: A list of interrupt specifiers. According to the + interrupt-names, appropriate interrupts are required. +- interrupt-names: Must include the following entries: +"dma"- DMA interrupt +"msi"- MSI interrupt +"intx" - Legacy INTA/B/C/D interrupt + +Optional properties: +- phys: A phandle to generic PCIe PHY. According to the phy-names, appropriate + phys are required. +- phy-names: Must be "pcie-phy". + +Required sub-node: +- interrupt-controller: Specifies interrupt controller for legacy PCI + interrupts. The node name isn't important. + +Required properties for interrupt-controller: +- interrupt-controller: identifies the node as an interrupt controller. +- #interrupt-cells: specifies the number of cells needed to encode an + interrupt source. The value must be 1. + +Example: + + pcie: pcie@6600 { + compatible = "socionext,uniphier-pcie", "snps,dw-pcie"; + status = "disabled"; + reg-names = "dbi", "link", "config"; + reg = <0x6600 0x1000>, <0x6601 0x1>, + <0x2fff 0x1>; + #address-cells = <3>; + #size-cells = <2>; + clocks = <_clk 24>; + resets = <_rst 24>; + num-lanes = <1>; + num-viewport = <1>; + bus-range = <0x0 0xff>; + device_type = "pci"; + ranges = + /* downstream I/O */ + <0x8100 0 0x 0x2ffe 0 0x0001 + /* non-prefetchable memory */ +0x8200 0 0x 0x2000 0 0x0ffe>; + #interrupt-cells = <1>; + interrupt-names = "dma", "msi", "intx"; + interrupts = <0 224 4>, <0 225 4>, <0 226 4>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 _intc 0>, /* INTA */ + <0 0 0 2 _intc 1>, /* INTB */ + <0 0 0 3 _intc 2>, /* INTC */ + <0 0 0 4 _intc 3>; /* INTD */ + + pcie_intc: interrupt-controller { + interrupt-controller; + #interrupt-cells = <1>; + }; + }; -- 2.7.4
[PATCH] sysctl: kselftests: use kernel module instead of built-in
It uses modprobe $TEST_DRIVER in sysctl.sh, so update config to use "m" instead Signed-off-by: Lei Yang --- tools/testing/selftests/sysctl/config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/sysctl/config b/tools/testing/selftests/sysctl/config index 6ca1480..fc263ef 100644 --- a/tools/testing/selftests/sysctl/config +++ b/tools/testing/selftests/sysctl/config @@ -1 +1 @@ -CONFIG_TEST_SYSCTL=y +CONFIG_TEST_SYSCTL=m -- 1.9.1
[PATCH] sysctl: kselftests: use kernel module instead of built-in
It uses modprobe $TEST_DRIVER in sysctl.sh, so update config to use "m" instead Signed-off-by: Lei Yang --- tools/testing/selftests/sysctl/config | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/sysctl/config b/tools/testing/selftests/sysctl/config index 6ca1480..fc263ef 100644 --- a/tools/testing/selftests/sysctl/config +++ b/tools/testing/selftests/sysctl/config @@ -1 +1 @@ -CONFIG_TEST_SYSCTL=y +CONFIG_TEST_SYSCTL=m -- 1.9.1
Re: [PATCH 0/3] of: root #{size,address}-cells clean-ups
On 08/30/18 12:05, Rob Herring wrote: > This is a small restructuring and clean-up of handling root node > #size-cells and #address-cells (or lack of). As only Sparc needs a > different default value and only for #address-cells, we can handle that > locally and remove one more dependency on asm/prom.h. > > Rob > > Rob Herring (3): > of/fdt: Scan the root node properties earlier > of/fdt: avoid re-parsing '#{address,size}-cells' in > of_fdt_limit_memory > of: make default address and size cells sizes private > > arch/sparc/include/asm/prom.h | 3 --- > drivers/of/fdt.c | 30 +- > drivers/of/of_private.h | 8 > include/linux/of.h| 6 -- > 4 files changed, 13 insertions(+), 34 deletions(-) > Nice cleanup! -Frank
Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode
Srinivas Pandruvada writes: > On Mon, 2018-09-03 at 23:53 -0700, Francisco Jerez wrote: >> Eero Tamminen writes: >> >> > Hi, >> > >> > On 31.08.2018 20:28, Srinivas Pandruvada wrote: >> > ... >> > > As per testing Eero Tamminen, the results are comparable to the >> > > patchset >> > > https://patchwork.kernel.org/patch/10312259/ >> > > But he has to watch results for several days to check trend. >> > >> > It's close, but there is some gap compared to Francisco's version. >> > >> > (Because of the large variance on TDP limited devices, and factors >> > causing extra perf differences e.g. between boots, it's hard to >> > give >> > exact number without having trends from several days / weeks. I >> > would >> > also need new version of Fransisco's patch set that applies to >> > latest >> > kernel like yours does.) >> > >> > >> > > Since here boost is getting limited to turbo and non turbo, we >> > > need some >> > > ways to adjust the fractions corresponding to max non turbo as >> > > well. It >> > > is much easier to use the actual P-state limits for boost instead >> > > of >> > > fractions. So here P-state io boost limit is applied on top of >> > > the >> > > P-state limit calculated via current algorithm by removing >> > > current >> > > io_wait boost calculation using fractions. >> > > >> > > Since we prefer to use common algorithm for all processor >> > > platforms, this >> > > change was tested on other client and sever platforms as well. >> > > All results >> > > were within the margin of errors. Results: >> > > https://bugzilla.kernel.org/attachment.cgi?id=278149 >> > >> > Good. >> > >> > Francisco, how well the listed PTS tests cover latency bound cases >> > you >> > were concerned about? [1] >> > >> > >> >- Eero >> > >> > [1] Fransisco was concerned that the patch: >> > * trade-off might regress latency bound cases (which I haven't >> > tested, I >> > tested only 3D throughput), and >> > * that it addressed only other of the sources of energy >> > inefficiencies >> > he had identified (which could explain slightly better 3D results >> > from >> > his more complex patch set). >> >> This patch causes a number of statistically significant regressions >> (with significance of 1%) on the two systems I've tested it on. On >> my > Sure. These patches are targeted to Atom clients where some of these > server like workload may have some minor regression on few watts TDP > parts. Neither the 36% regression of fs-mark, the 21% regression of sqlite, nor the 10% regression of warsaw qualify as small. And most of the test cases on the list of regressions aren't exclusively server-like, if at all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics benchmarks -- Latency is as important if not more for interactive workloads than it is for server workloads. In the case of a conflict like the one we're dealing with right now between optimizing for throughput (e.g. for the maximum number of requests per second) and optimizing for latency (e.g. for the minimum request duration), you are more likely to be concerned about the former than about the latter in a server setup. > But weighing against reduced TURBO usage (which is enough trigger) and > improvement in tests done by Eero (which was primary complaint to us). > > It is trivial patch. Yes, the patch is not perfect and doesn't close > doors for any improvements. > It's sort of self-contained because it's awfully incomplete. > I see your idea, but how to implement in acceptable way is a challenge. Main challenge was getting the code to work without regressions in latency-sensitive workloads, which I did, because you told me that it wasn't acceptable for it to cause any regressions on the Phoronix daily-system-tracker, whether latency-bound or not. What is left in order to address Peter's concerns is for the most part plumbing, that's guaranteed not to have any functional impact on the heuristic. The fact that we don't expect it to change the performance of the system makes it substantially less time-consuming to validate than altering the performance trade-offs of the heuristic dynamically in order to avoid regressions (which is what has kept my systems busy most of the time lately). Seems like my series, even in its current state without the changes requested by Peter is closer to being ready for production than this patch is. > So ultimately we will get there, but will require some time compared > with other high priority items. > > > Thanks > Srinivas > >> CHV N3050: >> >> > phoronix/fs- >> > mark/test=0: >> > XXX ±7.25% x34 -> XXX ±7.00% x39 d=-36.85% ±5.91% p=0.00% >> > phoronix/sqlite: >> > XXX ±1.86% x34 -> XXX ±1.88% x39 d=-21.73% >> > ±1.66% p=0.00% >> > warsow/benchsow: >> > XXX ±1.25% x34 -> XXX ±1.95% x39 d=-10.83% >> >
Re: [PATCH 0/3] of: root #{size,address}-cells clean-ups
On 08/30/18 12:05, Rob Herring wrote: > This is a small restructuring and clean-up of handling root node > #size-cells and #address-cells (or lack of). As only Sparc needs a > different default value and only for #address-cells, we can handle that > locally and remove one more dependency on asm/prom.h. > > Rob > > Rob Herring (3): > of/fdt: Scan the root node properties earlier > of/fdt: avoid re-parsing '#{address,size}-cells' in > of_fdt_limit_memory > of: make default address and size cells sizes private > > arch/sparc/include/asm/prom.h | 3 --- > drivers/of/fdt.c | 30 +- > drivers/of/of_private.h | 8 > include/linux/of.h| 6 -- > 4 files changed, 13 insertions(+), 34 deletions(-) > Nice cleanup! -Frank
Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode
Srinivas Pandruvada writes: > On Mon, 2018-09-03 at 23:53 -0700, Francisco Jerez wrote: >> Eero Tamminen writes: >> >> > Hi, >> > >> > On 31.08.2018 20:28, Srinivas Pandruvada wrote: >> > ... >> > > As per testing Eero Tamminen, the results are comparable to the >> > > patchset >> > > https://patchwork.kernel.org/patch/10312259/ >> > > But he has to watch results for several days to check trend. >> > >> > It's close, but there is some gap compared to Francisco's version. >> > >> > (Because of the large variance on TDP limited devices, and factors >> > causing extra perf differences e.g. between boots, it's hard to >> > give >> > exact number without having trends from several days / weeks. I >> > would >> > also need new version of Fransisco's patch set that applies to >> > latest >> > kernel like yours does.) >> > >> > >> > > Since here boost is getting limited to turbo and non turbo, we >> > > need some >> > > ways to adjust the fractions corresponding to max non turbo as >> > > well. It >> > > is much easier to use the actual P-state limits for boost instead >> > > of >> > > fractions. So here P-state io boost limit is applied on top of >> > > the >> > > P-state limit calculated via current algorithm by removing >> > > current >> > > io_wait boost calculation using fractions. >> > > >> > > Since we prefer to use common algorithm for all processor >> > > platforms, this >> > > change was tested on other client and sever platforms as well. >> > > All results >> > > were within the margin of errors. Results: >> > > https://bugzilla.kernel.org/attachment.cgi?id=278149 >> > >> > Good. >> > >> > Francisco, how well the listed PTS tests cover latency bound cases >> > you >> > were concerned about? [1] >> > >> > >> >- Eero >> > >> > [1] Fransisco was concerned that the patch: >> > * trade-off might regress latency bound cases (which I haven't >> > tested, I >> > tested only 3D throughput), and >> > * that it addressed only other of the sources of energy >> > inefficiencies >> > he had identified (which could explain slightly better 3D results >> > from >> > his more complex patch set). >> >> This patch causes a number of statistically significant regressions >> (with significance of 1%) on the two systems I've tested it on. On >> my > Sure. These patches are targeted to Atom clients where some of these > server like workload may have some minor regression on few watts TDP > parts. Neither the 36% regression of fs-mark, the 21% regression of sqlite, nor the 10% regression of warsaw qualify as small. And most of the test cases on the list of regressions aren't exclusively server-like, if at all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics benchmarks -- Latency is as important if not more for interactive workloads than it is for server workloads. In the case of a conflict like the one we're dealing with right now between optimizing for throughput (e.g. for the maximum number of requests per second) and optimizing for latency (e.g. for the minimum request duration), you are more likely to be concerned about the former than about the latter in a server setup. > But weighing against reduced TURBO usage (which is enough trigger) and > improvement in tests done by Eero (which was primary complaint to us). > > It is trivial patch. Yes, the patch is not perfect and doesn't close > doors for any improvements. > It's sort of self-contained because it's awfully incomplete. > I see your idea, but how to implement in acceptable way is a challenge. Main challenge was getting the code to work without regressions in latency-sensitive workloads, which I did, because you told me that it wasn't acceptable for it to cause any regressions on the Phoronix daily-system-tracker, whether latency-bound or not. What is left in order to address Peter's concerns is for the most part plumbing, that's guaranteed not to have any functional impact on the heuristic. The fact that we don't expect it to change the performance of the system makes it substantially less time-consuming to validate than altering the performance trade-offs of the heuristic dynamically in order to avoid regressions (which is what has kept my systems busy most of the time lately). Seems like my series, even in its current state without the changes requested by Peter is closer to being ready for production than this patch is. > So ultimately we will get there, but will require some time compared > with other high priority items. > > > Thanks > Srinivas > >> CHV N3050: >> >> > phoronix/fs- >> > mark/test=0: >> > XXX ±7.25% x34 -> XXX ±7.00% x39 d=-36.85% ±5.91% p=0.00% >> > phoronix/sqlite: >> > XXX ±1.86% x34 -> XXX ±1.88% x39 d=-21.73% >> > ±1.66% p=0.00% >> > warsow/benchsow: >> > XXX ±1.25% x34 -> XXX ±1.95% x39 d=-10.83% >> >
Re: [PATCH 3/3] of: make default address and size cells sizes private
On 08/30/18 12:05, Rob Herring wrote: > Only some old OpenFirmware implementations rely on default sizes. Any > FDT and modern implementation should have explicit properties. Make the > OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside > users. > > This also gets us one step closer to removing the asm/prom.h dependency on > Sparc. > > Cc: "David S. Miller" > Cc: Frank Rowand > Cc: sparcli...@vger.kernel.org > Signed-off-by: Rob Herring > --- > arch/sparc/include/asm/prom.h | 3 --- > drivers/of/of_private.h | 8 > include/linux/of.h| 6 -- > 3 files changed, 8 insertions(+), 9 deletions(-) Reviewed-by: Frank Rowand > diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h > index d955c8df62d6..1902db27ff4b 100644 > --- a/arch/sparc/include/asm/prom.h > +++ b/arch/sparc/include/asm/prom.h > @@ -24,9 +24,6 @@ > #include > #include > > -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2 > -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > - > #define of_compat_cmp(s1, s2, l) strncmp((s1), (s2), (l)) > #define of_prop_cmp(s1, s2) strcasecmp((s1), (s2)) > #define of_node_cmp(s1, s2) strcmp((s1), (s2)) > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h > index 216175d11d3d..5d1567025358 100644 > --- a/drivers/of/of_private.h > +++ b/drivers/of/of_private.h > @@ -27,6 +27,14 @@ struct alias_prop { > char stem[0]; > }; > > +#if defined(CONFIG_SPARC) > +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2 > +#else > +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1 > +#endif > + > +#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > + > extern struct mutex of_mutex; > extern struct list_head aliases_lookup; > extern struct kset *of_kset; > diff --git a/include/linux/of.h b/include/linux/of.h > index 506beca9588d..49d85f670c75 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -247,12 +247,6 @@ static inline unsigned long of_read_ulong(const __be32 > *cell, int size) > #include > #endif > > -/* Default #address and #size cells. Allow arch asm/prom.h to override */ > -#if !defined(OF_ROOT_NODE_ADDR_CELLS_DEFAULT) > -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1 > -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > -#endif > - > #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags) > #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags) > >
Re: [PATCH 3/3] of: make default address and size cells sizes private
On 08/30/18 12:05, Rob Herring wrote: > Only some old OpenFirmware implementations rely on default sizes. Any > FDT and modern implementation should have explicit properties. Make the > OF_ROOT_NODE_*_CELLS_DEFAULT defines private so we don't get any outside > users. > > This also gets us one step closer to removing the asm/prom.h dependency on > Sparc. > > Cc: "David S. Miller" > Cc: Frank Rowand > Cc: sparcli...@vger.kernel.org > Signed-off-by: Rob Herring > --- > arch/sparc/include/asm/prom.h | 3 --- > drivers/of/of_private.h | 8 > include/linux/of.h| 6 -- > 3 files changed, 8 insertions(+), 9 deletions(-) Reviewed-by: Frank Rowand > diff --git a/arch/sparc/include/asm/prom.h b/arch/sparc/include/asm/prom.h > index d955c8df62d6..1902db27ff4b 100644 > --- a/arch/sparc/include/asm/prom.h > +++ b/arch/sparc/include/asm/prom.h > @@ -24,9 +24,6 @@ > #include > #include > > -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2 > -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > - > #define of_compat_cmp(s1, s2, l) strncmp((s1), (s2), (l)) > #define of_prop_cmp(s1, s2) strcasecmp((s1), (s2)) > #define of_node_cmp(s1, s2) strcmp((s1), (s2)) > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h > index 216175d11d3d..5d1567025358 100644 > --- a/drivers/of/of_private.h > +++ b/drivers/of/of_private.h > @@ -27,6 +27,14 @@ struct alias_prop { > char stem[0]; > }; > > +#if defined(CONFIG_SPARC) > +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 2 > +#else > +#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1 > +#endif > + > +#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > + > extern struct mutex of_mutex; > extern struct list_head aliases_lookup; > extern struct kset *of_kset; > diff --git a/include/linux/of.h b/include/linux/of.h > index 506beca9588d..49d85f670c75 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -247,12 +247,6 @@ static inline unsigned long of_read_ulong(const __be32 > *cell, int size) > #include > #endif > > -/* Default #address and #size cells. Allow arch asm/prom.h to override */ > -#if !defined(OF_ROOT_NODE_ADDR_CELLS_DEFAULT) > -#define OF_ROOT_NODE_ADDR_CELLS_DEFAULT 1 > -#define OF_ROOT_NODE_SIZE_CELLS_DEFAULT 1 > -#endif > - > #define OF_IS_DYNAMIC(x) test_bit(OF_DYNAMIC, >_flags) > #define OF_MARK_DYNAMIC(x) set_bit(OF_DYNAMIC, >_flags) > >
Re: linux-next: build warnings after merge of the devicetree tree
Hi all, On Wed, 5 Sep 2018 10:39:06 +1000 Stephen Rothwell wrote: > > After merging the devicetree tree, today's linux-next build (x86_64 > allmodconfig) produced these warnings: > > WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the > function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells > The function of_fdt_limit_memory() references > the variable __initdata dt_root_addr_cells. > This is often because of_fdt_limit_memory lacks a __initdata > annotation or the annotation of dt_root_addr_cells is wrong. > > WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the > function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells > The function of_fdt_limit_memory() references > the variable __initdata dt_root_size_cells. > This is often because of_fdt_limit_memory lacks a __initdata > annotation or the annotation of dt_root_size_cells is wrong. > > Introduced by commit > > bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in > of_fdt_limit_memory") It turns out that section mismatches are fatal errors in some configs (like powerpc allnoconfig), so I have added the following patch for today: From: Stephen Rothwell Date: Wed, 5 Sep 2018 11:50:29 +1000 Subject: [PATCH] mark of_fdt_limit_memory() as __init Signed-off-by: Stephen Rothwell --- drivers/of/fdt.c | 2 +- include/linux/of_fdt.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index fef4b2c8a171..fe78bed8925f 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -39,7 +39,7 @@ * memory entries in the /memory node. This function may be called * any time after initial_boot_param is set. */ -void of_fdt_limit_memory(int limit) +void __init of_fdt_limit_memory(int limit) { int memory; int len; diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h index b9cd9ebdf9b9..19ebf22a7862 100644 --- a/include/linux/of_fdt.h +++ b/include/linux/of_fdt.h @@ -46,7 +46,7 @@ extern char __dtb_end[]; /* Other Prototypes */ extern u64 of_flat_dt_translate_address(unsigned long node); -extern void of_fdt_limit_memory(int limit); +extern void __init of_fdt_limit_memory(int limit); #endif /* CONFIG_OF_FLATTREE */ #ifdef CONFIG_OF_EARLY_FLATTREE -- 2.18.0 -- Cheers, Stephen Rothwell pgpNjpkMIxaNL.pgp Description: OpenPGP digital signature
Re: linux-next: build warnings after merge of the devicetree tree
Hi all, On Wed, 5 Sep 2018 10:39:06 +1000 Stephen Rothwell wrote: > > After merging the devicetree tree, today's linux-next build (x86_64 > allmodconfig) produced these warnings: > > WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the > function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells > The function of_fdt_limit_memory() references > the variable __initdata dt_root_addr_cells. > This is often because of_fdt_limit_memory lacks a __initdata > annotation or the annotation of dt_root_addr_cells is wrong. > > WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the > function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells > The function of_fdt_limit_memory() references > the variable __initdata dt_root_size_cells. > This is often because of_fdt_limit_memory lacks a __initdata > annotation or the annotation of dt_root_size_cells is wrong. > > Introduced by commit > > bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in > of_fdt_limit_memory") It turns out that section mismatches are fatal errors in some configs (like powerpc allnoconfig), so I have added the following patch for today: From: Stephen Rothwell Date: Wed, 5 Sep 2018 11:50:29 +1000 Subject: [PATCH] mark of_fdt_limit_memory() as __init Signed-off-by: Stephen Rothwell --- drivers/of/fdt.c | 2 +- include/linux/of_fdt.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index fef4b2c8a171..fe78bed8925f 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -39,7 +39,7 @@ * memory entries in the /memory node. This function may be called * any time after initial_boot_param is set. */ -void of_fdt_limit_memory(int limit) +void __init of_fdt_limit_memory(int limit) { int memory; int len; diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h index b9cd9ebdf9b9..19ebf22a7862 100644 --- a/include/linux/of_fdt.h +++ b/include/linux/of_fdt.h @@ -46,7 +46,7 @@ extern char __dtb_end[]; /* Other Prototypes */ extern u64 of_flat_dt_translate_address(unsigned long node); -extern void of_fdt_limit_memory(int limit); +extern void __init of_fdt_limit_memory(int limit); #endif /* CONFIG_OF_FLATTREE */ #ifdef CONFIG_OF_EARLY_FLATTREE -- 2.18.0 -- Cheers, Stephen Rothwell pgpNjpkMIxaNL.pgp Description: OpenPGP digital signature
Re: [PATCH 2/3] of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory
On 08/30/18 12:05, Rob Herring wrote: > Now that we initialize dt_root_addr_cells and dt_root_size_cells earlier, > use them and simplify of_fdt_limit_memory. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/fdt.c | 23 +-- > 1 file changed, 1 insertion(+), 22 deletions(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 49abe18f1bde..fef4b2c8a171 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c As kbuild test robot noted, of_fdt_limit_memory() will access __initdata dt_root_addr_cells and __initdata dt_root_size_cells. A possible (untested) fix would be: - void of_fdt_limit_memory(int limit) - { + void __init of_fdt_limit_memory(int limit) + { > @@ -44,28 +44,7 @@ void of_fdt_limit_memory(int limit) > int memory; > int len; > const void *val; > - int nr_address_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT; > - int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT; > - const __be32 *addr_prop; > - const __be32 *size_prop; > - int root_offset; > - int cell_size; > - > - root_offset = fdt_path_offset(initial_boot_params, "/"); > - if (root_offset < 0) > - return; > - > - addr_prop = fdt_getprop(initial_boot_params, root_offset, > - "#address-cells", NULL); > - if (addr_prop) > - nr_address_cells = fdt32_to_cpu(*addr_prop); > - > - size_prop = fdt_getprop(initial_boot_params, root_offset, > - "#size-cells", NULL); > - if (size_prop) > - nr_size_cells = fdt32_to_cpu(*size_prop); > - > - cell_size = sizeof(uint32_t)*(nr_address_cells + nr_size_cells); > + int cell_size = sizeof(uint32_t)*(dt_root_addr_cells + > dt_root_size_cells); > > memory = fdt_path_offset(initial_boot_params, "/memory"); > if (memory > 0) { >
Re: [PATCH 2/3] of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory
On 08/30/18 12:05, Rob Herring wrote: > Now that we initialize dt_root_addr_cells and dt_root_size_cells earlier, > use them and simplify of_fdt_limit_memory. > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/fdt.c | 23 +-- > 1 file changed, 1 insertion(+), 22 deletions(-) > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 49abe18f1bde..fef4b2c8a171 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c As kbuild test robot noted, of_fdt_limit_memory() will access __initdata dt_root_addr_cells and __initdata dt_root_size_cells. A possible (untested) fix would be: - void of_fdt_limit_memory(int limit) - { + void __init of_fdt_limit_memory(int limit) + { > @@ -44,28 +44,7 @@ void of_fdt_limit_memory(int limit) > int memory; > int len; > const void *val; > - int nr_address_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT; > - int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT; > - const __be32 *addr_prop; > - const __be32 *size_prop; > - int root_offset; > - int cell_size; > - > - root_offset = fdt_path_offset(initial_boot_params, "/"); > - if (root_offset < 0) > - return; > - > - addr_prop = fdt_getprop(initial_boot_params, root_offset, > - "#address-cells", NULL); > - if (addr_prop) > - nr_address_cells = fdt32_to_cpu(*addr_prop); > - > - size_prop = fdt_getprop(initial_boot_params, root_offset, > - "#size-cells", NULL); > - if (size_prop) > - nr_size_cells = fdt32_to_cpu(*size_prop); > - > - cell_size = sizeof(uint32_t)*(nr_address_cells + nr_size_cells); > + int cell_size = sizeof(uint32_t)*(dt_root_addr_cells + > dt_root_size_cells); > > memory = fdt_path_offset(initial_boot_params, "/memory"); > if (memory > 0) { >
Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier
On 08/30/18 12:05, Rob Herring wrote: > Scan the root node properties (#{size,address}-cells) earlier, ^^^ before mdesc->dt_fixup() is called > so that > the dt_root_addr_cells and dt_root_size_cells variables are initialized > and can be used. by mdesc->dt_fixup() > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/fdt.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > Moving early_init_dt_scan_root() to inside early_init_dt_verify() puts something that has nothing to do with verifying the fdt into a function whose purpose is the verify. It hides the side effect of initializing the dt_root_addr_cells and dt_root_size_cells variables. I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(), move the chunk of code there instead of to early_init_dt_scan_nodes(), and call the new function from setup_machine_fdt(), just before calling mdesc->dt_fixup(). This would be a little bit more code, but more clearly showing the intent. -Frank > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 800ad252cf9c..49abe18f1bde 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1215,6 +1215,10 @@ bool __init early_init_dt_verify(void *params) > initial_boot_params = params; > of_fdt_crc32 = crc32_be(~0, initial_boot_params, > fdt_totalsize(initial_boot_params)); > + > + /* Initialize {size,address}-cells info */ > + of_scan_flat_dt(early_init_dt_scan_root, NULL); > + > return true; > } > > @@ -1224,9 +1228,6 @@ void __init early_init_dt_scan_nodes(void) > /* Retrieve various information from the /chosen node */ > of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line); > > - /* Initialize {size,address}-cells info */ > - of_scan_flat_dt(early_init_dt_scan_root, NULL); > - > /* Setup memory, calling early_init_dt_add_memory_arch */ > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > } >
Re: [PATCH 1/3] of/fdt: Scan the root node properties earlier
On 08/30/18 12:05, Rob Herring wrote: > Scan the root node properties (#{size,address}-cells) earlier, ^^^ before mdesc->dt_fixup() is called > so that > the dt_root_addr_cells and dt_root_size_cells variables are initialized > and can be used. by mdesc->dt_fixup() > > Cc: Frank Rowand > Signed-off-by: Rob Herring > --- > drivers/of/fdt.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > Moving early_init_dt_scan_root() to inside early_init_dt_verify() puts something that has nothing to do with verifying the fdt into a function whose purpose is the verify. It hides the side effect of initializing the dt_root_addr_cells and dt_root_size_cells variables. I suggest creating a new function early_init_dt_scan_init_pre_dt_fixup(), move the chunk of code there instead of to early_init_dt_scan_nodes(), and call the new function from setup_machine_fdt(), just before calling mdesc->dt_fixup(). This would be a little bit more code, but more clearly showing the intent. -Frank > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 800ad252cf9c..49abe18f1bde 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1215,6 +1215,10 @@ bool __init early_init_dt_verify(void *params) > initial_boot_params = params; > of_fdt_crc32 = crc32_be(~0, initial_boot_params, > fdt_totalsize(initial_boot_params)); > + > + /* Initialize {size,address}-cells info */ > + of_scan_flat_dt(early_init_dt_scan_root, NULL); > + > return true; > } > > @@ -1224,9 +1228,6 @@ void __init early_init_dt_scan_nodes(void) > /* Retrieve various information from the /chosen node */ > of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line); > > - /* Initialize {size,address}-cells info */ > - of_scan_flat_dt(early_init_dt_scan_root, NULL); > - > /* Setup memory, calling early_init_dt_add_memory_arch */ > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > } >
Re: [PATCH] lib/parser.c: switch match_number() over to use match_strdup()
On Thu, Aug 30, 2018 at 12:47:27PM -0700, Eric Biggers wrote: > From: Eric Biggers > > This simplifies the code. No change in behavior. > > Signed-off-by: Eric Biggers > --- > lib/parser.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/lib/parser.c b/lib/parser.c > index 0142ef28f0eb..96656a6dd59b 100644 > --- a/lib/parser.c > +++ b/lib/parser.c > @@ -131,13 +131,10 @@ static int match_number(substring_t *s, int *result, > int base) > char *buf; > int ret; > long val; > - size_t len = s->to - s->from; > > - buf = kmalloc(len + 1, GFP_KERNEL); > + buf = match_strdup(s); > if (!buf) > return -ENOMEM; > - memcpy(buf, s->from, len); > - buf[len] = '\0'; > > ret = 0; > val = simple_strtol(buf, , base); > -- Hi Andrew, are you planning to take this patch too? The similar patch lib/parser.c: switch match_u64int() over to use match_strdup() is in -mm now, but this one is not. Thanks, - Eric
Re: [PATCH] lib/parser.c: switch match_number() over to use match_strdup()
On Thu, Aug 30, 2018 at 12:47:27PM -0700, Eric Biggers wrote: > From: Eric Biggers > > This simplifies the code. No change in behavior. > > Signed-off-by: Eric Biggers > --- > lib/parser.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/lib/parser.c b/lib/parser.c > index 0142ef28f0eb..96656a6dd59b 100644 > --- a/lib/parser.c > +++ b/lib/parser.c > @@ -131,13 +131,10 @@ static int match_number(substring_t *s, int *result, > int base) > char *buf; > int ret; > long val; > - size_t len = s->to - s->from; > > - buf = kmalloc(len + 1, GFP_KERNEL); > + buf = match_strdup(s); > if (!buf) > return -ENOMEM; > - memcpy(buf, s->from, len); > - buf[len] = '\0'; > > ret = 0; > val = simple_strtol(buf, , base); > -- Hi Andrew, are you planning to take this patch too? The similar patch lib/parser.c: switch match_u64int() over to use match_strdup() is in -mm now, but this one is not. Thanks, - Eric
[PATCH] x86/intel_rdt: Show missing resctrl mount options
In resctrl filesystem, we have some mount options to enable L3/L2 CDP and MBA Software Controller features if platform supports them: mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl But currently only "cdp" option can be displayed. "cdpl2" and "mba_MBps" options are not showed even when they are active mount options. Before this patch: #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl #grep resctrl /proc/mounts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0 After this patch: #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl #grep resctrl /proc/mountts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0 Signed-off-by: Xiaochen Shen Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index b799c00bef09..82459a41fa84 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -2760,6 +2760,13 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) { if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled) seq_puts(seq, ",cdp"); + + if (rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled) + seq_puts(seq, ",cdpl2"); + + if (is_mba_sc(_resources_all[RDT_RESOURCE_MBA])) + seq_puts(seq, ",mba_MBps"); + return 0; } -- 2.18.0
[PATCH] x86/intel_rdt: Show missing resctrl mount options
In resctrl filesystem, we have some mount options to enable L3/L2 CDP and MBA Software Controller features if platform supports them: mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl But currently only "cdp" option can be displayed. "cdpl2" and "mba_MBps" options are not showed even when they are active mount options. Before this patch: #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl #grep resctrl /proc/mounts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0 After this patch: #mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl #grep resctrl /proc/mountts /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0 Signed-off-by: Xiaochen Shen Signed-off-by: Fenghua Yu --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index b799c00bef09..82459a41fa84 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -2760,6 +2760,13 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) { if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled) seq_puts(seq, ",cdp"); + + if (rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled) + seq_puts(seq, ",cdpl2"); + + if (is_mba_sc(_resources_all[RDT_RESOURCE_MBA])) + seq_puts(seq, ",mba_MBps"); + return 0; } -- 2.18.0
[PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend
On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power state retention), UART iomux settings will be lost, need to add pinctrl sleep/default mode switch during suspend/resume to make sure UART iomux settings are correct after resume. Signed-off-by: Anson Huang --- drivers/tty/serial/imx.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index c15332d..6ccb5c4 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -2449,6 +2450,8 @@ static int imx_uart_suspend_noirq(struct device *dev) clk_disable(sport->clk_ipg); + pinctrl_pm_select_sleep_state(dev); + return 0; } @@ -2457,6 +2460,8 @@ static int imx_uart_resume_noirq(struct device *dev) struct imx_port *sport = dev_get_drvdata(dev); int ret; + pinctrl_pm_select_default_state(dev); + ret = clk_enable(sport->clk_ipg); if (ret) return ret; -- 2.7.4
[PATCH V2 2/2] tty: serial: imx: add pinctrl sleep/default mode switch for suspend
On some i.MX SoCs' low power mode, such as i.MX7D's LPSR(low power state retention), UART iomux settings will be lost, need to add pinctrl sleep/default mode switch during suspend/resume to make sure UART iomux settings are correct after resume. Signed-off-by: Anson Huang --- drivers/tty/serial/imx.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index c15332d..6ccb5c4 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -2449,6 +2450,8 @@ static int imx_uart_suspend_noirq(struct device *dev) clk_disable(sport->clk_ipg); + pinctrl_pm_select_sleep_state(dev); + return 0; } @@ -2457,6 +2460,8 @@ static int imx_uart_resume_noirq(struct device *dev) struct imx_port *sport = dev_get_drvdata(dev); int ret; + pinctrl_pm_select_default_state(dev); + ret = clk_enable(sport->clk_ipg); if (ret) return ret; -- 2.7.4
[PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore
In noirq suspend/resume stage with no_console_suspend enabled, imx_uart_console_write() may be called to print out log_buf message by printk(), so there will be race condition between imx_uart_console_write() and imx_uart_save/restore_context(), need to add lock to protect the registers save/restore operations. Signed-off-by: Anson Huang --- drivers/tty/serial/imx.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index 239c0fa..c15332d 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -2376,8 +2376,13 @@ static int imx_uart_remove(struct platform_device *pdev) static void imx_uart_restore_context(struct imx_port *sport) { - if (!sport->context_saved) + unsigned long flags; + + spin_lock_irqsave(>port.lock, flags); + if (!sport->context_saved) { + spin_unlock_irqrestore(>port.lock, flags); return; + } imx_uart_writel(sport, sport->saved_reg[4], UFCR); imx_uart_writel(sport, sport->saved_reg[5], UESC); @@ -2390,11 +2395,15 @@ static void imx_uart_restore_context(struct imx_port *sport) imx_uart_writel(sport, sport->saved_reg[2], UCR3); imx_uart_writel(sport, sport->saved_reg[3], UCR4); sport->context_saved = false; + spin_unlock_irqrestore(>port.lock, flags); } static void imx_uart_save_context(struct imx_port *sport) { + unsigned long flags; + /* Save necessary regs */ + spin_lock_irqsave(>port.lock, flags); sport->saved_reg[0] = imx_uart_readl(sport, UCR1); sport->saved_reg[1] = imx_uart_readl(sport, UCR2); sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6 +2415,7 @@ static void imx_uart_save_context(struct imx_port *sport) sport->saved_reg[8] = imx_uart_readl(sport, UBMR); sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS); sport->context_saved = true; + spin_unlock_irqrestore(>port.lock, flags); } static void imx_uart_enable_wakeup(struct imx_port *sport, bool on) -- 2.7.4
[PATCH V2 1/2] tty: serial: imx: add lock for registers save/restore
In noirq suspend/resume stage with no_console_suspend enabled, imx_uart_console_write() may be called to print out log_buf message by printk(), so there will be race condition between imx_uart_console_write() and imx_uart_save/restore_context(), need to add lock to protect the registers save/restore operations. Signed-off-by: Anson Huang --- drivers/tty/serial/imx.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index 239c0fa..c15332d 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -2376,8 +2376,13 @@ static int imx_uart_remove(struct platform_device *pdev) static void imx_uart_restore_context(struct imx_port *sport) { - if (!sport->context_saved) + unsigned long flags; + + spin_lock_irqsave(>port.lock, flags); + if (!sport->context_saved) { + spin_unlock_irqrestore(>port.lock, flags); return; + } imx_uart_writel(sport, sport->saved_reg[4], UFCR); imx_uart_writel(sport, sport->saved_reg[5], UESC); @@ -2390,11 +2395,15 @@ static void imx_uart_restore_context(struct imx_port *sport) imx_uart_writel(sport, sport->saved_reg[2], UCR3); imx_uart_writel(sport, sport->saved_reg[3], UCR4); sport->context_saved = false; + spin_unlock_irqrestore(>port.lock, flags); } static void imx_uart_save_context(struct imx_port *sport) { + unsigned long flags; + /* Save necessary regs */ + spin_lock_irqsave(>port.lock, flags); sport->saved_reg[0] = imx_uart_readl(sport, UCR1); sport->saved_reg[1] = imx_uart_readl(sport, UCR2); sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6 +2415,7 @@ static void imx_uart_save_context(struct imx_port *sport) sport->saved_reg[8] = imx_uart_readl(sport, UBMR); sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS); sport->context_saved = true; + spin_unlock_irqrestore(>port.lock, flags); } static void imx_uart_enable_wakeup(struct imx_port *sport, bool on) -- 2.7.4
RE: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore
Hi, Uwe Anson Huang Best Regards! > -Original Message- > From: Uwe Kleine-König > Sent: Wednesday, September 5, 2018 4:32 AM > To: Anson Huang > Cc: gre...@linuxfoundation.org; jsl...@suse.com; > linux-ser...@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > > Subject: Re: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore > > Hello, > > On Thu, Aug 09, 2018 at 06:06:08PM +0800, Anson Huang wrote: > > In noirq suspend/resume stage with no_console_suspend enabled, > > .imx_console_write() may be called to print out log_buf message by > > .printk(), so there will be race condition between > > .imx_console_write() and .serial_imx_save/restore_context(), > > need to add lock to protect the registers save/restore operations. > > The function names changes some time ago. Also I'd drop the leading dots in > the names which I would understand as members in a struct. > > Is this a theoretical issue, or did you see actual breakage? I will update the function name and remove the dots. And we did see actual breakage during stress test, although the reproduce rate is NOT that high, but the issue came out and fixed by this patch. > > > Signed-off-by: Anson Huang > > --- > > drivers/tty/serial/imx.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index > > 239c0fa..a09ccef 100644 > > --- a/drivers/tty/serial/imx.c > > +++ b/drivers/tty/serial/imx.c > > @@ -2376,9 +2376,12 @@ static int imx_uart_remove(struct > > platform_device *pdev) > > > > static void imx_uart_restore_context(struct imx_port *sport) { > > + unsigned long flags; > > + > > if (!sport->context_saved) > > return; > > Is it safe to check .context_saved without holding the lock? Ah, my mistake, will put the check inside the lock. Please help review V2 patch set, thanks. Anson. > > > + spin_lock_irqsave(>port.lock, flags); > > imx_uart_writel(sport, sport->saved_reg[4], UFCR); > > imx_uart_writel(sport, sport->saved_reg[5], UESC); > > imx_uart_writel(sport, sport->saved_reg[6], UTIM); @@ -2390,11 > > +2393,15 @@ static void imx_uart_restore_context(struct imx_port *sport) > > imx_uart_writel(sport, sport->saved_reg[2], UCR3); > > imx_uart_writel(sport, sport->saved_reg[3], UCR4); > > sport->context_saved = false; > > + spin_unlock_irqrestore(>port.lock, flags); > > } > > > > static void imx_uart_save_context(struct imx_port *sport) { > > + unsigned long flags; > > + > > /* Save necessary regs */ > > + spin_lock_irqsave(>port.lock, flags); > > sport->saved_reg[0] = imx_uart_readl(sport, UCR1); > > sport->saved_reg[1] = imx_uart_readl(sport, UCR2); > > sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6 > > +2413,7 @@ static void imx_uart_save_context(struct imx_port *sport) > > sport->saved_reg[8] = imx_uart_readl(sport, UBMR); > > sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS); > > sport->context_saved = true; > > + spin_unlock_irqrestore(>port.lock, flags); > > } > > > > static void imx_uart_enable_wakeup(struct imx_port *sport, bool on) > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König > | > Industrial Linux Solutions | > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. > pengutronix.de%2Fdata=02%7C01%7Canson.huang%40nxp.com%7C74 > a28e347934451a59bf08d612a57f8c%7C686ea1d3bc2b4c6fa92cd99c5c301635 > %7C0%7C0%7C636716899353510311sdata=lByZDd%2BbQitjEf%2FaiN3 > e26El3ErT0fnmvaKCqckF7Qc%3Dreserved=0 |
RE: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore
Hi, Uwe Anson Huang Best Regards! > -Original Message- > From: Uwe Kleine-König > Sent: Wednesday, September 5, 2018 4:32 AM > To: Anson Huang > Cc: gre...@linuxfoundation.org; jsl...@suse.com; > linux-ser...@vger.kernel.org; linux-kernel@vger.kernel.org; dl-linux-imx > > Subject: Re: [PATCH 1/2] tty: serial: imx: add lock for registers save/restore > > Hello, > > On Thu, Aug 09, 2018 at 06:06:08PM +0800, Anson Huang wrote: > > In noirq suspend/resume stage with no_console_suspend enabled, > > .imx_console_write() may be called to print out log_buf message by > > .printk(), so there will be race condition between > > .imx_console_write() and .serial_imx_save/restore_context(), > > need to add lock to protect the registers save/restore operations. > > The function names changes some time ago. Also I'd drop the leading dots in > the names which I would understand as members in a struct. > > Is this a theoretical issue, or did you see actual breakage? I will update the function name and remove the dots. And we did see actual breakage during stress test, although the reproduce rate is NOT that high, but the issue came out and fixed by this patch. > > > Signed-off-by: Anson Huang > > --- > > drivers/tty/serial/imx.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index > > 239c0fa..a09ccef 100644 > > --- a/drivers/tty/serial/imx.c > > +++ b/drivers/tty/serial/imx.c > > @@ -2376,9 +2376,12 @@ static int imx_uart_remove(struct > > platform_device *pdev) > > > > static void imx_uart_restore_context(struct imx_port *sport) { > > + unsigned long flags; > > + > > if (!sport->context_saved) > > return; > > Is it safe to check .context_saved without holding the lock? Ah, my mistake, will put the check inside the lock. Please help review V2 patch set, thanks. Anson. > > > + spin_lock_irqsave(>port.lock, flags); > > imx_uart_writel(sport, sport->saved_reg[4], UFCR); > > imx_uart_writel(sport, sport->saved_reg[5], UESC); > > imx_uart_writel(sport, sport->saved_reg[6], UTIM); @@ -2390,11 > > +2393,15 @@ static void imx_uart_restore_context(struct imx_port *sport) > > imx_uart_writel(sport, sport->saved_reg[2], UCR3); > > imx_uart_writel(sport, sport->saved_reg[3], UCR4); > > sport->context_saved = false; > > + spin_unlock_irqrestore(>port.lock, flags); > > } > > > > static void imx_uart_save_context(struct imx_port *sport) { > > + unsigned long flags; > > + > > /* Save necessary regs */ > > + spin_lock_irqsave(>port.lock, flags); > > sport->saved_reg[0] = imx_uart_readl(sport, UCR1); > > sport->saved_reg[1] = imx_uart_readl(sport, UCR2); > > sport->saved_reg[2] = imx_uart_readl(sport, UCR3); @@ -2406,6 > > +2413,7 @@ static void imx_uart_save_context(struct imx_port *sport) > > sport->saved_reg[8] = imx_uart_readl(sport, UBMR); > > sport->saved_reg[9] = imx_uart_readl(sport, IMX21_UTS); > > sport->context_saved = true; > > + spin_unlock_irqrestore(>port.lock, flags); > > } > > > > static void imx_uart_enable_wakeup(struct imx_port *sport, bool on) > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König > | > Industrial Linux Solutions | > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. > pengutronix.de%2Fdata=02%7C01%7Canson.huang%40nxp.com%7C74 > a28e347934451a59bf08d612a57f8c%7C686ea1d3bc2b4c6fa92cd99c5c301635 > %7C0%7C0%7C636716899353510311sdata=lByZDd%2BbQitjEf%2FaiN3 > e26El3ErT0fnmvaKCqckF7Qc%3Dreserved=0 |
kselftests for memory.oom.group
Hi Shuah, I wrote some tests for the new memory.oom.group feature in cgroups 2. In the process, I discovered a few small bugs in the cgroups tests, which I have fixed as well in a separate commit. This is my first ever patch to Linux, so let me know if you see any issues or improvements that can be made. Thanks for all your amazing work on Linux! -Jay
kselftests for memory.oom.group
Hi Shuah, I wrote some tests for the new memory.oom.group feature in cgroups 2. In the process, I discovered a few small bugs in the cgroups tests, which I have fixed as well in a separate commit. This is my first ever patch to Linux, so let me know if you see any issues or improvements that can be made. Thanks for all your amazing work on Linux! -Jay
[PATCH 2/2] Add tests for memory.oom.group
From: Jay Kamat Add tests for memory.oom.group for the following cases: - Killing all processes in a leaf cgroup, but leaving the parent untouched - Killing all processes in a parent and leaf cgroup - Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered for being killed by the group oom killer. Signed-off-by: Jay Kamat --- tools/testing/selftests/cgroup/cgroup_util.c | 21 ++ tools/testing/selftests/cgroup/cgroup_util.h | 1 + .../selftests/cgroup/test_memcontrol.c| 205 ++ 3 files changed, 227 insertions(+) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index 4aadf38bcd5d..6799c69d7f03 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -338,3 +338,24 @@ int is_swap_enabled(void) return cnt > 1; } + +int set_oom_adj_score(int pid, int score) +{ + char path[PATH_MAX]; + int fd, len; + + sprintf(path, "/proc/%d/oom_score_adj", pid); + + fd = open(path, O_WRONLY | O_APPEND); + if (fd < 0) + return fd; + + len = dprintf(fd, "%d", score); + if (len < 0) { + close(fd); + return len; + } + + close(fd); + return 0; +} diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h index fe82a297d4e0..cabd43fd137a 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.h +++ b/tools/testing/selftests/cgroup/cgroup_util.h @@ -39,3 +39,4 @@ extern int get_temp_fd(void); extern int alloc_pagecache(int fd, size_t size); extern int alloc_anon(const char *cgroup, void *arg); extern int is_swap_enabled(void); +extern int set_oom_adj_score(int pid, int score); diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index cf0bddc9d271..017c15a7a935 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -2,6 +2,7 @@ #define _GNU_SOURCE #include +#include #include #include #include @@ -202,6 +203,36 @@ static int alloc_pagecache_50M_noexit(const char *cgroup, void *arg) return 0; } +static int alloc_anon_noexit(const char *cgroup, void *arg) +{ + int ppid = getppid(); + + if (alloc_anon(cgroup, arg)) + return -1; + + while (getppid() == ppid) + sleep(1); + + return 0; +} + +/* + * Wait until processes are killed asynchronously by the OOM killer + * If we exceed a timeout, fail. + */ +static int cg_test_proc_killed(const char *cgroup) +{ + int limit; + + for (limit = 10; limit > 0; limit--) { + if (cg_read_strcmp(cgroup, "cgroup.procs", "") == 0) + return 0; + + usleep(10); + } + return -1; +} + /* * First, this test creates the following hierarchy: * A memory.min = 50M, memory.max = 200M @@ -964,6 +995,177 @@ static int test_memcg_sock(const char *root) return ret; } +/* + * This test disables swapping and tries to allocate anonymous memory + * up to OOM with memory.group.oom set. Then it checks that all + * processes in the leaf (but not the parent) were killed. + */ +static int test_memcg_oom_group_leaf_events(const char *root) +{ + int ret = KSFT_FAIL; + char *parent, *child; + + parent = cg_name(root, "memcg_test_0"); + child = cg_name(root, "memcg_test_0/memcg_test_1"); + + if (!parent || !child) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_create(child)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + if (cg_write(child, "memory.max", "50M")) + goto cleanup; + + if (cg_write(child, "memory.swap.max", "0")) + goto cleanup; + + if (cg_write(child, "memory.oom.group", "1")) + goto cleanup; + + cg_run_nowait(parent, alloc_anon_noexit, (void *) MB(60)); + cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1)); + cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1)); + if (!cg_run(child, alloc_anon, (void *)MB(100))) + goto cleanup; + + if (cg_test_proc_killed(child)) + goto cleanup; + + if (cg_read_key_long(child, "memory.events", "oom_kill ") <= 0) + goto cleanup; + + if (cg_read_key_long(parent, "memory.events", "oom_kill ") != 0) + goto cleanup; + + ret = KSFT_PASS; + +cleanup: + if (child) + cg_destroy(child); + if (parent) + cg_destroy(parent); + free(child); + free(parent); + + return ret; +} + +/* + * This test disables swapping and tries to allocate anonymous memory + * up to OOM with
[PATCH 1/2] Fix cg_read_strcmp()
From: Jay Kamat Fix a couple issues with cg_read_strcmp(), to improve correctness of cgroup tests - Fix cg_read_strcmp() always returning 0 for empty "needle" strings - Fix a memory leak in cg_read_strcmp() Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests") Signed-off-by: Jay Kamat --- tools/testing/selftests/cgroup/cgroup_util.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index 1e9e3c470561..4aadf38bcd5d 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -91,15 +91,24 @@ int cg_read_strcmp(const char *cgroup, const char *control, { size_t size = strlen(expected) + 1; char *buf; + int ret; + + /* Handle the case of comparing against empty string */ + if (size == 1) + size = 32; buf = malloc(size); if (!buf) return -1; - if (cg_read(cgroup, control, buf, size)) + if (cg_read(cgroup, control, buf, size)) { + free(buf); return -1; + } - return strcmp(expected, buf); + ret = strcmp(expected, buf); + free(buf); + return ret; } int cg_read_strstr(const char *cgroup, const char *control, const char *needle) -- 2.17.1
[PATCH 2/2] Add tests for memory.oom.group
From: Jay Kamat Add tests for memory.oom.group for the following cases: - Killing all processes in a leaf cgroup, but leaving the parent untouched - Killing all processes in a parent and leaf cgroup - Keeping processes marked by OOM_SCORE_ADJ_MIN alive when considered for being killed by the group oom killer. Signed-off-by: Jay Kamat --- tools/testing/selftests/cgroup/cgroup_util.c | 21 ++ tools/testing/selftests/cgroup/cgroup_util.h | 1 + .../selftests/cgroup/test_memcontrol.c| 205 ++ 3 files changed, 227 insertions(+) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index 4aadf38bcd5d..6799c69d7f03 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -338,3 +338,24 @@ int is_swap_enabled(void) return cnt > 1; } + +int set_oom_adj_score(int pid, int score) +{ + char path[PATH_MAX]; + int fd, len; + + sprintf(path, "/proc/%d/oom_score_adj", pid); + + fd = open(path, O_WRONLY | O_APPEND); + if (fd < 0) + return fd; + + len = dprintf(fd, "%d", score); + if (len < 0) { + close(fd); + return len; + } + + close(fd); + return 0; +} diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h index fe82a297d4e0..cabd43fd137a 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.h +++ b/tools/testing/selftests/cgroup/cgroup_util.h @@ -39,3 +39,4 @@ extern int get_temp_fd(void); extern int alloc_pagecache(int fd, size_t size); extern int alloc_anon(const char *cgroup, void *arg); extern int is_swap_enabled(void); +extern int set_oom_adj_score(int pid, int score); diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index cf0bddc9d271..017c15a7a935 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -2,6 +2,7 @@ #define _GNU_SOURCE #include +#include #include #include #include @@ -202,6 +203,36 @@ static int alloc_pagecache_50M_noexit(const char *cgroup, void *arg) return 0; } +static int alloc_anon_noexit(const char *cgroup, void *arg) +{ + int ppid = getppid(); + + if (alloc_anon(cgroup, arg)) + return -1; + + while (getppid() == ppid) + sleep(1); + + return 0; +} + +/* + * Wait until processes are killed asynchronously by the OOM killer + * If we exceed a timeout, fail. + */ +static int cg_test_proc_killed(const char *cgroup) +{ + int limit; + + for (limit = 10; limit > 0; limit--) { + if (cg_read_strcmp(cgroup, "cgroup.procs", "") == 0) + return 0; + + usleep(10); + } + return -1; +} + /* * First, this test creates the following hierarchy: * A memory.min = 50M, memory.max = 200M @@ -964,6 +995,177 @@ static int test_memcg_sock(const char *root) return ret; } +/* + * This test disables swapping and tries to allocate anonymous memory + * up to OOM with memory.group.oom set. Then it checks that all + * processes in the leaf (but not the parent) were killed. + */ +static int test_memcg_oom_group_leaf_events(const char *root) +{ + int ret = KSFT_FAIL; + char *parent, *child; + + parent = cg_name(root, "memcg_test_0"); + child = cg_name(root, "memcg_test_0/memcg_test_1"); + + if (!parent || !child) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_create(child)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + if (cg_write(child, "memory.max", "50M")) + goto cleanup; + + if (cg_write(child, "memory.swap.max", "0")) + goto cleanup; + + if (cg_write(child, "memory.oom.group", "1")) + goto cleanup; + + cg_run_nowait(parent, alloc_anon_noexit, (void *) MB(60)); + cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1)); + cg_run_nowait(child, alloc_anon_noexit, (void *) MB(1)); + if (!cg_run(child, alloc_anon, (void *)MB(100))) + goto cleanup; + + if (cg_test_proc_killed(child)) + goto cleanup; + + if (cg_read_key_long(child, "memory.events", "oom_kill ") <= 0) + goto cleanup; + + if (cg_read_key_long(parent, "memory.events", "oom_kill ") != 0) + goto cleanup; + + ret = KSFT_PASS; + +cleanup: + if (child) + cg_destroy(child); + if (parent) + cg_destroy(parent); + free(child); + free(parent); + + return ret; +} + +/* + * This test disables swapping and tries to allocate anonymous memory + * up to OOM with
[PATCH 1/2] Fix cg_read_strcmp()
From: Jay Kamat Fix a couple issues with cg_read_strcmp(), to improve correctness of cgroup tests - Fix cg_read_strcmp() always returning 0 for empty "needle" strings - Fix a memory leak in cg_read_strcmp() Fixes: 84092dbcf901 ("selftests: cgroup: add memory controller self-tests") Signed-off-by: Jay Kamat --- tools/testing/selftests/cgroup/cgroup_util.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index 1e9e3c470561..4aadf38bcd5d 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -91,15 +91,24 @@ int cg_read_strcmp(const char *cgroup, const char *control, { size_t size = strlen(expected) + 1; char *buf; + int ret; + + /* Handle the case of comparing against empty string */ + if (size == 1) + size = 32; buf = malloc(size); if (!buf) return -1; - if (cg_read(cgroup, control, buf, size)) + if (cg_read(cgroup, control, buf, size)) { + free(buf); return -1; + } - return strcmp(expected, buf); + ret = strcmp(expected, buf); + free(buf); + return ret; } int cg_read_strstr(const char *cgroup, const char *control, const char *needle) -- 2.17.1
RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
> -Original Message- > From: Andrea Arcangeli [mailto:aarca...@redhat.com] > Sent: Tuesday, September 04, 2018 4:37 PM > To: Schaufler, Casey > Cc: Jiri Kosina ; Tim Chen ; > Thomas Gleixner ; Ingo Molnar ; > Peter Zijlstra ; Josh Poimboeuf > ; Woodhouse, David ; Oleg > Nesterov ; linux-kernel@vger.kernel.org; x...@kernel.org > Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can > be applied on arbitrary tasks > > Hello, > > On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote: > > The real reason to use an LSM based approach is that overloading ptrace > > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a > > processor interface. Even if ptrace_may_access() does exactly what you > > "Side channel is a processor interface" doesn't make me optimistic, > but I assume you're not speaking for Intel. Sorry, I've been working in security too long for my optimistic streak to be very wide. > > want it to for side-channel mitigation today it would be incredibly > > inappropriate to tie the two together for eternity. You don't want to > > restrict > > the ptrace checks to only those things that are also good for side-channel, > > and vice versa. > > It seems like you want to make this more configurable, Not especially, no. I have gotten considerable feedback that it should be configurable, and while I may not see the value myself, I do respect the people who've requested the configurability. > we have all > debugfs x86 specific tweaks to disable IBPB at runtime and we don't > allow a runtime opt-out of IBPB alone. > > If you shutdown either IBRS or retpolines at runtime with debugfs, > then IBPB goes away too. > > Giving a finegrined way to disable only IBPB we found was overkill > because IBPB has never been measurable if done only when the prev task > cannot ptrace the next task (which is a superset of the too weak > upstream not dumpable check, but still not a performance issue). > > Allowing IBPB runtime opt-out doesn't make much sense if you don't > allow to disable retpolines too still at runtime. And disabling > retpolines from LSM doesn't sounds the right place, it's an x86 > temporary quirk only relevant for current buggy CPUs. > > There should be a function that decides when IBPB and flush_RSB should > be run (flush_RSB has to be run if SMEP because with SMEP there's no > need to run flush_RSB at every kernel entry anymore), and that > function happens to check ptrace today, but it would check other stuff > too if we had other interfaces besides ptrace that would allow the > prev task to read the memory of the next task. So it's not so much > about ptrace nor about IBPB, it's about "IBPB_RSB" vs "prev task > can read memory of next task". Then each arch can implement > "IBPB_RSB" method its own way but the check is for the common > code and it should be in the scheduler and there's just 1 version of > this check needed. Right, I get that. There's more to ptrace access than "I can read the other task". There's more to what the processor is up to than IBPB. > > I don't think there should be a ton of different versions of this > function each providing a different answer, which is what LSM would > provide. If they provide different answers there should be different functions. It's a question of viewpoint. If you don't care about the SELinux viewpoint you shouldn't have to include it in your checks, any more than you care about x86 issues when you're running on a MIPS processor. > At most you can add a x86 debugfs tweak to shut off the logic but > again it would make more sense if retpolines could be shut off at > runtime too, doing it just for IBPB sounds backwards because it has > such an unmeasurable overhead. > > > Yes. That would be me. > > Even the attempt to make an innocuous call like > ptrace_has_cap(tcred->user_ns, mode) will eventually > deadlock there. Which is yet another reason there needs to be separate logic. > I used a PTRACE_MODE_ check that the ptrace check code uses to filter > out specific calls that may eventually enter LSM and hard lockup in > non reproducible workloads (some false positive IBPB is ok, especially > if it avoids a deadlock). Yes, even security people have to worry about locking. > Everything can be fixed in any way, but calling LSM from scheduler > code doesn't sound the most robust thing to do in general because what > works outside the scheduler doesn't work from within those semi atomic > regions (it tends to work initially until it eventually > deadlocks). The original code of such check, had all sort of deadlocks > because of that. What *is* the most robust way? Yes, there are locking issues. The code in the LSM infrastructure and in the security modules needs to be aware of that and deal with it. The SELinux code I proposed is more complex than it could be because the audit code requires locking that doesn't work in the switching context. > Thanks, > Andrea
RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
> -Original Message- > From: Andrea Arcangeli [mailto:aarca...@redhat.com] > Sent: Tuesday, September 04, 2018 4:37 PM > To: Schaufler, Casey > Cc: Jiri Kosina ; Tim Chen ; > Thomas Gleixner ; Ingo Molnar ; > Peter Zijlstra ; Josh Poimboeuf > ; Woodhouse, David ; Oleg > Nesterov ; linux-kernel@vger.kernel.org; x...@kernel.org > Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can > be applied on arbitrary tasks > > Hello, > > On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote: > > The real reason to use an LSM based approach is that overloading ptrace > > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a > > processor interface. Even if ptrace_may_access() does exactly what you > > "Side channel is a processor interface" doesn't make me optimistic, > but I assume you're not speaking for Intel. Sorry, I've been working in security too long for my optimistic streak to be very wide. > > want it to for side-channel mitigation today it would be incredibly > > inappropriate to tie the two together for eternity. You don't want to > > restrict > > the ptrace checks to only those things that are also good for side-channel, > > and vice versa. > > It seems like you want to make this more configurable, Not especially, no. I have gotten considerable feedback that it should be configurable, and while I may not see the value myself, I do respect the people who've requested the configurability. > we have all > debugfs x86 specific tweaks to disable IBPB at runtime and we don't > allow a runtime opt-out of IBPB alone. > > If you shutdown either IBRS or retpolines at runtime with debugfs, > then IBPB goes away too. > > Giving a finegrined way to disable only IBPB we found was overkill > because IBPB has never been measurable if done only when the prev task > cannot ptrace the next task (which is a superset of the too weak > upstream not dumpable check, but still not a performance issue). > > Allowing IBPB runtime opt-out doesn't make much sense if you don't > allow to disable retpolines too still at runtime. And disabling > retpolines from LSM doesn't sounds the right place, it's an x86 > temporary quirk only relevant for current buggy CPUs. > > There should be a function that decides when IBPB and flush_RSB should > be run (flush_RSB has to be run if SMEP because with SMEP there's no > need to run flush_RSB at every kernel entry anymore), and that > function happens to check ptrace today, but it would check other stuff > too if we had other interfaces besides ptrace that would allow the > prev task to read the memory of the next task. So it's not so much > about ptrace nor about IBPB, it's about "IBPB_RSB" vs "prev task > can read memory of next task". Then each arch can implement > "IBPB_RSB" method its own way but the check is for the common > code and it should be in the scheduler and there's just 1 version of > this check needed. Right, I get that. There's more to ptrace access than "I can read the other task". There's more to what the processor is up to than IBPB. > > I don't think there should be a ton of different versions of this > function each providing a different answer, which is what LSM would > provide. If they provide different answers there should be different functions. It's a question of viewpoint. If you don't care about the SELinux viewpoint you shouldn't have to include it in your checks, any more than you care about x86 issues when you're running on a MIPS processor. > At most you can add a x86 debugfs tweak to shut off the logic but > again it would make more sense if retpolines could be shut off at > runtime too, doing it just for IBPB sounds backwards because it has > such an unmeasurable overhead. > > > Yes. That would be me. > > Even the attempt to make an innocuous call like > ptrace_has_cap(tcred->user_ns, mode) will eventually > deadlock there. Which is yet another reason there needs to be separate logic. > I used a PTRACE_MODE_ check that the ptrace check code uses to filter > out specific calls that may eventually enter LSM and hard lockup in > non reproducible workloads (some false positive IBPB is ok, especially > if it avoids a deadlock). Yes, even security people have to worry about locking. > Everything can be fixed in any way, but calling LSM from scheduler > code doesn't sound the most robust thing to do in general because what > works outside the scheduler doesn't work from within those semi atomic > regions (it tends to work initially until it eventually > deadlocks). The original code of such check, had all sort of deadlocks > because of that. What *is* the most robust way? Yes, there are locking issues. The code in the LSM infrastructure and in the security modules needs to be aware of that and deal with it. The SELinux code I proposed is more complex than it could be because the audit code requires locking that doesn't work in the switching context. > Thanks, > Andrea
Re: POSIX violation by writeback error
On Tue, Sep 04, 2018 at 01:35:34PM -0700, Vito Caputo wrote: > On Tue, Sep 04, 2018 at 04:18:18PM -0400, Jeff Layton wrote: > > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote: > > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote: > > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote: > > > > > Well, I think the point was that in the above examples you'd prefer > > > > > that > > > > > the read just fail--no need to keep the data. A bit marking the file > > > > > (or even the entire filesystem) unreadable would satisfy posix, I > > > > > guess. > > > > > Whether that's practical, I don't know. > > > > > > > > When you would do it like that (mark the whole filesystem as "in > > > > error") things go from bad to worse even faster. The Linux kernel > > > > tries to keep the system up even in the face of errors. > > > > > > > > With that suggestion, having one application run into a writeback > > > > error would effectively crash the whole system because the filesystem > > > > may be the root filesystem and stuff like "sshd" that you need to > > > > diagnose the problem needs to be read from the disk > > > > > > Well, the absolutist position on posix compliance here would be that a > > > crash is still preferable to returning the wrong data. And for the > > > cases 焦晓冬 gives, that sounds right? Maybe it's the wrong balance in > > > general, I don't know. And we do already have filesystems with > > > panic-on-error options, so if they aren't used maybe then maybe users > > > have already voted against that level of strictness. > > > > > > > Yeah, idk. The problem here is that this is squarely in the domain of > > implementation defined behavior. I do think that the current "policy" > > (if you call it that) of what to do after a wb error is weird and wrong. > > What we probably ought to do is start considering how we'd like it to > > behave. > > > > How about something like this? > > > > Mark the pages as "uncleanable" after a writeback error. We'll satisfy > > reads from the cached data until someone calls fsync, at which point > > we'd return the error and invalidate the uncleanable pages. > > > > If no one calls fsync and scrapes the error, we'll hold on to it for as > > long as we can (or up to some predefined limit) and then after that > > we'll invalidate the uncleanable pages and start returning errors on > > reads. If someone eventually calls fsync afterward, we can return to > > normal operation. > > > > As always though...what about mmap? Would we need to SIGBUS at the point > > where we'd start returning errors on read()? > > > > Would that approximate the current behavior enough and make sense? > > Implementing it all sounds non-trivial though... > > > > Here's a crazy and potentially stupid idea: > > Implement a new class of swap space for backing dirty pages which fail > to write back. Pages in this space survive reboots, essentially backing > the implicit commitment POSIX establishes in the face of asynchronous > writeback errors. Rather than evicting these pages as clean, they are > swapped out to the persistent swap. And when that "swap" area gets write errors, too? What then? We're straight back to the same "what the hell do we do with the error" problem. Adding more turtles doesn't help solve this issue. Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: POSIX violation by writeback error
On Tue, Sep 04, 2018 at 01:35:34PM -0700, Vito Caputo wrote: > On Tue, Sep 04, 2018 at 04:18:18PM -0400, Jeff Layton wrote: > > On Tue, 2018-09-04 at 14:54 -0400, J. Bruce Fields wrote: > > > On Tue, Sep 04, 2018 at 06:23:48PM +0200, Rogier Wolff wrote: > > > > On Tue, Sep 04, 2018 at 12:12:03PM -0400, J. Bruce Fields wrote: > > > > > Well, I think the point was that in the above examples you'd prefer > > > > > that > > > > > the read just fail--no need to keep the data. A bit marking the file > > > > > (or even the entire filesystem) unreadable would satisfy posix, I > > > > > guess. > > > > > Whether that's practical, I don't know. > > > > > > > > When you would do it like that (mark the whole filesystem as "in > > > > error") things go from bad to worse even faster. The Linux kernel > > > > tries to keep the system up even in the face of errors. > > > > > > > > With that suggestion, having one application run into a writeback > > > > error would effectively crash the whole system because the filesystem > > > > may be the root filesystem and stuff like "sshd" that you need to > > > > diagnose the problem needs to be read from the disk > > > > > > Well, the absolutist position on posix compliance here would be that a > > > crash is still preferable to returning the wrong data. And for the > > > cases 焦晓冬 gives, that sounds right? Maybe it's the wrong balance in > > > general, I don't know. And we do already have filesystems with > > > panic-on-error options, so if they aren't used maybe then maybe users > > > have already voted against that level of strictness. > > > > > > > Yeah, idk. The problem here is that this is squarely in the domain of > > implementation defined behavior. I do think that the current "policy" > > (if you call it that) of what to do after a wb error is weird and wrong. > > What we probably ought to do is start considering how we'd like it to > > behave. > > > > How about something like this? > > > > Mark the pages as "uncleanable" after a writeback error. We'll satisfy > > reads from the cached data until someone calls fsync, at which point > > we'd return the error and invalidate the uncleanable pages. > > > > If no one calls fsync and scrapes the error, we'll hold on to it for as > > long as we can (or up to some predefined limit) and then after that > > we'll invalidate the uncleanable pages and start returning errors on > > reads. If someone eventually calls fsync afterward, we can return to > > normal operation. > > > > As always though...what about mmap? Would we need to SIGBUS at the point > > where we'd start returning errors on read()? > > > > Would that approximate the current behavior enough and make sense? > > Implementing it all sounds non-trivial though... > > > > Here's a crazy and potentially stupid idea: > > Implement a new class of swap space for backing dirty pages which fail > to write back. Pages in this space survive reboots, essentially backing > the implicit commitment POSIX establishes in the face of asynchronous > writeback errors. Rather than evicting these pages as clean, they are > swapped out to the persistent swap. And when that "swap" area gets write errors, too? What then? We're straight back to the same "what the hell do we do with the error" problem. Adding more turtles doesn't help solve this issue. Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [RFC PATCH 1/2] pipe: introduce busy wait for pipe
On 08/31/2018 09:09 AM, Steven Sistare wrote: On 8/30/2018 4:24 PM, subhra mazumdar wrote: Introduce pipe_ll_usec field for pipes that indicates the amount of micro seconds a thread should spin if pipe is empty or full before sleeping. This is similar to network sockets. Workloads like hackbench in pipe mode benefits significantly from this by avoiding the sleep and wakeup overhead. Other similar usecases can benefit. pipe_wait_flag is used to signal any thread busy waiting. pipe_busy_loop_timeout checks if spin time is over. Signed-off-by: subhra mazumdar --- include/linux/pipe_fs_i.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index e7497c9..fdfd2a2 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -1,6 +1,8 @@ #ifndef _LINUX_PIPE_FS_I_H #define _LINUX_PIPE_FS_I_H +#include + #define PIPE_DEF_BUFFERS 16 #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */ @@ -54,6 +56,8 @@ struct pipe_inode_info { unsigned int waiting_writers; unsigned int r_counter; unsigned int w_counter; + unsigned int pipe_ll_usec; + unsigned long pipe_wait_flag; struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers; @@ -157,6 +161,21 @@ static inline int pipe_buf_steal(struct pipe_inode_info *pipe, return buf->ops->steal(pipe, buf); } +static inline unsigned long pipe_busy_loop_current_time(void) +{ + return (unsigned long)(local_clock() >> 10); Why ">> 10" ? local_lock() has nanosec units, and you compare to the tunable pipe_llc_sec which has microsec units. Should be ">> 3". Better yet, redefine the tunable to have nanosec units. I suspect you will need very large values of the tunable to show similar results. It's 2^10. I don't think using nanosec units is necessary. It is unlikely data will be read or written in nano seconds. sk_busy_loop_timeout for sockets uses micro seconds too. Also, since this type of optimization consumes CPU extra cycles that could be used by other tasks, show the overall CPU utilization before and after the optimization, such as by using "time hackbench ...". OK. Thanks, Subhra - Steve +} + +static inline bool pipe_busy_loop_timeout(struct pipe_inode_info *pipe, + unsigned long start_time) +{ + unsigned long bp_usec = READ_ONCE(pipe->pipe_ll_usec); + unsigned long end_time = start_time + bp_usec; + unsigned long now = pipe_busy_loop_current_time(); + + return time_after(now, end_time); +} + /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ #define PIPE_SIZE PAGE_SIZE
Re: [RFC PATCH 1/2] pipe: introduce busy wait for pipe
On 08/31/2018 09:09 AM, Steven Sistare wrote: On 8/30/2018 4:24 PM, subhra mazumdar wrote: Introduce pipe_ll_usec field for pipes that indicates the amount of micro seconds a thread should spin if pipe is empty or full before sleeping. This is similar to network sockets. Workloads like hackbench in pipe mode benefits significantly from this by avoiding the sleep and wakeup overhead. Other similar usecases can benefit. pipe_wait_flag is used to signal any thread busy waiting. pipe_busy_loop_timeout checks if spin time is over. Signed-off-by: subhra mazumdar --- include/linux/pipe_fs_i.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index e7497c9..fdfd2a2 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -1,6 +1,8 @@ #ifndef _LINUX_PIPE_FS_I_H #define _LINUX_PIPE_FS_I_H +#include + #define PIPE_DEF_BUFFERS 16 #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */ @@ -54,6 +56,8 @@ struct pipe_inode_info { unsigned int waiting_writers; unsigned int r_counter; unsigned int w_counter; + unsigned int pipe_ll_usec; + unsigned long pipe_wait_flag; struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers; @@ -157,6 +161,21 @@ static inline int pipe_buf_steal(struct pipe_inode_info *pipe, return buf->ops->steal(pipe, buf); } +static inline unsigned long pipe_busy_loop_current_time(void) +{ + return (unsigned long)(local_clock() >> 10); Why ">> 10" ? local_lock() has nanosec units, and you compare to the tunable pipe_llc_sec which has microsec units. Should be ">> 3". Better yet, redefine the tunable to have nanosec units. I suspect you will need very large values of the tunable to show similar results. It's 2^10. I don't think using nanosec units is necessary. It is unlikely data will be read or written in nano seconds. sk_busy_loop_timeout for sockets uses micro seconds too. Also, since this type of optimization consumes CPU extra cycles that could be used by other tasks, show the overall CPU utilization before and after the optimization, such as by using "time hackbench ...". OK. Thanks, Subhra - Steve +} + +static inline bool pipe_busy_loop_timeout(struct pipe_inode_info *pipe, + unsigned long start_time) +{ + unsigned long bp_usec = READ_ONCE(pipe->pipe_ll_usec); + unsigned long end_time = start_time + bp_usec; + unsigned long now = pipe_busy_loop_current_time(); + + return time_after(now, end_time); +} + /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ #define PIPE_SIZE PAGE_SIZE
linux-next: build warnings after merge of the devicetree tree
Hi Rob, After merging the devicetree tree, today's linux-next build (x86_64 allmodconfig) produced these warnings: WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells The function of_fdt_limit_memory() references the variable __initdata dt_root_addr_cells. This is often because of_fdt_limit_memory lacks a __initdata annotation or the annotation of dt_root_addr_cells is wrong. WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells The function of_fdt_limit_memory() references the variable __initdata dt_root_size_cells. This is often because of_fdt_limit_memory lacks a __initdata annotation or the annotation of dt_root_size_cells is wrong. Introduced by commit bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory") -- Cheers, Stephen Rothwell pgphx2tNF83ur.pgp Description: OpenPGP digital signature
linux-next: build warnings after merge of the devicetree tree
Hi Rob, After merging the devicetree tree, today's linux-next build (x86_64 allmodconfig) produced these warnings: WARNING: vmlinux.o(.text+0xf40a16): Section mismatch in reference from the function of_fdt_limit_memory() to the variable .init.data:dt_root_addr_cells The function of_fdt_limit_memory() references the variable __initdata dt_root_addr_cells. This is often because of_fdt_limit_memory lacks a __initdata annotation or the annotation of dt_root_addr_cells is wrong. WARNING: vmlinux.o(.text+0xf40a1d): Section mismatch in reference from the function of_fdt_limit_memory() to the variable .init.data:dt_root_size_cells The function of_fdt_limit_memory() references the variable __initdata dt_root_size_cells. This is often because of_fdt_limit_memory lacks a __initdata annotation or the annotation of dt_root_size_cells is wrong. Introduced by commit bb35ea5c7c30 ("of/fdt: avoid re-parsing '#{address,size}-cells' in of_fdt_limit_memory") -- Cheers, Stephen Rothwell pgphx2tNF83ur.pgp Description: OpenPGP digital signature
Re: [RFC PATCH 2/2] pipe: use pipe busy wait
On 09/04/2018 02:54 PM, Thomas Gleixner wrote: On Thu, 30 Aug 2018, subhra mazumdar wrote: +void pipe_busy_wait(struct pipe_inode_info *pipe) +{ + unsigned long wait_flag = pipe->pipe_wait_flag; + unsigned long start_time = pipe_busy_loop_current_time(); + + pipe_unlock(pipe); + preempt_disable(); + for (;;) { + if (pipe->pipe_wait_flag > wait_flag) { + preempt_enable(); + pipe_lock(pipe); + return; + } + if (pipe_busy_loop_timeout(pipe, start_time)) + break; + cpu_relax(); + } + preempt_enable(); You are not really serious about busy looping with preemption disabled? That's just wrong. Why do you want to block others from getting on the CPU if there is nothing in the pipe? There is no point in doing so, really. If the wait loop is preempted because there is more important work to do, then it will come back and either see new data, or leave due to wait time reached. Makes sense. I will remove it. Thanks, Subhra Thanks, tglx
Re: [RFC PATCH 2/2] pipe: use pipe busy wait
On 09/04/2018 02:54 PM, Thomas Gleixner wrote: On Thu, 30 Aug 2018, subhra mazumdar wrote: +void pipe_busy_wait(struct pipe_inode_info *pipe) +{ + unsigned long wait_flag = pipe->pipe_wait_flag; + unsigned long start_time = pipe_busy_loop_current_time(); + + pipe_unlock(pipe); + preempt_disable(); + for (;;) { + if (pipe->pipe_wait_flag > wait_flag) { + preempt_enable(); + pipe_lock(pipe); + return; + } + if (pipe_busy_loop_timeout(pipe, start_time)) + break; + cpu_relax(); + } + preempt_enable(); You are not really serious about busy looping with preemption disabled? That's just wrong. Why do you want to block others from getting on the CPU if there is nothing in the pipe? There is no point in doing so, really. If the wait loop is preempted because there is more important work to do, then it will come back and either see new data, or leave due to wait time reached. Makes sense. I will remove it. Thanks, Subhra Thanks, tglx
Re: [GIT PULL] s390 patches for the 4.19 merge window #2
On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky wrote: > Harald Freudenberger (5): > s390/zcrypt: hex string mask improvements for apmask and aqmask. This (and an earlier 2017 commit) adds VLAs, which are being removed[1] from the kernel: drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable length array ‘clrm’ [-Wvla] drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable length array ‘setm’ [-Wvla] drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable length array ‘setm’ [-Wvla] static int process_mask_arg(const char *str, unsigned long *bitmap, int bits, struct mutex *lock) ... DECLARE_BITMAP(clrm, bits); DECLARE_BITMAP(setm, bits); Can someone please adjust this to make these fixed size again? Thanks! -Kees [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com -- Kees Cook Pixel Security
Re: [GIT PULL] s390 patches for the 4.19 merge window #2
On Fri, Aug 24, 2018 at 12:42 AM, Martin Schwidefsky wrote: > Harald Freudenberger (5): > s390/zcrypt: hex string mask improvements for apmask and aqmask. This (and an earlier 2017 commit) adds VLAs, which are being removed[1] from the kernel: drivers/s390/crypto/ap_bus.c:980:3: warning: ISO C90 forbids variable length array ‘clrm’ [-Wvla] drivers/s390/crypto/ap_bus.c:981:3: warning: ISO C90 forbids variable length array ‘setm’ [-Wvla] drivers/s390/crypto/ap_bus.c:995:3: warning: ISO C90 forbids variable length array ‘setm’ [-Wvla] static int process_mask_arg(const char *str, unsigned long *bitmap, int bits, struct mutex *lock) ... DECLARE_BITMAP(clrm, bits); DECLARE_BITMAP(setm, bits); Can someone please adjust this to make these fixed size again? Thanks! -Kees [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com -- Kees Cook Pixel Security
Re: linux-next: error when fetching the bcm2835 tree
Hi Stefan, On Tue, 4 Sep 2018 22:07:15 +0200 (CEST) Stefan Wahren wrote: > > > Eric Anholt hat am 4. September 2018 um 19:41 geschrieben: > > > > Stephen Rothwell writes: > > > > > Fetching the bcm2835 tree (git://github.com/anholt/linux.git#for-next) > > > produces this error: > > > > > > fatal: Couldn't find remote ref refs/heads/for-next > > > > I'd passed responsibility for the git tree off to Stefan while I'm > > mostly-unavailable due to new baby. Maybe Stefan had deleted the branch > > for a bit? > > sorry for that. I deleted the branch and i manually setup the new one later. > > I hope, everything is fine now? Yes, all good. -- Cheers, Stephen Rothwell pgpZ5h6V9RzV5.pgp Description: OpenPGP digital signature
Re: linux-next: error when fetching the bcm2835 tree
Hi Stefan, On Tue, 4 Sep 2018 22:07:15 +0200 (CEST) Stefan Wahren wrote: > > > Eric Anholt hat am 4. September 2018 um 19:41 geschrieben: > > > > Stephen Rothwell writes: > > > > > Fetching the bcm2835 tree (git://github.com/anholt/linux.git#for-next) > > > produces this error: > > > > > > fatal: Couldn't find remote ref refs/heads/for-next > > > > I'd passed responsibility for the git tree off to Stefan while I'm > > mostly-unavailable due to new baby. Maybe Stefan had deleted the branch > > for a bit? > > sorry for that. I deleted the branch and i manually setup the new one later. > > I hope, everything is fine now? Yes, all good. -- Cheers, Stephen Rothwell pgpZ5h6V9RzV5.pgp Description: OpenPGP digital signature