Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On 7/13/2018 10:50 PM, John Stultz wrote: On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha wrote: Hi John, Thanks for your response Please find my comments inline. On 7/11/2018 1:43 AM, John Stultz wrote: On Fri, Jul 6, 2018 at 6:17 AM, Mukesh Ojha wrote: Currently, there exists a corner case assuming when there is only one clocksource e.g RTC, and system failed to go to suspend mode. While resume rtc_resume() injects the sleeptime as timekeeping_rtc_skipresume() returned 'false' (default value of sleeptime_injected) due to which we can see mismatch in timestamps. This issue can also come in a system where more than one clocksource are present and very first suspend fails. Fix this by handling `sleeptime_injected` flag properly. Success case: {sleeptime_injected=false} rtc_suspend() => timekeeping_suspend() => timekeeping_resume() => (sleeptime injected) rtc_resume() Failure case: {failure in sleep path} {sleeptime_injected=false} rtc_suspend() => rtc_resume() sleeptime injected again which was not required as the suspend failed) Originally-by: Thomas Gleixner Signed-off-by: Mukesh Ojha --- Changes in v3: * Updated commit subject and description. * Updated the patch as per the fix given by Thomas Gleixner. Changes in v2: * Updated the commit text. * Removed extra variable and used the earlier static variable 'sleeptime_injected'. kernel/time/timekeeping.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 4786df9..32ae9ae 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1510,8 +1510,20 @@ void __weak read_boot_clock64(struct timespec64 *ts) ts->tv_nsec = 0; } -/* Flag for if timekeeping_resume() has injected sleeptime */ -static bool sleeptime_injected; +/* + * Flag reflecting whether timekeeping_resume() has injected sleeptime. + * + * The flag starts of true and is only cleared when a suspend reaches + * timekeeping_suspend(), timekeeping_resume() sets it when the timekeeper + * clocksource is not stopping across suspend and has been used to update + * sleep time. If the timekeeper clocksource has stopped then the flag + * stays false and is used by the RTC resume code to decide whether sleep + * time must be injected and if so the flag gets set then. + * + * If a suspend fails before reaching timekeeping_resume() then the flag + * stays true and prevents erroneous sleeptime injection. + */ +static bool sleeptime_injected = true; I worry this upside-down logic is too subtle to be easily reasoned about, and will just lead to future mistakes. Can we instead call this "suspend_timing_needed" and only set it to true when we don't inject any sleep time on resume? I did not get your point "only set it to true when we don't inject any sleep time on resume? " How do we know this ? This question itself depends on the "sleeptime_injected" if it is true means no need to inject else need to inject. Also, we need to make this variable back and forth true, false; suspends path ensures it to make it false. So yea, I'm not saying logically the code is really any different, this is more of a naming nit. So instead of having a variable that is always on that we occasionally turn off, lets invert the naming and have it be a flag that we occasionally turn on. I understand your concern about the name of the variable will be misleading. But the changing Boolean state would not solve the actual issue. If i understand you correctly you meant below code diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32ae9ae..becc5bd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts) * If a suspend fails before reaching timekeeping_resume() then the flag * stays true and prevents erroneous sleeptime injection. */ -static bool sleeptime_injected = true; +static bool suspend_timing_needed; /* Flag for if there is a persistent clock on this platform */ static bool persistent_clock_exists; @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 *delta) raw_spin_lock_irqsave(_lock, flags); write_seqcount_begin(_core.seq); - sleeptime_injected = true; + suspend_timing_needed = false; timekeeping_forward_now(tk); @@ -1714,10 +1714,10 @@ void timekeeping_resume(void) tk->tkr_mono.mask); nsec = mul_u64_u32_shr(cyc_delta, clock->mult, clock->shift); ts_delta = ns_to_timespec64(nsec); - sleeptime_injected = true; + suspend_timing_needed = true; } else if (timespec64_compare(_new, _suspend_time) > 0) { ts_delta = timespec64_sub(ts_new,
Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks
On Mon, Jul 16, 2018 at 8:08 AM Eric W. Biederman wrote: > > The change for global init is it will now die if init is a member of the > session or init is using this tty as it's controlling tty. > > Semantically killing init with SAK is completely appropriate. No. Semtnaitcally killing init is completely wrong. Because it will kill the whole system. And I don't mean that in "now init won't spawn new things". I mean that in "now we don't have a child reaper any more, and the system will be dead because we'll panic on exit". So it's not about the controlling tty, it's about fundamental kernel internal consistency guarantees. See write_unlock_irq(_lock); if (unlikely(pid_ns == _pid_ns)) { panic("Attempted to kill init! exitcode=0x%08x\n", father->signal->group_exit_code ?: father->exit_code); } in kernel/exit.c. Linus
Re: [PATCH v5 1/2] leds: core: Introduce generic pattern interface
On 07/16/2018 06:10 AM, Baolin Wang wrote: From: Bjorn Andersson Some LED controllers have support for autonomously controlling brightness over time, according to some preprogrammed pattern or function. This adds a new optional operator that LED class drivers can implement if they support such functionality as well as a new device attribute to configure the pattern for a given LED. Hmm... I was hoping that this would be a generic pattern trigger that could be used by any LED (e.g. gpio LEDs), not just LEDs with hardware support. I guess I cannot really test this after all. I think it looks good for your use case though.
Re: [v9] clk: qcom: clk-rpmh: Add QCOM RPMh clock driver
Quoting Taniya Das (2018-05-08 22:56:07) > Add the RPMh clock driver to control the RPMh managed clock resources on > some of the Qualcomm Technologies, Inc. SoCs. > > Signed-off-by: Taniya Das > --- Applied to clk-next
Re: [PATCH v2 1/2] dt-binding: pinctrl: Add NPCM7xx pinctrl and GPIO documentation
On Fri, Jul 13, 2018 at 12:42:02AM +0300, Tomer Maimon wrote: > Added device tree binding documentation for Nuvoton BMC > NPCM750/730/715/705 pinmux and GPIO controller. > > Signed-off-by: Tomer Maimon > --- > .../bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt | 216 > + > 1 file changed, 216 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt > > diff --git > a/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt > b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt > new file mode 100644 > index ..83f4bbac94bb > --- /dev/null > +++ b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt > @@ -0,0 +1,216 @@ > +Nuvoton NPCM7XX Pin Controllers > + > +The Nuvoton BMC NPCM7XX Pin Controller multi-function routed through > +the multiplexing block, Each pin supports GPIO functionality (GPIOx) > +and multiple functions that directly connect the pin to different > +hardware blocks. > + > +Required properties: > +- #address-cells : should be 1. > +- #size-cells : should be 1. > +- compatible : "nuvoton,npcm750-pinctrl" for Poleg NPCM7XX. > +- ranges : defines mapping ranges between pin controller node (parent) > + to GPIO bank node (children). > + > +=== GPIO Bank Subnode === > + > +The NPCM7XX has 8 GPIO Banks each GPIO bank supports 32 GPIO. > + > +Required GPIO Bank subnode-properties: > +- reg: specifies physical base address and size of > the GPIO > + bank registers. > +- gpio-controller: Marks the device node as a GPIO controller. > +- #gpio-cells: Must be <2>. The first cell is the gpio pin > number > + and the second cell is used for optional > parameters. > +- interrupts : contain the GPIO bank interrupt with flags for > falling edge. > +- gpio-ranges: defines the range of pins managed by the GPIO > bank controller. > + > +For example, GPIO bank subnodes like the following: > + gpio0: gpio@f001 { > + gpio-controller; > + #gpio-cells = <2>; > + reg = <0x0 0x80>; > + interrupts = ; > + gpio-ranges = < 0 0 32>; > + }; > + > +=== Pin Mux Subnode === > + > +- pin: A string containing the name of the pin > + An array of strings, each string containing the name of a pin. > + These pin are used for selecting pin configuration. > + > +The following are the list of pins available: > + "GPIO0/IOX1DI", "GPIO1/IOX1LD", "GPIO2/IOX1CK", "GPIO3/IOX1D0", > + "GPIO4/IOX2DI/SMB1DSDA", "GPIO5/IOX2LD/SMB1DSCL", > "GPIO6/IOX2CK/SMB2DSDA", > + "GPIO7/IOX2D0/SMB2DSCL", "GPIO8/LKGPO1", "GPIO9/LKGPO2", > "GPIO10/IOXHLD", > + "GPIO11/IOXHCK", "GPIO12/GSPICK/SMB5BSCL", "GPIO13/GSPIDO/SMB5BSDA", > + "GPIO14/GSPIDI/SMB5CSCL", "GPIO15/GSPICS/SMB5CSDA", "GPIO16/LKGPO0", > + "GPIO17/PSPI2DI/SMB4DEN","GPIO18/PSPI2D0/SMB4BSDA", > "GPIO19/PSPI2CK/SMB4BSCL", > + "GPIO20/SMB4CSDA/SMB15SDA", "GPIO21/SMB4CSCL/SMB15SCL", > "GPIO22/SMB4DSDA/SMB14SDA", > + "GPIO23/SMB4DSCL/SMB14SCL", "GPIO24/IOXHDO", "GPIO25/IOXHDI", > "GPIO26/SMB5SDA", > + "GPIO27/SMB5SCL", "GPIO28/SMB4SDA", "GPIO29/SMB4SCL", "GPIO30/SMB3SDA", > + "GPIO31/SMB3SCL", "GPIO32/nSPI0CS1","SPI0D2", "SPI0D3", > "GPIO37/SMB3CSDA", > + "GPIO38/SMB3CSCL", "GPIO39/SMB3BSDA", "GPIO40/SMB3BSCL", > "GPIO41/BSPRXD", > + "GPO42/BSPTXD/STRAP11", "GPIO43/RXD1/JTMS2/BU1RXD", > "GPIO44/nCTS1/JTDI2/BU1CTS", > + "GPIO45/nDCD1/JTDO2", "GPIO46/nDSR1/JTCK2", "GPIO47/nRI1/JCP_RDY2", > + "GPIO48/TXD2/BSPTXD", "GPIO49/RXD2/BSPRXD", "GPIO50/nCTS2", > "GPO51/nRTS2/STRAP2", > + "GPIO52/nDCD2", "GPO53/nDTR2_BOUT2/STRAP1", "GPIO54/nDSR2", > "GPIO55/nRI2", > + "GPIO56/R1RXERR", "GPIO57/R1MDC", "GPIO58/R1MDIO", "GPIO59/SMB3DSDA", > + "GPIO60/SMB3DSCL", "GPO61/nDTR1_BOUT1/STRAP6", "GPO62/nRTST1/STRAP5", > + "GPO63/TXD1/STRAP4", "GPIO64/FANIN0", "GPIO65/FANIN1", "GPIO66/FANIN2", > + "GPIO67/FANIN3", "GPIO68/FANIN4", "GPIO69/FANIN5", "GPIO70/FANIN6", > "GPIO71/FANIN7", > + "GPIO72/FANIN8", "GPIO73/FANIN9", "GPIO74/FANIN10", "GPIO75/FANIN11", > + "GPIO76/FANIN12", "GPIO77/FANIN13","GPIO78/FANIN14", "GPIO79/FANIN15", > + "GPIO80/PWM0", "GPIO81/PWM1", "GPIO82/PWM2", "GPIO83/PWM3", > "GPIO84/R2TXD0", > + "GPIO85/R2TXD1", "GPIO86/R2TXEN", "GPIO87/R2RXD0", "GPIO88/R2RXD1", > "GPIO89/R2CRSDV", > + "GPIO90/R2RXERR", "GPIO91/R2MDC", "GPIO92/R2MDIO", > "GPIO93/GA20/SMB5DSCL", > + "GPIO94/nKBRST/SMB5DSDA", "GPIO95/nLRESET/nESPIRST", "GPIO96/RG1TXD0", > + "GPIO97/RG1TXD1", "GPIO98/RG1TXD2", "GPIO99/RG1TXD3","GPIO100/RG1TXC", > + "GPIO101/RG1TXCTL", "GPIO102/RG1RXD0", "GPIO103/RG1RXD1", > "GPIO104/RG1RXD2", > + "GPIO105/RG1RXD3", "GPIO106/RG1RXC", "GPIO107/RG1RXCTL", > "GPIO108/RG1MDC", > + "GPIO109/RG1MDIO",
[PATCH memory-model 0/14] Updates to the formal memory model
Hello! This series contains updates to the Linux kernel's formal memory model in tools/memory-model, along with corresponding changes in documentation and Linux-kernel code. These patches are ready for inclusion into -tip. 1. Add a litmus test for full multi-copy atomicity. 2. Fix the name of the ISA2+pooncelock+pooncelock+pombonce litmus test. 3. Add Daniel Lustig as an LKMM reviewer, with focus on RISC-V, courtesy of Palmer Dabbelt. 4. Update Korean translation to fix broken DMA vs. MMIO ordering example, courtesy of SeongJae Park with edits suggested by Byungchul Park. 5-6.Remove the now-obsolete ACCESS_ONCE() from recipes and the model itself, courtesy of Mark Rutland. 7. Make regression-test scripts executable. 8. Describe atomic_set as a write operation, courtesy of Jonathan Neuschäfer. 9. Add informal LKMM documentation to MAINTAINERS. 10. Use smp_mb() in wake_woken_function(), courtesy of Andrea Parri. 11. Clarify requirements for smp_mb__after_spinlock(), courtesy of Andrea Parri. 12. Update wake_up() and friends' memory-barrier guarantees, courtesy of Andrea Parri. 13. Fix "smb_wmb()" typo (should be "smp_wmb()"), courtesy of Yauheni Kaliuta. 14. Rename litmus tests to comply to norm7, courtesy of Andrea Parri. Thanx, Paul Documentation/core-api/atomic_ops.rst |2 Documentation/memory-barriers.txt | 43 +++--- Documentation/translations/ko_KR/memory-barriers.txt | 22 +-- MAINTAINERS |6 include/linux/sched.h |4 include/linux/spinlock.h | 53 +-- kernel/sched/completion.c |8 - kernel/sched/core.c | 71 -- kernel/sched/wait.c | 55 +++ tools/memory-model/Documentation/explanation.txt |2 tools/memory-model/Documentation/recipes.txt | 12 - tools/memory-model/README | 20 +- tools/memory-model/linux-kernel.bell |2 tools/memory-model/litmus-tests/IRIW+fencembonceonces+OnceOnce.litmus |2 tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus |2 tools/memory-model/litmus-tests/LB+fencembonceonce+ctrlonceonce.litmus |2 tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus |2 tools/memory-model/litmus-tests/R+fencembonceonces.litmus |2 tools/memory-model/litmus-tests/README | 25 ++- tools/memory-model/litmus-tests/S+fencewmbonceonce+poacquireonce.litmus |2 tools/memory-model/litmus-tests/SB+fencembonceonces.litmus |2 tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus | 32 tools/memory-model/litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus |2 tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus |2 tools/memory-model/scripts/checkalllitmus.sh |2 tools/memory-model/scripts/checklitmus.sh |2 26 files changed, 223 insertions(+), 156 deletions(-)
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote: > > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote: > > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote: > > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > > > > > This is the start of the stable review cycle for the 4.9.113 release. > > > > > There are 32 patches in this series, all will be posted as a response > > > > > to this one. If anyone has any issues with these being applied, > > > > > please > > > > > let me know. > > > > > > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > > > > > Anything received after that time might be too late. > > > > > > > > > > > > > Build results: > > > > total: 148 pass: 133 fail: 15 > > > > Failed builds: > > > > mips:defconfig > > > > mips:allnoconfig > > > > mips:defconfig > > > > mips:allmodconfig > > > > mips:allnoconfig > > > > mips:bcm47xx_defconfig > > > > mips:bcm63xx_defconfig > > > > mips:nlm_xlp_defconfig > > > > mips:ath79_defconfig > > > > mips:ar7_defconfig > > > > mips:e55_defconfig > > > > mips:cavium_octeon_defconfig > > > > mips:malta_defconfig > > > > mips:rt305x_defconfig > > > > mips:defconfig > > > > Qemu test results: > > > > total: 166 pass: 154 fail: 12 > > > > Failed tests: > > > > mips:malta_defconfig:nosmp > > > > mips:malta_defconfig:smp > > > > mips64:malta_defconfig:nosmp > > > > mips64:malta_defconfig:smp > > > > mipsel:24Kf:malta_defconfig:nosmp:initrd > > > > mipsel:24Kf:malta_defconfig:smp:initrd > > > > mipsel:24Kf:malta_defconfig:smp:rootfs > > > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs > > > > mipsel64:malta_defconfig:nosmp:rootfs > > > > mipsel64:malta_defconfig:smp:initrd > > > > mipsel64:malta_defconfig:smp:rootfs > > > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs > > > > > > > > Error is always the same. > > > > > > > > arch/mips/kernel/process.c:637:8: > > > > error: 'call_single_data_t' undeclared here (not in a function) > > > > arch/mips/kernel/process.c: In function 'raise_backtrace': > > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: > > > > error: 'csd' undeclared (first use in this function) > > > > > > mips should now be fixed with the updated tree I have pushed out. > > > > > > > The above is with v4.9.112-33-g7fb1f5e, which is the latest version > > available from the git repository (as of right now). My builders > > pulled it at 4:07am this morning, and there was no subsequent update. > > Odd. Ok, I've bumped the -rc number to -rc2 now, and pushed it all out > again. Let me know if you don't see it show up within an hour or so. > Version is now v4.9.112-33-gb44db2b, so something changed, but I still get the same build error. Comparing the old and the new version, the only difference is the updated rc. diff --git a/Makefile b/Makefile index 57f315d00a94..986470ef6f6e 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 4 PATCHLEVEL = 9 SUBLEVEL = 113 -EXTRAVERSION = -rc1 +EXTRAVERSION = -rc2 NAME = Roaring Lionus The error itself is that single_data_t was replaced in the backport with call_single_data. It should have been "struct call_single_data". Guenter
Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID
Linus Torvalds writes: > On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman > wrote: >> >> In practice since glibc does not make thread id's available I don't >> expect anyone relies on this behavior. Since no one relies on it we >> can change it without creating a regression. > > Maybe. > > However, possibly not. > > The thing is, glibc wasn't the original or only use of our threads. In > fact, there are people out there that use clone() directly, without > using it for posix threading. And Oleg was right to notice this, > because the traditional model was literally to just use "kill()" on > the pid returned from clone(). I completely agree that Oleg was right to notice this, and I was definitely not right to overlook. In my description and otherwise. I also think the semantic change needs to happen in it's own separate patch so things can be tracked down. I really don't think anyone uses this but it is not smart to hold the rest of the changes hostage to my belief. So I am thinking about how to rework this. > So the semantics of Linux kill() really is to kill the thread, not the > group leader. glibc's implementation of pthreads is not the only model > out there. There are two questions. a) Can we use the pid of a thread to find the thread group? b) Will the signal be queued in the thread group? > Now, it is possible that at none of the legacy uses use CLONE_THREAD > and thus aren't affected (because tgid will always be pid). So maybe > nobody notices. That is what I expect. I don't know think legacy is a good description. Calling other uses of CLONE_THREAD non-glibc seems better. The old LinuxThreads did not use CLONE_THREAD because it did not exist. > > But we really have three different 'kill' system calls: > > - the original 'kill' system call (#37 on x86-32). > >This looks up the thread ID, but signals the *group*. > > - tkill (#238) > >This looks up the thread, and signals the specific thread. > > - tgkill (#270) > >This looks up the tgid, and signals the group. No. tgkill is a less racy version of tkill and verifies that the thread it signals is in the proper thread group. > Modern glibc will not even use the original 'kill()' at all, I think. > But it's the legacy behavior. No. Modern glibc definitely still uses kill. As kill is the only one exporting the posix kill API. > So I do think Oleg is right. We should be careful. You'll not notice > breakage on a modern distro, but you might easily break old code. Yes. We definitely need to be careful. At the same time since this isn't something the old LinuxThreads had to cope with we can probably clean it up. But as that is not my focus it should probably be pushed out. Eric
[PATCH memory-model 02/14] tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
The names on the first line of the litmus tests are arbitrary, but the convention is that they be the filename without the trailing ".litmus". This commit therefore removes the stray trailing ".litmus" from ISA2+pooncelock+pooncelock+pombonce.litmus's name. Reported-by: Andrea Parri Signed-off-by: Paul E. McKenney Acked-by: Alan Stern --- .../litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus index 7a39a0aaa976..0f749e419b34 100644 --- a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus +++ b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus @@ -1,4 +1,4 @@ -C ISA2+pooncelock+pooncelock+pombonce.litmus +C ISA2+pooncelock+pooncelock+pombonce (* * Result: Sometimes -- 2.17.1
Re: [PATCH v2 1/2] documentation: pinctrl: Add compatibles for Amlogic Meson G12A pin controllers
On Sat, Jul 14, 2018 at 11:27:53PM +, Yixun Lan wrote: > Add new compatible name for Amlogic's Meson-G12A pin controllers, > add a dt-binding header file which document the detail pin names. > > Acked-by: Martin Blumenstingl > Signed-off-by: Xingyu Chen > Signed-off-by: Yixun Lan > --- > .../bindings/pinctrl/meson,pinctrl.txt| 2 + > include/dt-bindings/gpio/meson-g12a-gpio.h| 114 ++ > 2 files changed, 116 insertions(+) > create mode 100644 include/dt-bindings/gpio/meson-g12a-gpio.h Reviewed-by: Rob Herring
Re: [PATCH] checkpatch: Require commit text and warn on long commit text lines
On 2018-07-13 17:08, Joe Perches wrote: On Fri, 2018-07-13 at 16:28 -0700, pher...@codeaurora.org wrote: On 2018-07-13 14:46, Joe Perches wrote: > On Fri, 2018-07-13 at 14:40 -0700, Prakruthi Deepak Heragu wrote: > > Commit text is almost always necessary to explain why a change is > > needed. > > This bit seems sensible, but perhaps it should just count the > number of lines after the end of email headers and before any > Signed-off-by:/Signature line > While committing the changes, one can just write the subject and not write the commit text at all. So, if we just count the lines between email headers and signed-off, we still do count lines which form the subject, but the commit text is still absent. Also, subject can be longer than one line. So, just counting lines doesn't really guarantee the presence of commit text. Not true. Look at $in_header_lines and $in_commit_log. > > Also, warn on commit text lines longer than 75 characters. The commit > > text > > are indented and may wrap on a terminal if they are longer than 75 > > characters. > > This is already exists via > > # Check for line lengths > 75 in commit log, warn once >if ($in_commit_log && !$commit_log_long_line && >length($line) > 75 && > True, but this patch points out every line of the commit text that is exceeding the limit. Which is bad because things like dump_stack() are added in commit logs and those are already allowed to be > 75 chars. Anyway, something like this probably works. Please test. --- scripts/checkpatch.pl | 13 + 1 file changed, 13 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index b5c875d7132b..8b5f3dae31c9 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2240,6 +2240,7 @@ sub process { my $in_header_lines = $file ? 0 : 1; my $in_commit_log = 0; #Scanning lines before patch my $has_commit_log = 0; #Encountered lines before patch + my $commit_log_lines = 0; #Number of commit log lines my $commit_log_possible_stack_dump = 0; my $commit_log_long_line = 0; my $commit_log_has_diff = 0; @@ -2497,6 +2498,18 @@ sub process { $cnt_lines++ if ($realcnt != 0); +# Verify the existence of a commit log if appropriate +# 2 is used because a $signature is counted in $commit_log_lines + if ($in_commit_log) { + if ($line !~ /^\s*$/) { + $commit_log_lines++;#could be a $signature + } + } else if ($has_commit_log && $commit_log_lines < 2) { + WARN("COMMIT_MESSAGE", +"Missing commit description - Add an appropriate one\n"); + $commit_log_lines = 2; #warn only once + } + # Check if the commit log has what seems like a diff which can confuse patch if ($in_commit_log && !$commit_log_has_diff && (($line =~ m@^\s+diff\b.*a/[\w/]+@ && I checked all the cases that I mentioned before. The change you suggested works for every case. Would you take care of merging this fix?
Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID
On Mon, Jul 16, 2018 at 11:02 AM Eric W. Biederman wrote: > > There are two questions. > a) Can we use the pid of a thread to find the thread group? Yes. Just find the thread, and then use p->tgid. However, that's not what the code used to do. It used to just find the thread, and then do "do_send_sig_info()" on it. And it's actually *slightly* different than "find the thread group based on the thread". At least the permission checks are different. The permission checks are done on the thread. > b) Will the signal be queued in the thread group? Yes. pending = group ? >signal->shared_pending : >pending; and "group" is true. > > Now, it is possible that at none of the legacy uses use CLONE_THREAD > > and thus aren't affected (because tgid will always be pid). So maybe > > nobody notices. > > That is what I expect. I don't know think legacy is a good description. > Calling other uses of CLONE_THREAD non-glibc seems better. The old > LinuxThreads did not use CLONE_THREAD because it did not exist. Again, don't get hung up about different libc implementations. People have literally used clone() directly. And some of them use CLONE_THREAD. Just google it. I guarantee you'll find examples of it, because I found examples. So stop the whole "libc" argument. That's not the point, and as long as you make that argument, your argument is simply not valid. People use clone() directly. Really. Really really. Linus
[RFC V2] kvm: Adding skelaton for Memory ROE
This is an attempt to implement Memory ROE discussed by me earlier as a way to prevent Rootkits. I have already explained in details in this thread: https://www.mail-archive.com/kernelnewbies@kernelnewbies.org/msg18826.html So I think there is no need for saying the exact same thing again. The problem is that the code isn't working and I can't figure out why I tried implementing the protection to follow similar behavior to that of KVM_MEM_READONLY but to be on page (SPTE) level The current problem I am facing is that when handling the hypercall vcpu->mode turns to be OUTSIDE_GUEST_MODE but KVM_REQ_TLB_FLUSH doesn't seem to be handled correctly. KVM documentation promised that when VCPU is not in GUEST_MODE VCPU are handled asap and kvm_vcpu_kick(vcpu); will even force that, but it doesn't seem to be the case for me. This is the kind of logging I am getting: [9073.753306] I came here kvm_mmu_slot_apply_flags [9073.753311] kvm_mmu_slot_apply_write_access: Flush = false [9073.992536] gfn_is_readonly: test_bit = 0 [9073.992543] gfn_is_readonly: test_bit = 0 [9073.992545] gfn_is_readonly: test_bit = 0 [9073.992703] Hypercall received, page address 0x0 [9073.992705] gfn_is_readonly: test_bit = 0 [9073.992708] kvm_mroe: flush state = Done [9073.992709] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [9073.992711] Setting page number 0 in slot number 0 [9073.992715] slot_rmap_apply_protection: The 0th page is readonly, Flush = True [9073.992717] kvm_mmu_slot_apply_write_access: Flush = true [9073.992719] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [9073.992720] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [9073.992721] kvm_mroe: flush state = Waiting For some reason kvm_vcpu_kick() didn't force the KVM_REQ_TLB_FLUSH to kick into the virtual cpu (I am talking about the last 2 lines). I am aware that there is still alot missing (like dealing with malicious guest remappings) and the code quality sucks, but any ideas about what I could be doing wrong (or ideas in general) would be apprciated. I am already planning to do everything cleanly once it works. Thansk. Edits for V2: Unfortunately I did few mistakes that lead do kernel not compiling on valnilla .config file because of silly mistakes not doing #ifdef..#endif in some places so this lead to using symbols only available when CONFIG_KVM_MROE is set in .conf even if it wasn't anyway it is fixed. and I should not that CONFIG_KVM_MROE should be used when testing my code and trying to figure out what went wrong Signed-off-by: Ahmed Abd El Mawgood --- arch/x86/include/asm/kvm_host.h | 7 +- arch/x86/kvm/Kconfig| 7 ++ arch/x86/kvm/mmu.c | 158 ++-- arch/x86/kvm/x86.c | 83 - include/linux/kvm_host.h| 17 + include/uapi/linux/kvm_para.h | 4 +- virt/kvm/kvm_main.c | 36 +++-- 7 files changed, 257 insertions(+), 55 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c13cd28d9d1b..c66e9245f750 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -235,7 +235,10 @@ struct kvm_mmu_memory_cache { int nobjs; void *objects[KVM_NR_MEM_OBJS]; }; - +struct kvm_write_access_data { + int i; + struct kvm_memory_slot *memslot; +}; /* * the pages used as guest page table on soft mmu are tracked by * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used @@ -1130,7 +1133,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 acc_track_mask, u64 me_mask); void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, +void kvm_mmu_slot_apply_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 92fd433c50b9..8ae822a8dc7a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +config KVM_MROE + bool "Hypercall Memory Read-Only Enforcement" + depends on KVM && X86 + help + This option add KVM_HC_HMROE hypercall to kvm which as hardening + mechanism to protect memory pages from being edited. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/vhost/Kconfig diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d594690d8b95..e06e923f90aa 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -70,7 +70,7 @@ enum { #undef MMU_DEBUG #ifdef MMU_DEBUG -static bool dbg = 0; +static bool dbg = 1; module_param(dbg, bool, 0644); #define
Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha wrote: > On 7/13/2018 10:50 PM, John Stultz wrote: >> On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha >>> On 7/11/2018 1:43 AM, John Stultz wrote: I worry this upside-down logic is too subtle to be easily reasoned about, and will just lead to future mistakes. Can we instead call this "suspend_timing_needed" and only set it to true when we don't inject any sleep time on resume? >>> >>> >>> I did not get your point "only set it to true when we don't inject any >>> sleep >>> time on resume? " >>> How do we know this ? >>> This question itself depends on the "sleeptime_injected" if it is true >>> means >>> no need to inject else need to inject. >>> >>> Also, we need to make this variable back and forth true, false; suspends >>> path ensures it to make it false. >> >> So yea, I'm not saying logically the code is really any different, >> this is more of a naming nit. So instead of having a variable that is >> always on that we occasionally turn off, lets invert the naming and >> have it be a flag that we occasionally turn on. > > > I understand your concern about the name of the variable will be misleading. > But the changing Boolean state would not solve the actual issue. > > If i understand you correctly you meant below code > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 32ae9ae..becc5bd 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts) > * If a suspend fails before reaching timekeeping_resume() then the flag > * stays true and prevents erroneous sleeptime injection. > */ > -static bool sleeptime_injected = true; > +static bool suspend_timing_needed; > > /* Flag for if there is a persistent clock on this platform */ > static bool persistent_clock_exists; > @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 > *delta) > raw_spin_lock_irqsave(_lock, flags); > write_seqcount_begin(_core.seq); > > - sleeptime_injected = true; > + suspend_timing_needed = false; > > timekeeping_forward_now(tk); > > @@ -1714,10 +1714,10 @@ void timekeeping_resume(void) > tk->tkr_mono.mask); > nsec = mul_u64_u32_shr(cyc_delta, clock->mult, > clock->shift); > ts_delta = ns_to_timespec64(nsec); > - sleeptime_injected = true; > + suspend_timing_needed = true; > } else if (timespec64_compare(_new, _suspend_time) > > 0) { > ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time); > - sleeptime_injected = true; > + suspend_timing_needed = true; > } No no... This part is wrong. We only set suspend_timing_needed if we *didn't* calculate the suspend time in timekeeping_resume. You have to invert all the boolean logic for it to be equivalent. > if (sleeptime_injected) > @@ -1756,7 +1756,7 @@ int timekeeping_suspend(void) > if (timekeeping_suspend_time.tv_sec || > timekeeping_suspend_time.tv_nsec) > persistent_clock_exists = true; > > - sleeptime_injected = false; > + suspend_timing_needed = false; > > raw_spin_lock_irqsave(_lock, flags); > > > This has a problem.. > > >> >> Just the name sleeptime_injected is read a statement, which if we say >> is defaults to true, becomes confusing to think about when the >> timekeeping_suspend/resume code hasn't yet run (which is the case >> where your error cropped up) - and no sleeptime has actually been >> injected. > > > Yes, when very first suspend fails and timekeeping_suspend/resume did not > run ; That is the exact issue. > So, exact solution is no need to inject any sleeptime here. > > If we set the default value to false then we will see timekeeping_resume > will inject sleeptime by below code which was not intended. > > static int rtc_resume(struct device *dev) > { > struct rtc_device *rtc = to_rtc_device(dev); > struct rtc_time tm; > struct timespec64 new_system, new_rtc; > struct timespec64 sleep_time; > int err; > > if (timekeeping_rtc_skipresume()) // it will return the value false > as sleep failed and timekeeping_resume() did not get called. > return 0; > > So, I think with the logic bug above it will work out properly, but let me know if I'm still missing something. thanks -john
Re: 4.18-rc* regression: x86-32 troubles (with timers?)
> Everything below here is is 'bad', which can be an indication that you > misclassified one of > the commits above as 'good' when it should have been 'bad'. The most likely > explanations are that you either typed the 'git bisect good' by accident, or > that the failure is not 100% reliable, and it sometimes works fine even on a > broken kernel. > > 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the > variable name in v9fs_get_trans_by_name() comment", which is marked "good", > and can't really be good if 0bc5fe85727413 is bad and you are not using the > 'qed' driver. > > I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and > if it was, test v4.17-rc4, which is what the net-next tree was based on. Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting it. Building v4.17-rc4 now. -- Meelis Roos (mr...@linux.ee)
Re: [PATCH] mm: don't do zero_resv_unavail if memmap is not allocated
On 07/16/2018 12:09 PM, Michal Hocko wrote: > On Mon 16-07-18 11:16:30, Pavel Tatashin wrote: >> Moving zero_resv_unavail before memmap_init_zone(), caused a regression on >> x86-32. >> >> The cause is that we access struct pages before they are allocated when >> CONFIG_FLAT_NODE_MEM_MAP is used. >> >> free_area_init_nodes() >> zero_resv_unavail() >> mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced >> free_area_init_node() >> if CONFIG_FLAT_NODE_MEM_MAP >> alloc_node_mem_map() >> memblock_virt_alloc_node_nopanic() <- struct page alloced here >> >> On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory >> that it returns, so we do not need to do zero_resv_unavail() here. > > This all is subtle as hell and almost impossible to build a sane code on > top. Your patch sounds good as a stop gap fix but we really need > something resembling an actual design rather than ad-hoc hacks piled on > top of each other.z I totally agree, I started working on figuring out how to simply and improve memmap_init_zone(). But part of the mess is that we simply have too many memmap layouts. We must start removing some of them. But, that also requires an analysis of how to unify them. > >> Fixes: e181ae0c5db9 ("mm: zero unavailable pages before memmap init") >> Signed-off-by: Pavel Tatashin > > Acked-by: Michal Hocko > Thank you. Pavel
Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID
On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman wrote: > > In practice since glibc does not make thread id's available I don't > expect anyone relies on this behavior. Since no one relies on it we > can change it without creating a regression. Maybe. However, possibly not. The thing is, glibc wasn't the original or only use of our threads. In fact, there are people out there that use clone() directly, without using it for posix threading. And Oleg was right to notice this, because the traditional model was literally to just use "kill()" on the pid returned from clone(). So the semantics of Linux kill() really is to kill the thread, not the group leader. glibc's implementation of pthreads is not the only model out there. Now, it is possible that at none of the legacy uses use CLONE_THREAD and thus aren't affected (because tgid will always be pid). So maybe nobody notices. But we really have three different 'kill' system calls: - the original 'kill' system call (#37 on x86-32). This looks up the thread ID, but signals the *group*. - tkill (#238) This looks up the thread, and signals the specific thread. - tgkill (#270) This looks up the tgid, and signals the group. Modern glibc will not even use the original 'kill()' at all, I think. But it's the legacy behavior. So I do think Oleg is right. We should be careful. You'll not notice breakage on a modern distro, but you might easily break old code. Linus
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 08:31:20PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 11:02:19AM -0700, Guenter Roeck wrote: > > On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote: > > > On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote: > > > > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote: > > > > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote: > > > > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > > > > > > > This is the start of the stable review cycle for the 4.9.113 > > > > > > > release. > > > > > > > There are 32 patches in this series, all will be posted as a > > > > > > > response > > > > > > > to this one. If anyone has any issues with these being applied, > > > > > > > please > > > > > > > let me know. > > > > > > > > > > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > > > > > > > Anything received after that time might be too late. > > > > > > > > > > > > > > > > > > > Build results: > > > > > > total: 148 pass: 133 fail: 15 > > > > > > Failed builds: > > > > > > mips:defconfig > > > > > > mips:allnoconfig > > > > > > mips:defconfig > > > > > > mips:allmodconfig > > > > > > mips:allnoconfig > > > > > > mips:bcm47xx_defconfig > > > > > > mips:bcm63xx_defconfig > > > > > > mips:nlm_xlp_defconfig > > > > > > mips:ath79_defconfig > > > > > > mips:ar7_defconfig > > > > > > mips:e55_defconfig > > > > > > mips:cavium_octeon_defconfig > > > > > > mips:malta_defconfig > > > > > > mips:rt305x_defconfig > > > > > > mips:defconfig > > > > > > Qemu test results: > > > > > > total: 166 pass: 154 fail: 12 > > > > > > Failed tests: > > > > > > mips:malta_defconfig:nosmp > > > > > > mips:malta_defconfig:smp > > > > > > mips64:malta_defconfig:nosmp > > > > > > mips64:malta_defconfig:smp > > > > > > mipsel:24Kf:malta_defconfig:nosmp:initrd > > > > > > mipsel:24Kf:malta_defconfig:smp:initrd > > > > > > mipsel:24Kf:malta_defconfig:smp:rootfs > > > > > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs > > > > > > mipsel64:malta_defconfig:nosmp:rootfs > > > > > > mipsel64:malta_defconfig:smp:initrd > > > > > > mipsel64:malta_defconfig:smp:rootfs > > > > > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs > > > > > > > > > > > > Error is always the same. > > > > > > > > > > > > arch/mips/kernel/process.c:637:8: > > > > > > error: 'call_single_data_t' undeclared here (not in a function) > > > > > > arch/mips/kernel/process.c: In function 'raise_backtrace': > > > > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: > > > > > > error: 'csd' undeclared (first use in this function) > > > > > > > > > > mips should now be fixed with the updated tree I have pushed out. > > > > > > > > > > > > > The above is with v4.9.112-33-g7fb1f5e, which is the latest version > > > > available from the git repository (as of right now). My builders > > > > pulled it at 4:07am this morning, and there was no subsequent update. > > > > > > Odd. Ok, I've bumped the -rc number to -rc2 now, and pushed it all out > > > again. Let me know if you don't see it show up within an hour or so. > > > > > Version is now v4.9.112-33-gb44db2b, so something changed, but I still get > > the same build error. > > > > Comparing the old and the new version, the only difference is the updated > > rc. > > > > diff --git a/Makefile b/Makefile > > index 57f315d00a94..986470ef6f6e 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -1,7 +1,7 @@ > > VERSION = 4 > > PATCHLEVEL = 9 > > SUBLEVEL = 113 > > -EXTRAVERSION = -rc1 > > +EXTRAVERSION = -rc2 > > NAME = Roaring Lionus > > > > The error itself is that single_data_t was replaced in the backport with > > call_single_data. It should have been "struct call_single_data". > > {sigh} This is why I asked for a fixup patch. Let me just go rip that > thing out now... Ok, -rc3 is out now to hopefully resolve this. Sorry for the noise. greg k-h
Re: [PATCH v2 3/3] dt-bindings: mfd: max8998: Add charger subnode binding
On Monday, July 16, 2018 12:00:03 PM CEST Rob Herring wrote: > On Sat, Jul 14, 2018 at 02:26:53PM +0200, Paweł Chmiel wrote: > > This patch adds devicetree bindings documentation for > > battery charging controller as the subnode of MAX8998 PMIC. > > > > Signed-off-by: Paweł Chmiel > > --- > > Changes from v1: > > - Removed unneeded Fixes tag > > - Correct description of all charger values > > - Added missing property unit > > --- > > Documentation/devicetree/bindings/mfd/max8998.txt | 24 > > +++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt > > b/Documentation/devicetree/bindings/mfd/max8998.txt > > index 23a3650ff2a2..196e50097a36 100644 > > --- a/Documentation/devicetree/bindings/mfd/max8998.txt > > +++ b/Documentation/devicetree/bindings/mfd/max8998.txt > > @@ -50,6 +50,23 @@ Additional properties required if > > max8998,pmic-buck2-dvs-gpio is defined: > > - max8998,pmic-buck2-dvs-voltage: An array of 2 voltage values in > > microvolts > >for buck2 regulator that can be selected using dvs gpio. > > > > +Charger: Configuration for battery charging controller should be added > > +inside a child node named 'charger'. > > + Required properties: > > + - max8998,charge-eoc-percent: Setup End of Charge Level. > > +If value equals 0, leave it unchanged. > > +Otherwise it should be value from 10 to 45 by 5 step. > > + > > + - max8998,charge-restart-level-microvolt: Setup Charge Restart Level. > > +If value equals 0, leave it unchanged. > > +If value equals -1, it will be disabled. > > -1 is a bit strange. Why not 'not present' is disabled? I wanted to make it work the same way, as it was in case of using it from platform_data. But this is good idea, to change it a bit. So we would have: - 0 -> leave unchanged - property not present in devicetree -> disabled (and set to -1 - driver is expecting it) - value from 10-45 or (5,6,7 in case of second property) I'll update both documentation and driver parsing code (for both properties - restart level and full timeout) and send v3 version. Thanks for suggestion. > > > +Otherwise it should be one of following values: 100, 150, 200. > > + > > + - max8998,charge-timeout-hours: Setup Charge Full Timeout. > > +If value equals 0, leave it unchanged. > > +If value equals -1, it will be disabled. > > +Otherwise it should be one of following values: 5, 6, 7. > > + > > Regulators: All the regulators of MAX8998 to be instantiated shall be > > listed in a child node named 'regulators'. Each regulator is represented > > by a child node of the 'regulators' node. > > @@ -99,6 +116,13 @@ Example: > > max8998,pmic-buck2-dvs-gpio = < 0 3 0 0>; /* SET3 */ > > max8998,pmic-buck2-dvs-voltage = <135>, <130>; > > > > + /* Charger configuration */ > > + charger { > > + max8998,charge-eoc-percent = <0>; > > + max8998,charge-restart-level-microvolt = <(-1)>; > > + max8998,charge-timeout-hours = <7>; > > + }; > > + > > /* Regulators to instantiate */ > > regulators { > > ldo2_reg: LDO2 { >
Re: cpu_no_speculation omissions?
On Mon, 16 Jul 2018, Alan Cox wrote: > On Mon, 2018-07-16 at 10:28 -0700, Dave Hansen wrote: > > On 07/16/2018 09:56 AM, Thomas Gleixner wrote: > > > On Mon, 16 Jul 2018, Rich Felker wrote: > > > > At least the Centerton (late-generation Bonnell uarch) Atom > > > > family is > > > > omitted from the cpu_no_speculation table added by commit > > > > fec9434a12f3 > > > > to arch/x86/kernel/cpu/common.c. Is this intentional? Would a > > > > patch > > > > adding it and possibly other omissions be welcome? > > > > > > Probably. Dave? > > > > IIRC, Alan Cox was compiling a list on what is affected vs. not. He > > would know way better than I. > > The pre Silvermont atom cores are in order. When I did the original > list I didn't bother with all the 32bit cores as we didn't have any > 32bit mitigations then. At least we should give the users that warm and fuzzy feeling that they are not affected. Thanks, tglx
Re: [PATCH] mmc: tegra: Add and use tegra_sdhci_get_max_clock()
On 13/07/18 16:39, Aapo Vienamo wrote: ... >>> that it returns the current clock rate of the host instead of the >>> maximum one, which can lead to unnecessarily small clock rates. >>> >>> This differs from the previous implementation of >>> tegra_sdhci_get_max_clock() in that it doesn't divide the result by two. >> >> Why? > > As far as I can tell the original tegra_sdhci_get_max_clock() was > implemented this way in order to force sdhci_calc_clk() to always set > the SDHCI clock divider to two on sdhci_set_clock(). The requirement to > configure the SDHCI divider to two is specific to DDR50/52 modes on > Tegra. > > The .get_max_clock() callback retuning half of the actual maximum will > result in HS200 and HS400 modes not being able to run at full speed. > Another mechanism to enforce the divider requirement has to be figured > out in order to enable DDR50/52 modes on Tegra SoCs. OK, thanks. Ah I see Stefan's patch for doubling the clock. OK so that's fine. Cheers Jon -- nvpublic
cpu_no_speculation omissions?
At least the Centerton (late-generation Bonnell uarch) Atom family is omitted from the cpu_no_speculation table added by commit fec9434a12f3 to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch adding it and possibly other omissions be welcome? Rich
Re: [PATCH 4.14 00/54] 4.14.56-stable review
On Mon, Jul 16, 2018 at 09:34:57AM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.14.56 release. > There are 54 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed Jul 18 07:34:24 UTC 2018. > Anything received after that time might be too late. > Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 173 pass: 173 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH] [RESEND] perf/x86/intel/uncore: Fix the index of PCU.3 Broadwell CPUs
On 7/16/2018 11:07 AM, Masayoshi Mizuma wrote: On 07/16/2018 10:29 AM, Liang, Kan wrote: On 7/15/2018 6:34 PM, Ingo Molnar wrote: * Masayoshi Mizuma wrote: From: Masayoshi Mizuma commit 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") introduced PCU.3 for Broadwell CPU. Unfortunately, the driver_data of PCU.3 conflicts to QPI Port 2 filter. { /* QPI Port 2 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46), .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2), { /* PCU.3 (for Capability registers) */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fc0), .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, HSWEP_PCI_PCU_3), // HSWEP_PCI_PCU_3 == 2 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -1030,6 +1030,7 @@ enum { SNBEP_PCI_QPI_PORT0_FILTER, SNBEP_PCI_QPI_PORT1_FILTER, HSWEP_PCI_PCU_3, + BDX_PCI_PCU_3, }; So we use a magic '2' enumerator in the 'QPI Port 2 filter', and that overlaps with HSWEP_PCI_PCU_3, right? Shouldn't we clean up all the enumerators and not use magic numbers, and this fix the conflict? Yes, it should fix the conflict. I will clean up the code. Thanks a lot! I would appreciate if you could add CC to me when you post the patch. Here is the patch. Masa, could you please give it a try? Thanks, Kan From 688378a4003ec33156958a52dc822105c18075af Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Mon, 16 Jul 2018 04:57:51 -0400 Subject: [PATCH] perf/x86/intel/uncore: Fix hardcode index of Broadwell extra PCI DEV Masa reports that a warning message is shown while CPU hot-removing on Broadwell server. WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0x10b/0x150 Call Trace: pci_device_remove+0x42/0xd0 device_release_driver_internal+0x148/0x220 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x57/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x174/0x3a0 worker_thread+0x4c/0x3d0 kthread+0xf8/0x130 This bug was introduced in: commit 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") The index of "QPI Port 2 filter" was hardcode to 2. The index of "PCU.3" used enumerator "HSWEP_PCI_PCU_3", which equals to 2 as well. To fix the conflict, the hardcode index needs to be cleaned up. Introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2 filter" on Broadwell, and increase the UNCORE_EXTRA_PCI_DEV_MAX. Clean up hardcode index. Reported-by: Masayoshi Mizuma Suggested-by: Ingo Molnar Signed-off-by: Kan Liang Fixes: 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") --- arch/x86/events/intel/uncore.h | 2 +- arch/x86/events/intel/uncore_snbep.c | 10 +++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index c9e1e0b..e17ab88 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -28,7 +28,7 @@ #define UNCORE_PCI_DEV_TYPE(data) ((data >> 8) & 0xff) #define UNCORE_PCI_DEV_IDX(data) (data & 0xff) #define UNCORE_EXTRA_PCI_DEV 0xff -#define UNCORE_EXTRA_PCI_DEV_MAX 3 +#define UNCORE_EXTRA_PCI_DEV_MAX 4 #define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff) diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 87dc026..51d7c11 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -1029,6 +1029,7 @@ void snbep_uncore_cpu_init(void) enum { SNBEP_PCI_QPI_PORT0_FILTER, SNBEP_PCI_QPI_PORT1_FILTER, + BDX_PCI_QPI_PORT2_FILTER, HSWEP_PCI_PCU_3, }; @@ -3286,15 +3287,18 @@ static const struct pci_device_id bdx_uncore_pci_ids[] = { }, { /* QPI Port 0 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f86), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 0), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, + SNBEP_PCI_QPI_PORT0_FILTER), }, { /* QPI Port 1 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f96), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 1), + .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, + SNBEP_PCI_QPI_PORT1_FILTER), }, { /* QPI Port 2 filter */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46), - .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2), + .driver_data =
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote: > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.9.113 release. > > There are 32 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > > Anything received after that time might be too late. > > > > Build results: > total: 148 pass: 133 fail: 15 > Failed builds: > mips:defconfig > mips:allnoconfig > mips:defconfig > mips:allmodconfig > mips:allnoconfig > mips:bcm47xx_defconfig > mips:bcm63xx_defconfig > mips:nlm_xlp_defconfig > mips:ath79_defconfig > mips:ar7_defconfig > mips:e55_defconfig > mips:cavium_octeon_defconfig > mips:malta_defconfig > mips:rt305x_defconfig > mips:defconfig > Qemu test results: > total: 166 pass: 154 fail: 12 > Failed tests: > mips:malta_defconfig:nosmp > mips:malta_defconfig:smp > mips64:malta_defconfig:nosmp > mips64:malta_defconfig:smp > mipsel:24Kf:malta_defconfig:nosmp:initrd > mipsel:24Kf:malta_defconfig:smp:initrd > mipsel:24Kf:malta_defconfig:smp:rootfs > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs > mipsel64:malta_defconfig:nosmp:rootfs > mipsel64:malta_defconfig:smp:initrd > mipsel64:malta_defconfig:smp:rootfs > mipsel64:fuloong2e_defconfig:fulong2e:rootfs > > Error is always the same. > > arch/mips/kernel/process.c:637:8: > error: 'call_single_data_t' undeclared here (not in a function) > arch/mips/kernel/process.c: In function 'raise_backtrace': > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: > error: 'csd' undeclared (first use in this function) mips should now be fixed with the updated tree I have pushed out. thanks, greg k-h
Re: [PATCH] arm64: cpuinfo: Include cleartext implementer and part strings
Hi Olof, On Mon, Jul 16, 2018 at 07:34:05AM -0700, Olof Johansson wrote: > On Mon, Jul 16, 2018 at 2:17 AM, Marc Zyngier wrote: > > On 15/07/18 04:53, Olof Johansson wrote: > >> There's some use in printing out what the implementer and part numbers > >> decode to for cases where they're known. > >> > >> I filled in the table based on public information; mostly from ARM TRMs > >> and other tools (and some of the SSBD tables in the kernel, etc). > >> > >> Apple IDs came from > >> https://github.com/apple/darwin-xnu/blob/master/osfmk/arm/cpuid.h I appreciate that the format of /proc/cpuinfo isn't as nice as one might like as a human reader. For better or worse, the vast majority of readers of /proc/cpuinfo are applications (be they scripts or binaries), and we must treat the format and contents of /proc/cpuinfo as ABI. We've been burned by /proc/cpuinfo format changes in the past, so we want to minimize the potential for ABI issues. This change is only useful to human readers of /proc/cpuinfo, mandates perpetual maintenance work, and comes with a very strong risk of subtle ABI breakage (both for this patch and future patches it implies will be necessary). So sorry, but I must NAK this patch, as with prior patches changing the format of /proc/cpuinfo. > > The same thing pops up every so often. And the answer is the same each time: > > Ever thought about why it pops up? In some cases because the person in question isn't sure what hardware they're on. In most cases because people are used to x86, and haven't considered what exactly they're trying to achieve. We've repeatedly been asked for strings *by application developers*, despite the MIDR being sufficient and already available. Generally, developers are happy to use MIDRs once they realise this. Having the string leads people to try to parse it, even though it cannot be reliable. > > - it breaks an existing userspace ABI by changing the format of cpuinfo > > Most tools likely rely on a per-line format, and in this case they're > likely interpreting the contents after the ':' as an integer. Adding > something to the line might or might not break them, agreed. > > How about introducing a "CPU model" line similar to x86's cleartext one? > > > - it is unmaintainable in the long run > > False. Chances are you already need details more fine-grained than > this for errata and quirks, we already do for SSBD. For SSBD, we don't look at the MIDR, and rely solely on the FW to identify the presence of a workaround. Going forwards, we'll do likewise as this is the only way to be future-proof and generic. > And if it lacks an entry it's not the end of the world. I don't think we can be that relaxed about this; it's quite easy to see that people *will* write software that relies on the string (or properties thereof) in ways that we cannot manage. People will do silly things like try to read the string into a fixed-size buffer. That works when the string is "Unknown", but falls apart on the next kernel release when it's "awesome super CPU 2000xx-super-mega-long-name-edition". We don't discover this problem for months, whereupon other applications are reliant upon the new string, and now we're stuck with a horrible ABI bug that we can't fix. > > - userspace already has this information by simply running "lscpu" > > > > What has changed? > > What has changed is that ARM(64) is moving from a > custom-kernel-custom-userspace world to one where you might be > upreving kernels and userspace separately, and you might be using an > older userspace with a newer machine and a newer kernel. Sure, and this is exactly why we're trying to ensure that the behaviour is *guaranteed* to be consistent across kernels -- so whichever kernel is used, userspace should behave the same. If userspace *needs* to know about a CPU, it can identify the CPU based on the MIDR (and REVIDR), and this will work regardless of kernel version. GCC does this today. If userspace *wants* to know about a CPU (e.g. to expose a human-friendly name), then a userspace database is fine. In the majority of cases it's *vastly* easier to update a userspace application than it is to update the kernel, and it's always possible to build the latest binary yourself. There are a number of cases where using strings is going to be problematic in practice, e.g. * Every out-of-tree SoC port is going to invent values that may not match the values mainline settles on. Then people will complain that their software only works on one or the other, because it makes some decision base on the model name. * Every distro is going to differ in the set of IDs they backport, if any. If your software relies on the MIDR, it will work everywhere. If it tries to parse the string, it will behave erratically across distros. * We may get the matching logic for wrong, and be overly permissive when matching MIDR -> string. e.g. we ignore the low 4 bits for some range of MIDRs, but the vendor only uses
Re: cpu_no_speculation omissions?
On 07/16/2018 09:56 AM, Thomas Gleixner wrote: > On Mon, 16 Jul 2018, Rich Felker wrote: >> At least the Centerton (late-generation Bonnell uarch) Atom family is >> omitted from the cpu_no_speculation table added by commit fec9434a12f3 >> to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch >> adding it and possibly other omissions be welcome? > > Probably. Dave? IIRC, Alan Cox was compiling a list on what is affected vs. not. He would know way better than I. Rich, did you have a specific set of family/model combinations that you thought were missed?
[PATCH v6 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()
sparse_init() requires to temporary allocate two large buffers: usemap_map and map_map. Baoquan He has identified that these buffers are so large that Linux is not bootable on small memory machines, such as a kdump boot. The buffers are especially large when CONFIG_X86_5LEVEL is set, as they are scaled to the maximum physical memory size. Baoquan provided a fix, which reduces these sizes of these buffers, but it is much better to get rid of them entirely. Add a new way to initialize sparse memory: sparse_init_nid(), which only operates within one memory node, and thus allocates memory either in large contiguous block or allocates section by section. This eliminates the need for use of temporary buffers. For simplified bisecting and review temporarly call sparse_init() new_sparse_init(), the new interface is going to be enabled as well as old code removed in the next patch. Signed-off-by: Pavel Tatashin Reviewed-by: Oscar Salvador --- mm/sparse.c | 85 + 1 file changed, 85 insertions(+) diff --git a/mm/sparse.c b/mm/sparse.c index 20ca292d8f11..248d5d7bbf55 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -200,6 +200,11 @@ static inline int next_present_section_nr(int section_nr) (section_nr <= __highest_present_section_nr));\ section_nr = next_present_section_nr(section_nr)) +static inline unsigned long first_present_section_nr(void) +{ + return next_present_section_nr(-1); +} + /* * Record how many memory sections are marked as present * during system bootup. @@ -668,6 +673,86 @@ void __init sparse_init(void) memblock_free_early(__pa(usemap_map), size); } +/* + * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end) + * And number of present sections in this node is map_count. + */ +static void __init sparse_init_nid(int nid, unsigned long pnum_begin, + unsigned long pnum_end, + unsigned long map_count) +{ + unsigned long pnum, usemap_longs, *usemap; + struct page *map; + + usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS); + usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid), + usemap_size() * + map_count); + if (!usemap) { + pr_err("%s: node[%d] usemap allocation failed", __func__, nid); + goto failed; + } + sparse_buffer_init(map_count * section_map_size(), nid); + for_each_present_section_nr(pnum_begin, pnum) { + if (pnum >= pnum_end) + break; + + map = sparse_mem_map_populate(pnum, nid, NULL); + if (!map) { + pr_err("%s: node[%d] memory map backing failed. Some memory will not be available.", + __func__, nid); + pnum_begin = pnum; + goto failed; + } + check_usemap_section_nr(nid, usemap); + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); + usemap += usemap_longs; + } + sparse_buffer_fini(); + return; +failed: + /* We failed to allocate, mark all the following pnums as not present */ + for_each_present_section_nr(pnum_begin, pnum) { + struct mem_section *ms; + + if (pnum >= pnum_end) + break; + ms = __nr_to_section(pnum); + ms->section_mem_map = 0; + } +} + +/* + * Allocate the accumulated non-linear sections, allocate a mem_map + * for each and record the physical to section mapping. + */ +void __init new_sparse_init(void) +{ + unsigned long pnum_begin = first_present_section_nr(); + int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin)); + unsigned long pnum_end, map_count = 1; + + /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */ + set_pageblock_order(); + + for_each_present_section_nr(pnum_begin + 1, pnum_end) { + int nid = sparse_early_nid(__nr_to_section(pnum_end)); + + if (nid == nid_begin) { + map_count++; + continue; + } + /* Init node with sections in range [pnum_begin, pnum_end) */ + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); + nid_begin = nid; + pnum_begin = pnum_end; + map_count = 1; + } + /* cover the last node */ + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); + vmemmap_populate_print_last(); +} + #ifdef CONFIG_MEMORY_HOTPLUG /* Mark all memory sections within the pfn range as online */ -- 2.18.0
[PATCH v6 5/5] mm/sparse: delete old sparse_init and enable new one
Rename new_sparse_init() to sparse_init() which enables it. Delete old sparse_init() and all the code that became obsolete with. Signed-off-by: Pavel Tatashin Tested-by: Michael Ellerman (powerpc) --- include/linux/mm.h | 6 -- mm/Kconfig | 4 - mm/sparse-vmemmap.c | 21 mm/sparse.c | 237 +--- 4 files changed, 1 insertion(+), 267 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 99d8c50adef6..726e71475144 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2649,12 +2649,6 @@ extern int randomize_va_space; const char * arch_vma_name(struct vm_area_struct *vma); void print_vma_addr(char *prefix, unsigned long rip); -void sparse_mem_maps_populate_node(struct page **map_map, - unsigned long pnum_begin, - unsigned long pnum_end, - unsigned long map_count, - int nodeid); - void *sparse_buffer_alloc(unsigned long size); struct page *sparse_mem_map_populate(unsigned long pnum, int nid, struct vmem_altmap *altmap); diff --git a/mm/Kconfig b/mm/Kconfig index 28fcf54946ea..b78e7cd4e9fe 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -115,10 +115,6 @@ config SPARSEMEM_EXTREME config SPARSEMEM_VMEMMAP_ENABLE bool -config SPARSEMEM_ALLOC_MEM_MAP_TOGETHER - def_bool y - depends on SPARSEMEM && X86_64 - config SPARSEMEM_VMEMMAP bool "Sparse Memory virtual memmap" depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index cd15f3d252c3..8301293331a2 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -261,24 +261,3 @@ struct page * __meminit sparse_mem_map_populate(unsigned long pnum, int nid, return map; } - -void __init sparse_mem_maps_populate_node(struct page **map_map, - unsigned long pnum_begin, - unsigned long pnum_end, - unsigned long map_count, int nodeid) -{ - unsigned long pnum; - int nr_consumed_maps = 0; - - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - - map_map[nr_consumed_maps] = - sparse_mem_map_populate(pnum, nodeid, NULL); - if (map_map[nr_consumed_maps++]) - continue; - pr_err("%s: sparsemem memory map backing failed some memory will not be available\n", - __func__); - } -} diff --git a/mm/sparse.c b/mm/sparse.c index 248d5d7bbf55..10b07eea9a6e 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -205,12 +205,6 @@ static inline unsigned long first_present_section_nr(void) return next_present_section_nr(-1); } -/* - * Record how many memory sections are marked as present - * during system bootup. - */ -static int __initdata nr_present_sections; - /* Record a memory area against a node. */ void __init memory_present(int nid, unsigned long start, unsigned long end) { @@ -240,7 +234,6 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) ms->section_mem_map = sparse_encode_early_nid(nid) | SECTION_IS_ONLINE; section_mark_present(ms); - nr_present_sections++; } } } @@ -377,37 +370,8 @@ static void __init check_usemap_section_nr(int nid, unsigned long *usemap) } #endif /* CONFIG_MEMORY_HOTREMOVE */ -static void __init sparse_early_usemaps_alloc_node(void *data, -unsigned long pnum_begin, -unsigned long pnum_end, -unsigned long usemap_count, int nodeid) -{ - void *usemap; - unsigned long pnum; - unsigned long **usemap_map = (unsigned long **)data; - int size = usemap_size(); - int nr_consumed_maps = 0; - - usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid), - size * usemap_count); - if (!usemap) { - pr_warn("%s: allocation failed\n", __func__); - return; - } - - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - usemap_map[nr_consumed_maps] = usemap; - usemap += size; - check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]); - nr_consumed_maps++; - } -} - #ifdef CONFIG_SPARSEMEM_VMEMMAP static unsigned long __init section_map_size(void) - { return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE); } @@
[PATCH v6 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap
non-vmemmap sparse also allocated large contiguous chunk of memory, and if fails falls back to smaller allocations. Use the same functions to allocate buffer as the vmemmap-sparse Signed-off-by: Pavel Tatashin --- mm/sparse.c | 41 ++--- 1 file changed, 14 insertions(+), 27 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index 9a0a5f598469..db4867b62fff 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -408,13 +408,20 @@ unsigned long __init section_map_size(void) } #else +unsigned long __init section_map_size(void) +{ + return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); +} + struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid, struct vmem_altmap *altmap) { - struct page *map; - unsigned long size; + unsigned long size = section_map_size(); + struct page *map = sparse_buffer_alloc(size); + + if (map) + return map; - size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); map = memblock_virt_alloc_try_nid(size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); @@ -425,42 +432,22 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, unsigned long pnum_end, unsigned long map_count, int nodeid) { - void *map; unsigned long pnum; - unsigned long size = sizeof(struct page) * PAGES_PER_SECTION; - int nr_consumed_maps; - - size = PAGE_ALIGN(size); - map = memblock_virt_alloc_try_nid_raw(size * map_count, - PAGE_SIZE, __pa(MAX_DMA_ADDRESS), - BOOTMEM_ALLOC_ACCESSIBLE, nodeid); - if (map) { - nr_consumed_maps = 0; - for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - if (!present_section_nr(pnum)) - continue; - map_map[nr_consumed_maps] = map; - map += size; - nr_consumed_maps++; - } - return; - } + unsigned long size = section_map_size(); + int nr_consumed_maps = 0; - /* fallback */ - nr_consumed_maps = 0; + sparse_buffer_init(size * map_count, nodeid); for (pnum = pnum_begin; pnum < pnum_end; pnum++) { - struct mem_section *ms; - if (!present_section_nr(pnum)) continue; map_map[nr_consumed_maps] = sparse_mem_map_populate(pnum, nodeid, NULL); if (map_map[nr_consumed_maps++]) continue; - ms = __nr_to_section(pnum); pr_err("%s: sparsemem memory map backing failed some memory will not be available\n", __func__); } + sparse_buffer_fini(); } #endif /* !CONFIG_SPARSEMEM_VMEMMAP */ -- 2.18.0
[PATCH v6 3/5] mm/sparse: move buffer init/fini to the common place
Now, that both variants of sparse memory use the same buffers to populate memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the common place. Signed-off-by: Pavel Tatashin --- include/linux/mm.h | 3 --- mm/sparse-vmemmap.c | 2 -- mm/sparse.c | 14 +++--- 3 files changed, 7 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a83d3e0e66d4..99d8c50adef6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2655,9 +2655,6 @@ void sparse_mem_maps_populate_node(struct page **map_map, unsigned long map_count, int nodeid); -unsigned long __init section_map_size(void); -void sparse_buffer_init(unsigned long size, int nid); -void sparse_buffer_fini(void); void *sparse_buffer_alloc(unsigned long size); struct page *sparse_mem_map_populate(unsigned long pnum, int nid, struct vmem_altmap *altmap); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index b05c7663c640..cd15f3d252c3 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -270,7 +270,6 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, unsigned long pnum; int nr_consumed_maps = 0; - sparse_buffer_init(section_map_size() * map_count, nodeid); for (pnum = pnum_begin; pnum < pnum_end; pnum++) { if (!present_section_nr(pnum)) continue; @@ -282,5 +281,4 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, pr_err("%s: sparsemem memory map backing failed some memory will not be available\n", __func__); } - sparse_buffer_fini(); } diff --git a/mm/sparse.c b/mm/sparse.c index db4867b62fff..20ca292d8f11 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -401,14 +401,14 @@ static void __init sparse_early_usemaps_alloc_node(void *data, } #ifdef CONFIG_SPARSEMEM_VMEMMAP -unsigned long __init section_map_size(void) +static unsigned long __init section_map_size(void) { return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE); } #else -unsigned long __init section_map_size(void) +static unsigned long __init section_map_size(void) { return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); } @@ -433,10 +433,8 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, unsigned long map_count, int nodeid) { unsigned long pnum; - unsigned long size = section_map_size(); int nr_consumed_maps = 0; - sparse_buffer_init(size * map_count, nodeid); for (pnum = pnum_begin; pnum < pnum_end; pnum++) { if (!present_section_nr(pnum)) continue; @@ -447,14 +445,13 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, pr_err("%s: sparsemem memory map backing failed some memory will not be available\n", __func__); } - sparse_buffer_fini(); } #endif /* !CONFIG_SPARSEMEM_VMEMMAP */ static void *sparsemap_buf __meminitdata; static void *sparsemap_buf_end __meminitdata; -void __init sparse_buffer_init(unsigned long size, int nid) +static void __init sparse_buffer_init(unsigned long size, int nid) { WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */ sparsemap_buf = @@ -464,7 +461,7 @@ void __init sparse_buffer_init(unsigned long size, int nid) sparsemap_buf_end = sparsemap_buf + size; } -void __init sparse_buffer_fini(void) +static void __init sparse_buffer_fini(void) { unsigned long size = sparsemap_buf_end - sparsemap_buf; @@ -494,8 +491,11 @@ static void __init sparse_early_mem_maps_alloc_node(void *data, unsigned long map_count, int nodeid) { struct page **map_map = (struct page **)data; + + sparse_buffer_init(section_map_size() * map_count, nodeid); sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end, map_count, nodeid); + sparse_buffer_fini(); } #else static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum) -- 2.18.0
[PATCH v6 1/5] mm/sparse: abstract sparse buffer allocations
When struct pages are allocated for sparse-vmemmap VA layout, we first try to allocate one large buffer, and than if that fails allocate struct pages for each section as we go. The code that allocates buffer is uses global variables and is spread across several call sites. Cleanup the code by introducing three functions to handle the global buffer: sparse_buffer_init()initialize the buffer sparse_buffer_fini()free the remaining part of the buffer sparse_buffer_alloc() alloc from the buffer, and if buffer is empty return NULL Define these functions in sparse.c instead of sparse-vmemmap.c because later we will use them for non-vmemmap sparse allocations as well. Signed-off-by: Pavel Tatashin --- include/linux/mm.h | 4 mm/sparse-vmemmap.c | 40 ++-- mm/sparse.c | 45 - 3 files changed, 54 insertions(+), 35 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 577e578eb640..a83d3e0e66d4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2655,6 +2655,10 @@ void sparse_mem_maps_populate_node(struct page **map_map, unsigned long map_count, int nodeid); +unsigned long __init section_map_size(void); +void sparse_buffer_init(unsigned long size, int nid); +void sparse_buffer_fini(void); +void *sparse_buffer_alloc(unsigned long size); struct page *sparse_mem_map_populate(unsigned long pnum, int nid, struct vmem_altmap *altmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 95e2c7638a5c..b05c7663c640 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -43,12 +43,9 @@ static void * __ref __earlyonly_bootmem_alloc(int node, unsigned long goal) { return memblock_virt_alloc_try_nid_raw(size, align, goal, - BOOTMEM_ALLOC_ACCESSIBLE, node); + BOOTMEM_ALLOC_ACCESSIBLE, node); } -static void *vmemmap_buf; -static void *vmemmap_buf_end; - void * __meminit vmemmap_alloc_block(unsigned long size, int node) { /* If the main allocator is up use that, fallback to bootmem. */ @@ -76,18 +73,10 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node) /* need to make sure size is all the same during early stage */ void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node) { - void *ptr; - - if (!vmemmap_buf) - return vmemmap_alloc_block(size, node); - - /* take the from buf */ - ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size); - if (ptr + size > vmemmap_buf_end) - return vmemmap_alloc_block(size, node); - - vmemmap_buf = ptr + size; + void *ptr = sparse_buffer_alloc(size); + if (!ptr) + ptr = vmemmap_alloc_block(size, node); return ptr; } @@ -279,19 +268,9 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, unsigned long map_count, int nodeid) { unsigned long pnum; - unsigned long size = sizeof(struct page) * PAGES_PER_SECTION; - void *vmemmap_buf_start; int nr_consumed_maps = 0; - size = ALIGN(size, PMD_SIZE); - vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count, -PMD_SIZE, __pa(MAX_DMA_ADDRESS)); - - if (vmemmap_buf_start) { - vmemmap_buf = vmemmap_buf_start; - vmemmap_buf_end = vmemmap_buf_start + size * map_count; - } - + sparse_buffer_init(section_map_size() * map_count, nodeid); for (pnum = pnum_begin; pnum < pnum_end; pnum++) { if (!present_section_nr(pnum)) continue; @@ -303,12 +282,5 @@ void __init sparse_mem_maps_populate_node(struct page **map_map, pr_err("%s: sparsemem memory map backing failed some memory will not be available\n", __func__); } - - if (vmemmap_buf_start) { - /* need to free left buf */ - memblock_free_early(__pa(vmemmap_buf), - vmemmap_buf_end - vmemmap_buf); - vmemmap_buf = NULL; - vmemmap_buf_end = NULL; - } + sparse_buffer_fini(); } diff --git a/mm/sparse.c b/mm/sparse.c index 2ea8b3dbd0df..9a0a5f598469 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -400,7 +400,14 @@ static void __init sparse_early_usemaps_alloc_node(void *data, } } -#ifndef CONFIG_SPARSEMEM_VMEMMAP +#ifdef CONFIG_SPARSEMEM_VMEMMAP +unsigned long __init section_map_size(void) + +{ + return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE); +} + +#else struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
Re: [PATCH] IPoIB: use kvzalloc to allocate an array of bucket pointers
> On 16 Jul 2018, at 18:19, Jan Dakinevich wrote: > > On Mon, 9 Jul 2018 16:51:03 +0300 > Jan Dakinevich wrote: > >> This table by default takes 32KiB which is 3rd memory order. Only if PAGE_SIZE is 4KiB... >> Meanwhile, this memory is not aimed for DMA operation and could be >> safely allocated by vmalloc. >> >> Signed-off-by: Jan Dakinevich lgtm, Reviewed-by: Håkon Bugge Thxs, Håkon >> --- >> drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c >> b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 26cde95..cb752df >> 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c >> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c >> @@ -1526,7 +1526,7 @@ static int ipoib_neigh_hash_init(struct >> ipoib_dev_priv *priv) return -ENOMEM; >> set_bit(IPOIB_STOP_NEIGH_GC, >flags); >> size = roundup_pow_of_two(arp_tbl.gc_thresh3); >> -buckets = kcalloc(size, sizeof(*buckets), GFP_KERNEL); >> +buckets = kvcalloc(size, sizeof(*buckets), GFP_KERNEL); >> if (!buckets) { >> kfree(htbl); >> return -ENOMEM; >> @@ -1554,7 +1554,7 @@ static void neigh_hash_free_rcu(struct rcu_head >> *head) struct ipoib_neigh __rcu **buckets = htbl->buckets; >> struct ipoib_neigh_table *ntbl = htbl->ntbl; >> >> -kfree(buckets); >> +kvfree(buckets); >> kfree(htbl); >> complete(>deleted); >> } > > ping > > -- > Best regards > Jan Dakinevich > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace()
On Mon, Jul 16, 2018 at 12:46:21PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 05:46:12PM +0800, 陈华才 wrote: > > Just change "call_single_data_t" to "struct call_single_data" and > > everything is OK. > > Ok, I've done that now, thanks. And I messed it up. So I've just dropped it entirely. Can someone send me a properly backported patch please? thanks, greg k-h
Re: [PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm
On Mon, 2018-07-16 at 03:04 +0200, Ingo Molnar wrote: > * Rik van Riel wrote: > > > On Mon, 2018-07-16 at 01:04 +0200, Ingo Molnar wrote: > > > * Rik van Riel wrote: > > > > > > > + /* > > > > +* Stop remote flushes for the previous mm. > > > > +* Skip the idle task; we never send init_mm > > > > TLB > > > > flushing IPIs, > > > > +* but the bitmap manipulation can cause cache > > > > line contention. > > > > +*/ > > > > + if (real_prev != _mm) { > > > > + VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, > > > > + mm_cpumask(rea > > > > l_pr > > > > ev))); > > > > + cpumask_clear_cpu(cpu, > > > > mm_cpumask(real_prev)); > > > > > > BTW., could this optimization be (safely) extended to all (or > > > most) > > > !task->mm > > > kernel threads? > > > > > > In particular softirq and threaded irq handlers could benefit > > > greatly > > > I suspect in > > > certain networking intense workloads that happen to active them. > > > > Yes, it could. > > > > Are there kernel threads that use something other than > > init_mm today? > > Yeah, I think that's the typical case - so at minimum the comment > should be fixed: > > > > > + * Skip the idle task; we never send init_mm TLB > flushing IPIs, > > and it should say 'skip kernel threads', right? I will send a v6 that improves this comment, and has the S-o-b thing you suggested for patch 1/7. I think that addresses all the comments people had on this patch series. -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
[PATCH v2 09/10] staging:trl8192u: Rename TClasProc > t_clas_proc - Style
Rename the struct TS_COMMON_INFO member variable TClasProc to t_clas_proc. This change clears the checkpatch issue with CamelCase variable names. There should be no impact on runtime execution. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 3bf48a04a68e..a183198afb31 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -22,7 +22,7 @@ struct ts_common_info { u8 addr[6]; TSPEC_BODY t_spec; QOS_TCLAS t_class[TCLAS_NUM]; - u8 TClasProc; + u8 t_clas_proc; u8 TClasNum; }; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index b5fede650b81..8b2bb0a69b01 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -106,7 +106,7 @@ static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo) eth_zero_addr(pTsCommonInfo->addr); memset(>t_spec, 0, sizeof(TSPEC_BODY)); memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); - pTsCommonInfo->TClasProc = 0; + pTsCommonInfo->t_clas_proc = 0; pTsCommonInfo->TClasNum = 0; } @@ -281,7 +281,7 @@ static void MakeTSEntry(struct ts_common_info *pTsCommonInfo, u8 *Addr, for(count = 0; count < TCLAS_Num; count++) memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, sizeof(QOS_TCLAS)); - pTsCommonInfo->TClasProc = TCLAS_Proc; + pTsCommonInfo->t_clas_proc = TCLAS_Proc; pTsCommonInfo->TClasNum = TCLAS_Num; } -- 2.18.0
[PATCH v2 06/10] staging:rtl8192u: Rename Addr > addr - Style
Rename the TX_COMMON_INFO structure's member Addr to addr. This change clears the checkpatch issue with CamelCase naming. This is a coding style change only and should not impact runtime execution. Signed-off-by: John Whitmore --- .../staging/rtl8192u/ieee80211/rtl819x_BAProc.c| 10 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h| 2 +- .../staging/rtl8192u/ieee80211/rtl819x_TSProc.c| 14 +++--- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c index 950f827a1740..297c498aef6a 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c @@ -634,7 +634,7 @@ TsInitAddBA( ActivateBAEntry(ieee, pBA, BA_SETUP_TIMEOUT); - ieee80211_send_ADDBAReq(ieee, pTS->TsCommonInfo.Addr, pBA); + ieee80211_send_ADDBAReq(ieee, pTS->TsCommonInfo.addr, pBA); } void @@ -646,7 +646,7 @@ TsInitDelBA(struct ieee80211_device *ieee, struct ts_common_info *pTsCommonInfo, if (TxTsDeleteBA(ieee, pTxTs)) ieee80211_send_DELBA( ieee, - pTsCommonInfo->Addr, + pTsCommonInfo->addr, (pTxTs->TxAdmittedBARecord.bValid)?(>TxAdmittedBARecord):(>TxPendingBARecord), TxRxSelect, DELBA_REASON_END_BA); @@ -655,7 +655,7 @@ TsInitDelBA(struct ieee80211_device *ieee, struct ts_common_info *pTsCommonInfo, if (RxTsDeleteBA(ieee, pRxTs)) ieee80211_send_DELBA( ieee, - pTsCommonInfo->Addr, + pTsCommonInfo->addr, >RxAdmittedBARecord, TxRxSelect, DELBA_REASON_END_BA); @@ -683,7 +683,7 @@ void TxBaInactTimeout(struct timer_list *t) TxTsDeleteBA(ieee, pTxTs); ieee80211_send_DELBA( ieee, - pTxTs->TsCommonInfo.Addr, + pTxTs->TsCommonInfo.addr, >TxAdmittedBARecord, TX_DIR, DELBA_REASON_TIMEOUT); @@ -697,7 +697,7 @@ void RxBaInactTimeout(struct timer_list *t) RxTsDeleteBA(ieee, pRxTs); ieee80211_send_DELBA( ieee, - pRxTs->TsCommonInfo.Addr, + pRxTs->TsCommonInfo.addr, >RxAdmittedBARecord, RX_DIR, DELBA_REASON_TIMEOUT); diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 1613c3c5d447..62c6fc513540 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -19,7 +19,7 @@ struct ts_common_info { struct list_headlist; struct timer_list setup_timer; struct timer_list inact_timer; - u8 Addr[6]; + u8 addr[6]; TSPEC_BODY TSpec; QOS_TCLAS TClass[TCLAS_NUM]; u8 TClasProc; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index 175d91218af8..72a5fc0bbb47 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -103,7 +103,7 @@ static void TsAddBaProcess(struct timer_list *t) static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo) { - eth_zero_addr(pTsCommonInfo->Addr); + eth_zero_addr(pTsCommonInfo->addr); memset(>TSpec, 0, sizeof(TSPEC_BODY)); memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); pTsCommonInfo->TClasProc = 0; @@ -247,7 +247,7 @@ static struct ts_common_info *SearchAdmitTRStream(struct ieee80211_device *ieee, continue; list_for_each_entry(pRet, psearch_list, list){ // IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, pRet->TSpec.f.TSInfo.field.ucDirection); - if (memcmp(pRet->Addr, Addr, 6) == 0) + if (memcmp(pRet->addr, Addr, 6) == 0) if (pRet->TSpec.f.TSInfo.field.ucTSID == TID) if(pRet->TSpec.f.TSInfo.field.ucDirection == dir) { // printk("Bingo! got it\n"); @@ -273,7 +273,7 @@ static void MakeTSEntry(struct ts_common_info *pTsCommonInfo, u8 *Addr, if(pTsCommonInfo == NULL) return; -
[PATCH v2 10/10] staging:rtl8192u: Rename TClasNum > t_clas_num - Style
Rename the struct TS_COMMON_INFO member variable TClasNum to t_clas_num. This change clears the checkpatch issue with CamelCase naming. There should be no impact on runtime execution. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index a183198afb31..3da1ef6ef105 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -23,7 +23,7 @@ struct ts_common_info { TSPEC_BODY t_spec; QOS_TCLAS t_class[TCLAS_NUM]; u8 t_clas_proc; - u8 TClasNum; + u8 t_clas_num; }; typedef struct _TX_TS_RECORD { diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index 8b2bb0a69b01..4b2da7f31166 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -107,7 +107,7 @@ static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo) memset(>t_spec, 0, sizeof(TSPEC_BODY)); memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); pTsCommonInfo->t_clas_proc = 0; - pTsCommonInfo->TClasNum = 0; + pTsCommonInfo->t_clas_num = 0; } static void ResetTxTsEntry(PTX_TS_RECORD pTS) @@ -282,7 +282,7 @@ static void MakeTSEntry(struct ts_common_info *pTsCommonInfo, u8 *Addr, memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, sizeof(QOS_TCLAS)); pTsCommonInfo->t_clas_proc = TCLAS_Proc; - pTsCommonInfo->TClasNum = TCLAS_Num; + pTsCommonInfo->t_clas_num = TCLAS_Num; } -- 2.18.0
[PATCH v2 00/10] staging:rtl8192u: Coding style changes.
Simple coding style changes to avoid CamelCase. John Whitmore (10): staging:rtl8192u: remove typedef of enumeration TR_SELECT - Style staging:rtl8192u: remove typedef of struct TS_COMMON_INFO - Style staging:rtl8192u: Rename List > list - Coding style staging:rtl8192u: rename SetupTimer > setup_timer - Style staging:rtl8192u: Rename InactTimer > inact_timer - Style staging:rtl8192u: Rename Addr > addr - Style staging:rtl8192u: Rename TSpec > t_spec - Style staging:rtl8192u: Rename TClass > t_class - Style staging:trl8192u: Rename TClasProc > t_clas_proc - Style staging:rtl8192u: Rename TClasNum > t_clas_num - Style .../staging/rtl8192u/ieee80211/ieee80211.h| 6 +- .../staging/rtl8192u/ieee80211/ieee80211_rx.c | 4 +- .../staging/rtl8192u/ieee80211/ieee80211_tx.c | 4 +- .../rtl8192u/ieee80211/rtl819x_BAProc.c | 30 ++-- .../staging/rtl8192u/ieee80211/rtl819x_TS.h | 28 ++-- .../rtl8192u/ieee80211/rtl819x_TSProc.c | 142 +- 6 files changed, 107 insertions(+), 107 deletions(-) -- 2.18.0
[PATCH v2 07/10] staging:rtl8192u: Rename TSpec > t_spec - Style
Rename the TS_COMMON_INFO structure's member TSpec to t_spec. This change clears the checkpatch issue with CamelCase naming of variables. There should be no impact on runtime execution. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c index 297c498aef6a..2c7d3ab1b5f7 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c @@ -626,7 +626,7 @@ TsInitAddBA( pBA->DialogToken++; // DialogToken: Only keep the latest dialog token pBA->BaParamSet.field.AMSDU_Support = 0;// Do not support A-MSDU with A-MPDU now!! pBA->BaParamSet.field.BAPolicy = Policy;// Policy: Delayed or Immediate - pBA->BaParamSet.field.TID = pTS->TsCommonInfo.TSpec.f.TSInfo.field.ucTSID; // TID + pBA->BaParamSet.field.TID = pTS->TsCommonInfo.t_spec.f.TSInfo.field.ucTSID; // TID // BufferSize: This need to be set according to A-MPDU vector pBA->BaParamSet.field.BufferSize = 32; // BufferSize: This need to be set according to A-MPDU vector pBA->BaTimeoutValue = 0;// Timeout value: Set 0 to disable Timer diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 62c6fc513540..4c1b2e27cf94 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -20,7 +20,7 @@ struct ts_common_info { struct timer_list setup_timer; struct timer_list inact_timer; u8 addr[6]; - TSPEC_BODY TSpec; + TSPEC_BODY t_spec; QOS_TCLAS TClass[TCLAS_NUM]; u8 TClasProc; u8 TClasNum; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index 72a5fc0bbb47..f9f4196f43fa 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -104,7 +104,7 @@ static void TsAddBaProcess(struct timer_list *t) static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo) { eth_zero_addr(pTsCommonInfo->addr); - memset(>TSpec, 0, sizeof(TSPEC_BODY)); + memset(>t_spec, 0, sizeof(TSPEC_BODY)); memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); pTsCommonInfo->TClasProc = 0; pTsCommonInfo->TClasNum = 0; @@ -248,8 +248,8 @@ static struct ts_common_info *SearchAdmitTRStream(struct ieee80211_device *ieee, list_for_each_entry(pRet, psearch_list, list){ // IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, pRet->TSpec.f.TSInfo.field.ucDirection); if (memcmp(pRet->addr, Addr, 6) == 0) - if (pRet->TSpec.f.TSInfo.field.ucTSID == TID) - if(pRet->TSpec.f.TSInfo.field.ucDirection == dir) { + if (pRet->t_spec.f.TSInfo.field.ucTSID == TID) + if(pRet->t_spec.f.TSInfo.field.ucDirection == dir) { // printk("Bingo! got it\n"); break; } @@ -276,7 +276,7 @@ static void MakeTSEntry(struct ts_common_info *pTsCommonInfo, u8 *Addr, memcpy(pTsCommonInfo->addr, Addr, 6); if(pTSPEC != NULL) - memcpy((u8 *)(&(pTsCommonInfo->TSpec)), (u8 *)pTSPEC, sizeof(TSPEC_BODY)); + memcpy((u8 *)(&(pTsCommonInfo->t_spec)), (u8 *)pTSPEC, sizeof(TSPEC_BODY)); for(count = 0; count < TCLAS_Num; count++) memcpy((u8 *)(&(pTsCommonInfo->TClass[count])), (u8 *)pTCLAS, sizeof(QOS_TCLAS)); -- 2.18.0
[PATCH v2 08/10] staging:rtl8192u: Rename TClass > t_class - Style
Rename the struct TS_COMMON_INFO member variable from TClass to t_class. This change clears the checkpatch issue with CamelCase Variable names. There should be no impact on runtime execution. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 4c1b2e27cf94..3bf48a04a68e 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -21,7 +21,7 @@ struct ts_common_info { struct timer_list inact_timer; u8 addr[6]; TSPEC_BODY t_spec; - QOS_TCLAS TClass[TCLAS_NUM]; + QOS_TCLAS t_class[TCLAS_NUM]; u8 TClasProc; u8 TClasNum; }; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index f9f4196f43fa..b5fede650b81 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -105,7 +105,7 @@ static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo) { eth_zero_addr(pTsCommonInfo->addr); memset(>t_spec, 0, sizeof(TSPEC_BODY)); - memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); + memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM); pTsCommonInfo->TClasProc = 0; pTsCommonInfo->TClasNum = 0; } @@ -279,7 +279,7 @@ static void MakeTSEntry(struct ts_common_info *pTsCommonInfo, u8 *Addr, memcpy((u8 *)(&(pTsCommonInfo->t_spec)), (u8 *)pTSPEC, sizeof(TSPEC_BODY)); for(count = 0; count < TCLAS_Num; count++) - memcpy((u8 *)(&(pTsCommonInfo->TClass[count])), (u8 *)pTCLAS, sizeof(QOS_TCLAS)); + memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, sizeof(QOS_TCLAS)); pTsCommonInfo->TClasProc = TCLAS_Proc; pTsCommonInfo->TClasNum = TCLAS_Num; -- 2.18.0
[PATCH v2 05/10] staging:rtl8192u: Rename InactTimer > inact_timer - Style
Rename the struct TS_COMMON_INFO member InactTimer to inact_timer. This change clears the checkpatch issue with CamelCase naming. The change should not have any impact on runtime execution. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 10 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 26384566d4f3..1613c3c5d447 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -18,7 +18,7 @@ enum tr_select { struct ts_common_info { struct list_headlist; struct timer_list setup_timer; - struct timer_list InactTimer; + struct timer_list inact_timer; u8 Addr[6]; TSPEC_BODY TSpec; QOS_TCLAS TClass[TCLAS_NUM]; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index efe8392043c1..175d91218af8 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -148,7 +148,7 @@ void TSInitialize(struct ieee80211_device *ieee) // DLS related timer will be add here in the future!! timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut, 0); - timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout, + timer_setup(>TsCommonInfo.inact_timer, TsInactTimeout, 0); timer_setup(>TsAddBaTimer, TsAddBaProcess, 0); timer_setup(>TxPendingBARecord.Timer, BaSetupTimeOut, @@ -169,7 +169,7 @@ void TSInitialize(struct ieee80211_device *ieee) INIT_LIST_HEAD(>RxPendingPktList); timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut, 0); - timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout, + timer_setup(>TsCommonInfo.inact_timer, TsInactTimeout, 0); timer_setup(>RxAdmittedBARecord.Timer, RxBaInactTimeout, 0); @@ -194,10 +194,10 @@ static void AdmitTS(struct ieee80211_device *ieee, struct ts_common_info *pTsCommonInfo, u32 InactTime) { del_timer_sync(>setup_timer); - del_timer_sync(>InactTimer); + del_timer_sync(>inact_timer); if(InactTime!=0) - mod_timer(>InactTimer, + mod_timer(>inact_timer, jiffies + msecs_to_jiffies(InactTime)); } @@ -413,7 +413,7 @@ static void RemoveTsEntry(struct ieee80211_device *ieee, struct ts_common_info * //u32 flags = 0; unsigned long flags = 0; del_timer_sync(>setup_timer); - del_timer_sync(>InactTimer); + del_timer_sync(>inact_timer); TsInitDelBA(ieee, pTs, TxRxSelect); if(TxRxSelect == RX_DIR) { -- 2.18.0
[PATCH v2 04/10] staging:rtl8192u: rename SetupTimer > setup_timer - Style
Rename the struct TS_COMMON_INFO member SetupTimer to setup_timer. This clears the checkpatch issue with CamelCase variable names. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 0a0420aa91cf..26384566d4f3 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -17,7 +17,7 @@ enum tr_select { struct ts_common_info { struct list_headlist; - struct timer_list SetupTimer; + struct timer_list setup_timer; struct timer_list InactTimer; u8 Addr[6]; TSPEC_BODY TSpec; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index 24abdad52ed6..efe8392043c1 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -146,7 +146,7 @@ void TSInitialize(struct ieee80211_device *ieee) pTxTS->num = count; // The timers for the operation of Traffic Stream and Block Ack. // DLS related timer will be add here in the future!! - timer_setup(>TsCommonInfo.SetupTimer, TsSetupTimeOut, + timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut, 0); timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout, 0); @@ -167,7 +167,7 @@ void TSInitialize(struct ieee80211_device *ieee) for(count = 0; count < TOTAL_TS_NUM; count++) { pRxTS->num = count; INIT_LIST_HEAD(>RxPendingPktList); - timer_setup(>TsCommonInfo.SetupTimer, TsSetupTimeOut, + timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut, 0); timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout, 0); @@ -193,7 +193,7 @@ void TSInitialize(struct ieee80211_device *ieee) static void AdmitTS(struct ieee80211_device *ieee, struct ts_common_info *pTsCommonInfo, u32 InactTime) { - del_timer_sync(>SetupTimer); + del_timer_sync(>setup_timer); del_timer_sync(>InactTimer); if(InactTime!=0) @@ -412,7 +412,7 @@ static void RemoveTsEntry(struct ieee80211_device *ieee, struct ts_common_info * { //u32 flags = 0; unsigned long flags = 0; - del_timer_sync(>SetupTimer); + del_timer_sync(>setup_timer); del_timer_sync(>InactTimer); TsInitDelBA(ieee, pTs, TxRxSelect); -- 2.18.0
[PATCH v2 03/10] staging:rtl8192u: Rename List > list - Coding style
In struct TS_COMMON_INFO rename the member List to list. This clears the checkpatch issue concerning CamelCase naming of variables. Signed-off-by: John Whitmore --- .../staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +- .../rtl8192u/ieee80211/rtl819x_TSProc.c | 64 +-- 2 files changed, 33 insertions(+), 33 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 5d2fb52a8072..0a0420aa91cf 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -16,7 +16,7 @@ enum tr_select { }; struct ts_common_info { - struct list_headList; + struct list_headlist; struct timer_list SetupTimer; struct timer_list InactTimer; u8 Addr[6]; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index fd3347dca025..24abdad52ed6 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c @@ -156,7 +156,7 @@ void TSInitialize(struct ieee80211_device *ieee) timer_setup(>TxAdmittedBARecord.Timer, TxBaInactTimeout, 0); ResetTxTsEntry(pTxTS); - list_add_tail(>TsCommonInfo.List, >Tx_TS_Unused_List); + list_add_tail(>TsCommonInfo.list, >Tx_TS_Unused_List); pTxTS++; } @@ -175,7 +175,7 @@ void TSInitialize(struct ieee80211_device *ieee) RxBaInactTimeout, 0); timer_setup(>RxPktPendingTimer, RxPktPendingTimeout, 0); ResetRxTsEntry(pRxTS); - list_add_tail(>TsCommonInfo.List, >Rx_TS_Unused_List); + list_add_tail(>TsCommonInfo.list, >Rx_TS_Unused_List); pRxTS++; } // Initialize unused Rx Reorder List. @@ -245,7 +245,7 @@ static struct ts_common_info *SearchAdmitTRStream(struct ieee80211_device *ieee, for(dir = 0; dir <= DIR_BI_DIR; dir++) { if (!search_dir[dir]) continue; - list_for_each_entry(pRet, psearch_list, List){ + list_for_each_entry(pRet, psearch_list, list){ // IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, pRet->TSpec.f.TSInfo.field.ucDirection); if (memcmp(pRet->Addr, Addr, 6) == 0) if (pRet->TSpec.f.TSInfo.field.ucTSID == TID) @@ -254,11 +254,11 @@ static struct ts_common_info *SearchAdmitTRStream(struct ieee80211_device *ieee, break; } } - if(>List != psearch_list) + if(>list != psearch_list) break; } - if(>List != psearch_list) + if(>list != psearch_list) return pRet ; else return NULL; @@ -371,8 +371,8 @@ bool GetTs( ((TxRxSelect==TX_DIR)?DIR_UP:DIR_DOWN); IEEE80211_DEBUG(IEEE80211_DL_TS, "to add Ts\n"); if(!list_empty(pUnusedList)) { - (*ppTS) = list_entry(pUnusedList->next, struct ts_common_info, List); - list_del_init(&(*ppTS)->List); + (*ppTS) = list_entry(pUnusedList->next, struct ts_common_info, list); + list_del_init(&(*ppTS)->list); if(TxRxSelect==TX_DIR) { PTX_TS_RECORD tmp = container_of(*ppTS, TX_TS_RECORD, TsCommonInfo); ResetTxTsEntry(tmp); @@ -395,7 +395,7 @@ bool GetTs( MakeTSEntry(*ppTS, Addr, , NULL, 0, 0); AdmitTS(ieee, *ppTS, 0); - list_add_tail(&((*ppTS)->List), pAddmitList); + list_add_tail(&((*ppTS)->list), pAddmitList); // if there is DirectLink, we need to do additional operation here!! return true; @@ -457,36 +457,36 @@ void RemovePeerTS(struct ieee80211_device *ieee, u8 *Addr) struct ts_common_info *pTS, *pTmpTS; printk("===>RemovePeerTS,%pM\n", Addr); - list_for_each_entry_safe(pTS, pTmpTS, >Tx_TS_Pending_List, List) { + list_for_each_entry_safe(pTS, pTmpTS, >Tx_TS_Pending_List, list) { if (memcmp(pTS->Addr, Addr, 6) == 0) { RemoveTsEntry(ieee, pTS, TX_DIR); - list_del_init(>List); -
[PATCH v2 02/10] staging:rtl8192u: remove typedef of struct TS_COMMON_INFO - Style
To clear a checkpatch issue removed the typedef of the structure TS_COMMON_INFO. This change removes the previous declaration, which defined two types, both TS_COMMON_INFO and a pointer type PTS_COMMON_INFO: typedef struct _TS_COMMON_INFO { ... } TS_COMMON_INFO, *PTS_COMMON_INFO; The pointer type has been completely removed from the code, as: "(so-called Hungarian notation) is brain damaged" according to the coding standard. Signed-off-by: John Whitmore --- .../staging/rtl8192u/ieee80211/ieee80211.h| 4 ++-- .../staging/rtl8192u/ieee80211/ieee80211_rx.c | 4 ++-- .../staging/rtl8192u/ieee80211/ieee80211_tx.c | 4 ++-- .../rtl8192u/ieee80211/rtl819x_BAProc.c | 10 .../staging/rtl8192u/ieee80211/rtl819x_TS.h | 8 +++ .../rtl8192u/ieee80211/rtl819x_TSProc.c | 24 +-- 6 files changed, 27 insertions(+), 27 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h b/drivers/staging/rtl8192u/ieee80211/ieee80211.h index e65c9967b627..bb4bb68bb3dd 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h @@ -2391,7 +2391,7 @@ int ieee80211_rx_DELBA(struct ieee80211_device *ieee, struct sk_buff *skb); void TsInitAddBA(struct ieee80211_device *ieee, PTX_TS_RECORD pTS, u8 Policy, u8 bOverwritePending); void TsInitDelBA(struct ieee80211_device *ieee, -PTS_COMMON_INFO pTsCommonInfo, enum tr_select TxRxSelect); +struct ts_common_info *pTsCommonInfo, enum tr_select TxRxSelect); void BaSetupTimeOut(struct timer_list *t); void TxBaInactTimeout(struct timer_list *t); void RxBaInactTimeout(struct timer_list *t); @@ -2399,7 +2399,7 @@ void ResetBaEntry(PBA_RECORD pBA); //function in TS.c bool GetTs( struct ieee80211_device *ieee, - PTS_COMMON_INFO *ppTS, + struct ts_common_info **ppTS, u8 *Addr, u8 TID, enum tr_select TxRxSelect, //Rx:1, Tx:0 diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c index 172165f4461a..a84de8c52a2b 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c @@ -1021,7 +1021,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, //IEEE80211_DEBUG(IEEE80211_DL_REORDER,"%s(): QOS ENABLE AND RECEIVE QOS DATA , we will get Ts, tid:%d\n",__func__, tid); if(GetTs( ieee, - (PTS_COMMON_INFO *) , + (struct ts_common_info **) , hdr->addr2, Frame_QoSTID((u8 *)(skb->data)), RX_DIR, @@ -1266,7 +1266,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct sk_buff *skb, { TID = Frame_QoSTID(skb->data); SeqNum = WLAN_GET_SEQ_SEQ(sc); - GetTs(ieee,(PTS_COMMON_INFO *) ,hdr->addr2,TID,RX_DIR,true); + GetTs(ieee,(struct ts_common_info **) ,hdr->addr2,TID,RX_DIR,true); if (TID !=0 && TID !=3) { ieee->bis_any_nonbepkts = true; diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c index 9a1a84548bc6..010b53b45825 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c @@ -330,7 +330,7 @@ static void ieee80211_tx_query_agg_cap(struct ieee80211_device *ieee, } if(pHTInfo->bCurrentAMPDUEnable) { - if (!GetTs(ieee, (PTS_COMMON_INFO *)(), hdr->addr1, skb->priority, TX_DIR, true)) + if (!GetTs(ieee, (struct ts_common_info **)(), hdr->addr1, skb->priority, TX_DIR, true)) { printk("===>can't get TS\n"); return; @@ -585,7 +585,7 @@ static void ieee80211_query_seqnum(struct ieee80211_device *ieee, if (IsQoSDataFrame(skb->data)) //we deal qos data only { PTX_TS_RECORD pTS = NULL; - if (!GetTs(ieee, (PTS_COMMON_INFO *)(), dst, skb->priority, TX_DIR, true)) + if (!GetTs(ieee, (struct ts_common_info **)(), dst, skb->priority, TX_DIR, true)) { return; } diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c index 04779da803e4..950f827a1740 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c @@ -360,7 +360,7 @@ int ieee80211_rx_ADDBAReq(struct ieee80211_device *ieee, struct sk_buff *skb) // If there is no
[PATCH v2 01/10] staging:rtl8192u: remove typedef of enumeration TR_SELECT - Style
To clear a checkpatch issue removed the typedef of the enumeration TR_SELECT this should not impact runtime code as it's only a coding style change. Signed-off-by: John Whitmore --- drivers/staging/rtl8192u/ieee80211/ieee80211.h | 4 ++-- drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c | 10 +- drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 4 ++-- drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 6 +++--- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h b/drivers/staging/rtl8192u/ieee80211/ieee80211.h index 3b7968681f77..e65c9967b627 100644 --- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h +++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h @@ -2391,7 +2391,7 @@ int ieee80211_rx_DELBA(struct ieee80211_device *ieee, struct sk_buff *skb); void TsInitAddBA(struct ieee80211_device *ieee, PTX_TS_RECORD pTS, u8 Policy, u8 bOverwritePending); void TsInitDelBA(struct ieee80211_device *ieee, -PTS_COMMON_INFO pTsCommonInfo, TR_SELECT TxRxSelect); +PTS_COMMON_INFO pTsCommonInfo, enum tr_select TxRxSelect); void BaSetupTimeOut(struct timer_list *t); void TxBaInactTimeout(struct timer_list *t); void RxBaInactTimeout(struct timer_list *t); @@ -2402,7 +2402,7 @@ bool GetTs( PTS_COMMON_INFO *ppTS, u8 *Addr, u8 TID, - TR_SELECT TxRxSelect, //Rx:1, Tx:0 + enum tr_select TxRxSelect, //Rx:1, Tx:0 boolbAddNewTs ); void TSInitialize(struct ieee80211_device *ieee); diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c index 2dc4d0e93948..04779da803e4 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c @@ -174,7 +174,7 @@ static struct sk_buff *ieee80211_ADDBA(struct ieee80211_device *ieee, u8 *Dst, P *function: construct DELBA frame * input: u8* dst //DELBA frame's destination * PBA_RECORD pBA //BA_RECORD entry which stores the necessary information for BA - * TR_SELECT TxRxSelect //TX RX direction + * enum tr_select TxRxSelect //TX RX direction * u16ReasonCode //status code. * output: none * return: sk_buff* skb //return constructed skb to xmit @@ -183,7 +183,7 @@ static struct sk_buff *ieee80211_DELBA( struct ieee80211_device *ieee, u8 *dst, PBA_RECORD pBA, - TR_SELECTTxRxSelect, + enum tr_select TxRxSelect, u16 ReasonCode ) { @@ -290,14 +290,14 @@ static void ieee80211_send_ADDBARsp(struct ieee80211_device *ieee, u8 *dst, *function: send ADDBARSP frame out * input: u8* dst //DELBA frame's destination * PBA_RECORD pBA //BA_RECORD entry which stores the necessary information for BA - * TR_SELECT TxRxSelect //TX or RX + * enum tr_select TxRxSelect //TX or RX * u16ReasonCode //DEL ReasonCode * output: none * notice: If any possible, please hide pBA in ieee. And temporarily use Manage Queue as softmac_mgmt_xmit() usually does / static void ieee80211_send_DELBA(struct ieee80211_device *ieee, u8 *dst, -PBA_RECORD pBA, TR_SELECT TxRxSelect, +PBA_RECORD pBA, enum tr_select TxRxSelect, u16 ReasonCode) { struct sk_buff *skb; @@ -638,7 +638,7 @@ TsInitAddBA( } void -TsInitDelBA(struct ieee80211_device *ieee, PTS_COMMON_INFO pTsCommonInfo, TR_SELECT TxRxSelect) +TsInitDelBA(struct ieee80211_device *ieee, PTS_COMMON_INFO pTsCommonInfo, enum tr_select TxRxSelect) { if (TxRxSelect == TX_DIR) { PTX_TS_RECORD pTxTs = (PTX_TS_RECORD)pTsCommonInfo; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h index 3a0ff08c687a..c6a5ac1e647c 100644 --- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h +++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h @@ -10,10 +10,10 @@ #define TCLAS_NUM 4 /* This define the Tx/Rx directions */ -typedef enum _TR_SELECT { +enum tr_select { TX_DIR = 0, RX_DIR = 1, -} TR_SELECT, *PTR_SELECT; +}; typedef struct _TS_COMMON_INFO { struct list_headList; diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c index
Due Process Unit-July 16 2018
Our Ref:FGN-GOV/IMF/2018 Attention:Beneficiary Instruction was given by the Office of Presidency and United Nations(UN) and also (IMF)to transfer Sum of $10m through ATM Debit Card to you which you can use it in near cash point,shopping mall or banking. You can withdrawal money from your ATM Debit Card from any ATM MACHINE location or center of your choice/nearest to you, in any part of the world.You should reconfirm your address and full name and also telephone number for delivery of your original (A.T.M) Debit Card to you within 48hrs. Note: this compensation includes, Victims of fraudulent activities and payment of inheritance,Contracts and also lottery winning, which was due to you etc Yours faithfully, (Tél:234-81462 71957) CONTACT/Mr.Japheth Benjamin Foreign Payment Due Process Unit.
Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire
On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman wrote: > > If the numbers can be trusted it is actually slower to put the sync in > lock, at least on one of the machines: > > Time > lwsync_sync 84,932,987,977 > sync_lwsync 93,185,930,333 Very funky. > I guess arguably it's not a very macro benchmark, but we have a > context_switch benchmark in the tree[1] which we often use to tune > things, and it degrades badly. It just spins up two threads and has them > ping-pong using yield. I hacked that up to run on x86, and it only is about 5% locking overhead in my profiles. It's about 18% __switch_to, and a lot of system call entry/exit, but not a lot of locking. I'm actually surprised it is even that much locking, since it seems to be single-cpu, so there should be no contention and the lock (which seems to be rq = this_rq(); rq_lock(rq, ); in do_sched_yield()) should stay local to the cpu. And for you the locking is apparently even _more_ noticeable. But yes, a 10% regression on that context switch thing is huge. You shouldn't do ping-pong stuff, but people kind of do. Linus
Re: [PATCH v5 07/12] PM / devfreq: export devfreq_class
Hi Chanwoo, On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote: > > Hi, > > > > On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote: > > > >> I didn't see any framework which exporting the class instance. > >> It is very dangerous. Unknown device drivers is able to reset > >> the 'devfreq_class' instance. I can't agree this approach. > > > > While I agree that it is potential dangerous it is actually a common > > practice to export the class: > > > > I tried to find the real usage of exported class instance > and I add the comment for each class instance. Almost exported class > instance are used in the their own director or some exported class > like rio_mport_class/switchtec_class are created from specific device driver > instead of subsystem. > > Only following two cases are used on outside of subsystem directory. > devtmpfs.c and alarmtimer.c are core feature of linux kernel. > > drivers/base/devtmpfs.c uses 'block_class'. > kernel/time/alarmtimer.c uses 'rtc_class'. > > I cannot yet agree this approach due to only block_class and rtc_class. I thought your main concern was that the class is exported, which is what several other subsystems do. That the class isn't used outside of the subsystem directory most likely means that there is no need for it, rather than that it shouldn't be done at all (depending on the type of use of course). In any case not exporting the class object provides a limited protection against potential abuse of the class at best. To use the class API all that is needed is a 'struct device' of a devfreq device, which has a pointer to the class object (dev->class). Theoretically I could register a fake devfreq device to obtain access to the class object, though that doesn't seem a very neat approach ;-) > You added the following comment why devfreq_class instance is necessary. > Actullay, I don't know the best solution right now. But, all device drivers > can be added or removed if driver uses the module type. It is not a problem > for only devfreq instance. Certainly it's not a problem limited to devfreq devices. In many other cases bus notifiers can be used, but since devfreq devices areen't tied to a specific bus this is not an option here. If you really don't want to export the class we could add wrappers for (un)registering a class interface: int devfreq_class_interface_register(struct class_interface *) void devfreq_class_interface_unregister(struct class_interface *) The wrappers would have to assign ci->class since the throttler can't see the class object. Or add notifiers for device addition/removal, though the throttler relies on the behavior of the class_interface which also notifies about devices added before registration. This might not be what other potential users of the notifiers expect. Thanks Matthias > /* > + * devfreq devices can be added and removed at runtime, hence they > + * must also be handled dynamically. The class_interface notifies us > + * whenever a device is added or removed. When the interface is > + * registered ci->add_dev() is called for all existing devfreq > + * devices. > */ > > > > grep "extern struct class " include/linux/ -R > > include/linux/rio.h:extern struct class rio_mport_class; > rio_mport_class is created on drivers/rapidio/rio-drivers.c. > It means that just device driver create the 'rio_mport_class' class > instead of any linux kernel subsystem. > > > include/linux/tty.h:extern struct class *tty_class; > tty_class is not used on outside of drivers/tty > > > include/linux/fb.h:extern struct class *fb_class; > fb_class is not used on outside of drivers/video/fbdev > > > include/linux/ide.h:extern struct class *ide_port_class; > ide_port_class is not used on outside of drivers/ide. > > > include/linux/device.h:extern struct class * __must_check > > __class_create(struct module *owner, > > > include/linux/devfreq.h:extern struct class *devfreq_class; > not yet > > > include/linux/switchtec.h:extern struct class *switchtec_class; > switchtec_class is created on drivers/pci/switch/switchtec.c > and then switchtec_class is only used on > drivers/ntb/hw/mscc/ntb_hw_switchtec.c. > It is not subsystem. Just switchtec.c device driver makes the their own class. > > > include/linux/input.h:extern struct class input_class; > input_class is not used on outside of drivers/input. > > > include/linux/power_supply.h:extern struct class *power_supply_class; > power_supply_class is not used on outside of drivers/power/supply. > > > > include/linux/genhd.h:extern struct class block_class; > drivers/base/devtmpfs.c uses 'block_class'. > > > include/linux/rtc.h:extern struct class *rtc_class; > kernel/time/alarmtimer.c uses 'rtc_class'. > > > > > struct class_interface and class_interface_register() would be > > pointless without exported classes. > > > > My
Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver
On 11.06.2018 14:33, David Hildenbrand wrote: > On 11.06.2018 13:56, Michal Hocko wrote: >> On Mon 11-06-18 13:53:49, David Hildenbrand wrote: >>> On 24.05.2018 23:07, David Hildenbrand wrote: On 24.05.2018 16:22, Michal Hocko wrote: > I will go over the rest of the email later I just wanted to make this > point clear because I suspect we are talking past each other. It sounds like we are now talking about how to solve the problem. I like that :) >>> >>> Hi Michal, >>> >>> did you have time to think about the details of your proposed idea? >> >> Not really. Sorry about that. It's been busy time. I am planning to >> revisit after merge window closes. >> > > Sure no worries, I still have a bunch of other things to work on. But it > would be nice to clarify soon in which direction I have to head to get > this implemented and upstream (e.g. what I proposed, what you proposed > or maybe something different). > I would really like to make progress here. I pointed out basic problems/questions with the proposed alternative. I think I answered all your questions. But you also said that you are not going to accept the current approach. So some decision has to be made. Although it's very demotivating and frustrating (I hope not all work in the MM area will be like this), if there is no guidance on how to proceed, I'll have to switch to adding/removing/onlining/offlining whole segments. This is not what I want, but maybe this has a higher chance of getting reviews/acks. Understanding that you are busy, please if you make suggestions, follow up on responses. -- Thanks, David / dhildenb
Re: [PATCH 4.4 00/43] 4.4.141-stable review
On Mon, Jul 16, 2018 at 09:36:05AM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.4.141 release. > There are 43 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed Jul 18 07:34:50 UTC 2018. > Anything received after that time might be too late. > Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 149 pass: 149 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.17 00/67] 4.17.7-stable review
On Mon, Jul 16, 2018 at 09:33:38AM -0700, Guenter Roeck wrote: > On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.17.7 release. > > There are 67 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Wed Jul 18 07:34:11 UTC 2018. > > Anything received after that time might be too late. > > > > Build results: > total: 134 pass: 134 fail: 0 > Qemu test results: > total: 171 pass: 159 fail: 12 > Failed tests: > i386:Broadwell:q35:defconfig:smp:rootfs > i386:Skylake-Client:q35:defconfig:smp:rootfs > i386:SandyBridge:q35:defconfig:smp:rootfs > i386:Haswell:pc:defconfig:smp:rootfs > i386:Nehalem:q35:defconfig:smp:rootfs > i386:phenom:pc:defconfig:smp:rootfs > i386:Opteron_G5:q35:defconfig:smp:initrd > i386:Westmere:q35:defconfig:smp:initrd > i386:core2duo:q35:defconfig:nosmp:rootfs > i386:Conroe:pc:defconfig:nosmp:rootfs > i386:Opteron_G1:pc:defconfig:nosmp:initrd > i386:n270:q35:defconfig:nosmp:rootfs > > All 32-bit i386 boot tests crash. > > [0.00] BUG: unable to handle kernel NULL pointer dereference at > > [0.00] *pde = > [0.00] Oops: 0002 [#1] SMP > [0.00] Modules linked in: > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1 > [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24 > [0.00] EIP: zero_resv_unavail+0x9b/0x103 > [0.00] EFLAGS: 0022 CPU: 0 > [0.00] EAX: EBX: ECX: 0008 EDX: > [0.00] ESI: c6bc5e90 EDI: EBP: c6bc5eac ESP: c6bc5e80 > [0.00] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [0.00] CR0: 80050033 CR2: CR3: 06d67000 CR4: 00040690 > [0.00] Call Trace: > [0.00] free_area_init_nodes+0x407/0x44c > [0.00] zone_sizes_init+0x45/0x5c > [0.00] paging_init+0xac/0xaf > [0.00] native_pagetable_init+0xc6/0xce > [0.00] setup_arch+0x9c6/0xb18 > [0.00] start_kernel+0x4f/0x3d2 > [0.00] ? idt_setup_early_handler+0x2b/0x3e > [0.00] i386_start_kernel+0x95/0x99 > [0.00] startup_32_smp+0x164/0x168 Yeah, Linus's tree also crashes from this same reason :( A patch was just posted to hopefully fix this... greg k-h
[PATCH V2 2/2] firmware: ti_sci: Provide host-id as an optional dt parameter
Texas Instrument's System Control Interface (TISCI) permits the ability for Operating Systems to running in virtual machines to be able to independently communicate with the firmware without the need going through an hypervisor. The "host-id" in effect is the hardware representation of the host (example: VMs locked to a core) as identified to the System Controller. Provide support as an optional parameter implementation and use the compatible data as default if one is not provided by device tree. Signed-off-by: Nishanth Menon --- Changes since V1: * None V1: https://patchwork.kernel.org/patch/10475309/ RFC: https://patchwork.kernel.org/patch/10447715/ drivers/firmware/ti_sci.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/firmware/ti_sci.c b/drivers/firmware/ti_sci.c index 7fa744793bc5..69ed1464175c 100644 --- a/drivers/firmware/ti_sci.c +++ b/drivers/firmware/ti_sci.c @@ -66,14 +66,14 @@ struct ti_sci_xfers_info { /** * struct ti_sci_desc - Description of SoC integration - * @host_id: Host identifier representing the compute entity + * @default_host_id: Host identifier representing the compute entity * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds) * @max_msgs: Maximum number of messages that can be pending * simultaneously in the system * @max_msg_size: Maximum size of data per message that can be handled. */ struct ti_sci_desc { - u8 host_id; + u8 default_host_id; int max_rx_timeout_ms; int max_msgs; int max_msg_size; @@ -94,6 +94,7 @@ struct ti_sci_desc { * @chan_rx: Receive mailbox channel * @minfo: Message info * @node: list head + * @host_id: Host ID * @users: Number of users of this instance */ struct ti_sci_info { @@ -110,6 +111,7 @@ struct ti_sci_info { struct mbox_chan *chan_rx; struct ti_sci_xfers_info minfo; struct list_head node; + u8 host_id; /* protected by ti_sci_list_mutex */ int users; @@ -370,7 +372,7 @@ static struct ti_sci_xfer *ti_sci_get_one_xfer(struct ti_sci_info *info, hdr->seq = xfer_id; hdr->type = msg_type; - hdr->host = info->desc->host_id; + hdr->host = info->host_id; hdr->flags = msg_flags; return xfer; @@ -1793,7 +1795,7 @@ static int tisci_reboot_handler(struct notifier_block *nb, unsigned long mode, /* Description for K2G */ static const struct ti_sci_desc ti_sci_pmmc_k2g_desc = { - .host_id = 2, + .default_host_id = 2, /* Conservative duration */ .max_rx_timeout_ms = 1000, /* Limited by MBOX_TX_QUEUE_LEN. K2G can handle upto 128 messages! */ @@ -1819,6 +1821,7 @@ static int ti_sci_probe(struct platform_device *pdev) int ret = -EINVAL; int i; int reboot = 0; + u32 h_id; of_id = of_match_device(ti_sci_of_match, dev); if (!of_id) { @@ -1833,6 +1836,19 @@ static int ti_sci_probe(struct platform_device *pdev) info->dev = dev; info->desc = desc; + ret = of_property_read_u32(dev->of_node, "ti,host-id", _id); + /* if the property is not present in DT, use a default from desc */ + if (ret < 0) { + info->host_id = info->desc->default_host_id; + } else { + if (!h_id) { + dev_warn(dev, "Host ID 0 is reserved for firmware\n"); + info->host_id = info->desc->default_host_id; + } else { + info->host_id = h_id; + } + } + reboot = of_property_read_bool(dev->of_node, "ti,system-reboot-controller"); INIT_LIST_HEAD(>node); -- 2.15.1
[PATCH V2 0/2] firmware: ti_sci: Add host-id as an optional parameter
Please find attached series to enable host-id as an optional dt property. This is a minor update to V1 -> Mostly to pick up Greet's feedback and Rob's Ack. V1: https://patchwork.ozlabs.org/cover/931822/ The series is based on v4.18-rc1 and is available here: https://github.com/nmenon/linux-2.6-playground/commits/upstream/v4.18-rc1/k3-1-am6-tisci Consolidated all patches (including all series) are available here: https://github.com/nmenon/linux-2.6-playground/commits/upstream/v4.18-rc1/k3-am6-integ Full Boot log (integrated of all series for AM654) is available here: https://pastebin.ubuntu.com/p/bBFmnzYtCd/ Nishanth Menon (2): Documentation: dt: keystone: ti-sci: Add optional host-id parameter firmware: ti_sci: Provide host-id as an optional dt parameter .../devicetree/bindings/arm/keystone/ti,sci.txt| 4 drivers/firmware/ti_sci.c | 24 ++ 2 files changed, 24 insertions(+), 4 deletions(-) -- 2.15.1
Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On Mon, Jul 16, 2018 at 9:30 AM, John Stultz wrote: > On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha wrote: >> On 7/13/2018 10:50 PM, John Stultz wrote: >>> On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha On 7/11/2018 1:43 AM, John Stultz wrote: > I worry this upside-down logic is too subtle to be easily reasoned > about, and will just lead to future mistakes. > > Can we instead call this "suspend_timing_needed" and only set it to > true when we don't inject any sleep time on resume? I did not get your point "only set it to true when we don't inject any sleep time on resume? " How do we know this ? This question itself depends on the "sleeptime_injected" if it is true means no need to inject else need to inject. Also, we need to make this variable back and forth true, false; suspends path ensures it to make it false. >>> >>> So yea, I'm not saying logically the code is really any different, >>> this is more of a naming nit. So instead of having a variable that is >>> always on that we occasionally turn off, lets invert the naming and >>> have it be a flag that we occasionally turn on. >> >> >> I understand your concern about the name of the variable will be misleading. >> But the changing Boolean state would not solve the actual issue. >> >> If i understand you correctly you meant below code >> >> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c >> index 32ae9ae..becc5bd 100644 >> --- a/kernel/time/timekeeping.c >> +++ b/kernel/time/timekeeping.c >> @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts) >> * If a suspend fails before reaching timekeeping_resume() then the flag >> * stays true and prevents erroneous sleeptime injection. >> */ >> -static bool sleeptime_injected = true; >> +static bool suspend_timing_needed; >> >> /* Flag for if there is a persistent clock on this platform */ >> static bool persistent_clock_exists; >> @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 >> *delta) >> raw_spin_lock_irqsave(_lock, flags); >> write_seqcount_begin(_core.seq); >> >> - sleeptime_injected = true; >> + suspend_timing_needed = false; >> >> timekeeping_forward_now(tk); >> >> @@ -1714,10 +1714,10 @@ void timekeeping_resume(void) >> tk->tkr_mono.mask); >> nsec = mul_u64_u32_shr(cyc_delta, clock->mult, >> clock->shift); >> ts_delta = ns_to_timespec64(nsec); >> - sleeptime_injected = true; >> + suspend_timing_needed = true; >> } else if (timespec64_compare(_new, _suspend_time) > >> 0) { >> ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time); >> - sleeptime_injected = true; >> + suspend_timing_needed = true; >> } > > No no... This part is wrong. We only set suspend_timing_needed if we > *didn't* calculate the suspend time in timekeeping_resume. > > You have to invert all the boolean logic for it to be equivalent. > ... >> > > > So, I think with the logic bug above it will work out properly, but > let me know if I'm still missing something. Sorry, I meant "with the logic bug above fixed it will work out". thanks -john
[PATCH V2 1/2] Documentation: dt: keystone: ti-sci: Add optional host-id parameter
Texas Instrument's System Control Interface (TISCI) permits the ability for OSs running in virtual machines to be able to independently communicate with the firmware without the need going through an hypervisor. The "host-id" in effect is the hardware representation of the host (example: VMs locked to a core) as identified to the System Controller. Hypervisors can either fill in appropriate host-ids in dt used for each VM instance OR may use prebuilt blobs where the host-ids are pre-populated, as appropriate for the OS running in the VMs. This is introduced as an optional parameter to maintain consistency with legacy device tree blobs. Reviewed-by: Rob Herring Signed-off-by: Nishanth Menon --- Changes since V1: * Dropped the SCSI comment in commit log * Picked up Rob's Reviewed by. V1: https://patchwork.kernel.org/patch/10475311/ RFC: https://patchwork.kernel.org/patch/10447753/ Documentation/devicetree/bindings/arm/keystone/ti,sci.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt index 31f5f9a104cc..b56a02c10ae6 100644 --- a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt +++ b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt @@ -45,11 +45,15 @@ Optional Properties: debug_messages - Map the Debug message region - reg: register space corresponding to the debug_messages - ti,system-reboot-controller: If system reboot can be triggered by SoC reboot +- ti,host-id: Integer value corresponding to the host ID assigned by Firmware + for identification of host processing entities such as virtual + machines Example (K2G): - pmmc: pmmc { compatible = "ti,k2g-sci"; + ti,host-id = <2>; mbox-names = "rx", "tx"; mboxes= < _proxy_pmmc_rx>, < _proxy_pmmc_tx>; -- 2.15.1
[PATCH v6 0/5] sparse_init rewrite
Changelog: v6 - v5 - Removed one more obsolete function, as Oscar noticed - Replaced BUG_ON with WARN_ON, ALIGN with PTR_ALIGN. - Added review-by, and test-by. v5 - v4 - Fixed the issue that was reported on ppc64 when CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed - Consolidated the new buffer allocation between vmemmap and non-vmemmap variants of sparse layout. - Removed all review-by comments, because I had to do significant amount of changes compared to previous version and need another round of review. - I also would appreciate if those who reported problems with PPC64 could test this change. v4 - v3 - Addressed comments from Dave Hansen v3 - v1 - Fixed two issues found by Baoquan He v1 - v2 - Addressed comments from Oscar Salvador In sparse_init() we allocate two large buffers to temporary hold usemap and memmap for the whole machine. However, we can avoid doing that if we changed sparse_init() to operated on per-node bases instead of doing it on the whole machine beforehand. As shown by Baoquan http://lkml.kernel.org/r/20180628062857.29658-1-...@redhat.com The buffers are large enough to cause machine stop to boot on small memory systems. Another benefit of these changes is that they also obsolete CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER. Pavel Tatashin (5): mm/sparse: abstract sparse buffer allocations mm/sparse: use the new sparse buffer functions in non-vmemmap mm/sparse: move buffer init/fini to the common place mm/sparse: add new sparse_init_nid() and sparse_init() mm/sparse: delete old sparse_init and enable new one include/linux/mm.h | 7 +- mm/Kconfig | 4 - mm/sparse-vmemmap.c | 59 + mm/sparse.c | 314 ++-- 4 files changed, 102 insertions(+), 282 deletions(-) -- 2.18.0
[PATCH memory-model 05/14] tools/memory-model: Remove ACCESS_ONCE() from recipes
From: Mark Rutland Since commit: b899a850431e2dd0 ("compiler.h: Remove ACCESS_ONCE()") ... there has been no definition of ACCESS_ONCE() in the kernel tree, and it has been necessary to use READ_ONCE() or WRITE_ONCE() instead. Let's update the exmaples in recipes.txt likewise for consistency, using READ_ONCE() for reads. Signed-off-by: Mark Rutland Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: Akira Yokosawa Acked-by: Andrea Parri Signed-off-by: Paul E. McKenney --- tools/memory-model/Documentation/recipes.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/memory-model/Documentation/recipes.txt b/tools/memory-model/Documentation/recipes.txt index ee4309a87fc4..1fea8ef2b184 100644 --- a/tools/memory-model/Documentation/recipes.txt +++ b/tools/memory-model/Documentation/recipes.txt @@ -322,9 +322,9 @@ the following write-side code fragment: And the xlog_valid_lsn() function in fs/xfs/xfs_log_priv.h contains the corresponding read-side code fragment: - cur_cycle = ACCESS_ONCE(log->l_curr_cycle); + cur_cycle = READ_ONCE(log->l_curr_cycle); smp_rmb(); - cur_block = ACCESS_ONCE(log->l_curr_block); + cur_block = READ_ONCE(log->l_curr_block); Alternatively, consider the following comment in function perf_output_put_handle() in kernel/events/ring_buffer.c: -- 2.17.1
[PATCH memory-model 03/14] MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
From: Palmer Dabbelt Dan runs the RISC-V memory model working group. I've been forwarding him LKMM emails that end up in my inbox, but I'm far from an expert in this stuff. He requested to be added as a reviewer, which seems sane to me as it'll take a human out of the loop. CC: Daniel Lustig Acked-by: Daniel Lustig Acked-by: Andrea Parri Signed-off-by: Palmer Dabbelt Signed-off-by: Paul E. McKenney --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 9d5eeff51b5f..ac8ed55dbe9b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8316,6 +8316,7 @@ M:Jade Alglave M: Luc Maranget M: "Paul E. McKenney" R: Akira Yokosawa +R: Daniel Lustig L: linux-kernel@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git -- 2.17.1
[PATCH memory-model 07/14] tools/memory-model: Make scripts executable
This commit makes the scripts executable to avoid the need for everyone to do so manually in their archive. Signed-off-by: Paul E. McKenney Acked-by: Akira Yokosawa --- tools/memory-model/scripts/checkalllitmus.sh | 2 +- tools/memory-model/scripts/checklitmus.sh| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) mode change 100644 => 100755 tools/memory-model/scripts/checkalllitmus.sh mode change 100644 => 100755 tools/memory-model/scripts/checklitmus.sh diff --git a/tools/memory-model/scripts/checkalllitmus.sh b/tools/memory-model/scripts/checkalllitmus.sh old mode 100644 new mode 100755 index af0aa15ab84e..ca528f9a24d4 --- a/tools/memory-model/scripts/checkalllitmus.sh +++ b/tools/memory-model/scripts/checkalllitmus.sh @@ -9,7 +9,7 @@ # appended. # # Usage: -# sh checkalllitmus.sh [ directory ] +# checkalllitmus.sh [ directory ] # # The LINUX_HERD_OPTIONS environment variable may be used to specify # arguments to herd, whose default is defined by the checklitmus.sh script. diff --git a/tools/memory-model/scripts/checklitmus.sh b/tools/memory-model/scripts/checklitmus.sh old mode 100644 new mode 100755 index e2e477472844..bf12a75c0719 --- a/tools/memory-model/scripts/checklitmus.sh +++ b/tools/memory-model/scripts/checklitmus.sh @@ -8,7 +8,7 @@ # with ".out" appended. # # Usage: -# sh checklitmus.sh file.litmus +# checklitmus.sh file.litmus # # The LINUX_HERD_OPTIONS environment variable may be used to specify # arguments to herd, which default to "-conf linux-kernel.cfg". Thus, -- 2.17.1
[PATCH memory-model 06/14] tools/memory-model: Remove ACCESS_ONCE() from model
From: Mark Rutland Since commit: b899a850431e2dd0 ("compiler.h: Remove ACCESS_ONCE()") ... there has been no definition of ACCESS_ONCE() in the kernel tree, and it has been necessary to use READ_ONCE() or WRITE_ONCE() instead. Correspondingly, let's remove ACCESS_ONCE() from the kernel memory model. Signed-off-by: Mark Rutland Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: Akira Yokosawa Acked-by: Andrea Parri Signed-off-by: Paul E. McKenney --- tools/memory-model/linux-kernel.bell | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell index 64f5740e0e75..b84fb2f67109 100644 --- a/tools/memory-model/linux-kernel.bell +++ b/tools/memory-model/linux-kernel.bell @@ -13,7 +13,7 @@ "Linux-kernel memory consistency model" -enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) || +enum Accesses = 'once (*READ_ONCE,WRITE_ONCE*) || 'release (*smp_store_release*) || 'acquire (*smp_load_acquire*) || 'noreturn (* R of non-return RMW *) -- 2.17.1
Re: [PATCH v6] pidns: introduce syscall translate_pid
On 06/01/2018 12:18 PM, Konstantin Khlebnikov wrote: Each process have different pids, one for each pid namespace it belongs. When interaction happens within single pid-ns translation isn't required. More complicated scenarios needs special handling. For example: - reading pid-files or logs written inside container with pid namespace - writing logs with internal pids outside container for pushing them into - attaching with ptrace to tasks from different pid namespace Generally speaking, any cross pid-ns API with pids needs translation. Currently there are several interfaces that could be used here: Pid namespaces are identified by device and inode of /proc/[pid]/ns/pid. Pids for nested pid namespaces are shown in file /proc/[pid]/status. In some cases pid translation could be easily done using this information. Backward translation requires scanning all tasks and becomes really complicated for deeper namespace nesting. Unix socket automatically translates pid attached to SCM_CREDENTIALS. This requires CAP_SYS_ADMIN for sending arbitrary pids and entering into pid namespace, this expose process and could be insecure. This patch adds new syscall for converting pids between pid namespaces: pid_t translate_pid(pid_t pid, int source, int target); Pid-namespaces are referred file descriptors opened to proc files /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. Negative argument points to current pid namespace. Syscall returns pid in target pid-ns or zero if task have no pid there. Error codes: EBADF- file descriptor is closed EINVAL - file descriptor isn't pid namespace ESRCH- task not found in @source namespace Translation could breach pid-ns isolation and return pids from outer pid namespaces iff process already has file descriptor for these namespaces. Examples: translate_pid(pid, ns, -1) - get pid in our pid namespace translate_pid(pid, -1, ns) - get pid in other pid namespace translate_pid(1, ns, -1)- get pid of init task for namespace translate_pid(pid, -1, ns) > 0 - is pid is reachable from ns? translate_pid(1, ns1, ns2) > 0 - is ns1 inside ns2? translate_pid(1, ns1, ns2) == 0 - is ns1 outside ns2? translate_pid(1, ns1, ns2) == 1 - is ns1 equal ns2? Signed-off-by: Konstantin Khlebnikov Reanimated-by: Nagarathnam Muthusamy --- v1: https://lkml.org/lkml/2015/9/15/411 v2: https://lkml.org/lkml/2015/9/24/278 * use namespace-fd as second/third argument * add -pid for getting parent pid * move code into kernel/sys.c next to getppid * drop ifdef CONFIG_PID_NS * add generic syscall v3: https://lkml.org/lkml/2015/9/28/3 * use proc_ns_fdget() * update description * rebase to next-20150925 * fix conflict with mlock2 v4: https://lkml.org/lkml/2017/10/13/177 * rename from getvpid() into translate_pid() * remove syscall if CONFIG_PID_NS=n * drop -pid for parent task * drop fget-fdget optimizations * add helper get_pid_ns_by_fd() * wire only into x86 v5: https://lkml.org/lkml/2018/4/4/677 * rewrite commit message * resolve pidns by task pid or by pidns fd * add arguments source_type and target_type v6: * revert back minimized v4 design * rebase to next-20180601 * fix COND_SYSCALL stub * use next syscall number, old used for io_pgetevents --- sample tool --- #define _GNU_SOURCE #include #include #include #include #include #include #include #ifndef SYS_translate_pid #ifdef __x86_64__ #define SYS_translate_pid 334 #elif defined __i386__ #define SYS_translate_pid 386 #endif #endif pid_t translate_pid(pid_t pid, int source, int target) { return syscall(SYS_translate_pid, pid, source, target); } int main(int argc, char **argv) { int pid, source, target; char buf[64]; if (argc != 4) errx(1, "usage: %s ", argv[0]); pid = atoi(argv[1]); source = atoi(argv[2]); target = atoi(argv[3]); if (source > 0) { snprintf(buf, sizeof(buf), "/proc/%d/ns/pid", source); source = open(buf, O_RDONLY); if (source < 0) err(2, "open source %s", buf); } if (target > 0) { snprintf(buf, sizeof(buf), "/proc/%d/ns/pid", target); target = open(buf, O_RDONLY); if (target < 0) err(2, "open target %s", buf); } pid = translate_pid(pid, source, target); if (pid < 0) err(2, "translate_pid"); printf("%d\n", pid); return 0; } --- --- arch/x86/entry/syscalls/syscall_32.tbl |1 arch/x86/entry/syscalls/syscall_64.tbl |1 include/linux/syscalls.h |1 kernel/pid_namespace.c | 66 kernel/sys_ni.c|3 + 5 files changed, 72 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index
[PATCH memory-model 12/14] doc: Update wake_up() & co. memory-barrier guarantees
From: Andrea Parri Both the implementation and the users' expectation [1] for the various wakeup primitives have evolved over time, but the documentation has not kept up with these changes: brings it into 2018. [1] http://lkml.kernel.org/r/20180424091510.gb4...@hirez.programming.kicks-ass.net Suggested-by: Peter Zijlstra Signed-off-by: Andrea Parri [ aparri: Apply feedback from Alan Stern. ] Cc: Alan Stern Cc: Will Deacon Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: Akira Yokosawa Cc: Daniel Lustig Cc: Jonathan Corbet Cc: Ingo Molnar Acked-by: Peter Zijlstra (Intel) Signed-off-by: Paul E. McKenney --- Documentation/memory-barriers.txt | 43 +++ include/linux/sched.h | 4 +-- kernel/sched/completion.c | 8 +++--- kernel/sched/core.c | 30 + kernel/sched/wait.c | 8 +++--- 5 files changed, 49 insertions(+), 44 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index a02d6bbfc9d0..0d8d7ef131e9 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2179,32 +2179,41 @@ or: event_indicated = 1; wake_up_process(event_daemon); -A write memory barrier is implied by wake_up() and co. if and only if they -wake something up. The barrier occurs before the task state is cleared, and so -sits between the STORE to indicate the event and the STORE to set TASK_RUNNING: +A general memory barrier is executed by wake_up() if it wakes something up. +If it doesn't wake anything up then a memory barrier may or may not be +executed; you must not rely on it. The barrier occurs before the task state +is accessed, in particular, it sits between the STORE to indicate the event +and the STORE to set TASK_RUNNING: - CPU 1 CPU 2 + CPU 1 (Sleeper) CPU 2 (Waker) === === set_current_state();STORE event_indicated smp_store_mb(); wake_up(); - STORE current->state -STORE current->state - LOAD event_indicated + STORE current->state ... + + LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL) + STORE task->state -To repeat, this write memory barrier is present if and only if something -is actually awakened. To see this, consider the following sequence of -events, where X and Y are both initially zero: +where "task" is the thread being woken up and it equals CPU 1's "current". + +To repeat, a general memory barrier is guaranteed to be executed by wake_up() +if something is actually awakened, but otherwise there is no such guarantee. +To see this, consider the following sequence of events, where X and Y are both +initially zero: CPU 1 CPU 2 === === - X = 1; STORE event_indicated + X = 1; Y = 1; smp_mb(); wake_up(); - Y = 1; wait_event(wq, Y == 1); - wake_up();load from Y sees 1, no memory barrier - load from X might see 0 + LOAD Y LOAD X + +If a wakeup does occur, one (at least) of the two loads must see 1. If, on +the other hand, a wakeup does not occur, both loads might see 0. -In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed -to see 1. +wake_up_process() always executes a general memory barrier. The barrier again +occurs before the task state is accessed. In particular, if the wake_up() in +the previous snippet were replaced by a call to wake_up_process() then one of +the two loads would be guaranteed to see 1. The available waker functions include: @@ -2224,6 +2233,8 @@ The available waker functions include: wake_up_poll(); wake_up_process(); +In terms of memory ordering, these functions all provide the same guarantees of +a wake_up() (or stronger). [!] Note that the memory barriers implied by the sleeper and the waker do _not_ order multiple stores before the wake-up with respect to loads of those stored diff --git a/include/linux/sched.h b/include/linux/sched.h index 87bf02d93a27..ddfdeb632f74 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -167,8 +167,8 @@ struct task_group; * need_sleep = false; * wake_up_state(p, TASK_UNINTERRUPTIBLE); * - * Where wake_up_state() (and all other wakeup primitives) imply enough - * barriers to order the store of the variable against wakeup. + * where wake_up_state() executes a full memory barrier before accessing the +
[PATCH memory-model 14/14] tools/memory-model: Rename litmus tests to comply to norm7
From: Andrea Parri norm7 produces the 'normalized' name of a litmus test, when the test can be generated from a single cycle that passes through each process exactly once. The commit renames such tests in order to comply to the naming scheme implemented by this tool. Signed-off-by: Andrea Parri Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: "Paul E. McKenney" Cc: Akira Yokosawa Signed-off-by: Paul E. McKenney Acked-by: Alan Stern --- tools/memory-model/Documentation/recipes.txt | 8 tools/memory-model/README | 20 +-- ... => IRIW+fencembonceonces+OnceOnce.litmus} | 2 +- ...=> LB+fencembonceonce+ctrlonceonce.litmus} | 2 +- ...+fencewmbonceonce+fencermbonceonce.litmus} | 2 +- ...onces.litmus => R+fencembonceonces.litmus} | 2 +- tools/memory-model/litmus-tests/README| 16 +++ ...> S+fencewmbonceonce+poacquireonce.litmus} | 2 +- ...nces.litmus => SB+fencembonceonces.litmus} | 2 +- ...ooncerelease+fencermbonceonce+Once.litmus} | 2 +- ...e+poacquirerelease+fencembonceonce.litmus} | 2 +- 11 files changed, 30 insertions(+), 30 deletions(-) rename tools/memory-model/litmus-tests/{IRIW+mbonceonces+OnceOnce.litmus => IRIW+fencembonceonces+OnceOnce.litmus} (95%) rename tools/memory-model/litmus-tests/{LB+ctrlonceonce+mbonceonce.litmus => LB+fencembonceonce+ctrlonceonce.litmus} (95%) rename tools/memory-model/litmus-tests/{MP+wmbonceonce+rmbonceonce.litmus => MP+fencewmbonceonce+fencermbonceonce.litmus} (91%) rename tools/memory-model/litmus-tests/{R+mbonceonces.litmus => R+fencembonceonces.litmus} (95%) rename tools/memory-model/litmus-tests/{S+wmbonceonce+poacquireonce.litmus => S+fencewmbonceonce+poacquireonce.litmus} (90%) rename tools/memory-model/litmus-tests/{SB+mbonceonces.litmus => SB+fencembonceonces.litmus} (95%) rename tools/memory-model/litmus-tests/{WRC+pooncerelease+rmbonceonce+Once.litmus => WRC+pooncerelease+fencermbonceonce+Once.litmus} (93%) rename tools/memory-model/litmus-tests/{Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus => Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus} (94%) diff --git a/tools/memory-model/Documentation/recipes.txt b/tools/memory-model/Documentation/recipes.txt index 1fea8ef2b184..af72700cc20a 100644 --- a/tools/memory-model/Documentation/recipes.txt +++ b/tools/memory-model/Documentation/recipes.txt @@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses ordered by locking will be seen as ordered by CPUs not holding that lock. Consider this example: - /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */ + /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */ void CPU0(void) { spin_lock(); @@ -292,7 +292,7 @@ and to use smp_load_acquire() instead of smp_rmb(). However, the older smp_wmb() and smp_rmb() APIs are still heavily used, so it is important to understand their use cases. The general approach is shown below: - /* See MP+wmbonceonce+rmbonceonce.litmus. */ + /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */ void CPU0(void) { WRITE_ONCE(x, 1); @@ -360,7 +360,7 @@ can be seen in the LB+poonceonces.litmus litmus test. One way of avoiding the counter-intuitive outcome is through the use of a control dependency paired with a full memory barrier: - /* See LB+ctrlonceonce+mbonceonce.litmus. */ + /* See LB+fencembonceonce+ctrlonceonce.litmus. */ void CPU0(void) { r0 = READ_ONCE(x); @@ -476,7 +476,7 @@ that one CPU first stores to one variable and then loads from a second, while another CPU stores to the second variable and then loads from the first. Preserving order requires nothing less than full barriers: - /* See SB+mbonceonces.litmus. */ + /* See SB+fencembonceonces.litmus. */ void CPU0(void) { WRITE_ONCE(x, 1); diff --git a/tools/memory-model/README b/tools/memory-model/README index 734f7feaa5dc..ee987ce20aae 100644 --- a/tools/memory-model/README +++ b/tools/memory-model/README @@ -35,13 +35,13 @@ BASIC USAGE: HERD7 The memory model is used, in conjunction with "herd7", to exhaustively explore the state space of small litmus tests. -For example, to run SB+mbonceonces.litmus against the memory model: +For example, to run SB+fencembonceonces.litmus against the memory model: - $ herd7 -conf linux-kernel.cfg litmus-tests/SB+mbonceonces.litmus + $ herd7 -conf linux-kernel.cfg litmus-tests/SB+fencembonceonces.litmus Here is the corresponding output: - Test SB+mbonceonces Allowed + Test SB+fencembonceonces Allowed States 3 0:r0=0; 1:r0=1; 0:r0=1; 1:r0=0; @@ -50,8 +50,8 @@ Here is the corresponding output: Witnesses Positive: 0 Negative: 3 Condition exists (0:r0=0 /\ 1:r0=0) - Observation
[PATCH memory-model 09/14] tools/memory-model: Add informal LKMM documentation to MAINTAINERS
The Linux-kernel memory model has been informal, with a number of text files documenting it. It would be good to make sure that these informal descriptions are kept up to date and/or pruned appropriately. This commit therefore brings more of those text files into the LKMM MAINTAINERS file entry. Signed-off-by: Paul E. McKenney Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: Akira Yokosawa Cc: Daniel Lustig Cc: "David S. Miller" Acked-by: Andrea Parri --- MAINTAINERS | 5 + 1 file changed, 5 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ac8ed55dbe9b..b60d6daad132 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8318,9 +8318,14 @@ M: "Paul E. McKenney" R: Akira Yokosawa R: Daniel Lustig L: linux-kernel@vger.kernel.org +L: linux-a...@vger.kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git F: tools/memory-model/ +F: Documentation/atomic_bitops.txt +F: Documentation/atomic_t.txt +F: Documentation/core-api/atomic_ops.rst +F: Documentation/core-api/refcount-vs-atomic.rst F: Documentation/memory-barriers.txt LINUX SECURITY MODULE (LSM) FRAMEWORK -- 2.17.1
[PATCH memory-model 08/14] docs: atomic_ops: Describe atomic_set as a write operation
From: Jonathan Neuschäfer The atomic_set() and ATOMIC_INIT() operations are writes, so this commit changes their description from "reads" to "writes". Signed-off-by: Jonathan Neuschäfer Signed-off-by: Paul E. McKenney Reviewed-by: Andrea Parri --- Documentation/core-api/atomic_ops.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/core-api/atomic_ops.rst b/Documentation/core-api/atomic_ops.rst index 2e7165f86f55..724583453e1f 100644 --- a/Documentation/core-api/atomic_ops.rst +++ b/Documentation/core-api/atomic_ops.rst @@ -29,7 +29,7 @@ updated by one CPU, local_t is probably more appropriate. Please see local_t. The first operations to implement for atomic_t's are the initializers and -plain reads. :: +plain writes. :: #define ATOMIC_INIT(i) { (i) } #define atomic_set(v, i)((v)->counter = (i)) -- 2.17.1
[PATCH memory-model 11/14] locking: Clarify requirements for smp_mb__after_spinlock()
From: Andrea Parri There are 11 interpretations of the requirements described in the header comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and one currently encoded in the Cat file. Stick to the latter (until a more satisfactory solution is available). This also reworks some snippets related to the barrier to illustrate the requirements and to link them to the idioms which are relied upon at its call sites. Suggested-by: Boqun Feng Signed-off-by: Andrea Parri Acked-by: Peter Zijlstra Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Will Deacon Cc: "Paul E. McKenney" Signed-off-by: Paul E. McKenney --- include/linux/spinlock.h | 53 +++- kernel/sched/core.c | 41 --- 2 files changed, 57 insertions(+), 37 deletions(-) diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 1e8a46435838..d70a06ff2bdd 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -114,29 +114,48 @@ do { \ #endif /*arch_spin_is_contended*/ /* - * This barrier must provide two things: + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier + * between program-order earlier lock acquisitions and program-order later + * memory accesses. * - * - it must guarantee a STORE before the spin_lock() is ordered against a - * LOAD after it, see the comments at its two usage sites. + * This guarantees that the following two properties hold: * - * - it must ensure the critical section is RCsc. + * 1) Given the snippet: * - * The latter is important for cases where we observe values written by other - * CPUs in spin-loops, without barriers, while being subject to scheduling. + * { X = 0; Y = 0; } * - * CPU0CPU1CPU2 + * CPU0 CPU1 * - * for (;;) { - * if (READ_ONCE(X)) - * break; - * } - * X=1 - * - * - * r = X; + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1); + * spin_lock(S); smp_mb(); + * smp_mb__after_spinlock(); r1 = READ_ONCE(X); + * r0 = READ_ONCE(Y); + * spin_unlock(S); * - * without transitivity it could be that CPU1 observes X!=0 breaks the loop, - * we get migrated and CPU2 sees X==0. + * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0) + * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments + * preceding the call to smp_mb__after_spinlock() in __schedule() and in + * try_to_wake_up(). + * + * 2) Given the snippet: + * + * { X = 0; Y = 0; } + * + * CPU0 CPU1CPU2 + * + * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y); + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb(); + * spin_unlock(S);r0 = READ_ONCE(X); r2 = READ_ONCE(X); + * WRITE_ONCE(Y, 1); + * spin_unlock(S); + * + * it is forbidden that CPU0's critical section executes before CPU1's + * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1) + * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments + * preceding the calls to smp_rmb() in try_to_wake_up() for similar + * snippets but "projected" onto two CPUs. + * + * Property (2) upgrades the lock to an RCsc lock. * * Since most load-store architectures implement ACQUIRE with an smp_mb() after * the LL/SC loop, they need no further barriers. Similarly all our TSO diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78d8facba456..7db0662360f1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * be possible to, falsely, observe p->on_rq == 0 and get stuck * in smp_cond_load_acquire() below. * -* sched_ttwu_pending() try_to_wake_up() -* [S] p->on_rq = 1; [L] P->state -* UNLOCK rq->lock -. -* \ -* +--- RMB -* schedule() / -* LOCK rq->lock-' -* UNLOCK rq->lock +* sched_ttwu_pending() try_to_wake_up() +* STORE p->on_rq = 1 LOAD p->state +* UNLOCK rq->lock +* +* __schedule() (switch to task 'p') +* LOCK rq->locksmp_rmb(); +* smp_mb__after_spinlock(); +* UNLOCK rq->lock * * [task p] -* [S] p->state = UNINTERRUPTIBLE [L]
[PATCH memory-model 01/14] tools/memory-model: Add litmus test for full multicopy atomicity
This commit adds a litmus test suggested by Alan Stern that is forbidden on fully multicopy atomic systems, but allowed on other-multicopy and on non-multicopy atomic systems. For reference, s390 is fully multicopy atomic, x86 and ARMv8 are other-multicopy atomic, and ARMv7 and powerpc are non-multicopy atomic. Suggested-by: Alan Stern Signed-off-by: Paul E. McKenney Acked-by: Alan Stern Acked-by: Andrea Parri --- tools/memory-model/litmus-tests/README| 9 ++ .../SB+rfionceonce-poonceonces.litmus | 32 +++ 2 files changed, 41 insertions(+) create mode 100644 tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus diff --git a/tools/memory-model/litmus-tests/README b/tools/memory-model/litmus-tests/README index 17eb9a8c222d..00140aaf58b7 100644 --- a/tools/memory-model/litmus-tests/README +++ b/tools/memory-model/litmus-tests/README @@ -111,6 +111,15 @@ SB+mbonceonces.litmus SB+poonceonces.litmus As above, but without the smp_mb() invocations. +SB+rfionceonce-poonceonces.litmus + This litmus test demonstrates that LKMM is not fully multicopy + atomic. (Neither is it other multicopy atomic.) This litmus test + also demonstrates the "locations" debugging aid, which designates + additional registers and locations to be printed out in the dump + of final states in the herd7 output. Without the "locations" + statement, only those registers and locations mentioned in the + "exists" clause will be printed. + S+poonceonces.litmus As below, but without the smp_wmb() and acquire load. diff --git a/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus new file mode 100644 index ..04a16603660b --- /dev/null +++ b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus @@ -0,0 +1,32 @@ +C SB+rfionceonce-poonceonces + +(* + * Result: Sometimes + * + * This litmus test demonstrates that LKMM is not fully multicopy atomic. + *) + +{} + +P0(int *x, int *y) +{ + int r1; + int r2; + + WRITE_ONCE(*x, 1); + r1 = READ_ONCE(*x); + r2 = READ_ONCE(*y); +} + +P1(int *x, int *y) +{ + int r3; + int r4; + + WRITE_ONCE(*y, 1); + r3 = READ_ONCE(*y); + r4 = READ_ONCE(*x); +} + +locations [0:r1; 1:r3; x; y] (* Debug aid: Print things not in "exists". *) +exists (0:r2=0 /\ 1:r4=0) -- 2.17.1
[PATCH memory-model 04/14] locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
From: SeongJae Park Translate this commit to Korean: 5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering example") Signed-off-by: SeongJae Park Signed-off-by: Paul E. McKenney [ paulmck: Updated based on feedback from Byungchul Park. ] Acked-by: Byungchul Park --- .../translations/ko_KR/memory-barriers.txt| 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index 921739d00f69..7f01fb1c1084 100644 --- a/Documentation/translations/ko_KR/memory-barriers.txt +++ b/Documentation/translations/ko_KR/memory-barriers.txt @@ -1891,22 +1891,22 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효 /* 소유권을 수정 */ desc->status = DEVICE_OWN; - /* MMIO 를 통해 디바이스에 공지를 하기 전에 메모리를 동기화 */ - wmb(); - /* 업데이트된 디스크립터의 디바이스에 공지 */ writel(DESC_NOTIFY, doorbell); } dma_rmb() 는 디스크립터로부터 데이터를 읽어오기 전에 디바이스가 소유권을 - 내놓았음을 보장하게 하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시 - 가졌음을 보기 전에 디스크립터에 데이터가 쓰였음을 보장합니다. wmb() 는 - 캐시 일관성이 없는 (cache incoherent) MMIO 영역에 쓰기를 시도하기 전에 - 캐시 일관성이 있는 메모리 (cache coherent memory) 쓰기가 완료되었음을 - 보장해주기 위해 필요합니다. - - consistent memory 에 대한 자세한 내용을 위해선 Documentation/DMA-API.txt - 문서를 참고하세요. + 내려놓았을 것을 보장하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시 + 가졌음을 보기 전에 디스크립터에 데이터가 쓰였을 것을 보장합니다. 참고로, + writel() 을 사용하면 캐시 일관성이 있는 메모리 (cache coherent memory) + 쓰기가 MMIO 영역에의 쓰기 전에 완료되었을 것을 보장하므로 writel() 앞에 + wmb() 를 실행할 필요가 없음을 알아두시기 바랍니다. writel() 보다 비용이 + 저렴한 writel_relaxed() 는 이런 보장을 제공하지 않으므로 여기선 사용되지 + 않아야 합니다. + + writel_relaxed() 와 같은 완화된 I/O 접근자들에 대한 자세한 내용을 위해서는 + "커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을 + 위해선 Documentation/DMA-API.txt 문서를 참고하세요. MMIO 쓰기 배리어 -- 2.17.1
[PATCH memory-model 13/14] memory-model/Documentation: Fix typo, smb->smp
From: Yauheni Kaliuta The tools/memory-model/Documentation/explanation.txt file says "For each other CPU C', smb_wmb() forces all po-earlier stores" This commit therefore replaces the "smb_wmb()" with "smp_wmb()". Signed-off-by: Yauheni Kaliuta Signed-off-by: Paul E. McKenney Acked-by: Alan Stern --- tools/memory-model/Documentation/explanation.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt index 1b09f3175a1f..0cbd1ef8f86d 100644 --- a/tools/memory-model/Documentation/explanation.txt +++ b/tools/memory-model/Documentation/explanation.txt @@ -804,7 +804,7 @@ type of fence: Second, some types of fence affect the way the memory subsystem propagates stores. When a fence instruction is executed on CPU C: - For each other CPU C', smb_wmb() forces all po-earlier stores + For each other CPU C', smp_wmb() forces all po-earlier stores on C to propagate to C' before any po-later stores do. For each other CPU C', any store which propagates to C before -- 2.17.1
[PATCH memory-model 10/14] sched: Use smp_mb() in wake_woken_function()
From: Andrea Parri wake_woken_function() synchronizes with wait_woken() as follows: [wait_woken] [wake_woken_function] entry->flags &= ~wq_flag_woken;condition = true; smp_mb(); smp_wmb(); if (condition) wq_entry->flags |= wq_flag_woken; break; This commit replaces the above smp_wmb() with an smp_mb() in order to guarantee that either wait_woken() sees the wait condition being true or the store to wq_entry->flags in woken_wake_function() follows the store in wait_woken() in the coherence order (so that the former can eventually be observed by wait_woken()). The commit also fixes a comment associated to set_current_state() in wait_woken(): the comment pairs the barrier in set_current_state() to the above smp_wmb(), while the actual pairing involves the barrier in set_current_state() and the barrier executed by the try_to_wake_up() in wake_woken_function(). Signed-off-by: Andrea Parri Cc: Ingo Molnar Acked-by: Peter Zijlstra (Intel) Signed-off-by: Paul E. McKenney --- kernel/sched/wait.c | 47 - 1 file changed, 21 insertions(+), 26 deletions(-) diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 928be527477e..a7a2aaa3026a 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -392,35 +392,36 @@ static inline bool is_kthread_should_stop(void) * if (condition) * break; * - * p->state = mode;condition = true; - * smp_mb(); // A smp_wmb(); // C - * if (!wq_entry->flags & WQ_FLAG_WOKEN) wq_entry->flags |= WQ_FLAG_WOKEN; - * schedule() try_to_wake_up(); - * p->state = TASK_RUNNING;~~ - * wq_entry->flags &= ~WQ_FLAG_WOKEN; condition = true; - * smp_mb() // B smp_wmb(); // C - * wq_entry->flags |= WQ_FLAG_WOKEN; - * } - * remove_wait_queue(_head, ); + * // in wait_woken() // in woken_wake_function() * + * p->state = mode;wq_entry->flags |= WQ_FLAG_WOKEN; + * smp_mb(); // A try_to_wake_up(): + * if (!(wq_entry->flags & WQ_FLAG_WOKEN)) + * schedule() if (p->state & mode) + * p->state = TASK_RUNNING; p->state = TASK_RUNNING; + * wq_entry->flags &= ~WQ_FLAG_WOKEN; ~~ + * smp_mb(); // B condition = true; + * } smp_mb(); // C + * remove_wait_queue(_head, ); wq_entry->flags |= WQ_FLAG_WOKEN; */ long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout) { - set_current_state(mode); /* A */ /* -* The above implies an smp_mb(), which matches with the smp_wmb() from -* woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must -* also observe all state before the wakeup. +* The below executes an smp_mb(), which matches with the full barrier +* executed by the try_to_wake_up() in woken_wake_function() such that +* either we see the store to wq_entry->flags in woken_wake_function() +* or woken_wake_function() sees our store to current->state. */ + set_current_state(mode); /* A */ if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop()) timeout = schedule_timeout(timeout); __set_current_state(TASK_RUNNING); /* -* The below implies an smp_mb(), it too pairs with the smp_wmb() from -* woken_wake_function() such that we must either observe the wait -* condition being true _OR_ WQ_FLAG_WOKEN such that we will not miss -* an event. +* The below executes an smp_mb(), which matches with the smp_mb() (C) +* in woken_wake_function() such that either we see the wait condition +* being true or the store to wq_entry->flags in woken_wake_function() +* follows ours in the coherence order. */ smp_store_mb(wq_entry->flags, wq_entry->flags & ~WQ_FLAG_WOKEN); /* B */ @@ -430,14 +431,8 @@ EXPORT_SYMBOL(wait_woken); int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key) { - /* -* Although this function is called under waitqueue lock, LOCK -* doesn't imply write barrier and the users expects write -* barrier semantics on wakeup functions. The following -* smp_wmb() is equivalent to smp_wmb() in try_to_wake_up() -* and is paired with smp_store_mb() in wait_woken(). -*/ - smp_wmb(); /* C */ + /* Pairs with the smp_store_mb() in wait_woken(). */ + smp_mb(); /* C */ wq_entry->flags |= WQ_FLAG_WOKEN;
Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On 7/16/2018 10:44 PM, John Stultz wrote: On Mon, Jul 16, 2018 at 9:30 AM, John Stultz wrote: On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha wrote: On 7/13/2018 10:50 PM, John Stultz wrote: On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha On 7/11/2018 1:43 AM, John Stultz wrote: I worry this upside-down logic is too subtle to be easily reasoned about, and will just lead to future mistakes. Can we instead call this "suspend_timing_needed" and only set it to true when we don't inject any sleep time on resume? I did not get your point "only set it to true when we don't inject any sleep time on resume? " How do we know this ? This question itself depends on the "sleeptime_injected" if it is true means no need to inject else need to inject. Also, we need to make this variable back and forth true, false; suspends path ensures it to make it false. So yea, I'm not saying logically the code is really any different, this is more of a naming nit. So instead of having a variable that is always on that we occasionally turn off, lets invert the naming and have it be a flag that we occasionally turn on. I understand your concern about the name of the variable will be misleading. But the changing Boolean state would not solve the actual issue. If i understand you correctly you meant below code diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32ae9ae..becc5bd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts) * If a suspend fails before reaching timekeeping_resume() then the flag * stays true and prevents erroneous sleeptime injection. */ -static bool sleeptime_injected = true; +static bool suspend_timing_needed; /* Flag for if there is a persistent clock on this platform */ static bool persistent_clock_exists; @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 *delta) raw_spin_lock_irqsave(_lock, flags); write_seqcount_begin(_core.seq); - sleeptime_injected = true; + suspend_timing_needed = false; timekeeping_forward_now(tk); @@ -1714,10 +1714,10 @@ void timekeeping_resume(void) tk->tkr_mono.mask); nsec = mul_u64_u32_shr(cyc_delta, clock->mult, clock->shift); ts_delta = ns_to_timespec64(nsec); - sleeptime_injected = true; + suspend_timing_needed = true; } else if (timespec64_compare(_new, _suspend_time) > 0) { ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time); - sleeptime_injected = true; + suspend_timing_needed = true; } No no... This part is wrong. We only set suspend_timing_needed if we *didn't* calculate the suspend time in timekeeping_resume. You have to invert all the boolean logic for it to be equivalent. ... So, I think with the logic bug above it will work out properly, but let me know if I'm still missing something. Please give it thought to a case where very first suspend fails with your logic. If i am not able to get your thought, please write a patch. -Mukesh Sorry, I meant "with the logic bug above fixed it will work out". thanks -john
Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On Mon, Jul 16, 2018 at 11:30 AM, Mukesh Ojha wrote: > > > On 7/16/2018 10:44 PM, John Stultz wrote: >> >> On Mon, Jul 16, 2018 at 9:30 AM, John Stultz >> wrote: >>> >>> On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha >>> wrote: On 7/13/2018 10:50 PM, John Stultz wrote: > > On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha >> >> On 7/11/2018 1:43 AM, John Stultz wrote: >>> >>> I worry this upside-down logic is too subtle to be easily reasoned >>> about, and will just lead to future mistakes. >>> >>> Can we instead call this "suspend_timing_needed" and only set it to >>> true when we don't inject any sleep time on resume? >> >> >> I did not get your point "only set it to true when we don't inject any >> sleep >> time on resume? " >> How do we know this ? >> This question itself depends on the "sleeptime_injected" if it is true >> means >> no need to inject else need to inject. >> >> Also, we need to make this variable back and forth true, false; >> suspends >> path ensures it to make it false. > > So yea, I'm not saying logically the code is really any different, > this is more of a naming nit. So instead of having a variable that is > always on that we occasionally turn off, lets invert the naming and > have it be a flag that we occasionally turn on. I understand your concern about the name of the variable will be misleading. But the changing Boolean state would not solve the actual issue. If i understand you correctly you meant below code diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32ae9ae..becc5bd 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts) * If a suspend fails before reaching timekeeping_resume() then the flag * stays true and prevents erroneous sleeptime injection. */ -static bool sleeptime_injected = true; +static bool suspend_timing_needed; /* Flag for if there is a persistent clock on this platform */ static bool persistent_clock_exists; @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64 *delta) raw_spin_lock_irqsave(_lock, flags); write_seqcount_begin(_core.seq); - sleeptime_injected = true; + suspend_timing_needed = false; timekeeping_forward_now(tk); @@ -1714,10 +1714,10 @@ void timekeeping_resume(void) tk->tkr_mono.mask); nsec = mul_u64_u32_shr(cyc_delta, clock->mult, clock->shift); ts_delta = ns_to_timespec64(nsec); - sleeptime_injected = true; + suspend_timing_needed = true; } else if (timespec64_compare(_new, _suspend_time) > 0) { ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time); - sleeptime_injected = true; + suspend_timing_needed = true; } >>> >>> No no... This part is wrong. We only set suspend_timing_needed if we >>> *didn't* calculate the suspend time in timekeeping_resume. >>> >>> You have to invert all the boolean logic for it to be equivalent. >>> >> ... >>> >>> >>> So, I think with the logic bug above it will work out properly, but >>> let me know if I'm still missing something. > > > Please give it thought to a case where very first suspend fails with your > logic. I believe I did. If the first suspend fails, we never reach timekeeping_resume, so we never set "suspend_time_needed = true", so then timekeeping_rtc_skipresume can then return true, and we don't inject the time in the RTC code. > If i am not able to get your thought, please write a patch. I probably will, but I'd like to encourage you to follow through on this one. You reported the issue, and submitted a few patches, so I think it would be good for you to also get the patch credit here. I don't believe its a complex request I've made, and I think you can figure it out. So, please, take one more real stab at this and I'll rework it if it seems necessary. thanks -john
[PATCH 4/7] x86,tlb: make lazy TLB mode lazier
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set are not removed from the mm_cpumask(mm), since that would prevent the TLB flush IPIs at page table free time from being sent to all the CPUs that need them. This patch reduces total CPU use in the system by about 1-2% for a memcache workload on two socket systems, and by about 1% for a heavily multi-process netperf between two systems. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 68 +++ 1 file changed, 59 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 4b73fe835c95..26542cc17043 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -185,6 +186,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, { struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm); u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy); unsigned cpu = smp_processor_id(); u64 next_tlb_gen; bool need_flush; @@ -242,17 +244,40 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, next->context.ctx_id); /* -* We don't currently support having a real mm loaded without -* our cpu set in mm_cpumask(). We have all the bookkeeping -* in place to figure out whether we would need to flush -* if our cpu were cleared in mm_cpumask(), but we don't -* currently use it. +* Even in lazy TLB mode, the CPU should stay set in the +* mm_cpumask. The TLB shootdown code can figure out from +* from cpu_tlbstate.is_lazy whether or not to send an IPI. */ if (WARN_ON_ONCE(real_prev != _mm && !cpumask_test_cpu(cpu, mm_cpumask(next cpumask_set_cpu(cpu, mm_cpumask(next)); - return; + /* +* If the CPU is not in lazy TLB mode, we are just switching +* from one thread in a process to another thread in the same +* process. No TLB flush required. +*/ + if (!was_lazy) + return; + + /* +* Read the tlb_gen to check whether a flush is needed. +* If the TLB is up to date, just use it. +* The barrier synchronizes with the tlb_gen increment in +* the TLB shootdown code. +*/ + smp_mb(); + next_tlb_gen = atomic64_read(>context.tlb_gen); + if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == + next_tlb_gen) + return; + + /* +* TLB contents went out of date while we were in lazy +* mode. Fall through to the TLB switching code below. +*/ + new_asid = prev_asid; + need_flush = true; } else { u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id); @@ -454,6 +479,9 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f, * paging-structure cache to avoid speculatively reading * garbage into our TLB. Since switching to init_mm is barely * slower than a minimal flush, just switch to init_mm. +* +* This should be rare, with native_flush_tlb_others skipping +* IPIs to lazy TLB mode CPUs. */ switch_mm_irqs_off(NULL, _mm, NULL); return; @@ -560,6 +588,9 @@ static void flush_tlb_func_remote(void *info) void native_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info) { + cpumask_var_t lazymask; + unsigned int cpu; + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); if (info->end == TLB_FLUSH_ALL) trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL); @@ -583,8 +614,6 @@ void native_flush_tlb_others(const struct
[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs
CPUs in !is_lazy have either received TLB flush IPIs earlier on during the munmap (when the user memory was unmapped), or have context switched and reloaded during that stage of the munmap. Page table free TLB flushes only need to be sent to CPUs in lazy TLB mode, which TLB contents might not yet be up to date yet. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/mm/tlb.c | 43 +++ 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 26542cc17043..e4156e37aa71 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -712,15 +712,50 @@ void tlb_flush_remove_tables_local(void *arg) } } +static void mm_fill_lazy_tlb_cpu_mask(struct mm_struct *mm, + struct cpumask *lazy_cpus) +{ + int cpu; + + for_each_cpu(cpu, mm_cpumask(mm)) { + if (!per_cpu(cpu_tlbstate.is_lazy, cpu)) + cpumask_set_cpu(cpu, lazy_cpus); + } +} + void tlb_flush_remove_tables(struct mm_struct *mm) { int cpu = get_cpu(); + cpumask_var_t lazy_cpus; + + if (cpumask_any_but(mm_cpumask(mm), cpu) >= nr_cpu_ids) { + put_cpu(); + return; + } + + if (!zalloc_cpumask_var(_cpus, GFP_ATOMIC)) { + /* +* If the cpumask allocation fails, do a brute force flush +* on all the CPUs that have this mm loaded. +*/ + smp_call_function_many(mm_cpumask(mm), + tlb_flush_remove_tables_local, (void *)mm, 1); + put_cpu(); + return; + } + /* -* XXX: this really only needs to be called for CPUs in lazy TLB mode. +* CPUs with !is_lazy either received a TLB flush IPI while the user +* pages in this address range were unmapped, or have context switched +* and reloaded %CR3 since then. +* +* Shootdown IPIs at page table freeing time only need to be sent to +* CPUs that may have out of date TLB contents. */ - if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) - smp_call_function_many(mm_cpumask(mm), tlb_flush_remove_tables_local, (void *)mm, 1); - + mm_fill_lazy_tlb_cpu_mask(mm, lazy_cpus); + smp_call_function_many(lazy_cpus, + tlb_flush_remove_tables_local, (void *)mm, 1); + free_cpumask_var(lazy_cpus); put_cpu(); } -- 2.14.4
[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm
Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels,using 1.8% of a 48 CPU system during a netperf to localhost run. Digging into the profile, we noticed that cpumask_clear_cpu and cpumask_set_cpu together take about half of the CPU time taken by switch_mm_irqs_off. However, the CPUs running netperf end up switching back and forth between netperf and the idle task, which does not require changes to the mm_cpumask. Furthermore, the init_mm cpumask ends up being the most heavily contended one in the system. Simply skipping changes to mm_cpumask(_mm) reduces overhead. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Reported-and-tested-by: Song Liu --- arch/x86/mm/tlb.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 493559cae2d5..f086195f644c 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -310,15 +310,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, sync_current_stack_to_mm(next); } - /* Stop remote flushes for the previous mm */ - VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, mm_cpumask(real_prev)) && - real_prev != _mm); - cpumask_clear_cpu(cpu, mm_cpumask(real_prev)); + /* +* Stop remote flushes for the previous mm. +* Skip kernel threads; we never send init_mm TLB flushing IPIs, +* but the bitmap manipulation can cause cache line contention. +*/ + if (real_prev != _mm) { + VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, + mm_cpumask(real_prev))); + cpumask_clear_cpu(cpu, mm_cpumask(real_prev)); + } /* * Start remote flushes and then read tlb_gen. */ - cpumask_set_cpu(cpu, mm_cpumask(next)); + if (next != _mm) + cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(>context.tlb_gen); choose_new_asid(next, next_tlb_gen, _asid, _flush); -- 2.14.4
[PATCH 6/7] x86,mm: always use lazy TLB mode
Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except at page table freeing time, and idle CPUs will no longer get shootdown IPIs for things like mprotect and madvise, we can always use lazy TLB mode. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Tested-by: Song Liu --- arch/x86/include/asm/tlbflush.h | 16 arch/x86/mm/tlb.c | 15 +-- 2 files changed, 1 insertion(+), 30 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 3aa3204b5dc0..511bf5fae8b8 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -148,22 +148,6 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) #define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr) #endif -static inline bool tlb_defer_switch_to_init_mm(void) -{ - /* -* If we have PCID, then switching to init_mm is reasonably -* fast. If we don't have PCID, then switching to init_mm is -* quite slow, so we try to defer it in the hopes that we can -* avoid it entirely. The latter approach runs the risk of -* receiving otherwise unnecessary IPIs. -* -* This choice is just a heuristic. The tlb code can handle this -* function returning true or false regardless of whether we have -* PCID. -*/ - return !static_cpu_has(X86_FEATURE_PCID); -} - struct tlb_context { u64 ctx_id; u64 tlb_gen; diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e4156e37aa71..493559cae2d5 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -379,20 +379,7 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) if (this_cpu_read(cpu_tlbstate.loaded_mm) == _mm) return; - if (tlb_defer_switch_to_init_mm()) { - /* -* There's a significant optimization that may be possible -* here. We have accurate enough TLB flush tracking that we -* don't need to maintain coherence of TLB per se when we're -* lazy. We do, however, need to maintain coherence of -* paging-structure caches. We could, in principle, leave our -* old mm loaded and only switch to init_mm when -* tlb_remove_page() happens. -*/ - this_cpu_write(cpu_tlbstate.is_lazy, true); - } else { - switch_mm(NULL, _mm, NULL); - } + this_cpu_write(cpu_tlbstate.is_lazy, true); } /* -- 2.14.4
[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. Signed-off-by: Rik van Riel Acked-by: Dave Hansen Suggested-by: Andy Lutomirski --- arch/x86/mm/tlb.c | 60 +++ 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 9a893673c56b..4b73fe835c95 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -187,6 +187,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); unsigned cpu = smp_processor_id(); u64 next_tlb_gen; + bool need_flush; + u16 new_asid; /* * NB: The scheduler will call us with prev == next when switching @@ -252,8 +254,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, return; } else { - u16 new_asid; - bool need_flush; u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id); /* @@ -297,41 +297,41 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, next_tlb_gen = atomic64_read(>context.tlb_gen); choose_new_asid(next, next_tlb_gen, _asid, _flush); + } - if (need_flush) { - this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); - this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); - load_new_mm_cr3(next->pgd, new_asid, true); - - /* -* NB: This gets called via leave_mm() in the idle path -* where RCU functions differently. Tracing normally -* uses RCU, so we need to use the _rcuidle variant. -* -* (There is no good reason for this. The idle code should -* be rearranged to call this before rcu_idle_enter().) -*/ - trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); - } else { - /* The new ASID is already up to date. */ - load_new_mm_cr3(next->pgd, new_asid, false); - - /* See above wrt _rcuidle. */ - trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); - } + if (need_flush) { + this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); + load_new_mm_cr3(next->pgd, new_asid, true); /* -* Record last user mm's context id, so we can avoid -* flushing branch buffer with IBPB if we switch back -* to the same user. +* NB: This gets called via leave_mm() in the idle path +* where RCU functions differently. Tracing normally +* uses RCU, so we need to use the _rcuidle variant. +* +* (There is no good reason for this. The idle code should +* be rearranged to call this before rcu_idle_enter().) */ - if (next != _mm) - this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); + trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); + } else { + /* The new ASID is already up to date. */ + load_new_mm_cr3(next->pgd, new_asid, false); - this_cpu_write(cpu_tlbstate.loaded_mm, next); - this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid); + /* See above wrt _rcuidle. */ + trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); } + /* +* Record last user mm's context id, so we can avoid +* flushing branch buffer with IBPB if we switch back +* to the same user. +*/ + if (next != _mm) + this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); + + this_cpu_write(cpu_tlbstate.loaded_mm, next); + this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid); + load_mm_cr4(next); switch_ldt(real_prev, next); } -- 2.14.4
Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks
Linus Torvalds writes: > On Mon, Jul 16, 2018 at 8:08 AM Eric W. Biederman > wrote: >> >> The change for global init is it will now die if init is a member of the >> session or init is using this tty as it's controlling tty. >> >> Semantically killing init with SAK is completely appropriate. > > No. > > Semtnaitcally killing init is completely wrong. Because it will kill > the whole system. > > And I don't mean that in "now init won't spawn new things". I mean > that in "now we don't have a child reaper any more, and the system > will be dead because we'll panic on exit". > > So it's not about the controlling tty, it's about fundamental kernel > internal consistency guarantees. > > See > > write_unlock_irq(_lock); > if (unlikely(pid_ns == _pid_ns)) { > panic("Attempted to kill init! exitcode=0x%08x\n", > father->signal->group_exit_code ?: father->exit_code); > } > > in kernel/exit.c. I should have said it doesn't matter because init does not open ttys and become a member of session groups. Or at least it never has in my experience. The only way I know to get that behavior is to boot with init=/bin/bash. With the force_sig already in do_SAK today my change is not a regression. As force_sig in a completely different way forces the signal to init. Looking deeper, all of the silliness with SEND_SIG_FORCED and force_sig_info is to guarantee delivery of synchronous exceptions even to init. So I think we want the patch below to clean that up. Then we don't have to worry about the wrong things sending signals to init by accident, and SEND_SIG_FORCED becomes just SEND_SIG_PRIV that skips the unnecesary allocation of a siginfo struct. Thoughts? Eric From: "Eric W. Biederman" Date: Mon, 16 Jul 2018 13:29:04 -0500 Subject: [PATCH] signal: Cleanup delivery of exceptions to init - Stop clearing SIGNAL_UNKILLABLE. It makes interaction by the process with other signals problematic, and exceptions are not necessarily fatal. - Don't allow SIGKILL and SIGSTOP to the global init. It never helps and it it can only make things worse. - Explicitly allow exceptions to any kind of init. They are exceptions and synchronous and need to be handled somehow. Init can setup a handler or deal with the default action. This is not a change it is just code movement from force_sig_info into send_signal and get_signal. - Treat all signals from the kernel as if they are from an ancestor pid namespace. - Take out the overrides of SIGNAL_UNKILLABLE from force_sig_info The changes to send_signal and get_signal make them unnecessary. - Take out the SEND_SIG_FORCED overrides from prepare_signal. The changes to send_signal makes it redundant. - Rename force back to from_ancestor_ns as that makes the logic with respect to namespaces clearer and logically if the kernel is sending you a signal it is from your ancestor namespace. Signed-off-by: "Eric W. Biederman" --- kernel/signal.c | 41 - 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 94296afacf44..298f5c690681 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -72,20 +72,21 @@ static int sig_handler_ignored(void __user *handler, int sig) (handler == SIG_DFL && sig_kernel_ignore(sig)); } -static int sig_task_ignored(struct task_struct *t, int sig, bool force) +static int sig_task_ignored(struct task_struct *t, int sig, bool from_ancestor_ns) { void __user *handler; handler = sig_handler(t, sig); if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) && - handler == SIG_DFL && !(force && sig_kernel_only(sig))) + handler == SIG_DFL && + (is_global_init(t) || !(from_ancestor_ns && sig_kernel_only(sig return 1; return sig_handler_ignored(handler, sig); } -static int sig_ignored(struct task_struct *t, int sig, bool force) +static int sig_ignored(struct task_struct *t, int sig, bool from_ancestor_ns) { /* * Blocked signals are never ignored, since the @@ -103,7 +104,7 @@ static int sig_ignored(struct task_struct *t, int sig, bool force) if (t->ptrace && sig != SIGKILL) return 0; - return sig_task_ignored(t, sig, force); + return sig_task_ignored(t, sig, from_ancestor_ns); } /* @@ -809,7 +810,7 @@ static void ptrace_trap_notify(struct task_struct *t) * Returns true if the signal should be actually delivered, otherwise * it should be dropped. */ -static bool prepare_signal(int sig, struct task_struct *p, bool force) +static bool prepare_signal(int sig, struct task_struct *p, bool from_ancestor_ns) { struct signal_struct *signal = p->signal; struct task_struct *t; @@ -871,7 +872,7 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) } } - return
[RFC] kvm: Adding skelaton for Memory ROE
This is my first patch, an attempt to implement Memory ROE discussed by me earlier as a way to prevent Rootkits. I have already explained in details in this thread: https://www.mail-archive.com/kernelnewbies@kernelnewbies.org/msg18826.html So I think there is no need for saying the exact same thing again. The problem is that the code isn't working and I can't figure out why I tried implementing the protection to follow similar behavior to that of KVM_MEM_READONLY but to be on page (SPTE) level The current problem I am facing is that when handling the hypercall vcpu->mode turns to be OUTSIDE_GUEST_MODE but KVM_REQ_TLB_FLUSH doesn't seem to be handled correctly. KVM documentation promised that when VCPU is not in GUEST_MODE VCPU are handled asap and kvm_vcpu_kick(vcpu); will even force that, but it doesn't seem to be the case for me. This is the kind of logging I am getting: [3556.312299] kvm_mmu_slot_apply_flags: visited [3556.312301] kvm_mmu_slot_apply_write_access: Flush = false [3557.034243] gfn_is_readonly: test_bit = 0 [3557.034251] gfn_is_readonly: test_bit = 0 [3557.034254] gfn_is_readonly: test_bit = 0 [3557.034463] Hypercall received, page address 0x0 [3557.034466] gfn_is_readonly: test_bit = 0 [3557.034469] kvm_mroe: flush state = Done [3557.034472] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [3557.034475] Setting page number 0 in slot number 0 [3557.034480] slot_rmap_apply_protection: The 0th page is readonly, Flush = True [3557.034483] kvm_mmu_slot_apply_write_access: Flush = true [3557.034486] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [3557.034488] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE [3557.034490] kvm_mroe: flush state = Waiting For some reason kvm_vcpu_kick() didn't force the KVM_REQ_TLB_FLUSH to kick into the virtual cpu (I am talking about the last 2 lines). I am aware that there is still alot missing (like dealing with malicious guest remappings) and the code quality sucks, but any ideas about what I could be doing wrong (or ideas in general) would be apprciated. I am already planning to do everything cleanly once it works. Thansk. Signed-off-by: Ahmed Abd El Mawgood --- arch/x86/include/asm/kvm_host.h | 7 ++- arch/x86/kvm/Kconfig| 7 +++ arch/x86/kvm/mmu.c | 127 +++- arch/x86/kvm/x86.c | 83 -- include/linux/kvm_host.h| 17 ++ include/uapi/linux/kvm_para.h | 4 +- virt/kvm/kvm_main.c | 36 +--- 7 files changed, 226 insertions(+), 55 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c13cd28d9d1b..c66e9245f750 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -235,7 +235,10 @@ struct kvm_mmu_memory_cache { int nobjs; void *objects[KVM_NR_MEM_OBJS]; }; - +struct kvm_write_access_data { + int i; + struct kvm_memory_slot *memslot; +}; /* * the pages used as guest page table on soft mmu are tracked by * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used @@ -1130,7 +1133,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 acc_track_mask, u64 me_mask); void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, +void kvm_mmu_slot_apply_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 92fd433c50b9..8ae822a8dc7a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +config KVM_MROE + bool "Hypercall Memory Read-Only Enforcement" + depends on KVM && X86 + help + This option add KVM_HC_HMROE hypercall to kvm which as hardening + mechanism to protect memory pages from being edited. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/vhost/Kconfig diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d594690d8b95..946545b8b8cb 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -70,7 +70,7 @@ enum { #undef MMU_DEBUG #ifdef MMU_DEBUG -static bool dbg = 0; +static bool dbg = 1; module_param(dbg, bool, 0644); #define pgprintk(x...) do { if (dbg) printk(x); } while (0) @@ -1402,7 +1402,6 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep) static bool spte_write_protect(u64 *sptep, bool pt_protect) { u64 spte = *sptep; - if (!is_writable_pte(spte) && !(pt_protect && spte_can_locklessly_be_made_writable(spte))) return false; @@ -1418,15 +1417,23
Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks
On Mon, Jul 16, 2018 at 12:17 PM Eric W. Biederman wrote: > > I should have said it doesn't matter because init does not open ttys and > become a member of session groups. Or at least it never has in my > experience. The only way I know to get that behavior is to boot with > init=/bin/bash. That's hopefully true, yes. Presumably init does open the console, but hopefull doesn't do setsid. (We *do* do "setsid()" for the linuxrc running, but that's not done by the init thread itself). > With the force_sig already in do_SAK today my change is not a > regression. As force_sig in a completely different way forces the > signal to init. Ok. A couple of notes in the commit description on this might be good. > So I think we want the patch below to clean that up. Then we don't have > to worry about the wrong things sending signals to init by accident, and > SEND_SIG_FORCED becomes just SEND_SIG_PRIV that skips the unnecesary > allocation of a siginfo struct. > > Thoughts? I think the end result is fine, but then I look at that patch of yours and it does many other things and that makes me nervous. Can you separate out the different things it does into separate patches to make it easier to read? Linus
Re: [PATCH] mmc: tegra: Force correct divider calculation on DDR50/52
On 16/07/18 15:34, Aapo Vienamo wrote: > Tegra SDHCI controllers require the SDHCI clock divider to be configured > to divide the clock by two in DDR50/52 modes. Incorrectly configured > clock divider results in corrupted data. > > Prevent the possibility of incorrectly calculating the divider value due > to clock rate rounding or low parent clock frequency by not assigning > host->max_clk to clk_get_rate() on tegra_sdhci_set_clock(). > > See the comments for further details. > > Fixes: a8e326a ("mmc: tegra: implement module external clock change") > Signed-off-by: Aapo Vienamo > --- > drivers/mmc/host/sdhci-tegra.c | 17 - > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c > index ddf00166..908b23e 100644 > --- a/drivers/mmc/host/sdhci-tegra.c > +++ b/drivers/mmc/host/sdhci-tegra.c > @@ -210,9 +210,24 @@ static void tegra_sdhci_set_clock(struct sdhci_host > *host, unsigned int clock) > if (!clock) > return sdhci_set_clock(host, clock); > > + /* > + * In DDR50/52 modes the Tegra SDHCI controllers require the SDHCI > + * divider to be configured to divided the host clock by two. The SDHC > + * clock divider is calculated as part of sdhci_set_clock() by > + * sdhci_calc_clk(). The divider is calculated from host->max_clk and > + * the requested clock rate. > + * > + * By setting the host->max_clk to clock * 2 the divider calculation > + * will always result in the correct value for DDR50/52 modes, > + * regardless of clock rate rounding, which may happen if the value > + * from clk_get_rate() is used. > + */ > host_clk = tegra_host->ddr_signaling ? clock * 2 : clock; > clk_set_rate(pltfm_host->clk, host_clk); > - host->max_clk = clk_get_rate(pltfm_host->clk); > + if (tegra_host->ddr_signaling) > + host->max_clk = host_clk; > + else > + host->max_clk = clk_get_rate(pltfm_host->clk); > > sdhci_set_clock(host, clock); > I see what you are saying, but should we be concerned that we are not getting the rate we are requesting in the first place? Maybe it would help if you could provide a specific example showing a case where we request rate X and get Y, and then this leads to a bad rate Z later. Cheers Jon -- nvpublic
Re: [PATCH] IPoIB: use kvzalloc to allocate an array of bucket pointers
On Mon, 9 Jul 2018 16:51:03 +0300 Jan Dakinevich wrote: > This table by default takes 32KiB which is 3rd memory order. > Meanwhile, this memory is not aimed for DMA operation and could be > safely allocated by vmalloc. > > Signed-off-by: Jan Dakinevich > --- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c > b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 26cde95..cb752df > 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1526,7 +1526,7 @@ static int ipoib_neigh_hash_init(struct > ipoib_dev_priv *priv) return -ENOMEM; > set_bit(IPOIB_STOP_NEIGH_GC, >flags); > size = roundup_pow_of_two(arp_tbl.gc_thresh3); > - buckets = kcalloc(size, sizeof(*buckets), GFP_KERNEL); > + buckets = kvcalloc(size, sizeof(*buckets), GFP_KERNEL); > if (!buckets) { > kfree(htbl); > return -ENOMEM; > @@ -1554,7 +1554,7 @@ static void neigh_hash_free_rcu(struct rcu_head > *head) struct ipoib_neigh __rcu **buckets = htbl->buckets; > struct ipoib_neigh_table *ntbl = htbl->ntbl; > > - kfree(buckets); > + kvfree(buckets); > kfree(htbl); > complete(>deleted); > } ping -- Best regards Jan Dakinevich
Re: [PATCH 4.17 00/67] 4.17.7-stable review
On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.17.7 release. > There are 67 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed Jul 18 07:34:11 UTC 2018. > Anything received after that time might be too late. > Build results: total: 134 pass: 134 fail: 0 Qemu test results: total: 171 pass: 159 fail: 12 Failed tests: i386:Broadwell:q35:defconfig:smp:rootfs i386:Skylake-Client:q35:defconfig:smp:rootfs i386:SandyBridge:q35:defconfig:smp:rootfs i386:Haswell:pc:defconfig:smp:rootfs i386:Nehalem:q35:defconfig:smp:rootfs i386:phenom:pc:defconfig:smp:rootfs i386:Opteron_G5:q35:defconfig:smp:initrd i386:Westmere:q35:defconfig:smp:initrd i386:core2duo:q35:defconfig:nosmp:rootfs i386:Conroe:pc:defconfig:nosmp:rootfs i386:Opteron_G1:pc:defconfig:nosmp:initrd i386:n270:q35:defconfig:nosmp:rootfs All 32-bit i386 boot tests crash. [0.00] BUG: unable to handle kernel NULL pointer dereference at [0.00] *pde = [0.00] Oops: 0002 [#1] SMP [0.00] Modules linked in: [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1 [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24 [0.00] EIP: zero_resv_unavail+0x9b/0x103 [0.00] EFLAGS: 0022 CPU: 0 [0.00] EAX: EBX: ECX: 0008 EDX: [0.00] ESI: c6bc5e90 EDI: EBP: c6bc5eac ESP: c6bc5e80 [0.00] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [0.00] CR0: 80050033 CR2: CR3: 06d67000 CR4: 00040690 [0.00] Call Trace: [0.00] free_area_init_nodes+0x407/0x44c [0.00] zone_sizes_init+0x45/0x5c [0.00] paging_init+0xac/0xaf [0.00] native_pagetable_init+0xc6/0xce [0.00] setup_arch+0x9c6/0xb18 [0.00] start_kernel+0x4f/0x3d2 [0.00] ? idt_setup_early_handler+0x2b/0x3e [0.00] i386_start_kernel+0x95/0x99 [0.00] startup_32_smp+0x164/0x168 Guenter
Re: [PATCH 1/2] mm: Fix vma_is_anonymous() false-positives
On Mon 16-07-18 17:47:39, Kirill A. Shutemov wrote: > On Mon, Jul 16, 2018 at 04:22:45PM +0200, Michal Hocko wrote: > > On Mon 16-07-18 17:04:41, Kirill A. Shutemov wrote: > > > On Mon, Jul 16, 2018 at 01:30:28PM +, Michal Hocko wrote: > > > > On Tue 10-07-18 13:48:58, Andrew Morton wrote: > > > > > On Tue, 10 Jul 2018 16:48:20 +0300 "Kirill A. Shutemov" > > > > > wrote: > > > > > > > > > > > vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous > > > > > > VMA. This is unreliable as ->mmap may not set ->vm_ops. > > > > > > > > > > > > False-positive vma_is_anonymous() may lead to crashes: > > > > > > > > > > > > ... > > > > > > > > > > > > This can be fixed by assigning anonymous VMAs own vm_ops and not > > > > > > relying > > > > > > on it being NULL. > > > > > > > > > > > > If ->mmap() failed to set ->vm_ops, mmap_region() will set it to > > > > > > dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs. > > > > > > > > > > Is there a smaller, simpler fix which we can use for backporting > > > > > purposes and save the larger rework for development kernels? > > > > > > > > Why cannot we simply keep anon vma with null vm_ops and set dummy_vm_ops > > > > for all users who do not initialize it in their mmap callbacks? > > > > Basically have a sanity check in call_mmap? > > > > > > As I said, there's a corner case of MAP_PRIVATE of /dev/zero. > > > > This is really creative. I really didn't think about that. I am > > wondering whether this really has to be handled as a private anonymous > > mapping implicitly. Why does vma_is_anonymous has to succeed for these > > mappings? Why cannot we simply handle it as any other file backed > > PRIVATE mapping? > > Because it's established way to create anonymous mappings in Linux. > And we cannot break the semantics. How exactly would semantic break? You would still get zero pages on read faults and anonymous pages on CoW. So basically the same thing as for any other file backed MAP_PRIVATE mapping. -- Michal Hocko SUSE Labs
Re: [tip:sched/core] sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust()
* Peter Zijlstra wrote: > On Sun, Jul 15, 2018 at 04:36:17PM -0700, tip-bot for Xunlei Pang wrote: > > Commit-ID: 8d4c00dc38a8aa30dae8402955e55e7b34e74bc8 > > Gitweb: > > https://git.kernel.org/tip/8d4c00dc38a8aa30dae8402955e55e7b34e74bc8 > > Author: Xunlei Pang > > AuthorDate: Mon, 9 Jul 2018 22:58:43 +0800 > > Committer: Ingo Molnar > > CommitDate: Mon, 16 Jul 2018 00:28:31 +0200 > > > > sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust() > > > > If users access "/proc/pid/stat", the utime and stime ratio in the > > current SAMPLE period are excepted, but currently cputime_adjust() > > always calculates with the ratio of the WHOLE lifetime of the process. > > > > This results in inaccurate utime and stime in "/proc/pid/stat". For > > example, a process runs for a while with "50% usr, 0% sys", then > > followed by "100% sys". For later while, the following is excepted: > > > > 0.0 usr, 100.0 sys > > > > but we get: > > > > 10.0 usr, 90.0 sys > > > > This patch uses the accurate ratio in cputime_adjust() to address the > > issue. A new 'task_cputime' type field is added in prev_cputime to record > > previous 'task_cputime' so that we can get the elapsed times as the accurate > > ratio. > > Ingo, please make this one go away. Sure, I've removed it from sched/core. Thanks, Ingo
[GIT PULL rcu/next] RCU commits for 4.19
Hello, Ingo! This pull request contains the following changes: 1. An optimization and a fix for RCU expedited grace periods, with the fix being from Boqun Feng. http://lkml.kernel.org/r/20180625224308.ga10...@linux.vnet.ibm.com 2. Miscellaneous fixes, including a lockdep-annotation fix from Boqun Feng. http://lkml.kernel.org/r/20180625230410.ga13...@linux.vnet.ibm.com 3. SRCU updates. http://lkml.kernel.org/r/20180625224734.ga10...@linux.vnet.ibm.com 4. Updates to rcutorture and associated scripting. http://lkml.kernel.org/r/20180625225735.ga11...@linux.vnet.ibm.com 5. Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt, and RCU-sched flavors, replacing the old ->gpnum and ->completed pair of fields. This change allows lockless code to obtain the complete grace-period state with a single READ_ONCE(), which is needed to maintain tolerable lock contention during the upcoming consolidation of the three RCU flavors. Note that grace-period sequence numbers are already used by rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus already heavily used and well-tested. Joel Fernandes contributed a number of excellent fixes and improvements. http://lkml.kernel.org/r/20180626000841.ga15...@linux.vnet.ibm.com 6. Clean up some grace-period-reporting loose ends, including improving the handling of quiescent states from offline CPUs and fixing some false-positive WARN_ON_ONCE() invocations. (Strictly speaking, the WARN_ON_ONCE() invocations were quite correct, but their invariants were (harmlessly) violated by the earlier sloppy handling of quiescent states from offline CPUs.) In addition, improve grace-period forward-progress guarantees so as to allow removal of fail-safe checks that required otherwise needless lock acquisitions. Finally, add more diagnostics to help debug the upcoming consolidation of the RCU-bh, RCU-preempt, and RCU-sched flavors. http://lkml.kernel.org/r/20180626002052.ga24...@linux.vnet.ibm.com 7. Additional miscellaneous fixes, including those contributed by Byungchul Park, Mauro Carvalho Chehab, Joe Perches, Joel Fernandes, Steven Rostedt, Andrea Parri, and Neil Brown. http://lkml.kernel.org/r/20180626003448.ga26...@linux.vnet.ibm.com 8. Additional torture-test changes, including several contributed by Arnd Bergmann and Joel Fernandes. http://lkml.kernel.org/r/20180626005205.ga28...@linux.vnet.ibm.com The layout of this pull request is unusual due to the large footprint of the commits in #5 and #6. Therefore #1-#4 are branches on top of v4.18-rc1, #5 and #6 are stacked on top of each other based off of the merge point for #1-#4, and #7 and #8 are branches on top of #6. Although I merged this series with v4.18-rc4 without conflicts and tested the results, Stephen Rothwell has reported a couple of conflicts with this series in the course of his -next integration-testing efforts and has provided the following resolutions: http://lkml.kernel.org/r/20180514134636.61831...@canb.auug.org.au http://lkml.kernel.org/r/20180622122717.5f475...@canb.auug.org.au Both of these resolutions look good to me. This pull request increases the size of RCU by not quite 500 lines. Of this, about 200 is due to adding test code to rcutorture in order to adequately test the consolidated RCU-bh, RCU-preempt, and RCU-sched flavors. The purpose of adding this test code early is to test it against RCU-sched, which happens to already support compound RCU read-side critical sections via rcu_read_lock_sched(), preempt_disable(), local_bh_disable(), and local_irq_save(). The remainder are primarily due to the addition of diagnostics and the fixing of quiescent-state and grace-period reporting. This size increase is temporary: The overall effect of this set of cleanups, the consolidation of the three RCU flavors, and additional cleanups will be to -decrease- RCU's line count by a couple of hundred lines. All of these changes have been subjected to 0day Test Robot and -next testing, and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo for you to fetch changes up to 18952651dae8efcc6d565c97f8fe5629b399cb3e: Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEAD (2018-07-12 15:42:41 -0700) Andrea Parri (1): doc: Update synchronize_rcu() definition in whatisRCU.txt Arnd Bergmann (1): rcutorture: Use monotonic timestamp for stall detection Boqun Feng (2): rcu: Use the proper lockdep annotation in dump_blkd_tasks() rcu: Make expedited GPs handle CPU 0 being offline Byungchul Park (2): rcu: Improve
Re: [PATCH v2 3/3] dt-bindings: mfd: max8998: Add charger subnode binding
On Sat, Jul 14, 2018 at 02:26:53PM +0200, Paweł Chmiel wrote: > This patch adds devicetree bindings documentation for > battery charging controller as the subnode of MAX8998 PMIC. > > Signed-off-by: Paweł Chmiel > --- > Changes from v1: > - Removed unneeded Fixes tag > - Correct description of all charger values > - Added missing property unit > --- > Documentation/devicetree/bindings/mfd/max8998.txt | 24 > +++ > 1 file changed, 24 insertions(+) > > diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt > b/Documentation/devicetree/bindings/mfd/max8998.txt > index 23a3650ff2a2..196e50097a36 100644 > --- a/Documentation/devicetree/bindings/mfd/max8998.txt > +++ b/Documentation/devicetree/bindings/mfd/max8998.txt > @@ -50,6 +50,23 @@ Additional properties required if > max8998,pmic-buck2-dvs-gpio is defined: > - max8998,pmic-buck2-dvs-voltage: An array of 2 voltage values in microvolts >for buck2 regulator that can be selected using dvs gpio. > > +Charger: Configuration for battery charging controller should be added > +inside a child node named 'charger'. > + Required properties: > + - max8998,charge-eoc-percent: Setup End of Charge Level. > +If value equals 0, leave it unchanged. > +Otherwise it should be value from 10 to 45 by 5 step. > + > + - max8998,charge-restart-level-microvolt: Setup Charge Restart Level. > +If value equals 0, leave it unchanged. > +If value equals -1, it will be disabled. -1 is a bit strange. Why not 'not present' is disabled? > +Otherwise it should be one of following values: 100, 150, 200. > + > + - max8998,charge-timeout-hours: Setup Charge Full Timeout. > +If value equals 0, leave it unchanged. > +If value equals -1, it will be disabled. > +Otherwise it should be one of following values: 5, 6, 7. > + > Regulators: All the regulators of MAX8998 to be instantiated shall be > listed in a child node named 'regulators'. Each regulator is represented > by a child node of the 'regulators' node. > @@ -99,6 +116,13 @@ Example: > max8998,pmic-buck2-dvs-gpio = < 0 3 0 0>; /* SET3 */ > max8998,pmic-buck2-dvs-voltage = <135>, <130>; > > + /* Charger configuration */ > + charger { > + max8998,charge-eoc-percent = <0>; > + max8998,charge-restart-level-microvolt = <(-1)>; > + max8998,charge-timeout-hours = <7>; > + }; > + > /* Regulators to instantiate */ > regulators { > ldo2_reg: LDO2 { > -- > 2.7.4 >
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 11:02:19AM -0700, Guenter Roeck wrote: > On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote: > > On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote: > > > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote: > > > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote: > > > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > > > > > > This is the start of the stable review cycle for the 4.9.113 > > > > > > release. > > > > > > There are 32 patches in this series, all will be posted as a > > > > > > response > > > > > > to this one. If anyone has any issues with these being applied, > > > > > > please > > > > > > let me know. > > > > > > > > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > > > > > > Anything received after that time might be too late. > > > > > > > > > > > > > > > > Build results: > > > > > total: 148 pass: 133 fail: 15 > > > > > Failed builds: > > > > > mips:defconfig > > > > > mips:allnoconfig > > > > > mips:defconfig > > > > > mips:allmodconfig > > > > > mips:allnoconfig > > > > > mips:bcm47xx_defconfig > > > > > mips:bcm63xx_defconfig > > > > > mips:nlm_xlp_defconfig > > > > > mips:ath79_defconfig > > > > > mips:ar7_defconfig > > > > > mips:e55_defconfig > > > > > mips:cavium_octeon_defconfig > > > > > mips:malta_defconfig > > > > > mips:rt305x_defconfig > > > > > mips:defconfig > > > > > Qemu test results: > > > > > total: 166 pass: 154 fail: 12 > > > > > Failed tests: > > > > > mips:malta_defconfig:nosmp > > > > > mips:malta_defconfig:smp > > > > > mips64:malta_defconfig:nosmp > > > > > mips64:malta_defconfig:smp > > > > > mipsel:24Kf:malta_defconfig:nosmp:initrd > > > > > mipsel:24Kf:malta_defconfig:smp:initrd > > > > > mipsel:24Kf:malta_defconfig:smp:rootfs > > > > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs > > > > > mipsel64:malta_defconfig:nosmp:rootfs > > > > > mipsel64:malta_defconfig:smp:initrd > > > > > mipsel64:malta_defconfig:smp:rootfs > > > > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs > > > > > > > > > > Error is always the same. > > > > > > > > > > arch/mips/kernel/process.c:637:8: > > > > > error: 'call_single_data_t' undeclared here (not in a function) > > > > > arch/mips/kernel/process.c: In function 'raise_backtrace': > > > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: > > > > > error: 'csd' undeclared (first use in this function) > > > > > > > > mips should now be fixed with the updated tree I have pushed out. > > > > > > > > > > The above is with v4.9.112-33-g7fb1f5e, which is the latest version > > > available from the git repository (as of right now). My builders > > > pulled it at 4:07am this morning, and there was no subsequent update. > > > > Odd. Ok, I've bumped the -rc number to -rc2 now, and pushed it all out > > again. Let me know if you don't see it show up within an hour or so. > > > Version is now v4.9.112-33-gb44db2b, so something changed, but I still get > the same build error. > > Comparing the old and the new version, the only difference is the updated rc. > > diff --git a/Makefile b/Makefile > index 57f315d00a94..986470ef6f6e 100644 > --- a/Makefile > +++ b/Makefile > @@ -1,7 +1,7 @@ > VERSION = 4 > PATCHLEVEL = 9 > SUBLEVEL = 113 > -EXTRAVERSION = -rc1 > +EXTRAVERSION = -rc2 > NAME = Roaring Lionus > > The error itself is that single_data_t was replaced in the backport with > call_single_data. It should have been "struct call_single_data". {sigh} This is why I asked for a fixup patch. Let me just go rip that thing out now... greg k-h
Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails
On Tue, 17 Jul 2018, Mukesh Ojha wrote: > On 7/16/2018 10:44 PM, John Stultz wrote: > > > So, I think with the logic bug above it will work out properly, but > > > let me know if I'm still missing something. > > Please give it thought to a case where very first suspend fails with your > logic. > If i am not able to get your thought, please write a patch. John wants you to invert the logic. i.e. true -> false false -> true if (var) -> if (!var) if (!var) -> if (var) It's not that hard, right? Thanks, tglx
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 4.9.113 release. > There are 32 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > Anything received after that time might be too late. > Build results: total: 148 pass: 133 fail: 15 Failed builds: mips:defconfig mips:allnoconfig mips:defconfig mips:allmodconfig mips:allnoconfig mips:bcm47xx_defconfig mips:bcm63xx_defconfig mips:nlm_xlp_defconfig mips:ath79_defconfig mips:ar7_defconfig mips:e55_defconfig mips:cavium_octeon_defconfig mips:malta_defconfig mips:rt305x_defconfig mips:defconfig Qemu test results: total: 166 pass: 154 fail: 12 Failed tests: mips:malta_defconfig:nosmp mips:malta_defconfig:smp mips64:malta_defconfig:nosmp mips64:malta_defconfig:smp mipsel:24Kf:malta_defconfig:nosmp:initrd mipsel:24Kf:malta_defconfig:smp:initrd mipsel:24Kf:malta_defconfig:smp:rootfs mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs mipsel64:malta_defconfig:nosmp:rootfs mipsel64:malta_defconfig:smp:initrd mipsel64:malta_defconfig:smp:rootfs mipsel64:fuloong2e_defconfig:fulong2e:rootfs Error is always the same. arch/mips/kernel/process.c:637:8: error: 'call_single_data_t' undeclared here (not in a function) arch/mips/kernel/process.c: In function 'raise_backtrace': /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: error: 'csd' undeclared (first use in this function) Details are available at http://kerneltests.org/builders. Guenter
compliation error with aio_abi.h
Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1601529 with 4.18-rc4: Steps to Reproduce: 1.echo 'int f;' | gcc -include linux/aio_abi.h -xc -c - -o /dev/null Actual results: /usr/include/asm/signal.h:127:2: error: unknown type name ‘size_t’ size_t ss_size; ^~ In file included from :32: /usr/include/linux/aio_abi.h:115:2: error: unknown type name ‘size_t’ size_t sigsetsize; ^~ Expected results: no errors, as in Fedora 28 kernel-headers-4.17.4-200.fc28.x86_64) This looks like the structure that was introduced with 7a074e96dee6 ("aio: implement io_pgetevents") . is #include the correct header? This breaks compilation of nginx https://bugzilla.redhat.com/show_bug.cgi?id=1597674 Thanks, Laura
Re: [PATCH 4.17 00/67] 4.17.7-stable review
On Mon, Jul 16, 2018 at 09:33:38AM -0700, Guenter Roeck wrote: > On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.17.7 release. > > There are 67 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Wed Jul 18 07:34:11 UTC 2018. > > Anything received after that time might be too late. > > > > Build results: > total: 134 pass: 134 fail: 0 > Qemu test results: > total: 171 pass: 159 fail: 12 > Failed tests: > i386:Broadwell:q35:defconfig:smp:rootfs > i386:Skylake-Client:q35:defconfig:smp:rootfs > i386:SandyBridge:q35:defconfig:smp:rootfs > i386:Haswell:pc:defconfig:smp:rootfs > i386:Nehalem:q35:defconfig:smp:rootfs > i386:phenom:pc:defconfig:smp:rootfs > i386:Opteron_G5:q35:defconfig:smp:initrd > i386:Westmere:q35:defconfig:smp:initrd > i386:core2duo:q35:defconfig:nosmp:rootfs > i386:Conroe:pc:defconfig:nosmp:rootfs > i386:Opteron_G1:pc:defconfig:nosmp:initrd > i386:n270:q35:defconfig:nosmp:rootfs > > All 32-bit i386 boot tests crash. > > [0.00] BUG: unable to handle kernel NULL pointer dereference at > > [0.00] *pde = > [0.00] Oops: 0002 [#1] SMP > [0.00] Modules linked in: > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1 > [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24 > [0.00] EIP: zero_resv_unavail+0x9b/0x103 > [0.00] EFLAGS: 0022 CPU: 0 > [0.00] EAX: EBX: ECX: 0008 EDX: > [0.00] ESI: c6bc5e90 EDI: EBP: c6bc5eac ESP: c6bc5e80 > [0.00] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [0.00] CR0: 80050033 CR2: CR3: 06d67000 CR4: 00040690 > [0.00] Call Trace: > [0.00] free_area_init_nodes+0x407/0x44c > [0.00] zone_sizes_init+0x45/0x5c > [0.00] paging_init+0xac/0xaf > [0.00] native_pagetable_init+0xc6/0xce > [0.00] setup_arch+0x9c6/0xb18 > [0.00] start_kernel+0x4f/0x3d2 > [0.00] ? idt_setup_early_handler+0x2b/0x3e > [0.00] i386_start_kernel+0x95/0x99 > [0.00] startup_32_smp+0x164/0x168 > > Guenter I think this is caused by 3e3eda67092b ("mm: zero available pages before memmap init"). A fix for it was posted to the mailing list this morning: https://lore.kernel.org/lkml/20180716151630.770-1-pasha.tatas...@oracle.com/ Cheers, Nathan
Re: [PATCH] x86/power/64: Remove VLA usage
On Sun, Jul 15, 2018 at 08:56:57PM -0700, Kees Cook wrote: > In the quest to remove all stack VLA usage from the kernel[1], this > removes the discouraged use of AHASH_REQUEST_ON_STACK by switching to > shash directly and allocating the descriptor in heap memory (which should > be fine: the tfm has already been allocated there too). > > [1] > https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com > > Signed-off-by: Kees Cook > --- > arch/x86/power/hibernate_64.c | 35 +++ > 1 file changed, 19 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 67ccf64c8bd8..0ed01bb935a6 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -233,28 +233,31 @@ struct restore_data_record { > */ > static int get_e820_md5(struct e820_table *table, void *buf) > { > - struct scatterlist sg; > - struct crypto_ahash *tfm; > + struct crypto_shash *tfm; > + struct shash_desc *desc; > int size; > int ret = 0; > > - tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); > + tfm = crypto_alloc_shash("md5", 0, 0); > if (IS_ERR(tfm)) > return -ENOMEM; > > - { > - AHASH_REQUEST_ON_STACK(req, tfm); > - size = offsetof(struct e820_table, entries) + sizeof(struct > e820_entry) * table->nr_entries; > - ahash_request_set_tfm(req, tfm); > - sg_init_one(, (u8 *)table, size); > - ahash_request_set_callback(req, 0, NULL, NULL); > - ahash_request_set_crypt(req, , buf, size); > - > - if (crypto_ahash_digest(req)) > - ret = -EINVAL; > - ahash_request_zero(req); > - } > - crypto_free_ahash(tfm); > + desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm), > +GFP_KERNEL); > + if (!desc) > + return -ENOMEM; Need crypto_free_shash(tfm) if the kmalloc() here fails. > + > + desc->tfm = tfm; > + desc->flags = 0; > + > + size = offsetof(struct e820_table, entries) + > + sizeof(struct e820_entry) * table->nr_entries; > + > + if (crypto_shash_digest(desc, (u8 *)table, size, buf)) > + ret = -EINVAL; > + > + kzfree(desc); > + crypto_free_shash(tfm); > > return ret; > } > -- > 2.17.1 > > > -- > Kees Cook > Pixel Security
Re: [PATCH v5 05/12] PM / devfreq: Add support for policy notifiers
On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote: > Hi Matthias, > > On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote: > > Hi Chanwoo, > > > > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote: > > > >> Firstly, > >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function. > >> > >> devfreq already used the OPP interface as default. It means that > >> the outside of 'drivers/devfreq' can disable/enable the frequency > >> such as drivers/thermal/devfreq_cooling.c. Also, when some device > >> drivers disable/enable the specific frequency, the devfreq core > >> consider them. > >> > >> So, devfreq doesn't need to devfreq_verify_within_limits() because > >> already support some interface to change the minimum/maximum frequency > >> of devfreq device. > >> > >> In case of cpufreq subsystem, cpufreq only provides > >> 'cpufreq_verify_with_limits()' > >> to change the minimum/maximum frequency of cpu. some device driver cannot > >> change the minimum/maximum frequency through OPP interface. > >> > >> But, in case of devfreq subsystem, as I explained already, devfreq support > >> the OPP interface as default way. devfreq subsystem doesn't need to add > >> other way to change the minimum/maximum frequency. > > > > Using the OPP interface exclusively works as long as a > > enabling/disabling of OPPs is limited to a single driver > > (drivers/thermal/devfreq_cooling.c). When multiple drivers are > > involved you need a way to resolve conflicts, that's the purpose of > > devfreq_verify_within_limits(). Please let me know if there are > > existing mechanisms for conflict resolution that I overlooked. > > > > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use > > devfreq_verify_within_limits() instead of the OPP interface if > > desired, however this seems beyond the scope of this series. > > Actually, if we uses this approach, it doesn't support the multiple drivers > too. > If non throttler drivers uses devfreq_verify_within_limits(), the conflict > happen. As long as drivers limit the max freq there is no conflict, the lowest max freq wins. I expect this to be the usual case, apparently it worked for cpufreq for 10+ years. However it is correct that there would be a conflict if a driver requests a min freq that is higher than the max freq requested by another. In this case devfreq_verify_within_limits() resolves the conflict by raising p->max to the min freq. Not sure if this is something that would ever occur in practice though. If we are really concerned about this case it would also be an option to limit the adjustment to the max frequency. > To resolve the conflict for multiple device driver, maybe OPP interface > have to support 'usage_count' such as clk_enable/disable(). This would require supporting negative usage count values, since a OPP should not be enabled if e.g. thermal enables it but the throttler disabled it or viceversa. Theoretically there could also be conflicts, like one driver disabling the higher OPPs and another the lower ones, with the outcome of all OPPs being disabled, which would be a more drastic conflict resolution than that of devfreq_verify_within_limits(). Viresh, what do you think about an OPP usage count? Thanks Matthias
Re: [PATCH v2 1/3] dt-bindings: pfuze100: add optional disable switch-regulators binding
On Fri, Jul 13, 2018 at 02:50:01PM +0200, Marco Felsch wrote: > This binding is used to keep the backward compatibility with the current > dtb's [1]. The binding informs the driver that the unused switch regulators > can be disabled. > If it is not specified, the driver doesn't disable the switch regulators. > > [1] https://patchwork.kernel.org/patch/10490381/ > > Signed-off-by: Marco Felsch > > --- > Changes in V2: > - add more information about the binding > - rename binding and add vendor prefix > > .../devicetree/bindings/regulator/pfuze100.txt| 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt > b/Documentation/devicetree/bindings/regulator/pfuze100.txt > index 672c939045ff..2c46b8d368db 100644 > --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt > +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt > @@ -4,6 +4,17 @@ Required properties: > - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000", > "fsl,pfuze3001" > - reg: I2C slave address > > +Optional properties: > +- fsl,pfuze-support-disable: Boolean, if present disable all unused switch > + regulators to save power consumption. Attention, till 4.18 these regulators You shouldn't have kernel version info in bindings. > + were always on without specifying "regulator-always-on". So be sure to mark > + import regulators as "regulator-always-on" (e.g. DDR ref, DDR supply). If s/import/important/ > + not present, the switched regualtors are always on and can't be disabled. > + This binding is a workaround to keep backward compatibility with old dtb's > + which rely on the fact that the switched regulators are always on and don't > + mark them explicit as "regulator-always-on". On new dtbs this property > should > + always be present. > + > Required child node: > - regulators: This is the list of child nodes that specify the regulator >initialization data for defined regulators. Please refer to below doc > -- > 2.18.0 >
Re: cpu_no_speculation omissions?
On Mon, 2018-07-16 at 10:28 -0700, Dave Hansen wrote: > On 07/16/2018 09:56 AM, Thomas Gleixner wrote: > > On Mon, 16 Jul 2018, Rich Felker wrote: > > > At least the Centerton (late-generation Bonnell uarch) Atom > > > family is > > > omitted from the cpu_no_speculation table added by commit > > > fec9434a12f3 > > > to arch/x86/kernel/cpu/common.c. Is this intentional? Would a > > > patch > > > adding it and possibly other omissions be welcome? > > > > Probably. Dave? > > IIRC, Alan Cox was compiling a list on what is affected vs. not. He > would know way better than I. The pre Silvermont atom cores are in order. When I did the original list I didn't bother with all the 32bit cores as we didn't have any 32bit mitigations then. Alan
Re: [PATCH 4.9 00/32] 4.9.113-stable review
On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote: > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote: > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 4.9.113 release. > > > There are 32 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018. > > > Anything received after that time might be too late. > > > > > > > Build results: > > total: 148 pass: 133 fail: 15 > > Failed builds: > > mips:defconfig > > mips:allnoconfig > > mips:defconfig > > mips:allmodconfig > > mips:allnoconfig > > mips:bcm47xx_defconfig > > mips:bcm63xx_defconfig > > mips:nlm_xlp_defconfig > > mips:ath79_defconfig > > mips:ar7_defconfig > > mips:e55_defconfig > > mips:cavium_octeon_defconfig > > mips:malta_defconfig > > mips:rt305x_defconfig > > mips:defconfig > > Qemu test results: > > total: 166 pass: 154 fail: 12 > > Failed tests: > > mips:malta_defconfig:nosmp > > mips:malta_defconfig:smp > > mips64:malta_defconfig:nosmp > > mips64:malta_defconfig:smp > > mipsel:24Kf:malta_defconfig:nosmp:initrd > > mipsel:24Kf:malta_defconfig:smp:initrd > > mipsel:24Kf:malta_defconfig:smp:rootfs > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs > > mipsel64:malta_defconfig:nosmp:rootfs > > mipsel64:malta_defconfig:smp:initrd > > mipsel64:malta_defconfig:smp:rootfs > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs > > > > Error is always the same. > > > > arch/mips/kernel/process.c:637:8: > > error: 'call_single_data_t' undeclared here (not in a function) > > arch/mips/kernel/process.c: In function 'raise_backtrace': > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22: > > error: 'csd' undeclared (first use in this function) > > mips should now be fixed with the updated tree I have pushed out. > The above is with v4.9.112-33-g7fb1f5e, which is the latest version available from the git repository (as of right now). My builders pulled it at 4:07am this morning, and there was no subsequent update. Guenter
Re: [PATCH] x86/ioapic: Do not unmask io_apic when interrupt is in progress
On Fri, 2 Mar 2018, Sebastian Andrzej Siewior wrote: > From: Ingo Molnar > Date: Fri, 3 Jul 2009 08:29:27 -0500 > > With threaded interrupts we might see an interrupt in progress on > migration. Do not unmask it when this is the case. It took me quite a while to wrap my head around this one. > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -1690,7 +1690,8 @@ static bool io_apic_level_ack_pending(st > static inline bool ioapic_irqd_mask(struct irq_data *data) > { > /* If we are moving the irq we need to mask it */ > - if (unlikely(irqd_is_setaffinity_pending(data))) { > + if (unlikely(irqd_is_setaffinity_pending(data) && > + !irqd_irq_inprogress(data))) { I really don't see how that is preventing anything. The point is that chip->eoi_irq() is always called _after_ the INPROGRESS flag has been cleared. And if that's not the case then the interrupt would be never moved. But, that patch is almost 10 years old and I didn't find the exact context. So it might have been valid back then, but I doubt it. Though there is an issue with threaded interrupts which are marked ONESHOT and using the fasteoi handler. if (IS_ONESHOT()) mask_irq(); cond_unmask_eoi_irq() chip->irq_eoi(); So if setaffinity is pending then the interrupt will be moved and then unmasked, which is wrong as it should be kept masked up to the point where the threaded handler finished. It's not a real problem, the interrupt will just be able to fire before the threaded handler has finished, though the irq masked state will be wrong for a bit. The patch below should cure the issue. It also renames the horribly misnomed functions so it becomes clear what they are supposed to do. Thanks, tglx 8<--- diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 3982f79d2377..23667451e44e 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1721,19 +1721,20 @@ static bool io_apic_level_ack_pending(struct mp_chip_data *data) return false; } -static inline bool ioapic_irqd_mask(struct irq_data *data) +static inline bool ioapic_prepare_move(struct irq_data *data) { /* If we are moving the irq we need to mask it */ if (unlikely(irqd_is_setaffinity_pending(data))) { - mask_ioapic_irq(data); + if (!irqd_irq_masked(data)) + mask_ioapic_irq(data); return true; } return false; } -static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked) +static inline void ioapic_finish_move(struct irq_data *data, bool moveit) { - if (unlikely(masked)) { + if (unlikely(moveit)) { /* Only migrate the irq if the ack has been received. * * On rare occasions the broadcast level triggered ack gets @@ -1762,15 +1763,17 @@ static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked) */ if (!io_apic_level_ack_pending(data->chip_data)) irq_move_masked_irq(data); - unmask_ioapic_irq(data); + /* If the irq is masked in the core, leave it */ + if (!irqd_irq_masked(data)) + unmask_ioapic_irq(data); } } #else -static inline bool ioapic_irqd_mask(struct irq_data *data) +static inline bool ioapic_move_prepare(struct irq_data *data) { return false; } -static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked) +static inline void ioapic_move_irq(struct irq_data *data, bool masked) { } #endif @@ -1779,11 +1782,11 @@ static void ioapic_ack_level(struct irq_data *irq_data) { struct irq_cfg *cfg = irqd_cfg(irq_data); unsigned long v; - bool masked; + bool moveit; int i; irq_complete_move(cfg); - masked = ioapic_irqd_mask(irq_data); + moveit = ioapic_prepare_move(irq_data); /* * It appears there is an erratum which affects at least version 0x11 @@ -1838,7 +1841,7 @@ static void ioapic_ack_level(struct irq_data *irq_data) eoi_ioapic_pin(cfg->vector, irq_data->chip_data); } - ioapic_irqd_unmask(irq_data, masked); + ioapic_finish_move(irq_data, moveit); } static void ioapic_ir_ack_level(struct irq_data *irq_data)
Re: cpu_no_speculation omissions?
On Mon, 16 Jul 2018, Rich Felker wrote: > At least the Centerton (late-generation Bonnell uarch) Atom family is > omitted from the cpu_no_speculation table added by commit fec9434a12f3 > to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch > adding it and possibly other omissions be welcome? Probably. Dave? Thanks, tglx