date:20180716

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread Mukesh Ojha




On 7/13/2018 10:50 PM, John Stultz wrote:

On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha  wrote:

Hi John,

Thanks for your response
Please find my comments inline.


On 7/11/2018 1:43 AM, John Stultz wrote:

On Fri, Jul 6, 2018 at 6:17 AM, Mukesh Ojha  wrote:

Currently, there exists a corner case assuming when there is
only one clocksource e.g RTC, and system failed to go to
suspend mode. While resume rtc_resume() injects the sleeptime
as timekeeping_rtc_skipresume() returned 'false' (default value
of sleeptime_injected) due to which we can see mismatch in
timestamps.

This issue can also come in a system where more than one
clocksource are present and very first suspend fails.

Fix this by handling `sleeptime_injected` flag properly.

Success case:

  {sleeptime_injected=false}
rtc_suspend() => timekeeping_suspend() => timekeeping_resume() =>

(sleeptime injected)
   rtc_resume()

Failure case:

   {failure in sleep path} {sleeptime_injected=false}
rtc_suspend() =>  rtc_resume()

sleeptime injected again which was not required as the suspend failed)

Originally-by: Thomas Gleixner 
Signed-off-by: Mukesh Ojha 
---
Changes in v3:
   * Updated commit subject and description.
   * Updated the patch as per the fix given by Thomas Gleixner.

Changes in v2:
   * Updated the commit text.
   * Removed extra variable and used the earlier static
 variable 'sleeptime_injected'.

   kernel/time/timekeeping.c | 21 ++---
   1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4786df9..32ae9ae 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1510,8 +1510,20 @@ void __weak read_boot_clock64(struct timespec64
*ts)
  ts->tv_nsec = 0;
   }

-/* Flag for if timekeeping_resume() has injected sleeptime */
-static bool sleeptime_injected;
+/*
+ * Flag reflecting whether timekeeping_resume() has injected sleeptime.
+ *
+ * The flag starts of true and is only cleared when a suspend reaches
+ * timekeeping_suspend(), timekeeping_resume() sets it when the
timekeeper
+ * clocksource is not stopping across suspend and has been used to
update
+ * sleep time. If the timekeeper clocksource has stopped then the flag
+ * stays false and is used by the RTC resume code to decide whether
sleep
+ * time must be injected and if so the flag gets set then.
+ *
+ * If a suspend fails before reaching timekeeping_resume() then the flag
+ * stays true and prevents erroneous sleeptime injection.
+ */
+static bool sleeptime_injected = true;

I worry this upside-down logic is too subtle to be easily reasoned
about, and will just lead to future mistakes.

Can we instead call this "suspend_timing_needed" and only set it to
true when we don't inject any sleep time on resume?


I did not get your point "only set it to true when we don't inject any sleep
time on resume? "
How do we know  this ?
This question itself depends on the "sleeptime_injected" if it is true means
no need to inject else need to inject.

Also, we need to make this variable back and forth true, false; suspends
path ensures it to make it false.

So yea, I'm not saying logically the code is really any different,
this is more of a naming nit. So instead of having a variable that is
always on that we occasionally turn off, lets invert the naming and
have it be a flag that we occasionally turn on.


I understand your concern about the name of the variable will be misleading.
But the changing Boolean state would not solve the actual issue.

If i understand you correctly you meant below code

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 32ae9ae..becc5bd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts)
  * If a suspend fails before reaching timekeeping_resume() then the flag
  * stays true and prevents erroneous sleeptime injection.
  */
-static bool sleeptime_injected = true;
+static bool suspend_timing_needed;

 /* Flag for if there is a persistent clock on this platform */
 static bool persistent_clock_exists;
@@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct 
timespec64 *delta)

    raw_spin_lock_irqsave(_lock, flags);
    write_seqcount_begin(_core.seq);

-   sleeptime_injected = true;
+   suspend_timing_needed = false;

    timekeeping_forward_now(tk);

@@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
  tk->tkr_mono.mask);
    nsec = mul_u64_u32_shr(cyc_delta, clock->mult, 
clock->shift);

    ts_delta = ns_to_timespec64(nsec);
-   sleeptime_injected = true;
+   suspend_timing_needed = true;
    } else if (timespec64_compare(_new, 
_suspend_time) > 0) {
    ts_delta = timespec64_sub(ts_new,

Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks

2018-07-16 Thread Linus Torvalds

On Mon, Jul 16, 2018 at 8:08 AM Eric W. Biederman  wrote:
>
> The change for global init is it will now die if init is a member of the
> session or init is using this tty as it's controlling tty.
>
> Semantically killing init with SAK is completely appropriate.

No.

Semtnaitcally killing init is completely wrong. Because it will kill
the whole system.

And I don't mean that in "now init won't spawn new things". I mean
that in "now we don't have a child reaper any more, and the system
will be dead because we'll panic on exit".

So it's not about the controlling tty, it's about fundamental kernel
internal consistency guarantees.

See

write_unlock_irq(_lock);
if (unlikely(pid_ns == _pid_ns)) {
panic("Attempted to kill init! exitcode=0x%08x\n",
father->signal->group_exit_code ?: father->exit_code);
}

in kernel/exit.c.

   Linus

Re: [PATCH v5 1/2] leds: core: Introduce generic pattern interface

2018-07-16 Thread David Lechner


On 07/16/2018 06:10 AM, Baolin Wang wrote:

From: Bjorn Andersson 

Some LED controllers have support for autonomously controlling
brightness over time, according to some preprogrammed pattern or
function.

This adds a new optional operator that LED class drivers can implement
if they support such functionality as well as a new device attribute to
configure the pattern for a given LED.



Hmm... I was hoping that this would be a generic pattern trigger that
could be used by any LED (e.g. gpio LEDs), not just LEDs with hardware
support. I guess I cannot really test this after all.

I think it looks good for your use case though.

Re: [v9] clk: qcom: clk-rpmh: Add QCOM RPMh clock driver

2018-07-16 Thread Stephen Boyd

Quoting Taniya Das (2018-05-08 22:56:07)
> Add the RPMh clock driver to control the RPMh managed clock resources on
> some of the Qualcomm Technologies, Inc. SoCs.
> 
> Signed-off-by: Taniya Das 
> ---

Applied to clk-next

Re: [PATCH v2 1/2] dt-binding: pinctrl: Add NPCM7xx pinctrl and GPIO documentation

2018-07-16 Thread Rob Herring

On Fri, Jul 13, 2018 at 12:42:02AM +0300, Tomer Maimon wrote:
> Added device tree binding documentation for Nuvoton BMC
> NPCM750/730/715/705 pinmux and GPIO controller.
> 
> Signed-off-by: Tomer Maimon 
> ---
>  .../bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt   | 216 
> +
>  1 file changed, 216 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt 
> b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt
> new file mode 100644
> index ..83f4bbac94bb
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pinctrl/nuvoton,npcm7xx-pinctrl.txt
> @@ -0,0 +1,216 @@
> +Nuvoton NPCM7XX Pin Controllers
> +
> +The Nuvoton BMC NPCM7XX Pin Controller multi-function routed through
> +the multiplexing block, Each pin supports GPIO functionality (GPIOx)
> +and multiple functions that directly connect the pin to different
> +hardware blocks.
> +
> +Required properties:
> +- #address-cells : should be 1.
> +- #size-cells : should be 1.
> +- compatible  : "nuvoton,npcm750-pinctrl" for Poleg NPCM7XX.
> +- ranges  : defines mapping ranges between pin controller node (parent)
> + to GPIO bank node (children).
> +
> +=== GPIO Bank Subnode ===
> +
> +The NPCM7XX has 8 GPIO Banks each GPIO bank supports 32 GPIO.
> +
> +Required GPIO Bank subnode-properties:
> +- reg: specifies physical base address and size of 
> the GPIO
> + bank registers.
> +- gpio-controller: Marks the device node as a GPIO controller.
> +- #gpio-cells: Must be <2>. The first cell is the gpio pin 
> number
> + and the second cell is used for optional 
> parameters.
> +- interrupts : contain the GPIO bank interrupt with flags for 
> falling edge.
> +- gpio-ranges: defines the range of pins managed by the GPIO 
> bank controller.
> +
> +For example, GPIO bank subnodes like the following:
> + gpio0: gpio@f001 {
> + gpio-controller;
> + #gpio-cells = <2>;
> + reg = <0x0 0x80>;
> + interrupts = ;
> + gpio-ranges = < 0 0 32>;
> + };
> +
> +=== Pin Mux Subnode ===
> +
> +- pin: A string containing the name of the pin
> + An array of strings, each string containing the name of a pin.
> + These pin are used for selecting pin configuration.
> +
> +The following are the list of pins available:
> + "GPIO0/IOX1DI", "GPIO1/IOX1LD", "GPIO2/IOX1CK", "GPIO3/IOX1D0",
> + "GPIO4/IOX2DI/SMB1DSDA", "GPIO5/IOX2LD/SMB1DSCL", 
> "GPIO6/IOX2CK/SMB2DSDA",
> + "GPIO7/IOX2D0/SMB2DSCL", "GPIO8/LKGPO1", "GPIO9/LKGPO2", 
> "GPIO10/IOXHLD",
> + "GPIO11/IOXHCK", "GPIO12/GSPICK/SMB5BSCL", "GPIO13/GSPIDO/SMB5BSDA",
> + "GPIO14/GSPIDI/SMB5CSCL", "GPIO15/GSPICS/SMB5CSDA", "GPIO16/LKGPO0",
> + "GPIO17/PSPI2DI/SMB4DEN","GPIO18/PSPI2D0/SMB4BSDA", 
> "GPIO19/PSPI2CK/SMB4BSCL",
> + "GPIO20/SMB4CSDA/SMB15SDA", "GPIO21/SMB4CSCL/SMB15SCL", 
> "GPIO22/SMB4DSDA/SMB14SDA",
> + "GPIO23/SMB4DSCL/SMB14SCL", "GPIO24/IOXHDO", "GPIO25/IOXHDI", 
> "GPIO26/SMB5SDA",
> + "GPIO27/SMB5SCL", "GPIO28/SMB4SDA", "GPIO29/SMB4SCL", "GPIO30/SMB3SDA",
> + "GPIO31/SMB3SCL", "GPIO32/nSPI0CS1","SPI0D2", "SPI0D3", 
> "GPIO37/SMB3CSDA",
> + "GPIO38/SMB3CSCL", "GPIO39/SMB3BSDA", "GPIO40/SMB3BSCL", 
> "GPIO41/BSPRXD",
> + "GPO42/BSPTXD/STRAP11", "GPIO43/RXD1/JTMS2/BU1RXD", 
> "GPIO44/nCTS1/JTDI2/BU1CTS",
> + "GPIO45/nDCD1/JTDO2", "GPIO46/nDSR1/JTCK2", "GPIO47/nRI1/JCP_RDY2",
> + "GPIO48/TXD2/BSPTXD", "GPIO49/RXD2/BSPRXD", "GPIO50/nCTS2", 
> "GPO51/nRTS2/STRAP2",
> + "GPIO52/nDCD2", "GPO53/nDTR2_BOUT2/STRAP1", "GPIO54/nDSR2", 
> "GPIO55/nRI2",
> + "GPIO56/R1RXERR", "GPIO57/R1MDC", "GPIO58/R1MDIO", "GPIO59/SMB3DSDA",
> + "GPIO60/SMB3DSCL", "GPO61/nDTR1_BOUT1/STRAP6", "GPO62/nRTST1/STRAP5",
> + "GPO63/TXD1/STRAP4", "GPIO64/FANIN0", "GPIO65/FANIN1", "GPIO66/FANIN2",
> + "GPIO67/FANIN3", "GPIO68/FANIN4", "GPIO69/FANIN5", "GPIO70/FANIN6", 
> "GPIO71/FANIN7",
> + "GPIO72/FANIN8", "GPIO73/FANIN9", "GPIO74/FANIN10", "GPIO75/FANIN11",
> + "GPIO76/FANIN12", "GPIO77/FANIN13","GPIO78/FANIN14", "GPIO79/FANIN15",
> + "GPIO80/PWM0", "GPIO81/PWM1", "GPIO82/PWM2", "GPIO83/PWM3", 
> "GPIO84/R2TXD0",
> + "GPIO85/R2TXD1", "GPIO86/R2TXEN", "GPIO87/R2RXD0", "GPIO88/R2RXD1", 
> "GPIO89/R2CRSDV",
> + "GPIO90/R2RXERR", "GPIO91/R2MDC", "GPIO92/R2MDIO", 
> "GPIO93/GA20/SMB5DSCL",
> + "GPIO94/nKBRST/SMB5DSDA", "GPIO95/nLRESET/nESPIRST", "GPIO96/RG1TXD0",
> + "GPIO97/RG1TXD1", "GPIO98/RG1TXD2", "GPIO99/RG1TXD3","GPIO100/RG1TXC",
> + "GPIO101/RG1TXCTL", "GPIO102/RG1RXD0", "GPIO103/RG1RXD1", 
> "GPIO104/RG1RXD2",
> + "GPIO105/RG1RXD3", "GPIO106/RG1RXC", "GPIO107/RG1RXCTL", 
> "GPIO108/RG1MDC",
> + "GPIO109/RG1MDIO",

[PATCH memory-model 0/14] Updates to the formal memory model

2018-07-16 Thread Paul E. McKenney

Hello!

This series contains updates to the Linux kernel's formal memory model
in tools/memory-model, along with corresponding changes in documentation
and Linux-kernel code.  These patches are ready for inclusion into -tip.

1.  Add a litmus test for full multi-copy atomicity.

2.  Fix the name of the ISA2+pooncelock+pooncelock+pombonce litmus test.

3.  Add Daniel Lustig as an LKMM reviewer, with focus on RISC-V,
courtesy of Palmer Dabbelt.

4.  Update Korean translation to fix broken DMA vs. MMIO ordering
example, courtesy of SeongJae Park with edits suggested by
Byungchul Park.

5-6.Remove the now-obsolete ACCESS_ONCE() from recipes and the model
itself, courtesy of Mark Rutland.

7.  Make regression-test scripts executable.

8.  Describe atomic_set as a write operation, courtesy of Jonathan
Neuschäfer.

9.  Add informal LKMM documentation to MAINTAINERS.

10. Use smp_mb() in wake_woken_function(), courtesy of Andrea Parri.

11. Clarify requirements for smp_mb__after_spinlock(), courtesy of
Andrea Parri.

12. Update wake_up() and friends' memory-barrier guarantees,
courtesy of Andrea Parri.

13. Fix "smb_wmb()" typo (should be "smp_wmb()"), courtesy of
Yauheni Kaliuta.

14. Rename litmus tests to comply to norm7, courtesy of Andrea Parri.

Thanx, Paul



 Documentation/core-api/atomic_ops.rst  
|2 
 Documentation/memory-barriers.txt  
|   43 +++---
 Documentation/translations/ko_KR/memory-barriers.txt   
|   22 +--
 MAINTAINERS
|6 
 include/linux/sched.h  
|4 
 include/linux/spinlock.h   
|   53 +--
 kernel/sched/completion.c  
|8 -
 kernel/sched/core.c
|   71 --
 kernel/sched/wait.c
|   55 +++
 tools/memory-model/Documentation/explanation.txt   
|2 
 tools/memory-model/Documentation/recipes.txt   
|   12 -
 tools/memory-model/README  
|   20 +-
 tools/memory-model/linux-kernel.bell   
|2 
 tools/memory-model/litmus-tests/IRIW+fencembonceonces+OnceOnce.litmus  
|2 
 tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus 
|2 
 tools/memory-model/litmus-tests/LB+fencembonceonce+ctrlonceonce.litmus 
|2 
 tools/memory-model/litmus-tests/MP+fencewmbonceonce+fencermbonceonce.litmus
|2 
 tools/memory-model/litmus-tests/R+fencembonceonces.litmus  
|2 
 tools/memory-model/litmus-tests/README 
|   25 ++-
 tools/memory-model/litmus-tests/S+fencewmbonceonce+poacquireonce.litmus
|2 
 tools/memory-model/litmus-tests/SB+fencembonceonces.litmus 
|2 
 tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus  
|   32 
 tools/memory-model/litmus-tests/WRC+pooncerelease+fencermbonceonce+Once.litmus 
|2 
 
tools/memory-model/litmus-tests/Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
 |2 
 tools/memory-model/scripts/checkalllitmus.sh   
|2 
 tools/memory-model/scripts/checklitmus.sh  
|2 
 26 files changed, 223 insertions(+), 156 deletions(-)

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote:
> > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote:
> > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> > > > > This is the start of the stable review cycle for the 4.9.113 release.
> > > > > There are 32 patches in this series, all will be posted as a response
> > > > > to this one.  If anyone has any issues with these being applied, 
> > > > > please
> > > > > let me know.
> > > > > 
> > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> > > > > Anything received after that time might be too late.
> > > > >
> > > > 
> > > > Build results:
> > > > total: 148 pass: 133 fail: 15
> > > > Failed builds: 
> > > > mips:defconfig 
> > > > mips:allnoconfig 
> > > > mips:defconfig 
> > > > mips:allmodconfig 
> > > > mips:allnoconfig 
> > > > mips:bcm47xx_defconfig 
> > > > mips:bcm63xx_defconfig 
> > > > mips:nlm_xlp_defconfig 
> > > > mips:ath79_defconfig 
> > > > mips:ar7_defconfig 
> > > > mips:e55_defconfig 
> > > > mips:cavium_octeon_defconfig 
> > > > mips:malta_defconfig 
> > > > mips:rt305x_defconfig 
> > > > mips:defconfig 
> > > > Qemu test results:
> > > > total: 166 pass: 154 fail: 12
> > > > Failed tests: 
> > > > mips:malta_defconfig:nosmp 
> > > > mips:malta_defconfig:smp 
> > > > mips64:malta_defconfig:nosmp 
> > > > mips64:malta_defconfig:smp 
> > > > mipsel:24Kf:malta_defconfig:nosmp:initrd 
> > > > mipsel:24Kf:malta_defconfig:smp:initrd 
> > > > mipsel:24Kf:malta_defconfig:smp:rootfs 
> > > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
> > > > mipsel64:malta_defconfig:nosmp:rootfs 
> > > > mipsel64:malta_defconfig:smp:initrd 
> > > > mipsel64:malta_defconfig:smp:rootfs 
> > > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs
> > > > 
> > > > Error is always the same.
> > > > 
> > > > arch/mips/kernel/process.c:637:8:
> > > > error: 'call_single_data_t' undeclared here (not in a function)
> > > > arch/mips/kernel/process.c: In function 'raise_backtrace':
> > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
> > > > error: 'csd' undeclared (first use in this function)
> > > 
> > > mips should now be fixed with the updated tree I have pushed out.
> > > 
> > 
> > The above is with v4.9.112-33-g7fb1f5e, which is the latest version
> > available from the git repository (as of right now). My builders
> > pulled it at 4:07am this morning, and there was no subsequent update.
> 
> Odd.  Ok, I've bumped the -rc number to -rc2 now, and pushed it all out
> again.  Let me know if you don't see it show up within an hour or so.
> 
Version is now v4.9.112-33-gb44db2b, so something changed, but I still get
the same build error.

Comparing the old and the new version, the only difference is the updated rc.

diff --git a/Makefile b/Makefile
index 57f315d00a94..986470ef6f6e 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 4
 PATCHLEVEL = 9
 SUBLEVEL = 113
-EXTRAVERSION = -rc1
+EXTRAVERSION = -rc2
 NAME = Roaring Lionus

The error itself is that single_data_t was replaced in the backport with
call_single_data. It should have been "struct call_single_data".

Guenter

Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Eric W. Biederman

Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  
> wrote:
>>
>> In practice since glibc does not make thread id's available I don't
>> expect anyone relies on this behavior.  Since no one relies on it we
>> can change it without creating a regression.
>
> Maybe.
>
> However, possibly not.
>
> The thing is, glibc wasn't the original or only use of our threads. In
> fact, there are people out there that use clone() directly, without
> using it for posix threading. And Oleg was right to notice this,
> because the traditional model was literally to just use "kill()" on
> the pid returned from clone().

I completely agree that Oleg was right to notice this, and I was
definitely not right to overlook.  In my description and otherwise.

I also think the semantic change needs to happen in it's own separate
patch so things can be tracked down.

I really don't think anyone uses this but it is not smart to hold the
rest of the changes hostage to my belief.  So I am thinking about how
to rework this.

> So the semantics of Linux kill() really is to kill the thread, not the
> group leader. glibc's implementation of pthreads is not the only model
> out there.

There are two questions.
a) Can we use the pid of a thread to find the thread group?
b) Will the signal be queued in the thread group?

> Now, it is possible that at none of the legacy uses use CLONE_THREAD
> and thus aren't affected (because tgid will always be pid). So maybe
> nobody notices.

That is what I expect.  I don't know think legacy is a good description.
Calling other uses of CLONE_THREAD non-glibc seems better.  The old
LinuxThreads did not use CLONE_THREAD because it did not exist.
>
> But we really have three different 'kill' system calls:
>
>  - the original 'kill' system call (#37 on x86-32).
>
>This looks up the thread ID, but signals the *group*.
>
>  - tkill (#238)
>
>This looks up the thread, and signals the specific thread.
>
>  - tgkill (#270)
>
>This looks up the tgid, and signals the group.

No.  tgkill is a less racy version of tkill and verifies that the
thread it signals is in the proper thread group.

> Modern glibc will not even use the original 'kill()' at all, I think.
> But it's the legacy behavior.

No.  Modern glibc definitely still uses kill.  As kill is the only one
exporting the posix kill API.  

> So I do think Oleg is right. We should be careful. You'll not notice
> breakage on a modern distro, but you might easily break old code.

Yes.  We definitely need to be careful.   At the same time since this
isn't something the old LinuxThreads had to cope with we can probably
clean it up.  But as that is not my focus it should probably be pushed out.

Eric

[PATCH memory-model 02/14] tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name

2018-07-16 Thread Paul E. McKenney

The names on the first line of the litmus tests are arbitrary,
but the convention is that they be the filename without the trailing
".litmus".  This commit therefore removes the stray trailing ".litmus"
from ISA2+pooncelock+pooncelock+pombonce.litmus's name.

Reported-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
Acked-by: Alan Stern 
---
 .../litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus 
b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
index 7a39a0aaa976..0f749e419b34 100644
--- a/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
+++ b/tools/memory-model/litmus-tests/ISA2+pooncelock+pooncelock+pombonce.litmus
@@ -1,4 +1,4 @@
-C ISA2+pooncelock+pooncelock+pombonce.litmus
+C ISA2+pooncelock+pooncelock+pombonce
 
 (*
  * Result: Sometimes
-- 
2.17.1

Re: [PATCH v2 1/2] documentation: pinctrl: Add compatibles for Amlogic Meson G12A pin controllers

2018-07-16 Thread Rob Herring

On Sat, Jul 14, 2018 at 11:27:53PM +, Yixun Lan wrote:
> Add new compatible name for Amlogic's Meson-G12A pin controllers,
> add a dt-binding header file which document the detail pin names.
> 
> Acked-by: Martin Blumenstingl 
> Signed-off-by: Xingyu Chen 
> Signed-off-by: Yixun Lan 
> ---
>  .../bindings/pinctrl/meson,pinctrl.txt|   2 +
>  include/dt-bindings/gpio/meson-g12a-gpio.h| 114 ++
>  2 files changed, 116 insertions(+)
>  create mode 100644 include/dt-bindings/gpio/meson-g12a-gpio.h

Reviewed-by: Rob Herring

Re: [PATCH] checkpatch: Require commit text and warn on long commit text lines

2018-07-16 Thread pheragu


On 2018-07-13 17:08, Joe Perches wrote:

On Fri, 2018-07-13 at 16:28 -0700, pher...@codeaurora.org wrote:

On 2018-07-13 14:46, Joe Perches wrote:
> On Fri, 2018-07-13 at 14:40 -0700, Prakruthi Deepak Heragu wrote:
> > Commit text is almost always necessary to explain why a change is
> > needed.
>
> This bit seems sensible, but perhaps it should just count the
> number of lines after the end of email headers and before any
> Signed-off-by:/Signature line
>

While committing the changes, one can just write the subject and not
write
the commit text at all. So, if we just count the lines between email
headers
and signed-off, we still do count lines which form the subject, but 
the
commit text is still absent. Also, subject can be longer than one 
line.

So,
just counting lines doesn't really guarantee the presence of commit
text.


Not true.
Look at $in_header_lines and $in_commit_log.


> > Also, warn on commit text lines longer than 75 characters. The commit
> > text
> > are indented and may wrap on a terminal if they are longer than 75
> > characters.
>
> This is already exists via
>
> # Check for line lengths > 75 in commit log, warn once
>if ($in_commit_log && !$commit_log_long_line &&
>length($line) > 75 &&
>

True, but this patch points out every line of the commit text that is
exceeding the limit.


Which is bad because things like dump_stack() are added in
commit logs and those are already allowed to be > 75 chars.

Anyway, something like this probably works.  Please test.
---
 scripts/checkpatch.pl | 13 +
 1 file changed, 13 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index b5c875d7132b..8b5f3dae31c9 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2240,6 +2240,7 @@ sub process {
my $in_header_lines = $file ? 0 : 1;
my $in_commit_log = 0;  #Scanning lines before patch
my $has_commit_log = 0; #Encountered lines before patch
+   my $commit_log_lines = 0;   #Number of commit log lines
my $commit_log_possible_stack_dump = 0;
my $commit_log_long_line = 0;
my $commit_log_has_diff = 0;
@@ -2497,6 +2498,18 @@ sub process {

$cnt_lines++ if ($realcnt != 0);

+# Verify the existence of a commit log if appropriate
+# 2 is used because a $signature is counted in $commit_log_lines
+   if ($in_commit_log) {
+   if ($line !~ /^\s*$/) {
+   $commit_log_lines++;#could be a $signature
+   }
+   } else if ($has_commit_log && $commit_log_lines < 2) {
+   WARN("COMMIT_MESSAGE",
+"Missing commit description - Add an appropriate 
one\n");
+   $commit_log_lines = 2;  #warn only once
+   }
+
 # Check if the commit log has what seems like a diff which can confuse 
patch

if ($in_commit_log && !$commit_log_has_diff &&
(($line =~ m@^\s+diff\b.*a/[\w/]+@ &&
I checked all the cases that I mentioned before. The change you 
suggested works

for every case. Would you take care of merging this fix?

Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds

On Mon, Jul 16, 2018 at 11:02 AM Eric W. Biederman
 wrote:
>
> There are two questions.
> a) Can we use the pid of a thread to find the thread group?

Yes. Just find the thread, and then use p->tgid.

However, that's not what the code used to do. It used to just find the
thread, and then do "do_send_sig_info()" on it.

And it's actually *slightly* different than "find the thread group
based on the thread". At least the permission checks are different.
The permission checks are done on the thread.

> b) Will the signal be queued in the thread group?

Yes.

pending = group ? >signal->shared_pending : >pending;

and "group" is true.

> > Now, it is possible that at none of the legacy uses use CLONE_THREAD
> > and thus aren't affected (because tgid will always be pid). So maybe
> > nobody notices.
>
> That is what I expect.  I don't know think legacy is a good description.
> Calling other uses of CLONE_THREAD non-glibc seems better.  The old
> LinuxThreads did not use CLONE_THREAD because it did not exist.

Again, don't get hung up about different libc implementations.

People have literally used clone() directly. And some of them use CLONE_THREAD.

Just google it. I guarantee you'll find examples of it, because I
found examples.

So stop the whole "libc" argument. That's not the point, and as long
as you make that argument, your argument is simply not valid.

People use clone() directly. Really. Really really.

Linus

[RFC V2] kvm: Adding skelaton for Memory ROE

2018-07-16 Thread Ahmed Abd El Mawgood

This is an attempt to implement Memory ROE discussed by me earlier as a
way to prevent Rootkits. I have already explained in details in this
thread:
https://www.mail-archive.com/kernelnewbies@kernelnewbies.org/msg18826.html
So I think there is no need for saying the exact same thing again.
The problem is that the code isn't working and I can't figure out why

I tried implementing the protection to follow similar behavior to that
of KVM_MEM_READONLY but to be on page (SPTE) level
The current problem I am facing is that when handling the hypercall
vcpu->mode turns to be OUTSIDE_GUEST_MODE but KVM_REQ_TLB_FLUSH doesn't
seem to be handled correctly. KVM documentation promised that when VCPU is
not in GUEST_MODE VCPU are handled asap and kvm_vcpu_kick(vcpu); will
even force that, but it doesn't seem to be the case for me. This is the
kind of logging I am getting:

[9073.753306] I came here kvm_mmu_slot_apply_flags
[9073.753311] kvm_mmu_slot_apply_write_access: Flush = false
[9073.992536] gfn_is_readonly: test_bit = 0
[9073.992543] gfn_is_readonly: test_bit = 0
[9073.992545] gfn_is_readonly: test_bit = 0
[9073.992703] Hypercall received, page address 0x0
[9073.992705] gfn_is_readonly: test_bit = 0
[9073.992708] kvm_mroe: flush state = Done
[9073.992709] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[9073.992711] Setting page number 0 in slot number 0
[9073.992715] slot_rmap_apply_protection: The 0th page is readonly, Flush = True
[9073.992717] kvm_mmu_slot_apply_write_access: Flush = true
[9073.992719] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[9073.992720] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[9073.992721] kvm_mroe: flush state = Waiting

For some reason kvm_vcpu_kick() didn't force the KVM_REQ_TLB_FLUSH to
kick into the virtual cpu (I am talking about the last 2 lines).

I am aware that there is still alot missing (like dealing with malicious
guest remappings) and the code quality sucks, but any ideas about what I
could be doing wrong (or ideas in general) would be apprciated. I am
already planning to do everything cleanly once it works.

Thansk.

Edits for V2:
Unfortunately I did few mistakes that lead do kernel not compiling on
valnilla .config file because of silly mistakes not doing #ifdef..#endif
in some places so this lead to using symbols only available when CONFIG_KVM_MROE
is set in .conf even if it wasn't anyway it is fixed. and I should not
that CONFIG_KVM_MROE should be used when testing my code and trying to
figure out what went wrong
Signed-off-by: Ahmed Abd El Mawgood 
---
 arch/x86/include/asm/kvm_host.h |   7 +-
 arch/x86/kvm/Kconfig|   7 ++
 arch/x86/kvm/mmu.c  | 158 ++--
 arch/x86/kvm/x86.c  |  83 -
 include/linux/kvm_host.h|  17 +
 include/uapi/linux/kvm_para.h   |   4 +-
 virt/kvm/kvm_main.c |  36 +++--
 7 files changed, 257 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c13cd28d9d1b..c66e9245f750 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -235,7 +235,10 @@ struct kvm_mmu_memory_cache {
int nobjs;
void *objects[KVM_NR_MEM_OBJS];
 };
-
+struct kvm_write_access_data {
+   int i;
+   struct kvm_memory_slot *memslot;
+};
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
@@ -1130,7 +1133,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 
accessed_mask,
u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+void kvm_mmu_slot_apply_write_access(struct kvm *kvm,
  struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
   const struct kvm_memory_slot *memslot);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 92fd433c50b9..8ae822a8dc7a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -96,6 +96,13 @@ config KVM_MMU_AUDIT
 This option adds a R/W kVM module parameter 'mmu_audit', which allows
 auditing of KVM MMU events at runtime.
 
+config KVM_MROE
+   bool "Hypercall Memory Read-Only Enforcement"
+   depends on KVM && X86
+   help
+   This option add KVM_HC_HMROE hypercall to kvm which as hardening
+   mechanism to protect memory pages from being edited.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/vhost/Kconfig
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690d8b95..e06e923f90aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -70,7 +70,7 @@ enum {
 #undef MMU_DEBUG
 
 #ifdef MMU_DEBUG
-static bool dbg = 0;
+static bool dbg = 1;
 module_param(dbg, bool, 0644);
 
 #define

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread John Stultz

On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha  wrote:
> On 7/13/2018 10:50 PM, John Stultz wrote:
>> On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha 
>>> On 7/11/2018 1:43 AM, John Stultz wrote:
 I worry this upside-down logic is too subtle to be easily reasoned
 about, and will just lead to future mistakes.

 Can we instead call this "suspend_timing_needed" and only set it to
 true when we don't inject any sleep time on resume?
>>>
>>>
>>> I did not get your point "only set it to true when we don't inject any
>>> sleep
>>> time on resume? "
>>> How do we know  this ?
>>> This question itself depends on the "sleeptime_injected" if it is true
>>> means
>>> no need to inject else need to inject.
>>>
>>> Also, we need to make this variable back and forth true, false; suspends
>>> path ensures it to make it false.
>>
>> So yea, I'm not saying logically the code is really any different,
>> this is more of a naming nit. So instead of having a variable that is
>> always on that we occasionally turn off, lets invert the naming and
>> have it be a flag that we occasionally turn on.
>
>
> I understand your concern about the name of the variable will be misleading.
> But the changing Boolean state would not solve the actual issue.
>
> If i understand you correctly you meant below code
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 32ae9ae..becc5bd 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts)
>   * If a suspend fails before reaching timekeeping_resume() then the flag
>   * stays true and prevents erroneous sleeptime injection.
>   */
> -static bool sleeptime_injected = true;
> +static bool suspend_timing_needed;
>
>  /* Flag for if there is a persistent clock on this platform */
>  static bool persistent_clock_exists;
> @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64
> *delta)
> raw_spin_lock_irqsave(_lock, flags);
> write_seqcount_begin(_core.seq);
>
> -   sleeptime_injected = true;
> +   suspend_timing_needed = false;
>
> timekeeping_forward_now(tk);
>
> @@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
>   tk->tkr_mono.mask);
> nsec = mul_u64_u32_shr(cyc_delta, clock->mult,
> clock->shift);
> ts_delta = ns_to_timespec64(nsec);
> -   sleeptime_injected = true;
> +   suspend_timing_needed = true;
> } else if (timespec64_compare(_new, _suspend_time) >
> 0) {
> ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
> -   sleeptime_injected = true;
> +   suspend_timing_needed = true;
> }

No no... This part is wrong. We only set suspend_timing_needed if we
*didn't* calculate the suspend time in timekeeping_resume.

You have to invert all the boolean logic for it to be equivalent.

> if (sleeptime_injected)
> @@ -1756,7 +1756,7 @@ int timekeeping_suspend(void)
> if (timekeeping_suspend_time.tv_sec ||
> timekeeping_suspend_time.tv_nsec)
> persistent_clock_exists = true;
>
> -   sleeptime_injected = false;
> +   suspend_timing_needed = false;
>
> raw_spin_lock_irqsave(_lock, flags);
>
>
> This has a problem..
>
>
>>
>> Just the name sleeptime_injected is read a statement, which if we say
>> is defaults to true, becomes confusing to think about when the
>> timekeeping_suspend/resume code hasn't yet run (which is the case
>> where your error cropped up) - and no sleeptime has actually been
>> injected.
>
>
> Yes, when very first suspend fails and timekeeping_suspend/resume did not
> run ; That is the exact issue.
> So, exact solution is no need to inject any sleeptime here.
>
>  If we set the default value to false then we will see timekeeping_resume
> will inject sleeptime by below code which was not intended.
>
> static int rtc_resume(struct device *dev)
> {
> struct rtc_device   *rtc = to_rtc_device(dev);
> struct rtc_time tm;
> struct timespec64   new_system, new_rtc;
> struct timespec64   sleep_time;
> int err;
>
> if (timekeeping_rtc_skipresume())  // it will return the value false
> as sleep failed and timekeeping_resume() did not get called.
> return 0;
>
>   


So, I think with the logic bug above it will work out properly, but
let me know if I'm still missing something.

thanks
-john

Re: 4.18-rc* regression: x86-32 troubles (with timers?)

2018-07-16 Thread Meelis Roos

> Everything below here is is 'bad', which can be an indication that you
> misclassified one of
> the commits above as 'good' when it should have been 'bad'. The most likely
> explanations are that you either typed the 'git bisect good' by accident, or
> that the failure is not 100% reliable, and it sometimes works fine even on a
> broken kernel.
> 
> 0bc5fe857274133ca0 follows directly after 3a443bd6dd7c, "net/9p: correct the
> variable name in v9fs_get_trans_by_name() comment", which is marked "good",
> and can't really be good if 0bc5fe85727413 is bad and you are not using the
> 'qed' driver.
> 
> I'd retest 3a443bd6dd7c again to see if that should have been 'bad', and
> if it was, test v4.17-rc4, which is what the net-next tree was based on.

Yes, the same prebuilt 3a443bd6dd7c appeared to be bad when retesting 
it. Building v4.17-rc4 now.

-- 
Meelis Roos (mr...@linux.ee)

Re: [PATCH] mm: don't do zero_resv_unavail if memmap is not allocated

2018-07-16 Thread Pavel Tatashin




On 07/16/2018 12:09 PM, Michal Hocko wrote:
> On Mon 16-07-18 11:16:30, Pavel Tatashin wrote:
>> Moving zero_resv_unavail before memmap_init_zone(), caused a regression on
>> x86-32.
>>
>> The cause is that we access struct pages before they are allocated when
>> CONFIG_FLAT_NODE_MEM_MAP is used.
>>
>> free_area_init_nodes()
>>   zero_resv_unavail()
>> mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced
>>   free_area_init_node()
>> if CONFIG_FLAT_NODE_MEM_MAP
>>   alloc_node_mem_map()
>> memblock_virt_alloc_node_nopanic() <- struct page alloced here
>>
>> On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory
>> that it returns, so we do not need to do zero_resv_unavail() here.
> 
> This all is subtle as hell and almost impossible to build a sane code on
> top. Your patch sounds good as a stop gap fix but we really need
> something resembling an actual design rather than ad-hoc hacks piled on
> top of each other.z

I totally agree, I started working on figuring out how to simply and improve 
memmap_init_zone(). But part of the mess is that we simply have too many memmap 
layouts. We must start removing some of them. But, that also requires an 
analysis of how to unify them.

> 
>> Fixes: e181ae0c5db9 ("mm: zero unavailable pages before memmap init")
>> Signed-off-by: Pavel Tatashin 
> 
> Acked-by: Michal Hocko 
> 

Thank you.

Pavel

Re: [RFC][PATCH 07/11] signal: Deliver group signals via PIDTYPE_TGID not PIDTYPE_PID

2018-07-16 Thread Linus Torvalds

On Mon, Jul 16, 2018 at 7:50 AM Eric W. Biederman  wrote:
>
> In practice since glibc does not make thread id's available I don't
> expect anyone relies on this behavior.  Since no one relies on it we
> can change it without creating a regression.

Maybe.

However, possibly not.

The thing is, glibc wasn't the original or only use of our threads. In
fact, there are people out there that use clone() directly, without
using it for posix threading. And Oleg was right to notice this,
because the traditional model was literally to just use "kill()" on
the pid returned from clone().

So the semantics of Linux kill() really is to kill the thread, not the
group leader. glibc's implementation of pthreads is not the only model
out there.

Now, it is possible that at none of the legacy uses use CLONE_THREAD
and thus aren't affected (because tgid will always be pid). So maybe
nobody notices.

But we really have three different 'kill' system calls:

 - the original 'kill' system call (#37 on x86-32).

   This looks up the thread ID, but signals the *group*.

 - tkill (#238)

   This looks up the thread, and signals the specific thread.

 - tgkill (#270)

   This looks up the tgid, and signals the group.

Modern glibc will not even use the original 'kill()' at all, I think.
But it's the legacy behavior.

So I do think Oleg is right. We should be careful. You'll not notice
breakage on a modern distro, but you might easily break old code.

 Linus

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Greg Kroah-Hartman

On Mon, Jul 16, 2018 at 08:31:20PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jul 16, 2018 at 11:02:19AM -0700, Guenter Roeck wrote:
> > On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote:
> > > > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote:
> > > > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote:
> > > > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> > > > > > > This is the start of the stable review cycle for the 4.9.113 
> > > > > > > release.
> > > > > > > There are 32 patches in this series, all will be posted as a 
> > > > > > > response
> > > > > > > to this one.  If anyone has any issues with these being applied, 
> > > > > > > please
> > > > > > > let me know.
> > > > > > > 
> > > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> > > > > > > Anything received after that time might be too late.
> > > > > > >
> > > > > > 
> > > > > > Build results:
> > > > > > total: 148 pass: 133 fail: 15
> > > > > > Failed builds: 
> > > > > > mips:defconfig 
> > > > > > mips:allnoconfig 
> > > > > > mips:defconfig 
> > > > > > mips:allmodconfig 
> > > > > > mips:allnoconfig 
> > > > > > mips:bcm47xx_defconfig 
> > > > > > mips:bcm63xx_defconfig 
> > > > > > mips:nlm_xlp_defconfig 
> > > > > > mips:ath79_defconfig 
> > > > > > mips:ar7_defconfig 
> > > > > > mips:e55_defconfig 
> > > > > > mips:cavium_octeon_defconfig 
> > > > > > mips:malta_defconfig 
> > > > > > mips:rt305x_defconfig 
> > > > > > mips:defconfig 
> > > > > > Qemu test results:
> > > > > > total: 166 pass: 154 fail: 12
> > > > > > Failed tests: 
> > > > > > mips:malta_defconfig:nosmp 
> > > > > > mips:malta_defconfig:smp 
> > > > > > mips64:malta_defconfig:nosmp 
> > > > > > mips64:malta_defconfig:smp 
> > > > > > mipsel:24Kf:malta_defconfig:nosmp:initrd 
> > > > > > mipsel:24Kf:malta_defconfig:smp:initrd 
> > > > > > mipsel:24Kf:malta_defconfig:smp:rootfs 
> > > > > > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
> > > > > > mipsel64:malta_defconfig:nosmp:rootfs 
> > > > > > mipsel64:malta_defconfig:smp:initrd 
> > > > > > mipsel64:malta_defconfig:smp:rootfs 
> > > > > > mipsel64:fuloong2e_defconfig:fulong2e:rootfs
> > > > > > 
> > > > > > Error is always the same.
> > > > > > 
> > > > > > arch/mips/kernel/process.c:637:8:
> > > > > > error: 'call_single_data_t' undeclared here (not in a function)
> > > > > > arch/mips/kernel/process.c: In function 'raise_backtrace':
> > > > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
> > > > > > error: 'csd' undeclared (first use in this function)
> > > > > 
> > > > > mips should now be fixed with the updated tree I have pushed out.
> > > > > 
> > > > 
> > > > The above is with v4.9.112-33-g7fb1f5e, which is the latest version
> > > > available from the git repository (as of right now). My builders
> > > > pulled it at 4:07am this morning, and there was no subsequent update.
> > > 
> > > Odd.  Ok, I've bumped the -rc number to -rc2 now, and pushed it all out
> > > again.  Let me know if you don't see it show up within an hour or so.
> > > 
> > Version is now v4.9.112-33-gb44db2b, so something changed, but I still get
> > the same build error.
> > 
> > Comparing the old and the new version, the only difference is the updated 
> > rc.
> > 
> > diff --git a/Makefile b/Makefile
> > index 57f315d00a94..986470ef6f6e 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1,7 +1,7 @@
> >  VERSION = 4
> >  PATCHLEVEL = 9
> >  SUBLEVEL = 113
> > -EXTRAVERSION = -rc1
> > +EXTRAVERSION = -rc2
> >  NAME = Roaring Lionus
> > 
> > The error itself is that single_data_t was replaced in the backport with
> > call_single_data. It should have been "struct call_single_data".
> 
> {sigh}  This is why I asked for a fixup patch.  Let me just go rip that
> thing out now...

Ok, -rc3 is out now to hopefully resolve this.  Sorry for the noise.

greg k-h

Re: [PATCH v2 3/3] dt-bindings: mfd: max8998: Add charger subnode binding

2018-07-16 Thread Paweł Chmiel

On Monday, July 16, 2018 12:00:03 PM CEST Rob Herring wrote:
> On Sat, Jul 14, 2018 at 02:26:53PM +0200, Paweł Chmiel wrote:
> > This patch adds devicetree bindings documentation for
> > battery charging controller as the subnode of MAX8998 PMIC.
> > 
> > Signed-off-by: Paweł Chmiel 
> > ---
> > Changes from v1:
> >   - Removed unneeded Fixes tag
> >   - Correct description of all charger values
> >   - Added missing property unit
> > ---
> >  Documentation/devicetree/bindings/mfd/max8998.txt | 24 
> > +++
> >  1 file changed, 24 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt 
> > b/Documentation/devicetree/bindings/mfd/max8998.txt
> > index 23a3650ff2a2..196e50097a36 100644
> > --- a/Documentation/devicetree/bindings/mfd/max8998.txt
> > +++ b/Documentation/devicetree/bindings/mfd/max8998.txt
> > @@ -50,6 +50,23 @@ Additional properties required if 
> > max8998,pmic-buck2-dvs-gpio is defined:
> >  - max8998,pmic-buck2-dvs-voltage: An array of 2 voltage values in 
> > microvolts
> >for buck2 regulator that can be selected using dvs gpio.
> >  
> > +Charger: Configuration for battery charging controller should be added
> > +inside a child node named 'charger'.
> > +  Required properties:
> > +  - max8998,charge-eoc-percent: Setup End of Charge Level.
> > +If value equals 0, leave it unchanged.
> > +Otherwise it should be value from 10 to 45 by 5 step.
> > +
> > +  - max8998,charge-restart-level-microvolt: Setup Charge Restart Level.
> > +If value equals 0, leave it unchanged.
> > +If value equals -1, it will be disabled.
> 
> -1 is a bit strange. Why not 'not present' is disabled?
I wanted to make it work the same way, as it was in case of using it from 
platform_data. 
But this is good idea, to change it a bit.
So we would have:
- 0 -> leave unchanged
- property not present in devicetree -> disabled (and set to -1 - driver is 
expecting it)
- value from 10-45 or (5,6,7 in case of second property)
I'll update both documentation and driver parsing code (for both properties - 
restart level and full timeout) and send v3 version.

Thanks for suggestion.
> 
> > +Otherwise it should be one of following values: 100, 150, 200.
> > +
> > +  - max8998,charge-timeout-hours: Setup Charge Full Timeout.
> > +If value equals 0, leave it unchanged.
> > +If value equals -1, it will be disabled.
> > +Otherwise it should be one of following values: 5, 6, 7.
> > +
> >  Regulators: All the regulators of MAX8998 to be instantiated shall be
> >  listed in a child node named 'regulators'. Each regulator is represented
> >  by a child node of the 'regulators' node.
> > @@ -99,6 +116,13 @@ Example:
> > max8998,pmic-buck2-dvs-gpio = < 0 3 0 0>; /* SET3 */
> > max8998,pmic-buck2-dvs-voltage = <135>, <130>;
> >  
> > +   /* Charger configuration */
> > +   charger {
> > +   max8998,charge-eoc-percent = <0>;
> > +   max8998,charge-restart-level-microvolt = <(-1)>;
> > +   max8998,charge-timeout-hours = <7>;
> > +   };
> > +
> > /* Regulators to instantiate */
> > regulators {
> > ldo2_reg: LDO2 {
>

Re: cpu_no_speculation omissions?

2018-07-16 Thread Thomas Gleixner

On Mon, 16 Jul 2018, Alan Cox wrote:
> On Mon, 2018-07-16 at 10:28 -0700, Dave Hansen wrote:
> > On 07/16/2018 09:56 AM, Thomas Gleixner wrote:
> > > On Mon, 16 Jul 2018, Rich Felker wrote:
> > > > At least the Centerton (late-generation Bonnell uarch) Atom
> > > > family is
> > > > omitted from the cpu_no_speculation table added by commit
> > > > fec9434a12f3
> > > > to arch/x86/kernel/cpu/common.c. Is this intentional? Would a
> > > > patch
> > > > adding it and possibly other omissions be welcome?
> > > 
> > > Probably. Dave?
> > 
> > IIRC, Alan Cox was compiling a list on what is affected vs. not.  He
> > would know way better than I.
> 
> The pre Silvermont atom cores are in order. When I did the original
> list I didn't bother with all the 32bit cores as we didn't have any
> 32bit mitigations then.

At least we should give the users that warm and fuzzy feeling that they are
not affected.

Thanks,

tglx

Re: [PATCH] mmc: tegra: Add and use tegra_sdhci_get_max_clock()

2018-07-16 Thread Jon Hunter



On 13/07/18 16:39, Aapo Vienamo wrote:

...

>>> that it returns the current clock rate of the host instead of the
>>> maximum one, which can lead to unnecessarily small clock rates.
>>>
>>> This differs from the previous implementation of
>>> tegra_sdhci_get_max_clock() in that it doesn't divide the result by two.  
>>
>> Why?
> 
> As far as I can tell the original tegra_sdhci_get_max_clock() was
> implemented this way in order to force sdhci_calc_clk() to always set
> the SDHCI clock divider to two on sdhci_set_clock(). The requirement to
> configure the SDHCI divider to two is specific to DDR50/52 modes on
> Tegra.
> 
> The .get_max_clock() callback retuning half of the actual maximum will
> result in HS200 and HS400 modes not being able to run at full speed.
> Another mechanism to enforce the divider requirement has to be figured
> out in order to enable DDR50/52 modes on Tegra SoCs.

OK, thanks. Ah I see Stefan's patch for doubling the clock. OK so that's fine.

Cheers
Jon

-- 
nvpublic

cpu_no_speculation omissions?

2018-07-16 Thread Rich Felker

At least the Centerton (late-generation Bonnell uarch) Atom family is
omitted from the cpu_no_speculation table added by commit fec9434a12f3
to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch
adding it and possibly other omissions be welcome?

Rich

Re: [PATCH 4.14 00/54] 4.14.56-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 09:34:57AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.56 release.
> There are 54 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jul 18 07:34:24 UTC 2018.
> Anything received after that time might be too late.
> 

Build results:
total: 148 pass: 148 fail: 0
Qemu test results:
total: 173 pass: 173 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

Re: [PATCH] [RESEND] perf/x86/intel/uncore: Fix the index of PCU.3 Broadwell CPUs

2018-07-16 Thread Liang, Kan





On 7/16/2018 11:07 AM, Masayoshi Mizuma wrote:



On 07/16/2018 10:29 AM, Liang, Kan wrote:



On 7/15/2018 6:34 PM, Ingo Molnar wrote:


* Masayoshi Mizuma  wrote:


From: Masayoshi Mizuma 

commit 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for
Broadwell CPUs") introduced PCU.3 for Broadwell CPU. Unfortunately,
the driver_data of PCU.3 conflicts to QPI Port 2 filter.

  { /* QPI Port 2 filter  */
  PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46),
  .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2),

  { /* PCU.3 (for Capability registers) */
  PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fc0),
  .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV,
     HSWEP_PCI_PCU_3),
   // HSWEP_PCI_PCU_3 == 2



--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1030,6 +1030,7 @@ enum {
   SNBEP_PCI_QPI_PORT0_FILTER,
   SNBEP_PCI_QPI_PORT1_FILTER,
   HSWEP_PCI_PCU_3,
+    BDX_PCI_PCU_3,
   };


So we use a magic '2' enumerator in the 'QPI Port 2 filter', and that overlaps
with HSWEP_PCI_PCU_3, right?

Shouldn't we clean up all the enumerators and not use magic numbers, and this 
fix
the conflict?



Yes, it should fix the conflict. I will clean up the code.


Thanks a lot!
I would appreciate if you could add CC to me when you post the patch.



Here is the patch.

Masa, could you please give it a try?

Thanks,
Kan

From 688378a4003ec33156958a52dc822105c18075af Mon Sep 17 00:00:00 2001
From: Kan Liang 
Date: Mon, 16 Jul 2018 04:57:51 -0400
Subject: [PATCH] perf/x86/intel/uncore: Fix hardcode index of Broadwell 
extra PCI DEV


Masa reports that a warning message is shown while CPU hot-removing on
Broadwell server.

  WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988
  uncore_pci_remove+0x10b/0x150
  Call Trace:
   pci_device_remove+0x42/0xd0
   device_release_driver_internal+0x148/0x220
   pci_stop_bus_device+0x76/0xa0
   pci_stop_root_bus+0x44/0x60
   acpi_pci_root_remove+0x1f/0x80
   acpi_bus_trim+0x57/0x90
   acpi_bus_trim+0x2e/0x90
   acpi_device_hotplug+0x2bc/0x4b0
   acpi_hotplug_work_fn+0x1a/0x30
   process_one_work+0x174/0x3a0
   worker_thread+0x4c/0x3d0
   kthread+0xf8/0x130

This bug was introduced in:

  commit 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for
Broadwell CPUs")

The index of "QPI Port 2 filter" was hardcode to 2. The index of
"PCU.3" used enumerator "HSWEP_PCI_PCU_3", which equals to 2 as well.

To fix the conflict, the hardcode index needs to be cleaned up.
Introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2
filter" on Broadwell, and increase the UNCORE_EXTRA_PCI_DEV_MAX.
Clean up hardcode index.

Reported-by: Masayoshi Mizuma 
Suggested-by: Ingo Molnar 
Signed-off-by: Kan Liang 
Fixes: 15a3e845b01c ("perf/x86/intel/uncore: Fix SBOX support for
Broadwell CPUs")
---
 arch/x86/events/intel/uncore.h   |  2 +-
 arch/x86/events/intel/uncore_snbep.c | 10 +++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index c9e1e0b..e17ab88 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -28,7 +28,7 @@
 #define UNCORE_PCI_DEV_TYPE(data)  ((data >> 8) & 0xff)
 #define UNCORE_PCI_DEV_IDX(data)   (data & 0xff)
 #define UNCORE_EXTRA_PCI_DEV   0xff
-#define UNCORE_EXTRA_PCI_DEV_MAX   3
+#define UNCORE_EXTRA_PCI_DEV_MAX   4

 #define UNCORE_EVENT_CONSTRAINT(c, n) EVENT_CONSTRAINT(c, n, 0xff)

diff --git a/arch/x86/events/intel/uncore_snbep.c 
b/arch/x86/events/intel/uncore_snbep.c

index 87dc026..51d7c11 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1029,6 +1029,7 @@ void snbep_uncore_cpu_init(void)
 enum {
SNBEP_PCI_QPI_PORT0_FILTER,
SNBEP_PCI_QPI_PORT1_FILTER,
+   BDX_PCI_QPI_PORT2_FILTER,
HSWEP_PCI_PCU_3,
 };

@@ -3286,15 +3287,18 @@ static const struct pci_device_id 
bdx_uncore_pci_ids[] = {

},
{ /* QPI Port 0 filter  */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f86),
-   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 0),
+   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV,
+  SNBEP_PCI_QPI_PORT0_FILTER),
},
{ /* QPI Port 1 filter  */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f96),
-   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 1),
+   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV,
+  SNBEP_PCI_QPI_PORT1_FILTER),
},
{ /* QPI Port 2 filter  */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f46),
-   .driver_data = UNCORE_PCI_DEV_DATA(UNCORE_EXTRA_PCI_DEV, 2),
+   .driver_data =

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Greg Kroah-Hartman

On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote:
> On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.9.113 release.
> > There are 32 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> > Anything received after that time might be too late.
> >
> 
> Build results:
>   total: 148 pass: 133 fail: 15
> Failed builds: 
>   mips:defconfig 
>   mips:allnoconfig 
>   mips:defconfig 
>   mips:allmodconfig 
>   mips:allnoconfig 
>   mips:bcm47xx_defconfig 
>   mips:bcm63xx_defconfig 
>   mips:nlm_xlp_defconfig 
>   mips:ath79_defconfig 
>   mips:ar7_defconfig 
>   mips:e55_defconfig 
>   mips:cavium_octeon_defconfig 
>   mips:malta_defconfig 
>   mips:rt305x_defconfig 
>   mips:defconfig 
> Qemu test results:
>   total: 166 pass: 154 fail: 12
> Failed tests: 
>   mips:malta_defconfig:nosmp 
>   mips:malta_defconfig:smp 
>   mips64:malta_defconfig:nosmp 
>   mips64:malta_defconfig:smp 
>   mipsel:24Kf:malta_defconfig:nosmp:initrd 
>   mipsel:24Kf:malta_defconfig:smp:initrd 
>   mipsel:24Kf:malta_defconfig:smp:rootfs 
>   mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
>   mipsel64:malta_defconfig:nosmp:rootfs 
>   mipsel64:malta_defconfig:smp:initrd 
>   mipsel64:malta_defconfig:smp:rootfs 
>   mipsel64:fuloong2e_defconfig:fulong2e:rootfs
> 
> Error is always the same.
> 
> arch/mips/kernel/process.c:637:8:
>   error: 'call_single_data_t' undeclared here (not in a function)
> arch/mips/kernel/process.c: In function 'raise_backtrace':
> /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
>   error: 'csd' undeclared (first use in this function)

mips should now be fixed with the updated tree I have pushed out.

thanks,

greg k-h

Re: [PATCH] arm64: cpuinfo: Include cleartext implementer and part strings

2018-07-16 Thread Mark Rutland

Hi Olof,

On Mon, Jul 16, 2018 at 07:34:05AM -0700, Olof Johansson wrote:
> On Mon, Jul 16, 2018 at 2:17 AM, Marc Zyngier  wrote:
> > On 15/07/18 04:53, Olof Johansson wrote:
> >> There's some use in printing out what the implementer and part numbers
> >> decode to for cases where they're known.
> >>
> >> I filled in the table based on public information; mostly from ARM TRMs
> >> and other tools (and some of the SSBD tables in the kernel, etc).
> >>
> >> Apple IDs came from
> >> https://github.com/apple/darwin-xnu/blob/master/osfmk/arm/cpuid.h

I appreciate that the format of /proc/cpuinfo isn't as nice as one might
like as a human reader.

For better or worse, the vast majority of readers of /proc/cpuinfo are
applications (be they scripts or binaries), and we must treat the format
and contents of /proc/cpuinfo as ABI. We've been burned by /proc/cpuinfo
format changes in the past, so we want to minimize the potential for ABI
issues.

This change is only useful to human readers of /proc/cpuinfo, mandates
perpetual maintenance work, and comes with a very strong risk of subtle
ABI breakage (both for this patch and future patches it implies will be
necessary).

So sorry, but I must NAK this patch, as with prior patches changing the
format of /proc/cpuinfo.

> > The same thing pops up every so often. And the answer is the same each time:
> 
> Ever thought about why it pops up?

In some cases because the person in question isn't sure what hardware
they're on. In most cases because people are used to x86, and haven't
considered what exactly they're trying to achieve.

We've repeatedly been asked for strings *by application developers*,
despite the MIDR being sufficient and already available. Generally,
developers are happy to use MIDRs once they realise this.

Having the string leads people to try to parse it, even though it cannot
be reliable.

> > - it breaks an existing userspace ABI by changing the format of cpuinfo
> 
> Most tools likely rely on a per-line format, and in this case they're
> likely interpreting the contents after the ':' as an integer. Adding
> something to the line might or might not break them, agreed.
> 
> How about introducing a "CPU model" line similar to x86's cleartext one?
> 
> > - it is unmaintainable in the long run
> 
> False. Chances are you already need details more fine-grained than
> this for errata and quirks, we already do for SSBD.

For SSBD, we don't look at the MIDR, and rely solely on the FW to
identify the presence of a workaround. Going forwards, we'll do likewise
as this is the only way to be future-proof and generic.

> And if it lacks an entry it's not the end of the world.

I don't think we can be that relaxed about this; it's quite easy to see
that people *will* write software that relies on the string (or
properties thereof) in ways that we cannot manage.

People will do silly things like try to read the string into a
fixed-size buffer. That works when the string is "Unknown", but falls
apart on the next kernel release when it's "awesome super CPU
2000xx-super-mega-long-name-edition". We don't discover this problem for
months, whereupon other applications are reliant upon the new string,
and now we're stuck with a horrible ABI bug that we can't fix.

> > - userspace already has this information by simply running "lscpu"
> >
> > What has changed?
> 
> What has changed is that ARM(64) is moving from a
> custom-kernel-custom-userspace world to one where you might be
> upreving kernels and userspace separately, and you might be using an
> older userspace with a newer machine and a newer kernel.

Sure, and this is exactly why we're trying to ensure that the behaviour
is *guaranteed* to be consistent across kernels -- so whichever kernel
is used, userspace should behave the same.

If userspace *needs* to know about a CPU, it can identify the CPU based
on the MIDR (and REVIDR), and this will work regardless of kernel
version. GCC does this today.

If userspace *wants* to know about a CPU (e.g. to expose a
human-friendly name), then a userspace database is fine. In the majority
of cases it's *vastly* easier to update a userspace application than it
is to update the kernel, and it's always possible to build the latest
binary yourself.

There are a number of cases where using strings is going to be
problematic in practice, e.g.

* Every out-of-tree SoC port is going to invent values that may not
  match the values mainline settles on. Then people will complain that
  their software only works on one or the other, because it makes some
  decision base on the model name.

* Every distro is going to differ in the set of IDs they backport, if
  any.  If your software relies on the MIDR, it will work everywhere. If
  it tries to parse the string, it will behave erratically across
  distros.

* We may get the matching logic for wrong, and be overly permissive when
  matching MIDR -> string. e.g. we ignore the low 4 bits for some range
  of MIDRs, but the vendor only uses

Re: cpu_no_speculation omissions?

2018-07-16 Thread Dave Hansen

On 07/16/2018 09:56 AM, Thomas Gleixner wrote:
> On Mon, 16 Jul 2018, Rich Felker wrote:
>> At least the Centerton (late-generation Bonnell uarch) Atom family is
>> omitted from the cpu_no_speculation table added by commit fec9434a12f3
>> to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch
>> adding it and possibly other omissions be welcome?
> 
> Probably. Dave?

IIRC, Alan Cox was compiling a list on what is affected vs. not.  He
would know way better than I.

Rich, did you have a specific set of family/model combinations that you
thought were missed?

[PATCH v6 4/5] mm/sparse: add new sparse_init_nid() and sparse_init()

2018-07-16 Thread Pavel Tatashin

sparse_init() requires to temporary allocate two large buffers: usemap_map
and map_map.  Baoquan He has identified that these buffers are so large
that Linux is not bootable on small memory machines, such as a kdump boot.
The buffers are especially large when CONFIG_X86_5LEVEL is set, as they
are scaled to the maximum physical memory size.

Baoquan provided a fix, which reduces these sizes of these buffers, but it
is much better to get rid of them entirely.

Add a new way to initialize sparse memory: sparse_init_nid(), which only
operates within one memory node, and thus allocates memory either in large
contiguous block or allocates section by section.  This eliminates the
need for use of temporary buffers.

For simplified bisecting and review temporarly call sparse_init()
new_sparse_init(), the new interface is going to be enabled as well as old
code removed in the next patch.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Oscar Salvador 
---
 mm/sparse.c | 85 +
 1 file changed, 85 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 20ca292d8f11..248d5d7bbf55 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -200,6 +200,11 @@ static inline int next_present_section_nr(int section_nr)
  (section_nr <= __highest_present_section_nr));\
 section_nr = next_present_section_nr(section_nr))
 
+static inline unsigned long first_present_section_nr(void)
+{
+   return next_present_section_nr(-1);
+}
+
 /*
  * Record how many memory sections are marked as present
  * during system bootup.
@@ -668,6 +673,86 @@ void __init sparse_init(void)
memblock_free_early(__pa(usemap_map), size);
 }
 
+/*
+ * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end)
+ * And number of present sections in this node is map_count.
+ */
+static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
+  unsigned long pnum_end,
+  unsigned long map_count)
+{
+   unsigned long pnum, usemap_longs, *usemap;
+   struct page *map;
+
+   usemap_longs = BITS_TO_LONGS(SECTION_BLOCKFLAGS_BITS);
+   usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nid),
+ usemap_size() *
+ map_count);
+   if (!usemap) {
+   pr_err("%s: node[%d] usemap allocation failed", __func__, nid);
+   goto failed;
+   }
+   sparse_buffer_init(map_count * section_map_size(), nid);
+   for_each_present_section_nr(pnum_begin, pnum) {
+   if (pnum >= pnum_end)
+   break;
+
+   map = sparse_mem_map_populate(pnum, nid, NULL);
+   if (!map) {
+   pr_err("%s: node[%d] memory map backing failed. Some 
memory will not be available.",
+  __func__, nid);
+   pnum_begin = pnum;
+   goto failed;
+   }
+   check_usemap_section_nr(nid, usemap);
+   sparse_init_one_section(__nr_to_section(pnum), pnum, map, 
usemap);
+   usemap += usemap_longs;
+   }
+   sparse_buffer_fini();
+   return;
+failed:
+   /* We failed to allocate, mark all the following pnums as not present */
+   for_each_present_section_nr(pnum_begin, pnum) {
+   struct mem_section *ms;
+
+   if (pnum >= pnum_end)
+   break;
+   ms = __nr_to_section(pnum);
+   ms->section_mem_map = 0;
+   }
+}
+
+/*
+ * Allocate the accumulated non-linear sections, allocate a mem_map
+ * for each and record the physical to section mapping.
+ */
+void __init new_sparse_init(void)
+{
+   unsigned long pnum_begin = first_present_section_nr();
+   int nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
+   unsigned long pnum_end, map_count = 1;
+
+   /* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
+   set_pageblock_order();
+
+   for_each_present_section_nr(pnum_begin + 1, pnum_end) {
+   int nid = sparse_early_nid(__nr_to_section(pnum_end));
+
+   if (nid == nid_begin) {
+   map_count++;
+   continue;
+   }
+   /* Init node with sections in range [pnum_begin, pnum_end) */
+   sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+   nid_begin = nid;
+   pnum_begin = pnum_end;
+   map_count = 1;
+   }
+   /* cover the last node */
+   sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
+   vmemmap_populate_print_last();
+}
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 
 /* Mark all memory sections within the pfn range as online */
-- 
2.18.0

[PATCH v6 5/5] mm/sparse: delete old sparse_init and enable new one

2018-07-16 Thread Pavel Tatashin

Rename new_sparse_init() to sparse_init() which enables it.  Delete old
sparse_init() and all the code that became obsolete with.

Signed-off-by: Pavel Tatashin 
Tested-by: Michael Ellerman  (powerpc)
---
 include/linux/mm.h  |   6 --
 mm/Kconfig  |   4 -
 mm/sparse-vmemmap.c |  21 
 mm/sparse.c | 237 +---
 4 files changed, 1 insertion(+), 267 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 99d8c50adef6..726e71475144 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2649,12 +2649,6 @@ extern int randomize_va_space;
 const char * arch_vma_name(struct vm_area_struct *vma);
 void print_vma_addr(char *prefix, unsigned long rip);
 
-void sparse_mem_maps_populate_node(struct page **map_map,
-  unsigned long pnum_begin,
-  unsigned long pnum_end,
-  unsigned long map_count,
-  int nodeid);
-
 void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap);
diff --git a/mm/Kconfig b/mm/Kconfig
index 28fcf54946ea..b78e7cd4e9fe 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -115,10 +115,6 @@ config SPARSEMEM_EXTREME
 config SPARSEMEM_VMEMMAP_ENABLE
bool
 
-config SPARSEMEM_ALLOC_MEM_MAP_TOGETHER
-   def_bool y
-   depends on SPARSEMEM && X86_64
-
 config SPARSEMEM_VMEMMAP
bool "Sparse Memory virtual memmap"
depends on SPARSEMEM && SPARSEMEM_VMEMMAP_ENABLE
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index cd15f3d252c3..8301293331a2 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -261,24 +261,3 @@ struct page * __meminit sparse_mem_map_populate(unsigned 
long pnum, int nid,
 
return map;
 }
-
-void __init sparse_mem_maps_populate_node(struct page **map_map,
- unsigned long pnum_begin,
- unsigned long pnum_end,
- unsigned long map_count, int nodeid)
-{
-   unsigned long pnum;
-   int nr_consumed_maps = 0;
-
-   for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-   if (!present_section_nr(pnum))
-   continue;
-
-   map_map[nr_consumed_maps] =
-   sparse_mem_map_populate(pnum, nodeid, NULL);
-   if (map_map[nr_consumed_maps++])
-   continue;
-   pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
-  __func__);
-   }
-}
diff --git a/mm/sparse.c b/mm/sparse.c
index 248d5d7bbf55..10b07eea9a6e 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -205,12 +205,6 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
 }
 
-/*
- * Record how many memory sections are marked as present
- * during system bootup.
- */
-static int __initdata nr_present_sections;
-
 /* Record a memory area against a node. */
 void __init memory_present(int nid, unsigned long start, unsigned long end)
 {
@@ -240,7 +234,6 @@ void __init memory_present(int nid, unsigned long start, 
unsigned long end)
ms->section_mem_map = sparse_encode_early_nid(nid) |
SECTION_IS_ONLINE;
section_mark_present(ms);
-   nr_present_sections++;
}
}
 }
@@ -377,37 +370,8 @@ static void __init check_usemap_section_nr(int nid, 
unsigned long *usemap)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static void __init sparse_early_usemaps_alloc_node(void *data,
-unsigned long pnum_begin,
-unsigned long pnum_end,
-unsigned long usemap_count, int nodeid)
-{
-   void *usemap;
-   unsigned long pnum;
-   unsigned long **usemap_map = (unsigned long **)data;
-   int size = usemap_size();
-   int nr_consumed_maps = 0;
-
-   usemap = sparse_early_usemaps_alloc_pgdat_section(NODE_DATA(nodeid),
- size * usemap_count);
-   if (!usemap) {
-   pr_warn("%s: allocation failed\n", __func__);
-   return;
-   }
-
-   for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-   if (!present_section_nr(pnum))
-   continue;
-   usemap_map[nr_consumed_maps] = usemap;
-   usemap += size;
-   check_usemap_section_nr(nodeid, usemap_map[nr_consumed_maps]);
-   nr_consumed_maps++;
-   }
-}
-
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static unsigned long __init section_map_size(void)
-
 {
return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE);
 }
@@

[PATCH v6 2/5] mm/sparse: use the new sparse buffer functions in non-vmemmap

2018-07-16 Thread Pavel Tatashin

non-vmemmap sparse also allocated large contiguous chunk of memory, and if
fails falls back to smaller allocations.  Use the same functions to
allocate buffer as the vmemmap-sparse

Signed-off-by: Pavel Tatashin 
---
 mm/sparse.c | 41 ++---
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 9a0a5f598469..db4867b62fff 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -408,13 +408,20 @@ unsigned long __init section_map_size(void)
 }
 
 #else
+unsigned long __init section_map_size(void)
+{
+   return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+}
+
 struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap)
 {
-   struct page *map;
-   unsigned long size;
+   unsigned long size = section_map_size();
+   struct page *map = sparse_buffer_alloc(size);
+
+   if (map)
+   return map;
 
-   size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
map = memblock_virt_alloc_try_nid(size,
  PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
  BOOTMEM_ALLOC_ACCESSIBLE, nid);
@@ -425,42 +432,22 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
  unsigned long pnum_end,
  unsigned long map_count, int nodeid)
 {
-   void *map;
unsigned long pnum;
-   unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
-   int nr_consumed_maps;
-
-   size = PAGE_ALIGN(size);
-   map = memblock_virt_alloc_try_nid_raw(size * map_count,
- PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
- BOOTMEM_ALLOC_ACCESSIBLE, nodeid);
-   if (map) {
-   nr_consumed_maps = 0;
-   for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-   if (!present_section_nr(pnum))
-   continue;
-   map_map[nr_consumed_maps] = map;
-   map += size;
-   nr_consumed_maps++;
-   }
-   return;
-   }
+   unsigned long size = section_map_size();
+   int nr_consumed_maps = 0;
 
-   /* fallback */
-   nr_consumed_maps = 0;
+   sparse_buffer_init(size * map_count, nodeid);
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
-   struct mem_section *ms;
-
if (!present_section_nr(pnum))
continue;
map_map[nr_consumed_maps] =
sparse_mem_map_populate(pnum, nodeid, NULL);
if (map_map[nr_consumed_maps++])
continue;
-   ms = __nr_to_section(pnum);
pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
   __func__);
}
+   sparse_buffer_fini();
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
-- 
2.18.0

[PATCH v6 3/5] mm/sparse: move buffer init/fini to the common place

2018-07-16 Thread Pavel Tatashin

Now, that both variants of sparse memory use the same buffers to populate
memory map, we can move sparse_buffer_init()/sparse_buffer_fini() to the
common place.

Signed-off-by: Pavel Tatashin 
---
 include/linux/mm.h  |  3 ---
 mm/sparse-vmemmap.c |  2 --
 mm/sparse.c | 14 +++---
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a83d3e0e66d4..99d8c50adef6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2655,9 +2655,6 @@ void sparse_mem_maps_populate_node(struct page **map_map,
   unsigned long map_count,
   int nodeid);
 
-unsigned long __init section_map_size(void);
-void sparse_buffer_init(unsigned long size, int nid);
-void sparse_buffer_fini(void);
 void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index b05c7663c640..cd15f3d252c3 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -270,7 +270,6 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
unsigned long pnum;
int nr_consumed_maps = 0;
 
-   sparse_buffer_init(section_map_size() * map_count, nodeid);
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
continue;
@@ -282,5 +281,4 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
   __func__);
}
-   sparse_buffer_fini();
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index db4867b62fff..20ca292d8f11 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -401,14 +401,14 @@ static void __init sparse_early_usemaps_alloc_node(void 
*data,
 }
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-unsigned long __init section_map_size(void)
+static unsigned long __init section_map_size(void)
 
 {
return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE);
 }
 
 #else
-unsigned long __init section_map_size(void)
+static unsigned long __init section_map_size(void)
 {
return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
 }
@@ -433,10 +433,8 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
  unsigned long map_count, int nodeid)
 {
unsigned long pnum;
-   unsigned long size = section_map_size();
int nr_consumed_maps = 0;
 
-   sparse_buffer_init(size * map_count, nodeid);
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
continue;
@@ -447,14 +445,13 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
   __func__);
}
-   sparse_buffer_fini();
 }
 #endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void *sparsemap_buf __meminitdata;
 static void *sparsemap_buf_end __meminitdata;
 
-void __init sparse_buffer_init(unsigned long size, int nid)
+static void __init sparse_buffer_init(unsigned long size, int nid)
 {
WARN_ON(sparsemap_buf); /* forgot to call sparse_buffer_fini()? */
sparsemap_buf =
@@ -464,7 +461,7 @@ void __init sparse_buffer_init(unsigned long size, int nid)
sparsemap_buf_end = sparsemap_buf + size;
 }
 
-void __init sparse_buffer_fini(void)
+static void __init sparse_buffer_fini(void)
 {
unsigned long size = sparsemap_buf_end - sparsemap_buf;
 
@@ -494,8 +491,11 @@ static void __init sparse_early_mem_maps_alloc_node(void 
*data,
 unsigned long map_count, int nodeid)
 {
struct page **map_map = (struct page **)data;
+
+   sparse_buffer_init(section_map_size() * map_count, nodeid);
sparse_mem_maps_populate_node(map_map, pnum_begin, pnum_end,
 map_count, nodeid);
+   sparse_buffer_fini();
 }
 #else
 static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
-- 
2.18.0

[PATCH v6 1/5] mm/sparse: abstract sparse buffer allocations

2018-07-16 Thread Pavel Tatashin

When struct pages are allocated for sparse-vmemmap VA layout, we first try
to allocate one large buffer, and than if that fails allocate struct pages
for each section as we go.

The code that allocates buffer is uses global variables and is spread
across several call sites.

Cleanup the code by introducing three functions to handle the global
buffer:

sparse_buffer_init()initialize the buffer
sparse_buffer_fini()free the remaining part of the buffer
sparse_buffer_alloc()   alloc from the buffer, and if buffer is empty
return NULL

Define these functions in sparse.c instead of sparse-vmemmap.c because
later we will use them for non-vmemmap sparse allocations as well.

Signed-off-by: Pavel Tatashin 
---
 include/linux/mm.h  |  4 
 mm/sparse-vmemmap.c | 40 ++--
 mm/sparse.c | 45 -
 3 files changed, 54 insertions(+), 35 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 577e578eb640..a83d3e0e66d4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2655,6 +2655,10 @@ void sparse_mem_maps_populate_node(struct page **map_map,
   unsigned long map_count,
   int nodeid);
 
+unsigned long __init section_map_size(void);
+void sparse_buffer_init(unsigned long size, int nid);
+void sparse_buffer_fini(void);
+void *sparse_buffer_alloc(unsigned long size);
 struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
struct vmem_altmap *altmap);
 pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 95e2c7638a5c..b05c7663c640 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -43,12 +43,9 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
unsigned long goal)
 {
return memblock_virt_alloc_try_nid_raw(size, align, goal,
-   BOOTMEM_ALLOC_ACCESSIBLE, node);
+  BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
-static void *vmemmap_buf;
-static void *vmemmap_buf_end;
-
 void * __meminit vmemmap_alloc_block(unsigned long size, int node)
 {
/* If the main allocator is up use that, fallback to bootmem. */
@@ -76,18 +73,10 @@ void * __meminit vmemmap_alloc_block(unsigned long size, 
int node)
 /* need to make sure size is all the same during early stage */
 void * __meminit vmemmap_alloc_block_buf(unsigned long size, int node)
 {
-   void *ptr;
-
-   if (!vmemmap_buf)
-   return vmemmap_alloc_block(size, node);
-
-   /* take the from buf */
-   ptr = (void *)ALIGN((unsigned long)vmemmap_buf, size);
-   if (ptr + size > vmemmap_buf_end)
-   return vmemmap_alloc_block(size, node);
-
-   vmemmap_buf = ptr + size;
+   void *ptr = sparse_buffer_alloc(size);
 
+   if (!ptr)
+   ptr = vmemmap_alloc_block(size, node);
return ptr;
 }
 
@@ -279,19 +268,9 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
  unsigned long map_count, int nodeid)
 {
unsigned long pnum;
-   unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
-   void *vmemmap_buf_start;
int nr_consumed_maps = 0;
 
-   size = ALIGN(size, PMD_SIZE);
-   vmemmap_buf_start = __earlyonly_bootmem_alloc(nodeid, size * map_count,
-PMD_SIZE, __pa(MAX_DMA_ADDRESS));
-
-   if (vmemmap_buf_start) {
-   vmemmap_buf = vmemmap_buf_start;
-   vmemmap_buf_end = vmemmap_buf_start + size * map_count;
-   }
-
+   sparse_buffer_init(section_map_size() * map_count, nodeid);
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
continue;
@@ -303,12 +282,5 @@ void __init sparse_mem_maps_populate_node(struct page 
**map_map,
pr_err("%s: sparsemem memory map backing failed some memory 
will not be available\n",
   __func__);
}
-
-   if (vmemmap_buf_start) {
-   /* need to free left buf */
-   memblock_free_early(__pa(vmemmap_buf),
-   vmemmap_buf_end - vmemmap_buf);
-   vmemmap_buf = NULL;
-   vmemmap_buf_end = NULL;
-   }
+   sparse_buffer_fini();
 }
diff --git a/mm/sparse.c b/mm/sparse.c
index 2ea8b3dbd0df..9a0a5f598469 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -400,7 +400,14 @@ static void __init sparse_early_usemaps_alloc_node(void 
*data,
}
 }
 
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+unsigned long __init section_map_size(void)
+
+{
+   return ALIGN(sizeof(struct page) * PAGES_PER_SECTION, PMD_SIZE);
+}
+
+#else
 struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid,

Re: [PATCH] IPoIB: use kvzalloc to allocate an array of bucket pointers

2018-07-16 Thread Håkon Bugge




> On 16 Jul 2018, at 18:19, Jan Dakinevich  wrote:
> 
> On Mon,  9 Jul 2018 16:51:03 +0300
> Jan Dakinevich  wrote:
> 
>> This table by default takes 32KiB which is 3rd memory order.

Only if PAGE_SIZE is 4KiB...

>> Meanwhile, this memory is not aimed for DMA operation and could be
>> safely allocated by vmalloc.
>> 
>> Signed-off-by: Jan Dakinevich 

lgtm,

Reviewed-by: Håkon Bugge 


Thxs, Håkon

>> ---
>> drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 26cde95..cb752df
>> 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> @@ -1526,7 +1526,7 @@ static int ipoib_neigh_hash_init(struct
>> ipoib_dev_priv *priv) return -ENOMEM;
>>  set_bit(IPOIB_STOP_NEIGH_GC, >flags);
>>  size = roundup_pow_of_two(arp_tbl.gc_thresh3);
>> -buckets = kcalloc(size, sizeof(*buckets), GFP_KERNEL);
>> +buckets = kvcalloc(size, sizeof(*buckets), GFP_KERNEL);
>>  if (!buckets) {
>>  kfree(htbl);
>>  return -ENOMEM;
>> @@ -1554,7 +1554,7 @@ static void neigh_hash_free_rcu(struct rcu_head
>> *head) struct ipoib_neigh __rcu **buckets = htbl->buckets;
>>  struct ipoib_neigh_table *ntbl = htbl->ntbl;
>> 
>> -kfree(buckets);
>> +kvfree(buckets);
>>  kfree(htbl);
>>  complete(>deleted);
>> }
> 
> ping
> 
> -- 
> Best regards
> Jan Dakinevich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4.9 03/32] MIPS: Use async IPIs forarch_trigger_cpumask_backtrace()

2018-07-16 Thread Greg Kroah-Hartman

On Mon, Jul 16, 2018 at 12:46:21PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jul 16, 2018 at 05:46:12PM +0800, 陈华才 wrote:
> > Just change "call_single_data_t" to "struct call_single_data" and 
> > everything is OK.
> 
> Ok, I've done that now, thanks.

And I messed it up.  So I've just dropped it entirely.  Can someone send
me a properly backported patch please?

thanks,

greg k-h

Re: [PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-16 Thread Rik van Riel

On Mon, 2018-07-16 at 03:04 +0200, Ingo Molnar wrote:
> * Rik van Riel  wrote:
> 
> > On Mon, 2018-07-16 at 01:04 +0200, Ingo Molnar wrote:
> > > * Rik van Riel  wrote:
> > > 
> > > > +   /*
> > > > +* Stop remote flushes for the previous mm.
> > > > +* Skip the idle task; we never send init_mm
> > > > TLB
> > > > flushing IPIs,
> > > > +* but the bitmap manipulation can cause cache
> > > > line contention.
> > > > +*/
> > > > +   if (real_prev != _mm) {
> > > > +   VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
> > > > +   mm_cpumask(rea
> > > > l_pr
> > > > ev)));
> > > > +   cpumask_clear_cpu(cpu,
> > > > mm_cpumask(real_prev));
> > > 
> > > BTW., could this optimization be (safely) extended to all (or
> > > most)
> > > !task->mm 
> > > kernel threads?
> > > 
> > > In particular softirq and threaded irq handlers could benefit
> > > greatly
> > > I suspect in 
> > > certain networking intense workloads that happen to active them.
> > 
> > Yes, it could.
> > 
> > Are there kernel threads that use something other than
> > init_mm today?
> 
> Yeah, I think that's the typical case - so at minimum the comment
> should be fixed:
> 
>  > > > +   * Skip the idle task; we never send init_mm TLB
> flushing IPIs,
> 
> and it should say 'skip kernel threads', right?

I will send a v6 that improves this comment, and has
the S-o-b thing you suggested for patch 1/7.

I think that addresses all the comments people had on
this patch series.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

[PATCH v2 09/10] staging:trl8192u: Rename TClasProc > t_clas_proc - Style

2018-07-16 Thread John Whitmore

Rename the struct TS_COMMON_INFO member variable TClasProc to
t_clas_proc. This change clears the checkpatch issue with CamelCase variable
names. There should be no impact on runtime execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 3bf48a04a68e..a183198afb31 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -22,7 +22,7 @@ struct ts_common_info {
u8  addr[6];
TSPEC_BODY  t_spec;
QOS_TCLAS   t_class[TCLAS_NUM];
-   u8  TClasProc;
+   u8  t_clas_proc;
u8  TClasNum;
 };
 
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index b5fede650b81..8b2bb0a69b01 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -106,7 +106,7 @@ static void ResetTsCommonInfo(struct ts_common_info 
*pTsCommonInfo)
eth_zero_addr(pTsCommonInfo->addr);
memset(>t_spec, 0, sizeof(TSPEC_BODY));
memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
-   pTsCommonInfo->TClasProc = 0;
+   pTsCommonInfo->t_clas_proc = 0;
pTsCommonInfo->TClasNum = 0;
 }
 
@@ -281,7 +281,7 @@ static void MakeTSEntry(struct ts_common_info 
*pTsCommonInfo, u8 *Addr,
for(count = 0; count < TCLAS_Num; count++)
memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, 
sizeof(QOS_TCLAS));
 
-   pTsCommonInfo->TClasProc = TCLAS_Proc;
+   pTsCommonInfo->t_clas_proc = TCLAS_Proc;
pTsCommonInfo->TClasNum = TCLAS_Num;
 }
 
-- 
2.18.0

[PATCH v2 06/10] staging:rtl8192u: Rename Addr > addr - Style

2018-07-16 Thread John Whitmore

Rename the TX_COMMON_INFO structure's member Addr to addr. This change
clears the checkpatch issue with CamelCase naming. This is a coding style
change only and should not impact runtime execution.

Signed-off-by: John Whitmore 
---
 .../staging/rtl8192u/ieee80211/rtl819x_BAProc.c| 10 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h|  2 +-
 .../staging/rtl8192u/ieee80211/rtl819x_TSProc.c| 14 +++---
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
index 950f827a1740..297c498aef6a 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
@@ -634,7 +634,7 @@ TsInitAddBA(
 
ActivateBAEntry(ieee, pBA, BA_SETUP_TIMEOUT);
 
-   ieee80211_send_ADDBAReq(ieee, pTS->TsCommonInfo.Addr, pBA);
+   ieee80211_send_ADDBAReq(ieee, pTS->TsCommonInfo.addr, pBA);
 }
 
 void
@@ -646,7 +646,7 @@ TsInitDelBA(struct ieee80211_device *ieee, struct 
ts_common_info *pTsCommonInfo,
if (TxTsDeleteBA(ieee, pTxTs))
ieee80211_send_DELBA(
ieee,
-   pTsCommonInfo->Addr,
+   pTsCommonInfo->addr,

(pTxTs->TxAdmittedBARecord.bValid)?(>TxAdmittedBARecord):(>TxPendingBARecord),
TxRxSelect,
DELBA_REASON_END_BA);
@@ -655,7 +655,7 @@ TsInitDelBA(struct ieee80211_device *ieee, struct 
ts_common_info *pTsCommonInfo,
if (RxTsDeleteBA(ieee, pRxTs))
ieee80211_send_DELBA(
ieee,
-   pTsCommonInfo->Addr,
+   pTsCommonInfo->addr,
>RxAdmittedBARecord,
TxRxSelect,
DELBA_REASON_END_BA);
@@ -683,7 +683,7 @@ void TxBaInactTimeout(struct timer_list *t)
TxTsDeleteBA(ieee, pTxTs);
ieee80211_send_DELBA(
ieee,
-   pTxTs->TsCommonInfo.Addr,
+   pTxTs->TsCommonInfo.addr,
>TxAdmittedBARecord,
TX_DIR,
DELBA_REASON_TIMEOUT);
@@ -697,7 +697,7 @@ void RxBaInactTimeout(struct timer_list *t)
RxTsDeleteBA(ieee, pRxTs);
ieee80211_send_DELBA(
ieee,
-   pRxTs->TsCommonInfo.Addr,
+   pRxTs->TsCommonInfo.addr,
>RxAdmittedBARecord,
RX_DIR,
DELBA_REASON_TIMEOUT);
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 1613c3c5d447..62c6fc513540 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -19,7 +19,7 @@ struct ts_common_info {
struct list_headlist;
struct timer_list   setup_timer;
struct timer_list   inact_timer;
-   u8  Addr[6];
+   u8  addr[6];
TSPEC_BODY  TSpec;
QOS_TCLAS   TClass[TCLAS_NUM];
u8  TClasProc;
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index 175d91218af8..72a5fc0bbb47 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -103,7 +103,7 @@ static void TsAddBaProcess(struct timer_list *t)
 
 static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo)
 {
-   eth_zero_addr(pTsCommonInfo->Addr);
+   eth_zero_addr(pTsCommonInfo->addr);
memset(>TSpec, 0, sizeof(TSPEC_BODY));
memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
pTsCommonInfo->TClasProc = 0;
@@ -247,7 +247,7 @@ static struct ts_common_info *SearchAdmitTRStream(struct 
ieee80211_device *ieee,
continue;
list_for_each_entry(pRet, psearch_list, list){
//  IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, 
dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, 
pRet->TSpec.f.TSInfo.field.ucDirection);
-   if (memcmp(pRet->Addr, Addr, 6) == 0)
+   if (memcmp(pRet->addr, Addr, 6) == 0)
if (pRet->TSpec.f.TSInfo.field.ucTSID == TID)

if(pRet->TSpec.f.TSInfo.field.ucDirection == dir) {
//  printk("Bingo! got it\n");
@@ -273,7 +273,7 @@ static void MakeTSEntry(struct ts_common_info 
*pTsCommonInfo, u8 *Addr,
if(pTsCommonInfo == NULL)
return;
 
-

[PATCH v2 10/10] staging:rtl8192u: Rename TClasNum > t_clas_num - Style

2018-07-16 Thread John Whitmore

Rename the struct TS_COMMON_INFO member variable TClasNum to t_clas_num. This
change clears the checkpatch issue with CamelCase naming. There should be no
impact on runtime execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index a183198afb31..3da1ef6ef105 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -23,7 +23,7 @@ struct ts_common_info {
TSPEC_BODY  t_spec;
QOS_TCLAS   t_class[TCLAS_NUM];
u8  t_clas_proc;
-   u8  TClasNum;
+   u8  t_clas_num;
 };
 
 typedef struct _TX_TS_RECORD {
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index 8b2bb0a69b01..4b2da7f31166 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -107,7 +107,7 @@ static void ResetTsCommonInfo(struct ts_common_info 
*pTsCommonInfo)
memset(>t_spec, 0, sizeof(TSPEC_BODY));
memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
pTsCommonInfo->t_clas_proc = 0;
-   pTsCommonInfo->TClasNum = 0;
+   pTsCommonInfo->t_clas_num = 0;
 }
 
 static void ResetTxTsEntry(PTX_TS_RECORD pTS)
@@ -282,7 +282,7 @@ static void MakeTSEntry(struct ts_common_info 
*pTsCommonInfo, u8 *Addr,
memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, 
sizeof(QOS_TCLAS));
 
pTsCommonInfo->t_clas_proc = TCLAS_Proc;
-   pTsCommonInfo->TClasNum = TCLAS_Num;
+   pTsCommonInfo->t_clas_num = TCLAS_Num;
 }
 
 
-- 
2.18.0

[PATCH v2 00/10] staging:rtl8192u: Coding style changes.

2018-07-16 Thread John Whitmore

Simple coding style changes to avoid CamelCase.

John Whitmore (10):
  staging:rtl8192u: remove typedef of enumeration TR_SELECT - Style
  staging:rtl8192u: remove typedef of struct TS_COMMON_INFO - Style
  staging:rtl8192u: Rename List > list - Coding style
  staging:rtl8192u: rename SetupTimer > setup_timer - Style
  staging:rtl8192u: Rename InactTimer > inact_timer - Style
  staging:rtl8192u: Rename Addr > addr - Style
  staging:rtl8192u: Rename TSpec > t_spec - Style
  staging:rtl8192u: Rename TClass > t_class - Style
  staging:trl8192u: Rename TClasProc > t_clas_proc - Style
  staging:rtl8192u: Rename TClasNum > t_clas_num - Style

 .../staging/rtl8192u/ieee80211/ieee80211.h|   6 +-
 .../staging/rtl8192u/ieee80211/ieee80211_rx.c |   4 +-
 .../staging/rtl8192u/ieee80211/ieee80211_tx.c |   4 +-
 .../rtl8192u/ieee80211/rtl819x_BAProc.c   |  30 ++--
 .../staging/rtl8192u/ieee80211/rtl819x_TS.h   |  28 ++--
 .../rtl8192u/ieee80211/rtl819x_TSProc.c   | 142 +-
 6 files changed, 107 insertions(+), 107 deletions(-)

-- 
2.18.0

[PATCH v2 07/10] staging:rtl8192u: Rename TSpec > t_spec - Style

2018-07-16 Thread John Whitmore

Rename the TS_COMMON_INFO structure's member TSpec to t_spec. This change
clears the checkpatch issue with CamelCase naming of variables. There should
be no impact on runtime execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 8 
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
index 297c498aef6a..2c7d3ab1b5f7 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
@@ -626,7 +626,7 @@ TsInitAddBA(
pBA->DialogToken++; // 
DialogToken: Only keep the latest dialog token
pBA->BaParamSet.field.AMSDU_Support = 0;// Do not support 
A-MSDU with A-MPDU now!!
pBA->BaParamSet.field.BAPolicy = Policy;// Policy: Delayed or 
Immediate
-   pBA->BaParamSet.field.TID = 
pTS->TsCommonInfo.TSpec.f.TSInfo.field.ucTSID;  // TID
+   pBA->BaParamSet.field.TID = 
pTS->TsCommonInfo.t_spec.f.TSInfo.field.ucTSID; // TID
// BufferSize: This need to be set according to A-MPDU vector
pBA->BaParamSet.field.BufferSize = 32;  // BufferSize: This 
need to be set according to A-MPDU vector
pBA->BaTimeoutValue = 0;// 
Timeout value: Set 0 to disable Timer
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 62c6fc513540..4c1b2e27cf94 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -20,7 +20,7 @@ struct ts_common_info {
struct timer_list   setup_timer;
struct timer_list   inact_timer;
u8  addr[6];
-   TSPEC_BODY  TSpec;
+   TSPEC_BODY  t_spec;
QOS_TCLAS   TClass[TCLAS_NUM];
u8  TClasProc;
u8  TClasNum;
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index 72a5fc0bbb47..f9f4196f43fa 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -104,7 +104,7 @@ static void TsAddBaProcess(struct timer_list *t)
 static void ResetTsCommonInfo(struct ts_common_info *pTsCommonInfo)
 {
eth_zero_addr(pTsCommonInfo->addr);
-   memset(>TSpec, 0, sizeof(TSPEC_BODY));
+   memset(>t_spec, 0, sizeof(TSPEC_BODY));
memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
pTsCommonInfo->TClasProc = 0;
pTsCommonInfo->TClasNum = 0;
@@ -248,8 +248,8 @@ static struct ts_common_info *SearchAdmitTRStream(struct 
ieee80211_device *ieee,
list_for_each_entry(pRet, psearch_list, list){
//  IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, 
dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, 
pRet->TSpec.f.TSInfo.field.ucDirection);
if (memcmp(pRet->addr, Addr, 6) == 0)
-   if (pRet->TSpec.f.TSInfo.field.ucTSID == TID)
-   
if(pRet->TSpec.f.TSInfo.field.ucDirection == dir) {
+   if (pRet->t_spec.f.TSInfo.field.ucTSID == TID)
+   
if(pRet->t_spec.f.TSInfo.field.ucDirection == dir) {
//  printk("Bingo! got it\n");
break;
}
@@ -276,7 +276,7 @@ static void MakeTSEntry(struct ts_common_info 
*pTsCommonInfo, u8 *Addr,
memcpy(pTsCommonInfo->addr, Addr, 6);
 
if(pTSPEC != NULL)
-   memcpy((u8 *)(&(pTsCommonInfo->TSpec)), (u8 *)pTSPEC, 
sizeof(TSPEC_BODY));
+   memcpy((u8 *)(&(pTsCommonInfo->t_spec)), (u8 *)pTSPEC, 
sizeof(TSPEC_BODY));
 
for(count = 0; count < TCLAS_Num; count++)
memcpy((u8 *)(&(pTsCommonInfo->TClass[count])), (u8 *)pTCLAS, 
sizeof(QOS_TCLAS));
-- 
2.18.0

[PATCH v2 08/10] staging:rtl8192u: Rename TClass > t_class - Style

2018-07-16 Thread John Whitmore

Rename the struct TS_COMMON_INFO member variable from TClass to t_class. This
change clears the checkpatch issue with CamelCase Variable names. There should
be no impact on runtime execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 4c1b2e27cf94..3bf48a04a68e 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -21,7 +21,7 @@ struct ts_common_info {
struct timer_list   inact_timer;
u8  addr[6];
TSPEC_BODY  t_spec;
-   QOS_TCLAS   TClass[TCLAS_NUM];
+   QOS_TCLAS   t_class[TCLAS_NUM];
u8  TClasProc;
u8  TClasNum;
 };
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index f9f4196f43fa..b5fede650b81 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -105,7 +105,7 @@ static void ResetTsCommonInfo(struct ts_common_info 
*pTsCommonInfo)
 {
eth_zero_addr(pTsCommonInfo->addr);
memset(>t_spec, 0, sizeof(TSPEC_BODY));
-   memset(>TClass, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
+   memset(>t_class, 0, sizeof(QOS_TCLAS)*TCLAS_NUM);
pTsCommonInfo->TClasProc = 0;
pTsCommonInfo->TClasNum = 0;
 }
@@ -279,7 +279,7 @@ static void MakeTSEntry(struct ts_common_info 
*pTsCommonInfo, u8 *Addr,
memcpy((u8 *)(&(pTsCommonInfo->t_spec)), (u8 *)pTSPEC, 
sizeof(TSPEC_BODY));
 
for(count = 0; count < TCLAS_Num; count++)
-   memcpy((u8 *)(&(pTsCommonInfo->TClass[count])), (u8 *)pTCLAS, 
sizeof(QOS_TCLAS));
+   memcpy((u8 *)(&(pTsCommonInfo->t_class[count])), (u8 *)pTCLAS, 
sizeof(QOS_TCLAS));
 
pTsCommonInfo->TClasProc = TCLAS_Proc;
pTsCommonInfo->TClasNum = TCLAS_Num;
-- 
2.18.0

[PATCH v2 05/10] staging:rtl8192u: Rename InactTimer > inact_timer - Style

2018-07-16 Thread John Whitmore

Rename the struct TS_COMMON_INFO member InactTimer to inact_timer.
This change clears the checkpatch issue with CamelCase naming. The change
should not have any impact on runtime execution.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h |  2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 26384566d4f3..1613c3c5d447 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -18,7 +18,7 @@ enum tr_select {
 struct ts_common_info {
struct list_headlist;
struct timer_list   setup_timer;
-   struct timer_list   InactTimer;
+   struct timer_list   inact_timer;
u8  Addr[6];
TSPEC_BODY  TSpec;
QOS_TCLAS   TClass[TCLAS_NUM];
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index efe8392043c1..175d91218af8 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -148,7 +148,7 @@ void TSInitialize(struct ieee80211_device *ieee)
// DLS related timer will be add here in the future!!
timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut,
0);
-   timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout,
+   timer_setup(>TsCommonInfo.inact_timer, TsInactTimeout,
0);
timer_setup(>TsAddBaTimer, TsAddBaProcess, 0);
timer_setup(>TxPendingBARecord.Timer, BaSetupTimeOut,
@@ -169,7 +169,7 @@ void TSInitialize(struct ieee80211_device *ieee)
INIT_LIST_HEAD(>RxPendingPktList);
timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut,
0);
-   timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout,
+   timer_setup(>TsCommonInfo.inact_timer, TsInactTimeout,
0);
timer_setup(>RxAdmittedBARecord.Timer,
RxBaInactTimeout, 0);
@@ -194,10 +194,10 @@ static void AdmitTS(struct ieee80211_device *ieee,
struct ts_common_info *pTsCommonInfo, u32 InactTime)
 {
del_timer_sync(>setup_timer);
-   del_timer_sync(>InactTimer);
+   del_timer_sync(>inact_timer);
 
if(InactTime!=0)
-   mod_timer(>InactTimer,
+   mod_timer(>inact_timer,
  jiffies + msecs_to_jiffies(InactTime));
 }
 
@@ -413,7 +413,7 @@ static void RemoveTsEntry(struct ieee80211_device *ieee, 
struct ts_common_info *
//u32 flags = 0;
unsigned long flags = 0;
del_timer_sync(>setup_timer);
-   del_timer_sync(>InactTimer);
+   del_timer_sync(>inact_timer);
TsInitDelBA(ieee, pTs, TxRxSelect);
 
if(TxRxSelect == RX_DIR) {
-- 
2.18.0

[PATCH v2 04/10] staging:rtl8192u: rename SetupTimer > setup_timer - Style

2018-07-16 Thread John Whitmore

Rename the struct TS_COMMON_INFO member SetupTimer to setup_timer. This
clears the checkpatch issue with CamelCase variable names.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h | 2 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 0a0420aa91cf..26384566d4f3 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -17,7 +17,7 @@ enum tr_select {
 
 struct ts_common_info {
struct list_headlist;
-   struct timer_list   SetupTimer;
+   struct timer_list   setup_timer;
struct timer_list   InactTimer;
u8  Addr[6];
TSPEC_BODY  TSpec;
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index 24abdad52ed6..efe8392043c1 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -146,7 +146,7 @@ void TSInitialize(struct ieee80211_device *ieee)
pTxTS->num = count;
// The timers for the operation of Traffic Stream and Block Ack.
// DLS related timer will be add here in the future!!
-   timer_setup(>TsCommonInfo.SetupTimer, TsSetupTimeOut,
+   timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut,
0);
timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout,
0);
@@ -167,7 +167,7 @@ void TSInitialize(struct ieee80211_device *ieee)
for(count = 0; count < TOTAL_TS_NUM; count++) {
pRxTS->num = count;
INIT_LIST_HEAD(>RxPendingPktList);
-   timer_setup(>TsCommonInfo.SetupTimer, TsSetupTimeOut,
+   timer_setup(>TsCommonInfo.setup_timer, TsSetupTimeOut,
0);
timer_setup(>TsCommonInfo.InactTimer, TsInactTimeout,
0);
@@ -193,7 +193,7 @@ void TSInitialize(struct ieee80211_device *ieee)
 static void AdmitTS(struct ieee80211_device *ieee,
struct ts_common_info *pTsCommonInfo, u32 InactTime)
 {
-   del_timer_sync(>SetupTimer);
+   del_timer_sync(>setup_timer);
del_timer_sync(>InactTimer);
 
if(InactTime!=0)
@@ -412,7 +412,7 @@ static void RemoveTsEntry(struct ieee80211_device *ieee, 
struct ts_common_info *
 {
//u32 flags = 0;
unsigned long flags = 0;
-   del_timer_sync(>SetupTimer);
+   del_timer_sync(>setup_timer);
del_timer_sync(>InactTimer);
TsInitDelBA(ieee, pTs, TxRxSelect);
 
-- 
2.18.0

[PATCH v2 03/10] staging:rtl8192u: Rename List > list - Coding style

2018-07-16 Thread John Whitmore

In struct TS_COMMON_INFO rename the member List to list. This clears the
checkpatch issue concerning CamelCase naming of variables.

Signed-off-by: John Whitmore 
---
 .../staging/rtl8192u/ieee80211/rtl819x_TS.h   |  2 +-
 .../rtl8192u/ieee80211/rtl819x_TSProc.c   | 64 +--
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 5d2fb52a8072..0a0420aa91cf 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -16,7 +16,7 @@ enum tr_select {
 };
 
 struct ts_common_info {
-   struct list_headList;
+   struct list_headlist;
struct timer_list   SetupTimer;
struct timer_list   InactTimer;
u8  Addr[6];
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index fd3347dca025..24abdad52ed6 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
@@ -156,7 +156,7 @@ void TSInitialize(struct ieee80211_device *ieee)
timer_setup(>TxAdmittedBARecord.Timer,
TxBaInactTimeout, 0);
ResetTxTsEntry(pTxTS);
-   list_add_tail(>TsCommonInfo.List, 
>Tx_TS_Unused_List);
+   list_add_tail(>TsCommonInfo.list, 
>Tx_TS_Unused_List);
pTxTS++;
}
 
@@ -175,7 +175,7 @@ void TSInitialize(struct ieee80211_device *ieee)
RxBaInactTimeout, 0);
timer_setup(>RxPktPendingTimer, RxPktPendingTimeout, 0);
ResetRxTsEntry(pRxTS);
-   list_add_tail(>TsCommonInfo.List, 
>Rx_TS_Unused_List);
+   list_add_tail(>TsCommonInfo.list, 
>Rx_TS_Unused_List);
pRxTS++;
}
// Initialize unused Rx Reorder List.
@@ -245,7 +245,7 @@ static struct ts_common_info *SearchAdmitTRStream(struct 
ieee80211_device *ieee,
for(dir = 0; dir <= DIR_BI_DIR; dir++) {
if (!search_dir[dir])
continue;
-   list_for_each_entry(pRet, psearch_list, List){
+   list_for_each_entry(pRet, psearch_list, list){
//  IEEE80211_DEBUG(IEEE80211_DL_TS, "ADD:%pM, TID:%d, 
dir:%d\n", pRet->Addr, pRet->TSpec.f.TSInfo.field.ucTSID, 
pRet->TSpec.f.TSInfo.field.ucDirection);
if (memcmp(pRet->Addr, Addr, 6) == 0)
if (pRet->TSpec.f.TSInfo.field.ucTSID == TID)
@@ -254,11 +254,11 @@ static struct ts_common_info *SearchAdmitTRStream(struct 
ieee80211_device *ieee,
break;
}
}
-   if(>List  != psearch_list)
+   if(>list  != psearch_list)
break;
}
 
-   if(>List  != psearch_list)
+   if(>list  != psearch_list)
return pRet ;
else
return NULL;
@@ -371,8 +371,8 @@ bool GetTs(

((TxRxSelect==TX_DIR)?DIR_UP:DIR_DOWN);
IEEE80211_DEBUG(IEEE80211_DL_TS, "to add Ts\n");
if(!list_empty(pUnusedList)) {
-   (*ppTS) = list_entry(pUnusedList->next, struct 
ts_common_info, List);
-   list_del_init(&(*ppTS)->List);
+   (*ppTS) = list_entry(pUnusedList->next, struct 
ts_common_info, list);
+   list_del_init(&(*ppTS)->list);
if(TxRxSelect==TX_DIR) {
PTX_TS_RECORD tmp = container_of(*ppTS, 
TX_TS_RECORD, TsCommonInfo);
ResetTxTsEntry(tmp);
@@ -395,7 +395,7 @@ bool GetTs(
 
MakeTSEntry(*ppTS, Addr, , NULL, 0, 0);
AdmitTS(ieee, *ppTS, 0);
-   list_add_tail(&((*ppTS)->List), pAddmitList);
+   list_add_tail(&((*ppTS)->list), pAddmitList);
// if there is DirectLink, we need to do 
additional operation here!!
 
return true;
@@ -457,36 +457,36 @@ void RemovePeerTS(struct ieee80211_device *ieee, u8 *Addr)
struct ts_common_info   *pTS, *pTmpTS;
 
printk("===>RemovePeerTS,%pM\n", Addr);
-   list_for_each_entry_safe(pTS, pTmpTS, >Tx_TS_Pending_List, List) {
+   list_for_each_entry_safe(pTS, pTmpTS, >Tx_TS_Pending_List, list) {
if (memcmp(pTS->Addr, Addr, 6) == 0) {
RemoveTsEntry(ieee, pTS, TX_DIR);
-   list_del_init(>List);
-

[PATCH v2 02/10] staging:rtl8192u: remove typedef of struct TS_COMMON_INFO - Style

2018-07-16 Thread John Whitmore

To clear a checkpatch issue removed the typedef of the structure
TS_COMMON_INFO.

This change removes the previous declaration, which defined two types, both
TS_COMMON_INFO and a pointer type PTS_COMMON_INFO:

typedef struct _TS_COMMON_INFO {
...
} TS_COMMON_INFO, *PTS_COMMON_INFO;

The pointer type has been completely removed from the code, as:
"(so-called Hungarian notation) is brain damaged" according to the coding
standard.

Signed-off-by: John Whitmore 
---
 .../staging/rtl8192u/ieee80211/ieee80211.h|  4 ++--
 .../staging/rtl8192u/ieee80211/ieee80211_rx.c |  4 ++--
 .../staging/rtl8192u/ieee80211/ieee80211_tx.c |  4 ++--
 .../rtl8192u/ieee80211/rtl819x_BAProc.c   | 10 
 .../staging/rtl8192u/ieee80211/rtl819x_TS.h   |  8 +++
 .../rtl8192u/ieee80211/rtl819x_TSProc.c   | 24 +--
 6 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h 
b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
index e65c9967b627..bb4bb68bb3dd 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
@@ -2391,7 +2391,7 @@ int ieee80211_rx_DELBA(struct ieee80211_device *ieee, 
struct sk_buff *skb);
 void TsInitAddBA(struct ieee80211_device *ieee, PTX_TS_RECORD pTS,
 u8 Policy, u8 bOverwritePending);
 void TsInitDelBA(struct ieee80211_device *ieee,
-PTS_COMMON_INFO pTsCommonInfo, enum tr_select TxRxSelect);
+struct ts_common_info *pTsCommonInfo, enum tr_select 
TxRxSelect);
 void BaSetupTimeOut(struct timer_list *t);
 void TxBaInactTimeout(struct timer_list *t);
 void RxBaInactTimeout(struct timer_list *t);
@@ -2399,7 +2399,7 @@ void ResetBaEntry(PBA_RECORD pBA);
 //function in TS.c
 bool GetTs(
struct ieee80211_device *ieee,
-   PTS_COMMON_INFO *ppTS,
+   struct ts_common_info   **ppTS,
u8  *Addr,
u8  TID,
enum tr_select  TxRxSelect,  //Rx:1, Tx:0
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c 
b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
index 172165f4461a..a84de8c52a2b 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
@@ -1021,7 +1021,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct 
sk_buff *skb,
//IEEE80211_DEBUG(IEEE80211_DL_REORDER,"%s(): QOS 
ENABLE AND RECEIVE QOS DATA , we will get Ts, tid:%d\n",__func__, tid);
if(GetTs(
ieee,
-   (PTS_COMMON_INFO *) ,
+   (struct ts_common_info **) ,
hdr->addr2,
Frame_QoSTID((u8 *)(skb->data)),
RX_DIR,
@@ -1266,7 +1266,7 @@ int ieee80211_rx(struct ieee80211_device *ieee, struct 
sk_buff *skb,
{
TID = Frame_QoSTID(skb->data);
SeqNum = WLAN_GET_SEQ_SEQ(sc);
-   GetTs(ieee,(PTS_COMMON_INFO *) ,hdr->addr2,TID,RX_DIR,true);
+   GetTs(ieee,(struct ts_common_info **) 
,hdr->addr2,TID,RX_DIR,true);
if (TID !=0 && TID !=3)
{
ieee->bis_any_nonbepkts = true;
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c 
b/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c
index 9a1a84548bc6..010b53b45825 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_tx.c
@@ -330,7 +330,7 @@ static void ieee80211_tx_query_agg_cap(struct 
ieee80211_device *ieee,
}
if(pHTInfo->bCurrentAMPDUEnable)
{
-   if (!GetTs(ieee, (PTS_COMMON_INFO *)(), hdr->addr1, 
skb->priority, TX_DIR, true))
+   if (!GetTs(ieee, (struct ts_common_info **)(), 
hdr->addr1, skb->priority, TX_DIR, true))
{
printk("===>can't get TS\n");
return;
@@ -585,7 +585,7 @@ static void ieee80211_query_seqnum(struct ieee80211_device 
*ieee,
if (IsQoSDataFrame(skb->data)) //we deal qos data only
{
PTX_TS_RECORD pTS = NULL;
-   if (!GetTs(ieee, (PTS_COMMON_INFO *)(), dst, skb->priority, 
TX_DIR, true))
+   if (!GetTs(ieee, (struct ts_common_info **)(), dst, 
skb->priority, TX_DIR, true))
{
return;
}
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
index 04779da803e4..950f827a1740 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
@@ -360,7 +360,7 @@ int ieee80211_rx_ADDBAReq(struct ieee80211_device *ieee, 
struct sk_buff *skb)
// If there is no

[PATCH v2 01/10] staging:rtl8192u: remove typedef of enumeration TR_SELECT - Style

2018-07-16 Thread John Whitmore

To clear a checkpatch issue removed the typedef of the enumeration TR_SELECT
this should not impact runtime code as it's only a coding style change.

Signed-off-by: John Whitmore 
---
 drivers/staging/rtl8192u/ieee80211/ieee80211.h  |  4 ++--
 drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c | 10 +-
 drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h |  4 ++--
 drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c |  6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h 
b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
index 3b7968681f77..e65c9967b627 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
@@ -2391,7 +2391,7 @@ int ieee80211_rx_DELBA(struct ieee80211_device *ieee, 
struct sk_buff *skb);
 void TsInitAddBA(struct ieee80211_device *ieee, PTX_TS_RECORD pTS,
 u8 Policy, u8 bOverwritePending);
 void TsInitDelBA(struct ieee80211_device *ieee,
-PTS_COMMON_INFO pTsCommonInfo, TR_SELECT TxRxSelect);
+PTS_COMMON_INFO pTsCommonInfo, enum tr_select TxRxSelect);
 void BaSetupTimeOut(struct timer_list *t);
 void TxBaInactTimeout(struct timer_list *t);
 void RxBaInactTimeout(struct timer_list *t);
@@ -2402,7 +2402,7 @@ bool GetTs(
PTS_COMMON_INFO *ppTS,
u8  *Addr,
u8  TID,
-   TR_SELECT   TxRxSelect,  //Rx:1, Tx:0
+   enum tr_select  TxRxSelect,  //Rx:1, Tx:0
boolbAddNewTs
);
 void TSInitialize(struct ieee80211_device *ieee);
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
index 2dc4d0e93948..04779da803e4 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_BAProc.c
@@ -174,7 +174,7 @@ static struct sk_buff *ieee80211_ADDBA(struct 
ieee80211_device *ieee, u8 *Dst, P
  *function:  construct DELBA frame
  *   input:  u8*   dst //DELBA frame's destination
  *  PBA_RECORD pBA //BA_RECORD entry which stores the 
necessary information for BA
- *  TR_SELECT  TxRxSelect  //TX RX direction
+ *  enum tr_select TxRxSelect  //TX RX direction
  *  u16ReasonCode  //status code.
  *  output:  none
  *  return:  sk_buff*  skb //return constructed skb to xmit
@@ -183,7 +183,7 @@ static struct sk_buff *ieee80211_DELBA(
struct ieee80211_device  *ieee,
u8   *dst,
PBA_RECORD   pBA,
-   TR_SELECTTxRxSelect,
+   enum tr_select   TxRxSelect,
u16  ReasonCode
)
 {
@@ -290,14 +290,14 @@ static void ieee80211_send_ADDBARsp(struct 
ieee80211_device *ieee, u8 *dst,
  *function: send ADDBARSP frame out
  *   input:  u8*   dst //DELBA frame's destination
  *  PBA_RECORD pBA //BA_RECORD entry which stores the 
necessary information for BA
- *  TR_SELECT  TxRxSelect //TX or RX
+ *  enum tr_select TxRxSelect //TX or RX
  *  u16ReasonCode //DEL ReasonCode
  *  output:  none
  *  notice: If any possible, please hide pBA in ieee. And temporarily use 
Manage Queue as softmac_mgmt_xmit() usually does
  
/
 
 static void ieee80211_send_DELBA(struct ieee80211_device *ieee, u8 *dst,
-PBA_RECORD pBA, TR_SELECT TxRxSelect,
+PBA_RECORD pBA, enum tr_select TxRxSelect,
 u16 ReasonCode)
 {
struct sk_buff *skb;
@@ -638,7 +638,7 @@ TsInitAddBA(
 }
 
 void
-TsInitDelBA(struct ieee80211_device *ieee, PTS_COMMON_INFO pTsCommonInfo, 
TR_SELECT TxRxSelect)
+TsInitDelBA(struct ieee80211_device *ieee, PTS_COMMON_INFO pTsCommonInfo, enum 
tr_select TxRxSelect)
 {
if (TxRxSelect == TX_DIR) {
PTX_TS_RECORD   pTxTs = (PTX_TS_RECORD)pTsCommonInfo;
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
index 3a0ff08c687a..c6a5ac1e647c 100644
--- a/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
+++ b/drivers/staging/rtl8192u/ieee80211/rtl819x_TS.h
@@ -10,10 +10,10 @@
 #define TCLAS_NUM  4
 
 /*  This define the Tx/Rx directions */
-typedef enum _TR_SELECT {
+enum tr_select {
TX_DIR = 0,
RX_DIR = 1,
-} TR_SELECT, *PTR_SELECT;
+};
 
 typedef struct _TS_COMMON_INFO {
struct list_headList;
diff --git a/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c 
b/drivers/staging/rtl8192u/ieee80211/rtl819x_TSProc.c
index

Due Process Unit-July 16 2018

2018-07-16 Thread Fund Benefits




Our Ref:FGN-GOV/IMF/2018

Attention:Beneficiary
Instruction was given by the Office of Presidency and United Nations(UN) and 
also (IMF)to transfer Sum of $10m through ATM Debit Card to you which you can 
use it in near cash point,shopping mall or banking.
You can withdrawal money from your ATM Debit Card from any ATM MACHINE location 
or center of your choice/nearest to you, in any part of the world.You should 
reconfirm your address and full name and also telephone number for delivery of 
your original (A.T.M) Debit Card to you within 48hrs.

Note: this compensation includes, Victims of fraudulent activities and payment 
of inheritance,Contracts and also lottery winning, which was due to you etc

Yours faithfully,
(Tél:234-81462 71957)
CONTACT/Mr.Japheth Benjamin
Foreign Payment Due Process Unit.

Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

2018-07-16 Thread Linus Torvalds

On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman  wrote:
>
> If the numbers can be trusted it is actually slower to put the sync in
> lock, at least on one of the machines:
>
>   Time
> lwsync_sync   84,932,987,977
> sync_lwsync   93,185,930,333

Very funky.

> I guess arguably it's not a very macro benchmark, but we have a
> context_switch benchmark in the tree[1] which we often use to tune
> things, and it degrades badly. It just spins up two threads and has them
> ping-pong using yield.

I hacked that up to run on x86, and it only is about 5% locking
overhead in my profiles. It's about 18% __switch_to, and a lot of
system call entry/exit, but not a lot of locking.

I'm actually surprised it is even that much locking, since it seems to
be single-cpu, so there should be no contention and the lock (which
seems to be

rq = this_rq();
rq_lock(rq, );

in do_sched_yield()) should stay local to the cpu.

And for you the locking is apparently even _more_ noticeable.

But yes, a 10% regression on that context switch thing is huge. You
shouldn't do ping-pong stuff, but people kind of do.

  Linus

Re: [PATCH v5 07/12] PM / devfreq: export devfreq_class

2018-07-16 Thread Matthias Kaehlcke

Hi Chanwoo,

On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
> 
> On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
> > Hi,
> > 
> > On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
> > 
> >> I didn't see any framework which exporting the class instance.
> >> It is very dangerous. Unknown device drivers is able to reset
> >> the 'devfreq_class' instance. I can't agree this approach.
> > 
> > While I agree that it is potential dangerous it is actually a common
> > practice to export the class:
> > 
> 
> I tried to find the real usage of exported class instance
> and I add the comment for each class instance. Almost exported class
> instance are used in the their own director or some exported class
> like rio_mport_class/switchtec_class are created from specific device driver
> instead of subsystem.
> 
> Only following two cases are used on outside of subsystem directory.
> devtmpfs.c and alarmtimer.c are core feature of linux kernel.
> 
>   drivers/base/devtmpfs.c uses 'block_class'.
>   kernel/time/alarmtimer.c uses 'rtc_class'.
> 
> I cannot yet agree this approach due to only block_class and rtc_class.

I thought your main concern was that the class is exported, which is
what several other subsystems do. That the class isn't used outside of
the subsystem directory most likely means that there is no need for
it, rather than that it shouldn't be done at all (depending on the
type of use of course).

In any case not exporting the class object provides a limited
protection against potential abuse of the class at best. To use the
class API all that is needed is a 'struct device' of a devfreq device,
which has a pointer to the class object (dev->class).

Theoretically I could register a fake devfreq device to obtain access
to the class object, though that doesn't seem a very neat approach ;-)

> You added the following comment why devfreq_class instance is necessary.
> Actullay, I don't know the best solution right now. But, all device drivers
> can be added or removed if driver uses the module type. It is not a problem
> for only devfreq instance.

Certainly it's not a problem limited to devfreq devices. In many other
cases bus notifiers can be used, but since devfreq devices areen't
tied to a specific bus this is not an option here.

If you really don't want to export the class we could add wrappers
for (un)registering a class interface:

int devfreq_class_interface_register(struct class_interface *)
void devfreq_class_interface_unregister(struct class_interface *)

The wrappers would have to assign ci->class since the throttler
can't see the class object.

Or add notifiers for device addition/removal, though the throttler
relies on the behavior of the class_interface which also notifies
about devices added before registration. This might not be what other
potential users of the notifiers expect.

Thanks

Matthias

> /*
> +  * devfreq devices can be added and removed at runtime, hence they
> +  * must also be handled dynamically. The class_interface notifies us
> +  * whenever a device is added or removed. When the interface is
> +  * registered ci->add_dev() is called for all existing devfreq
> +  * devices.
> */
> 
> 
> > grep "extern struct class " include/linux/ -R
> > include/linux/rio.h:extern struct class rio_mport_class;
> rio_mport_class is created on drivers/rapidio/rio-drivers.c.
> It means that just device driver create the 'rio_mport_class' class
> instead of any linux kernel subsystem.
> 
> > include/linux/tty.h:extern struct class *tty_class;
> tty_class is not used on outside of drivers/tty
> 
> > include/linux/fb.h:extern struct class *fb_class;
> fb_class is not used on outside of drivers/video/fbdev
> 
> > include/linux/ide.h:extern struct class *ide_port_class;
> ide_port_class is not used on outside of drivers/ide.
> 
> > include/linux/device.h:extern struct class * __must_check 
> > __class_create(struct module *owner,
> 
> > include/linux/devfreq.h:extern struct class *devfreq_class;
> not yet
> 
> > include/linux/switchtec.h:extern struct class *switchtec_class;
> switchtec_class is created on drivers/pci/switch/switchtec.c
> and then switchtec_class is only used on 
> drivers/ntb/hw/mscc/ntb_hw_switchtec.c.
> It is not subsystem. Just switchtec.c device driver makes the their own class.
> 
> > include/linux/input.h:extern struct class input_class;
>   input_class is not used on outside of drivers/input.
> 
> > include/linux/power_supply.h:extern struct class *power_supply_class;
>   power_supply_class is not used on outside of drivers/power/supply.
> 
> 
> > include/linux/genhd.h:extern struct class block_class;
>   drivers/base/devtmpfs.c uses 'block_class'.
> 
> > include/linux/rtc.h:extern struct class *rtc_class;
>   kernel/time/alarmtimer.c uses 'rtc_class'.
> 
> > 
> > struct class_interface and class_interface_register() would be
> > pointless without exported classes.
> > 
> > My

Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver

2018-07-16 Thread David Hildenbrand

On 11.06.2018 14:33, David Hildenbrand wrote:
> On 11.06.2018 13:56, Michal Hocko wrote:
>> On Mon 11-06-18 13:53:49, David Hildenbrand wrote:
>>> On 24.05.2018 23:07, David Hildenbrand wrote:
 On 24.05.2018 16:22, Michal Hocko wrote:
> I will go over the rest of the email later I just wanted to make this
> point clear because I suspect we are talking past each other.

 It sounds like we are now talking about how to solve the problem. I like
 that :)

>>>
>>> Hi Michal,
>>>
>>> did you have time to think about the details of your proposed idea?
>>
>> Not really. Sorry about that. It's been busy time. I am planning to
>> revisit after merge window closes.
>>
> 
> Sure no worries, I still have a bunch of other things to work on. But it
> would be nice to clarify soon in which direction I have to head to get
> this implemented and upstream (e.g. what I proposed, what you proposed
> or maybe something different).
> 
I would really like to make progress here.

I pointed out basic problems/questions with the proposed alternative. I
think I answered all your questions. But you also said that you are not
going to accept the current approach. So some decision has to be made.

Although it's very demotivating and frustrating (I hope not all work in
the MM area will be like this), if there is no guidance on how to
proceed, I'll have to switch to adding/removing/onlining/offlining whole
segments. This is not what I want, but maybe this has a higher chance of
getting reviews/acks.

Understanding that you are busy, please if you make suggestions, follow
up on responses.

-- 

Thanks,

David / dhildenb

Re: [PATCH 4.4 00/43] 4.4.141-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 09:36:05AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.141 release.
> There are 43 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jul 18 07:34:50 UTC 2018.
> Anything received after that time might be too late.
> 

Build results:
total: 148 pass: 148 fail: 0
Qemu test results:
total: 149 pass: 149 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

Re: [PATCH 4.17 00/67] 4.17.7-stable review

2018-07-16 Thread Greg Kroah-Hartman

On Mon, Jul 16, 2018 at 09:33:38AM -0700, Guenter Roeck wrote:
> On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.7 release.
> > There are 67 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jul 18 07:34:11 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 134 pass: 134 fail: 0
> Qemu test results:
>   total: 171 pass: 159 fail: 12
> Failed tests: 
>   i386:Broadwell:q35:defconfig:smp:rootfs 
>   i386:Skylake-Client:q35:defconfig:smp:rootfs 
>   i386:SandyBridge:q35:defconfig:smp:rootfs
>   i386:Haswell:pc:defconfig:smp:rootfs 
>   i386:Nehalem:q35:defconfig:smp:rootfs 
>   i386:phenom:pc:defconfig:smp:rootfs 
>   i386:Opteron_G5:q35:defconfig:smp:initrd 
>   i386:Westmere:q35:defconfig:smp:initrd 
>   i386:core2duo:q35:defconfig:nosmp:rootfs 
>   i386:Conroe:pc:defconfig:nosmp:rootfs 
>   i386:Opteron_G1:pc:defconfig:nosmp:initrd
>   i386:n270:q35:defconfig:nosmp:rootfs
> 
> All 32-bit i386 boot tests crash.
> 
> [0.00] BUG: unable to handle kernel NULL pointer dereference at 
> 
> [0.00] *pde =  
> [0.00] Oops: 0002 [#1] SMP
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1
> [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24
> [0.00] EIP: zero_resv_unavail+0x9b/0x103
> [0.00] EFLAGS: 0022 CPU: 0
> [0.00] EAX:  EBX:  ECX: 0008 EDX: 
> [0.00] ESI: c6bc5e90 EDI:  EBP: c6bc5eac ESP: c6bc5e80
> [0.00]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [0.00] CR0: 80050033 CR2:  CR3: 06d67000 CR4: 00040690
> [0.00] Call Trace:
> [0.00]  free_area_init_nodes+0x407/0x44c
> [0.00]  zone_sizes_init+0x45/0x5c
> [0.00]  paging_init+0xac/0xaf
> [0.00]  native_pagetable_init+0xc6/0xce
> [0.00]  setup_arch+0x9c6/0xb18
> [0.00]  start_kernel+0x4f/0x3d2
> [0.00]  ? idt_setup_early_handler+0x2b/0x3e
> [0.00]  i386_start_kernel+0x95/0x99
> [0.00]  startup_32_smp+0x164/0x168

Yeah, Linus's tree also crashes from this same reason :(

A patch was just posted to hopefully fix this...

greg k-h

[PATCH V2 2/2] firmware: ti_sci: Provide host-id as an optional dt parameter

2018-07-16 Thread Nishanth Menon

Texas Instrument's System Control Interface (TISCI) permits the
ability for Operating Systems to running in virtual machines to be
able to independently communicate with the firmware without the need
going through an hypervisor.

The "host-id" in effect is the hardware representation of the
host (example: VMs locked to a core) as identified to the System
Controller.

Provide support as an optional parameter implementation and use the
compatible data as default if one is not provided by device tree.

Signed-off-by: Nishanth Menon 
---
Changes since V1:
* None

V1: https://patchwork.kernel.org/patch/10475309/
RFC: https://patchwork.kernel.org/patch/10447715/

 drivers/firmware/ti_sci.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/ti_sci.c b/drivers/firmware/ti_sci.c
index 7fa744793bc5..69ed1464175c 100644
--- a/drivers/firmware/ti_sci.c
+++ b/drivers/firmware/ti_sci.c
@@ -66,14 +66,14 @@ struct ti_sci_xfers_info {
 
 /**
  * struct ti_sci_desc - Description of SoC integration
- * @host_id:   Host identifier representing the compute entity
+ * @default_host_id:   Host identifier representing the compute entity
  * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds)
  * @max_msgs: Maximum number of messages that can be pending
  *   simultaneously in the system
  * @max_msg_size: Maximum size of data per message that can be handled.
  */
 struct ti_sci_desc {
-   u8 host_id;
+   u8 default_host_id;
int max_rx_timeout_ms;
int max_msgs;
int max_msg_size;
@@ -94,6 +94,7 @@ struct ti_sci_desc {
  * @chan_rx:   Receive mailbox channel
  * @minfo: Message info
  * @node:  list head
+ * @host_id:   Host ID
  * @users: Number of users of this instance
  */
 struct ti_sci_info {
@@ -110,6 +111,7 @@ struct ti_sci_info {
struct mbox_chan *chan_rx;
struct ti_sci_xfers_info minfo;
struct list_head node;
+   u8 host_id;
/* protected by ti_sci_list_mutex */
int users;
 
@@ -370,7 +372,7 @@ static struct ti_sci_xfer *ti_sci_get_one_xfer(struct 
ti_sci_info *info,
 
hdr->seq = xfer_id;
hdr->type = msg_type;
-   hdr->host = info->desc->host_id;
+   hdr->host = info->host_id;
hdr->flags = msg_flags;
 
return xfer;
@@ -1793,7 +1795,7 @@ static int tisci_reboot_handler(struct notifier_block 
*nb, unsigned long mode,
 
 /* Description for K2G */
 static const struct ti_sci_desc ti_sci_pmmc_k2g_desc = {
-   .host_id = 2,
+   .default_host_id = 2,
/* Conservative duration */
.max_rx_timeout_ms = 1000,
/* Limited by MBOX_TX_QUEUE_LEN. K2G can handle upto 128 messages! */
@@ -1819,6 +1821,7 @@ static int ti_sci_probe(struct platform_device *pdev)
int ret = -EINVAL;
int i;
int reboot = 0;
+   u32 h_id;
 
of_id = of_match_device(ti_sci_of_match, dev);
if (!of_id) {
@@ -1833,6 +1836,19 @@ static int ti_sci_probe(struct platform_device *pdev)
 
info->dev = dev;
info->desc = desc;
+   ret = of_property_read_u32(dev->of_node, "ti,host-id", _id);
+   /* if the property is not present in DT, use a default from desc */
+   if (ret < 0) {
+   info->host_id = info->desc->default_host_id;
+   } else {
+   if (!h_id) {
+   dev_warn(dev, "Host ID 0 is reserved for firmware\n");
+   info->host_id = info->desc->default_host_id;
+   } else {
+   info->host_id = h_id;
+   }
+   }
+
reboot = of_property_read_bool(dev->of_node,
   "ti,system-reboot-controller");
INIT_LIST_HEAD(>node);
-- 
2.15.1

[PATCH V2 0/2] firmware: ti_sci: Add host-id as an optional parameter

2018-07-16 Thread Nishanth Menon

Please find attached series to enable host-id as an optional dt property.

This is a minor update to V1 -> Mostly to pick up Greet's feedback and Rob's 
Ack.

V1: https://patchwork.ozlabs.org/cover/931822/

The series is based on v4.18-rc1 and is available here:
https://github.com/nmenon/linux-2.6-playground/commits/upstream/v4.18-rc1/k3-1-am6-tisci

Consolidated all patches (including all series) are available here:
https://github.com/nmenon/linux-2.6-playground/commits/upstream/v4.18-rc1/k3-am6-integ

Full Boot log (integrated of all series for AM654) is available here:
   https://pastebin.ubuntu.com/p/bBFmnzYtCd/

Nishanth Menon (2):
  Documentation: dt: keystone: ti-sci: Add optional host-id parameter
  firmware: ti_sci: Provide host-id as an optional dt parameter

 .../devicetree/bindings/arm/keystone/ti,sci.txt|  4 
 drivers/firmware/ti_sci.c  | 24 ++
 2 files changed, 24 insertions(+), 4 deletions(-)

-- 
2.15.1

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread John Stultz

On Mon, Jul 16, 2018 at 9:30 AM, John Stultz  wrote:
> On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha  wrote:
>> On 7/13/2018 10:50 PM, John Stultz wrote:
>>> On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha 
 On 7/11/2018 1:43 AM, John Stultz wrote:
> I worry this upside-down logic is too subtle to be easily reasoned
> about, and will just lead to future mistakes.
>
> Can we instead call this "suspend_timing_needed" and only set it to
> true when we don't inject any sleep time on resume?


 I did not get your point "only set it to true when we don't inject any
 sleep
 time on resume? "
 How do we know  this ?
 This question itself depends on the "sleeptime_injected" if it is true
 means
 no need to inject else need to inject.

 Also, we need to make this variable back and forth true, false; suspends
 path ensures it to make it false.
>>>
>>> So yea, I'm not saying logically the code is really any different,
>>> this is more of a naming nit. So instead of having a variable that is
>>> always on that we occasionally turn off, lets invert the naming and
>>> have it be a flag that we occasionally turn on.
>>
>>
>> I understand your concern about the name of the variable will be misleading.
>> But the changing Boolean state would not solve the actual issue.
>>
>> If i understand you correctly you meant below code
>>
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 32ae9ae..becc5bd 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts)
>>   * If a suspend fails before reaching timekeeping_resume() then the flag
>>   * stays true and prevents erroneous sleeptime injection.
>>   */
>> -static bool sleeptime_injected = true;
>> +static bool suspend_timing_needed;
>>
>>  /* Flag for if there is a persistent clock on this platform */
>>  static bool persistent_clock_exists;
>> @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64
>> *delta)
>> raw_spin_lock_irqsave(_lock, flags);
>> write_seqcount_begin(_core.seq);
>>
>> -   sleeptime_injected = true;
>> +   suspend_timing_needed = false;
>>
>> timekeeping_forward_now(tk);
>>
>> @@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
>>   tk->tkr_mono.mask);
>> nsec = mul_u64_u32_shr(cyc_delta, clock->mult,
>> clock->shift);
>> ts_delta = ns_to_timespec64(nsec);
>> -   sleeptime_injected = true;
>> +   suspend_timing_needed = true;
>> } else if (timespec64_compare(_new, _suspend_time) >
>> 0) {
>> ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
>> -   sleeptime_injected = true;
>> +   suspend_timing_needed = true;
>> }
>
> No no... This part is wrong. We only set suspend_timing_needed if we
> *didn't* calculate the suspend time in timekeeping_resume.
>
> You have to invert all the boolean logic for it to be equivalent.
>
...
>>   
>
>
> So, I think with the logic bug above it will work out properly, but
> let me know if I'm still missing something.

Sorry, I meant "with the logic bug above fixed it will work out".

thanks
-john

[PATCH V2 1/2] Documentation: dt: keystone: ti-sci: Add optional host-id parameter

2018-07-16 Thread Nishanth Menon

Texas Instrument's System Control Interface (TISCI) permits
the ability for OSs running in virtual machines to be able to
independently communicate with the firmware without the need going
through an hypervisor.

The "host-id" in effect is the hardware representation of the
host (example: VMs locked to a core) as identified to the System
Controller. Hypervisors can either fill in appropriate host-ids in dt
used for each VM instance OR may use prebuilt blobs where the host-ids
are pre-populated, as appropriate for the OS running in the VMs.

This is introduced as an optional parameter to maintain consistency
with legacy device tree blobs.

Reviewed-by: Rob Herring 
Signed-off-by: Nishanth Menon 
---

Changes since V1:
* Dropped the SCSI comment in commit log
* Picked up Rob's Reviewed by.

V1: https://patchwork.kernel.org/patch/10475311/
RFC: https://patchwork.kernel.org/patch/10447753/

 Documentation/devicetree/bindings/arm/keystone/ti,sci.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt 
b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt
index 31f5f9a104cc..b56a02c10ae6 100644
--- a/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt
+++ b/Documentation/devicetree/bindings/arm/keystone/ti,sci.txt
@@ -45,11 +45,15 @@ Optional Properties:
debug_messages - Map the Debug message region
 - reg:  register space corresponding to the debug_messages
 - ti,system-reboot-controller: If system reboot can be triggered by SoC reboot
+- ti,host-id: Integer value corresponding to the host ID assigned by Firmware
+   for identification of host processing entities such as virtual
+   machines
 
 Example (K2G):
 -
pmmc: pmmc {
compatible = "ti,k2g-sci";
+   ti,host-id = <2>;
mbox-names = "rx", "tx";
mboxes= < _proxy_pmmc_rx>,
< _proxy_pmmc_tx>;
-- 
2.15.1

[PATCH v6 0/5] sparse_init rewrite

2018-07-16 Thread Pavel Tatashin

Changelog:
v6 - v5
- Removed one more obsolete function, as Oscar noticed
- Replaced BUG_ON with WARN_ON, ALIGN with PTR_ALIGN.
- Added review-by, and test-by.

v5 - v4
- Fixed the issue that was reported on ppc64 when
  CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER is removed
- Consolidated the new buffer allocation between vmemmap
  and non-vmemmap variants of sparse layout.
- Removed all review-by comments, because I had to do
  significant amount of changes compared to previous version
  and need another round of review.
- I also would appreciate if those who reported problems with
  PPC64 could test this change.
v4 - v3
- Addressed comments from Dave Hansen
v3 - v1
- Fixed two issues found by Baoquan He
v1 - v2
- Addressed comments from Oscar Salvador

In sparse_init() we allocate two large buffers to temporary hold usemap and
memmap for the whole machine. However, we can avoid doing that if we
changed sparse_init() to operated on per-node bases instead of doing it on
the whole machine beforehand.

As shown by Baoquan
http://lkml.kernel.org/r/20180628062857.29658-1-...@redhat.com

The buffers are large enough to cause machine stop to boot on small memory
systems.

Another benefit of these changes is that they also obsolete
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER.

Pavel Tatashin (5):
  mm/sparse: abstract sparse buffer allocations
  mm/sparse: use the new sparse buffer functions in non-vmemmap
  mm/sparse: move buffer init/fini to the common place
  mm/sparse: add new sparse_init_nid() and sparse_init()
  mm/sparse: delete old sparse_init and enable new one

 include/linux/mm.h  |   7 +-
 mm/Kconfig  |   4 -
 mm/sparse-vmemmap.c |  59 +
 mm/sparse.c | 314 ++--
 4 files changed, 102 insertions(+), 282 deletions(-)

-- 
2.18.0

[PATCH memory-model 05/14] tools/memory-model: Remove ACCESS_ONCE() from recipes

2018-07-16 Thread Paul E. McKenney

From: Mark Rutland 

Since commit:

  b899a850431e2dd0 ("compiler.h: Remove ACCESS_ONCE()")

... there has been no definition of ACCESS_ONCE() in the kernel tree,
and it has been necessary to use READ_ONCE() or WRITE_ONCE() instead.

Let's update the exmaples in recipes.txt likewise for consistency, using
READ_ONCE() for reads.

Signed-off-by: Mark Rutland 
Cc: Alan Stern 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Nicholas Piggin 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Luc Maranget 
Cc: Akira Yokosawa 
Acked-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/Documentation/recipes.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/Documentation/recipes.txt 
b/tools/memory-model/Documentation/recipes.txt
index ee4309a87fc4..1fea8ef2b184 100644
--- a/tools/memory-model/Documentation/recipes.txt
+++ b/tools/memory-model/Documentation/recipes.txt
@@ -322,9 +322,9 @@ the following write-side code fragment:
 And the xlog_valid_lsn() function in fs/xfs/xfs_log_priv.h contains
 the corresponding read-side code fragment:
 
-   cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
+   cur_cycle = READ_ONCE(log->l_curr_cycle);
smp_rmb();
-   cur_block = ACCESS_ONCE(log->l_curr_block);
+   cur_block = READ_ONCE(log->l_curr_block);
 
 Alternatively, consider the following comment in function
 perf_output_put_handle() in kernel/events/ring_buffer.c:
-- 
2.17.1

[PATCH memory-model 03/14] MAINTAINERS: Add Daniel Lustig as an LKMM reviewer

2018-07-16 Thread Paul E. McKenney

From: Palmer Dabbelt 

Dan runs the RISC-V memory model working group.  I've been forwarding
him LKMM emails that end up in my inbox, but I'm far from an expert in
this stuff.  He requested to be added as a reviewer, which seems sane to
me as it'll take a human out of the loop.

CC: Daniel Lustig 
Acked-by: Daniel Lustig 
Acked-by: Andrea Parri 
Signed-off-by: Palmer Dabbelt 
Signed-off-by: Paul E. McKenney 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9d5eeff51b5f..ac8ed55dbe9b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8316,6 +8316,7 @@ M:Jade Alglave 
 M: Luc Maranget 
 M: "Paul E. McKenney" 
 R: Akira Yokosawa 
+R: Daniel Lustig 
 L: linux-kernel@vger.kernel.org
 S: Supported
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
-- 
2.17.1

[PATCH memory-model 07/14] tools/memory-model: Make scripts executable

2018-07-16 Thread Paul E. McKenney

This commit makes the scripts executable to avoid the need for everyone
to do so manually in their archive.

Signed-off-by: Paul E. McKenney 
Acked-by: Akira Yokosawa 
---
 tools/memory-model/scripts/checkalllitmus.sh | 2 +-
 tools/memory-model/scripts/checklitmus.sh| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
 mode change 100644 => 100755 tools/memory-model/scripts/checkalllitmus.sh
 mode change 100644 => 100755 tools/memory-model/scripts/checklitmus.sh

diff --git a/tools/memory-model/scripts/checkalllitmus.sh 
b/tools/memory-model/scripts/checkalllitmus.sh
old mode 100644
new mode 100755
index af0aa15ab84e..ca528f9a24d4
--- a/tools/memory-model/scripts/checkalllitmus.sh
+++ b/tools/memory-model/scripts/checkalllitmus.sh
@@ -9,7 +9,7 @@
 # appended.
 #
 # Usage:
-#  sh checkalllitmus.sh [ directory ]
+#  checkalllitmus.sh [ directory ]
 #
 # The LINUX_HERD_OPTIONS environment variable may be used to specify
 # arguments to herd, whose default is defined by the checklitmus.sh script.
diff --git a/tools/memory-model/scripts/checklitmus.sh 
b/tools/memory-model/scripts/checklitmus.sh
old mode 100644
new mode 100755
index e2e477472844..bf12a75c0719
--- a/tools/memory-model/scripts/checklitmus.sh
+++ b/tools/memory-model/scripts/checklitmus.sh
@@ -8,7 +8,7 @@
 # with ".out" appended.
 #
 # Usage:
-#  sh checklitmus.sh file.litmus
+#  checklitmus.sh file.litmus
 #
 # The LINUX_HERD_OPTIONS environment variable may be used to specify
 # arguments to herd, which default to "-conf linux-kernel.cfg".  Thus,
-- 
2.17.1

[PATCH memory-model 06/14] tools/memory-model: Remove ACCESS_ONCE() from model

2018-07-16 Thread Paul E. McKenney

From: Mark Rutland 

Since commit:

  b899a850431e2dd0 ("compiler.h: Remove ACCESS_ONCE()")

... there has been no definition of ACCESS_ONCE() in the kernel tree,
and it has been necessary to use READ_ONCE() or WRITE_ONCE() instead.

Correspondingly, let's remove ACCESS_ONCE() from the kernel memory
model.

Signed-off-by: Mark Rutland 
Cc: Alan Stern 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Nicholas Piggin 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Luc Maranget 
Cc: Akira Yokosawa 
Acked-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
---
 tools/memory-model/linux-kernel.bell | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/memory-model/linux-kernel.bell 
b/tools/memory-model/linux-kernel.bell
index 64f5740e0e75..b84fb2f67109 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -13,7 +13,7 @@
 
 "Linux-kernel memory consistency model"
 
-enum Accesses = 'once (*READ_ONCE,WRITE_ONCE,ACCESS_ONCE*) ||
+enum Accesses = 'once (*READ_ONCE,WRITE_ONCE*) ||
'release (*smp_store_release*) ||
'acquire (*smp_load_acquire*) ||
'noreturn (* R of non-return RMW *)
-- 
2.17.1

Re: [PATCH v6] pidns: introduce syscall translate_pid

2018-07-16 Thread Nagarathnam Muthusamy





On 06/01/2018 12:18 PM, Konstantin Khlebnikov wrote:

Each process have different pids, one for each pid namespace it belongs.
When interaction happens within single pid-ns translation isn't required.
More complicated scenarios needs special handling.

For example:
- reading pid-files or logs written inside container with pid namespace
- writing logs with internal pids outside container for pushing them into
- attaching with ptrace to tasks from different pid namespace

Generally speaking, any cross pid-ns API with pids needs translation.

Currently there are several interfaces that could be used here:

Pid namespaces are identified by device and inode of /proc/[pid]/ns/pid.

Pids for nested pid namespaces are shown in file /proc/[pid]/status.
In some cases pid translation could be easily done using this information.
Backward translation requires scanning all tasks and becomes really
complicated for deeper namespace nesting.

Unix socket automatically translates pid attached to SCM_CREDENTIALS.
This requires CAP_SYS_ADMIN for sending arbitrary pids and entering
into pid namespace, this expose process and could be insecure.

This patch adds new syscall for converting pids between pid namespaces:

pid_t translate_pid(pid_t pid, int source, int target);

Pid-namespaces are referred file descriptors opened to proc files
/proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children.
Negative argument points to current pid namespace.

Syscall returns pid in target pid-ns or zero if task have no pid there.

Error codes:
EBADF- file descriptor is closed
EINVAL   - file descriptor isn't pid namespace
ESRCH- task not found in @source namespace

Translation could breach pid-ns isolation and return pids from outer pid
namespaces iff process already has file descriptor for these namespaces.

Examples:
translate_pid(pid, ns, -1)  - get pid in our pid namespace
translate_pid(pid, -1, ns)  - get pid in other pid namespace
translate_pid(1, ns, -1)- get pid of init task for namespace
translate_pid(pid, -1, ns) > 0  - is pid is reachable from ns?
translate_pid(1, ns1, ns2) > 0  - is ns1 inside ns2?
translate_pid(1, ns1, ns2) == 0 - is ns1 outside ns2?
translate_pid(1, ns1, ns2) == 1 - is ns1 equal ns2?

Signed-off-by: Konstantin Khlebnikov 
Reanimated-by: Nagarathnam Muthusamy 

---

v1: https://lkml.org/lkml/2015/9/15/411
v2: https://lkml.org/lkml/2015/9/24/278
  * use namespace-fd as second/third argument
  * add -pid for getting parent pid
  * move code into kernel/sys.c next to getppid
  * drop ifdef CONFIG_PID_NS
  * add generic syscall
v3: https://lkml.org/lkml/2015/9/28/3
  * use proc_ns_fdget()
  * update description
  * rebase to next-20150925
  * fix conflict with mlock2
v4: https://lkml.org/lkml/2017/10/13/177
  * rename from getvpid() into translate_pid()
  * remove syscall if CONFIG_PID_NS=n
  * drop -pid for parent task
  * drop fget-fdget optimizations
  * add helper get_pid_ns_by_fd()
  * wire only into x86
v5: https://lkml.org/lkml/2018/4/4/677
  * rewrite commit message
  * resolve pidns by task pid or by pidns fd
  * add arguments source_type and target_type
v6:
  * revert back minimized v4 design
  * rebase to next-20180601
  * fix COND_SYSCALL stub
  * use next syscall number, old used for io_pgetevents

--- sample tool ---

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef SYS_translate_pid
#ifdef __x86_64__
#define SYS_translate_pid 334
#elif defined __i386__
#define SYS_translate_pid 386
#endif
#endif

pid_t translate_pid(pid_t pid, int source, int target) {
return syscall(SYS_translate_pid, pid, source, target);
}

int main(int argc, char **argv) {
int pid, source, target;
char buf[64];

if (argc != 4)
errx(1, "usage: %s   ", argv[0]);

pid = atoi(argv[1]);
source = atoi(argv[2]);
target = atoi(argv[3]);

if (source > 0) {
snprintf(buf, sizeof(buf), "/proc/%d/ns/pid", source);
source = open(buf, O_RDONLY);
if (source < 0)
err(2, "open source %s", buf);
}

if (target > 0) {
snprintf(buf, sizeof(buf), "/proc/%d/ns/pid", target);
target = open(buf, O_RDONLY);
if (target < 0)
err(2, "open target %s", buf);
}

pid = translate_pid(pid, source, target);
if (pid < 0)
err(2, "translate_pid");

printf("%d\n", pid);
return 0;
}

---
---
  arch/x86/entry/syscalls/syscall_32.tbl |1
  arch/x86/entry/syscalls/syscall_64.tbl |1
  include/linux/syscalls.h   |1
  kernel/pid_namespace.c |   66 
  kernel/sys_ni.c|3 +
  5 files changed, 72 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index

[PATCH memory-model 12/14] doc: Update wake_up() & co. memory-barrier guarantees

2018-07-16 Thread Paul E. McKenney

From: Andrea Parri 

Both the implementation and the users' expectation [1] for the various
wakeup primitives have evolved over time, but the documentation has not
kept up with these changes: brings it into 2018.

[1] 
http://lkml.kernel.org/r/20180424091510.gb4...@hirez.programming.kicks-ass.net

Suggested-by: Peter Zijlstra 
Signed-off-by: Andrea Parri 
[ aparri: Apply feedback from Alan Stern. ]
Cc: Alan Stern 
Cc: Will Deacon 
Cc: Boqun Feng 
Cc: Nicholas Piggin 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Luc Maranget 
Cc: Akira Yokosawa 
Cc: Daniel Lustig 
Cc: Jonathan Corbet 
Cc: Ingo Molnar 
Acked-by: Peter Zijlstra (Intel) 
Signed-off-by: Paul E. McKenney 
---
 Documentation/memory-barriers.txt | 43 +++
 include/linux/sched.h |  4 +--
 kernel/sched/completion.c |  8 +++---
 kernel/sched/core.c   | 30 +
 kernel/sched/wait.c   |  8 +++---
 5 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index a02d6bbfc9d0..0d8d7ef131e9 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2179,32 +2179,41 @@ or:
event_indicated = 1;
wake_up_process(event_daemon);
 
-A write memory barrier is implied by wake_up() and co.  if and only if they
-wake something up.  The barrier occurs before the task state is cleared, and so
-sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+A general memory barrier is executed by wake_up() if it wakes something up.
+If it doesn't wake anything up then a memory barrier may or may not be
+executed; you must not rely on it.  The barrier occurs before the task state
+is accessed, in particular, it sits between the STORE to indicate the event
+and the STORE to set TASK_RUNNING:
 
-   CPU 1   CPU 2
+   CPU 1 (Sleeper) CPU 2 (Waker)
=== ===
set_current_state();STORE event_indicated
  smp_store_mb();   wake_up();
-   STORE current->state  
-STORE current->state
-   LOAD event_indicated
+   STORE current->state  ...
+
+   LOAD event_indicated  if ((LOAD task->state) & TASK_NORMAL)
+   STORE task->state
 
-To repeat, this write memory barrier is present if and only if something
-is actually awakened.  To see this, consider the following sequence of
-events, where X and Y are both initially zero:
+where "task" is the thread being woken up and it equals CPU 1's "current".
+
+To repeat, a general memory barrier is guaranteed to be executed by wake_up()
+if something is actually awakened, but otherwise there is no such guarantee.
+To see this, consider the following sequence of events, where X and Y are both
+initially zero:
 
CPU 1   CPU 2
=== ===
-   X = 1;  STORE event_indicated
+   X = 1;  Y = 1;
smp_mb();   wake_up();
-   Y = 1;  wait_event(wq, Y == 1);
-   wake_up();load from Y sees 1, no memory barrier
-   load from X might see 0
+   LOAD Y  LOAD X
+
+If a wakeup does occur, one (at least) of the two loads must see 1.  If, on
+the other hand, a wakeup does not occur, both loads might see 0.
 
-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
+wake_up_process() always executes a general memory barrier.  The barrier again
+occurs before the task state is accessed.  In particular, if the wake_up() in
+the previous snippet were replaced by a call to wake_up_process() then one of
+the two loads would be guaranteed to see 1.
 
 The available waker functions include:
 
@@ -2224,6 +2233,8 @@ The available waker functions include:
wake_up_poll();
wake_up_process();
 
+In terms of memory ordering, these functions all provide the same guarantees of
+a wake_up() (or stronger).
 
 [!] Note that the memory barriers implied by the sleeper and the waker do _not_
 order multiple stores before the wake-up with respect to loads of those stored
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 87bf02d93a27..ddfdeb632f74 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -167,8 +167,8 @@ struct task_group;
  *   need_sleep = false;
  *   wake_up_state(p, TASK_UNINTERRUPTIBLE);
  *
- * Where wake_up_state() (and all other wakeup primitives) imply enough
- * barriers to order the store of the variable against wakeup.
+ * where wake_up_state() executes a full memory barrier before accessing the
+

[PATCH memory-model 14/14] tools/memory-model: Rename litmus tests to comply to norm7

2018-07-16 Thread Paul E. McKenney

From: Andrea Parri 

norm7 produces the 'normalized' name of a litmus test,  when the test
can be generated from a single cycle that passes through each process
exactly once. The commit renames such tests in order to comply to the
naming scheme implemented by this tool.

Signed-off-by: Andrea Parri 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Nicholas Piggin 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Luc Maranget 
Cc: "Paul E. McKenney" 
Cc: Akira Yokosawa 
Signed-off-by: Paul E. McKenney 
Acked-by: Alan Stern 
---
 tools/memory-model/Documentation/recipes.txt  |  8 
 tools/memory-model/README | 20 +--
 ... => IRIW+fencembonceonces+OnceOnce.litmus} |  2 +-
 ...=> LB+fencembonceonce+ctrlonceonce.litmus} |  2 +-
 ...+fencewmbonceonce+fencermbonceonce.litmus} |  2 +-
 ...onces.litmus => R+fencembonceonces.litmus} |  2 +-
 tools/memory-model/litmus-tests/README| 16 +++
 ...> S+fencewmbonceonce+poacquireonce.litmus} |  2 +-
 ...nces.litmus => SB+fencembonceonces.litmus} |  2 +-
 ...ooncerelease+fencermbonceonce+Once.litmus} |  2 +-
 ...e+poacquirerelease+fencembonceonce.litmus} |  2 +-
 11 files changed, 30 insertions(+), 30 deletions(-)
 rename tools/memory-model/litmus-tests/{IRIW+mbonceonces+OnceOnce.litmus => 
IRIW+fencembonceonces+OnceOnce.litmus} (95%)
 rename tools/memory-model/litmus-tests/{LB+ctrlonceonce+mbonceonce.litmus => 
LB+fencembonceonce+ctrlonceonce.litmus} (95%)
 rename tools/memory-model/litmus-tests/{MP+wmbonceonce+rmbonceonce.litmus => 
MP+fencewmbonceonce+fencermbonceonce.litmus} (91%)
 rename tools/memory-model/litmus-tests/{R+mbonceonces.litmus => 
R+fencembonceonces.litmus} (95%)
 rename tools/memory-model/litmus-tests/{S+wmbonceonce+poacquireonce.litmus => 
S+fencewmbonceonce+poacquireonce.litmus} (90%)
 rename tools/memory-model/litmus-tests/{SB+mbonceonces.litmus => 
SB+fencembonceonces.litmus} (95%)
 rename 
tools/memory-model/litmus-tests/{WRC+pooncerelease+rmbonceonce+Once.litmus => 
WRC+pooncerelease+fencermbonceonce+Once.litmus} (93%)
 rename 
tools/memory-model/litmus-tests/{Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus
 => Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus} (94%)

diff --git a/tools/memory-model/Documentation/recipes.txt 
b/tools/memory-model/Documentation/recipes.txt
index 1fea8ef2b184..af72700cc20a 100644
--- a/tools/memory-model/Documentation/recipes.txt
+++ b/tools/memory-model/Documentation/recipes.txt
@@ -126,7 +126,7 @@ However, it is not necessarily the case that accesses 
ordered by
 locking will be seen as ordered by CPUs not holding that lock.
 Consider this example:
 
-   /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
+   /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
void CPU0(void)
{
spin_lock();
@@ -292,7 +292,7 @@ and to use smp_load_acquire() instead of smp_rmb().  
However, the older
 smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
 to understand their use cases.  The general approach is shown below:
 
-   /* See MP+wmbonceonce+rmbonceonce.litmus. */
+   /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
@@ -360,7 +360,7 @@ can be seen in the LB+poonceonces.litmus litmus test.
 One way of avoiding the counter-intuitive outcome is through the use of a
 control dependency paired with a full memory barrier:
 
-   /* See LB+ctrlonceonce+mbonceonce.litmus. */
+   /* See LB+fencembonceonce+ctrlonceonce.litmus. */
void CPU0(void)
{
r0 = READ_ONCE(x);
@@ -476,7 +476,7 @@ that one CPU first stores to one variable and then loads 
from a second,
 while another CPU stores to the second variable and then loads from the
 first.  Preserving order requires nothing less than full barriers:
 
-   /* See SB+mbonceonces.litmus. */
+   /* See SB+fencembonceonces.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
diff --git a/tools/memory-model/README b/tools/memory-model/README
index 734f7feaa5dc..ee987ce20aae 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -35,13 +35,13 @@ BASIC USAGE: HERD7
 The memory model is used, in conjunction with "herd7", to exhaustively
 explore the state space of small litmus tests.
 
-For example, to run SB+mbonceonces.litmus against the memory model:
+For example, to run SB+fencembonceonces.litmus against the memory model:
 
-  $ herd7 -conf linux-kernel.cfg litmus-tests/SB+mbonceonces.litmus
+  $ herd7 -conf linux-kernel.cfg litmus-tests/SB+fencembonceonces.litmus
 
 Here is the corresponding output:
 
-  Test SB+mbonceonces Allowed
+  Test SB+fencembonceonces Allowed
   States 3
   0:r0=0; 1:r0=1;
   0:r0=1; 1:r0=0;
@@ -50,8 +50,8 @@ Here is the corresponding output:
   Witnesses
   Positive: 0 Negative: 3
   Condition exists (0:r0=0 /\ 1:r0=0)
-  Observation

[PATCH memory-model 09/14] tools/memory-model: Add informal LKMM documentation to MAINTAINERS

2018-07-16 Thread Paul E. McKenney

The Linux-kernel memory model has been informal, with a number of
text files documenting it.  It would be good to make sure that these
informal descriptions are kept up to date and/or pruned appropriately.
This commit therefore brings more of those text files into the LKMM
MAINTAINERS file entry.

Signed-off-by: Paul E. McKenney 
Cc: Alan Stern 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Nicholas Piggin 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Luc Maranget 
Cc: Akira Yokosawa 
Cc: Daniel Lustig 
Cc: "David S. Miller" 
Acked-by: Andrea Parri 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ac8ed55dbe9b..b60d6daad132 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8318,9 +8318,14 @@ M:   "Paul E. McKenney" 
 R: Akira Yokosawa 
 R: Daniel Lustig 
 L: linux-kernel@vger.kernel.org
+L: linux-a...@vger.kernel.org
 S: Supported
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
 F: tools/memory-model/
+F: Documentation/atomic_bitops.txt
+F: Documentation/atomic_t.txt
+F: Documentation/core-api/atomic_ops.rst
+F: Documentation/core-api/refcount-vs-atomic.rst
 F: Documentation/memory-barriers.txt
 
 LINUX SECURITY MODULE (LSM) FRAMEWORK
-- 
2.17.1

[PATCH memory-model 08/14] docs: atomic_ops: Describe atomic_set as a write operation

2018-07-16 Thread Paul E. McKenney

From: Jonathan Neuschäfer 

The atomic_set() and ATOMIC_INIT() operations are writes, so this
commit changes their description from "reads" to "writes".

Signed-off-by: Jonathan Neuschäfer 
Signed-off-by: Paul E. McKenney 
Reviewed-by: Andrea Parri 
---
 Documentation/core-api/atomic_ops.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/core-api/atomic_ops.rst 
b/Documentation/core-api/atomic_ops.rst
index 2e7165f86f55..724583453e1f 100644
--- a/Documentation/core-api/atomic_ops.rst
+++ b/Documentation/core-api/atomic_ops.rst
@@ -29,7 +29,7 @@ updated by one CPU, local_t is probably more appropriate. 
Please see
 local_t.
 
 The first operations to implement for atomic_t's are the initializers and
-plain reads. ::
+plain writes. ::
 
#define ATOMIC_INIT(i)  { (i) }
#define atomic_set(v, i)((v)->counter = (i))
-- 
2.17.1

[PATCH memory-model 11/14] locking: Clarify requirements for smp_mb__after_spinlock()

2018-07-16 Thread Paul E. McKenney

From: Andrea Parri 

There are 11 interpretations of the requirements described in the header
comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
one currently encoded in the Cat file. Stick to the latter (until a more
satisfactory solution is available).

This also reworks some snippets related to the barrier to illustrate the
requirements and to link them to the idioms which are relied upon at its
call sites.

Suggested-by: Boqun Feng 
Signed-off-by: Andrea Parri 
Acked-by: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Will Deacon 
Cc: "Paul E. McKenney" 
Signed-off-by: Paul E. McKenney 
---
 include/linux/spinlock.h | 53 +++-
 kernel/sched/core.c  | 41 ---
 2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1e8a46435838..d70a06ff2bdd 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -114,29 +114,48 @@ do {  
\
 #endif /*arch_spin_is_contended*/
 
 /*
- * This barrier must provide two things:
+ * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
+ * between program-order earlier lock acquisitions and program-order later
+ * memory accesses.
  *
- *   - it must guarantee a STORE before the spin_lock() is ordered against a
- * LOAD after it, see the comments at its two usage sites.
+ * This guarantees that the following two properties hold:
  *
- *   - it must ensure the critical section is RCsc.
+ *   1) Given the snippet:
  *
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
+ *   { X = 0;  Y = 0; }
  *
- * CPU0CPU1CPU2
+ *   CPU0  CPU1
  *
- * for (;;) {
- *   if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * 
- * 
- * r = X;
+ *   WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
+ *   spin_lock(S); smp_mb();
+ *   smp_mb__after_spinlock(); r1 = READ_ONCE(X);
+ *   r0 = READ_ONCE(Y);
+ *   spin_unlock(S);
  *
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ *  it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
+ *  and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
+ *  preceding the call to smp_mb__after_spinlock() in __schedule() and in
+ *  try_to_wake_up().
+ *
+ *   2) Given the snippet:
+ *
+ *  { X = 0;  Y = 0; }
+ *
+ *  CPU0   CPU1CPU2
+ *
+ *  spin_lock(S);  spin_lock(S);   r1 = READ_ONCE(Y);
+ *  WRITE_ONCE(X, 1);  smp_mb__after_spinlock();   smp_rmb();
+ *  spin_unlock(S);r0 = READ_ONCE(X);  r2 = READ_ONCE(X);
+ * WRITE_ONCE(Y, 1);
+ * spin_unlock(S);
+ *
+ *  it is forbidden that CPU0's critical section executes before CPU1's
+ *  critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
+ *  and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
+ *  preceding the calls to smp_rmb() in try_to_wake_up() for similar
+ *  snippets but "projected" onto two CPUs.
+ *
+ * Property (2) upgrades the lock to an RCsc lock.
  *
  * Since most load-store architectures implement ACQUIRE with an smp_mb() after
  * the LL/SC loop, they need no further barriers. Similarly all our TSO
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78d8facba456..7db0662360f1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int 
state, int wake_flags)
 * be possible to, falsely, observe p->on_rq == 0 and get stuck
 * in smp_cond_load_acquire() below.
 *
-* sched_ttwu_pending() try_to_wake_up()
-*   [S] p->on_rq = 1;  [L] P->state
-*   UNLOCK rq->lock  -.
-*  \
-*   +---   RMB
-* schedule()   /
-*   LOCK rq->lock-'
-*   UNLOCK rq->lock
+* sched_ttwu_pending() try_to_wake_up()
+*   STORE p->on_rq = 1   LOAD p->state
+*   UNLOCK rq->lock
+*
+* __schedule() (switch to task 'p')
+*   LOCK rq->locksmp_rmb();
+*   smp_mb__after_spinlock();
+*   UNLOCK rq->lock
 *
 * [task p]
-*   [S] p->state = UNINTERRUPTIBLE [L]

[PATCH memory-model 01/14] tools/memory-model: Add litmus test for full multicopy atomicity

2018-07-16 Thread Paul E. McKenney

This commit adds a litmus test suggested by Alan Stern that is forbidden
on fully multicopy atomic systems, but allowed on other-multicopy and
on non-multicopy atomic systems.  For reference, s390 is fully multicopy
atomic, x86 and ARMv8 are other-multicopy atomic, and ARMv7 and powerpc
are non-multicopy atomic.

Suggested-by: Alan Stern 
Signed-off-by: Paul E. McKenney 
Acked-by: Alan Stern 
Acked-by: Andrea Parri 
---
 tools/memory-model/litmus-tests/README|  9 ++
 .../SB+rfionceonce-poonceonces.litmus | 32 +++
 2 files changed, 41 insertions(+)
 create mode 100644 
tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus

diff --git a/tools/memory-model/litmus-tests/README 
b/tools/memory-model/litmus-tests/README
index 17eb9a8c222d..00140aaf58b7 100644
--- a/tools/memory-model/litmus-tests/README
+++ b/tools/memory-model/litmus-tests/README
@@ -111,6 +111,15 @@ SB+mbonceonces.litmus
 SB+poonceonces.litmus
As above, but without the smp_mb() invocations.
 
+SB+rfionceonce-poonceonces.litmus
+   This litmus test demonstrates that LKMM is not fully multicopy
+   atomic.  (Neither is it other multicopy atomic.)  This litmus test
+   also demonstrates the "locations" debugging aid, which designates
+   additional registers and locations to be printed out in the dump
+   of final states in the herd7 output.  Without the "locations"
+   statement, only those registers and locations mentioned in the
+   "exists" clause will be printed.
+
 S+poonceonces.litmus
As below, but without the smp_wmb() and acquire load.
 
diff --git a/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus 
b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus
new file mode 100644
index ..04a16603660b
--- /dev/null
+++ b/tools/memory-model/litmus-tests/SB+rfionceonce-poonceonces.litmus
@@ -0,0 +1,32 @@
+C SB+rfionceonce-poonceonces
+
+(*
+ * Result: Sometimes
+ *
+ * This litmus test demonstrates that LKMM is not fully multicopy atomic.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+   int r1;
+   int r2;
+
+   WRITE_ONCE(*x, 1);
+   r1 = READ_ONCE(*x);
+   r2 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y)
+{
+   int r3;
+   int r4;
+
+   WRITE_ONCE(*y, 1);
+   r3 = READ_ONCE(*y);
+   r4 = READ_ONCE(*x);
+}
+
+locations [0:r1; 1:r3; x; y] (* Debug aid: Print things not in "exists". *)
+exists (0:r2=0 /\ 1:r4=0)
-- 
2.17.1

[PATCH memory-model 04/14] locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example

2018-07-16 Thread Paul E. McKenney

From: SeongJae Park 

Translate this commit to Korean:

  5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering 
example")

Signed-off-by: SeongJae Park 
Signed-off-by: Paul E. McKenney 
[ paulmck: Updated based on feedback from Byungchul Park. ]
Acked-by: Byungchul Park 
---
 .../translations/ko_KR/memory-barriers.txt| 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 921739d00f69..7f01fb1c1084 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -1891,22 +1891,22 @@ Mandatory 배리어들은 SMP 시스템에서도 UP 시스템에서도 SMP 효
/* 소유권을 수정 */
desc->status = DEVICE_OWN;
 
-   /* MMIO 를 통해 디바이스에 공지를 하기 전에 메모리를 동기화 */
-   wmb();
-
/* 업데이트된 디스크립터의 디바이스에 공지 */
writel(DESC_NOTIFY, doorbell);
}
 
  dma_rmb() 는 디스크립터로부터 데이터를 읽어오기 전에 디바이스가 소유권을
- 내놓았음을 보장하게 하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시
- 가졌음을 보기 전에 디스크립터에 데이터가 쓰였음을 보장합니다.  wmb() 는
- 캐시 일관성이 없는 (cache incoherent) MMIO 영역에 쓰기를 시도하기 전에
- 캐시 일관성이 있는 메모리 (cache coherent memory) 쓰기가 완료되었음을
- 보장해주기 위해 필요합니다.
-
- consistent memory 에 대한 자세한 내용을 위해선 Documentation/DMA-API.txt
- 문서를 참고하세요.
+ 내려놓았을 것을 보장하고, dma_wmb() 는 디바이스가 자신이 소유권을 다시
+ 가졌음을 보기 전에 디스크립터에 데이터가 쓰였을 것을 보장합니다.  참고로,
+ writel() 을 사용하면 캐시 일관성이 있는 메모리 (cache coherent memory)
+ 쓰기가 MMIO 영역에의 쓰기 전에 완료되었을 것을 보장하므로 writel() 앞에
+ wmb() 를 실행할 필요가 없음을 알아두시기 바랍니다.  writel() 보다 비용이
+ 저렴한 writel_relaxed() 는 이런 보장을 제공하지 않으므로 여기선 사용되지
+ 않아야 합니다.
+
+ writel_relaxed() 와 같은 완화된 I/O 접근자들에 대한 자세한 내용을 위해서는
+ "커널 I/O 배리어의 효과" 섹션을, consistent memory 에 대한 자세한 내용을
+ 위해선 Documentation/DMA-API.txt 문서를 참고하세요.
 
 
 MMIO 쓰기 배리어
-- 
2.17.1

[PATCH memory-model 13/14] memory-model/Documentation: Fix typo, smb->smp

2018-07-16 Thread Paul E. McKenney

From: Yauheni Kaliuta 

The tools/memory-model/Documentation/explanation.txt file says
"For each other CPU C', smb_wmb() forces all po-earlier stores"
This commit therefore replaces the "smb_wmb()" with "smp_wmb()".

Signed-off-by: Yauheni Kaliuta 
Signed-off-by: Paul E. McKenney 
Acked-by: Alan Stern 
---
 tools/memory-model/Documentation/explanation.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/memory-model/Documentation/explanation.txt 
b/tools/memory-model/Documentation/explanation.txt
index 1b09f3175a1f..0cbd1ef8f86d 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -804,7 +804,7 @@ type of fence:
 Second, some types of fence affect the way the memory subsystem
 propagates stores.  When a fence instruction is executed on CPU C:
 
-   For each other CPU C', smb_wmb() forces all po-earlier stores
+   For each other CPU C', smp_wmb() forces all po-earlier stores
on C to propagate to C' before any po-later stores do.
 
For each other CPU C', any store which propagates to C before
-- 
2.17.1

[PATCH memory-model 10/14] sched: Use smp_mb() in wake_woken_function()

2018-07-16 Thread Paul E. McKenney

From: Andrea Parri 

wake_woken_function() synchronizes with wait_woken() as follows:

  [wait_woken]   [wake_woken_function]

  entry->flags &= ~wq_flag_woken;condition = true;
  smp_mb();  smp_wmb();
  if (condition) wq_entry->flags |= wq_flag_woken;
 break;

This commit replaces the above smp_wmb() with an smp_mb() in order to
guarantee that either wait_woken() sees the wait condition being true
or the store to wq_entry->flags in woken_wake_function() follows the
store in wait_woken() in the coherence order (so that the former can
eventually be observed by wait_woken()).

The commit also fixes a comment associated to set_current_state() in
wait_woken(): the comment pairs the barrier in set_current_state() to
the above smp_wmb(), while the actual pairing involves the barrier in
set_current_state() and the barrier executed by the try_to_wake_up()
in wake_woken_function().

Signed-off-by: Andrea Parri 
Cc: Ingo Molnar 
Acked-by: Peter Zijlstra (Intel) 
Signed-off-by: Paul E. McKenney 
---
 kernel/sched/wait.c | 47 -
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 928be527477e..a7a2aaa3026a 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -392,35 +392,36 @@ static inline bool is_kthread_should_stop(void)
  * if (condition)
  * break;
  *
- * p->state = mode;condition = true;
- * smp_mb(); // A  smp_wmb(); // C
- * if (!wq_entry->flags & WQ_FLAG_WOKEN)   wq_entry->flags |= 
WQ_FLAG_WOKEN;
- * schedule()  try_to_wake_up();
- * p->state = TASK_RUNNING;~~
- * wq_entry->flags &= ~WQ_FLAG_WOKEN;  condition = true;
- * smp_mb() // B   smp_wmb(); // C
- * wq_entry->flags |= 
WQ_FLAG_WOKEN;
- * }
- * remove_wait_queue(_head, );
+ * // in wait_woken()  // in woken_wake_function()
  *
+ * p->state = mode;wq_entry->flags |= 
WQ_FLAG_WOKEN;
+ * smp_mb(); // A  try_to_wake_up():
+ * if (!(wq_entry->flags & WQ_FLAG_WOKEN))
+ * schedule() if (p->state & mode)
+ * p->state = TASK_RUNNING;  p->state = 
TASK_RUNNING;
+ * wq_entry->flags &= ~WQ_FLAG_WOKEN;  ~~
+ * smp_mb(); // B  condition = true;
+ * }   smp_mb(); // C
+ * remove_wait_queue(_head, ); wq_entry->flags |= 
WQ_FLAG_WOKEN;
  */
 long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout)
 {
-   set_current_state(mode); /* A */
/*
-* The above implies an smp_mb(), which matches with the smp_wmb() from
-* woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
-* also observe all state before the wakeup.
+* The below executes an smp_mb(), which matches with the full barrier
+* executed by the try_to_wake_up() in woken_wake_function() such that
+* either we see the store to wq_entry->flags in woken_wake_function()
+* or woken_wake_function() sees our store to current->state.
 */
+   set_current_state(mode); /* A */
if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
timeout = schedule_timeout(timeout);
__set_current_state(TASK_RUNNING);
 
/*
-* The below implies an smp_mb(), it too pairs with the smp_wmb() from
-* woken_wake_function() such that we must either observe the wait
-* condition being true _OR_ WQ_FLAG_WOKEN such that we will not miss
-* an event.
+* The below executes an smp_mb(), which matches with the smp_mb() (C)
+* in woken_wake_function() such that either we see the wait condition
+* being true or the store to wq_entry->flags in woken_wake_function()
+* follows ours in the coherence order.
 */
smp_store_mb(wq_entry->flags, wq_entry->flags & ~WQ_FLAG_WOKEN); /* B */
 
@@ -430,14 +431,8 @@ EXPORT_SYMBOL(wait_woken);
 
 int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int 
sync, void *key)
 {
-   /*
-* Although this function is called under waitqueue lock, LOCK
-* doesn't imply write barrier and the users expects write
-* barrier semantics on wakeup functions.  The following
-* smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
-* and is paired with smp_store_mb() in wait_woken().
-*/
-   smp_wmb(); /* C */
+   /* Pairs with the smp_store_mb() in wait_woken(). */
+   smp_mb(); /* C */
wq_entry->flags |= WQ_FLAG_WOKEN;

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread Mukesh Ojha





On 7/16/2018 10:44 PM, John Stultz wrote:

On Mon, Jul 16, 2018 at 9:30 AM, John Stultz  wrote:

On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha  wrote:

On 7/13/2018 10:50 PM, John Stultz wrote:

On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha 

On 7/11/2018 1:43 AM, John Stultz wrote:

I worry this upside-down logic is too subtle to be easily reasoned
about, and will just lead to future mistakes.

Can we instead call this "suspend_timing_needed" and only set it to
true when we don't inject any sleep time on resume?


I did not get your point "only set it to true when we don't inject any
sleep
time on resume? "
How do we know  this ?
This question itself depends on the "sleeptime_injected" if it is true
means
no need to inject else need to inject.

Also, we need to make this variable back and forth true, false; suspends
path ensures it to make it false.

So yea, I'm not saying logically the code is really any different,
this is more of a naming nit. So instead of having a variable that is
always on that we occasionally turn off, lets invert the naming and
have it be a flag that we occasionally turn on.


I understand your concern about the name of the variable will be misleading.
But the changing Boolean state would not solve the actual issue.

If i understand you correctly you meant below code

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 32ae9ae..becc5bd 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64 *ts)
   * If a suspend fails before reaching timekeeping_resume() then the flag
   * stays true and prevents erroneous sleeptime injection.
   */
-static bool sleeptime_injected = true;
+static bool suspend_timing_needed;

  /* Flag for if there is a persistent clock on this platform */
  static bool persistent_clock_exists;
@@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct timespec64
*delta)
 raw_spin_lock_irqsave(_lock, flags);
 write_seqcount_begin(_core.seq);

-   sleeptime_injected = true;
+   suspend_timing_needed = false;

 timekeeping_forward_now(tk);

@@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
   tk->tkr_mono.mask);
 nsec = mul_u64_u32_shr(cyc_delta, clock->mult,
clock->shift);
 ts_delta = ns_to_timespec64(nsec);
-   sleeptime_injected = true;
+   suspend_timing_needed = true;
 } else if (timespec64_compare(_new, _suspend_time) >
0) {
 ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
-   sleeptime_injected = true;
+   suspend_timing_needed = true;
 }

No no... This part is wrong. We only set suspend_timing_needed if we
*didn't* calculate the suspend time in timekeeping_resume.

You have to invert all the boolean logic for it to be equivalent.


...

   


So, I think with the logic bug above it will work out properly, but
let me know if I'm still missing something.


Please give it thought to a case where very first suspend fails with 
your logic.

If i am not able to get your thought, please write a patch.

-Mukesh


Sorry, I meant "with the logic bug above fixed it will work out".

thanks
-john

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread John Stultz

On Mon, Jul 16, 2018 at 11:30 AM, Mukesh Ojha  wrote:
>
>
> On 7/16/2018 10:44 PM, John Stultz wrote:
>>
>> On Mon, Jul 16, 2018 at 9:30 AM, John Stultz 
>> wrote:
>>>
>>> On Mon, Jul 16, 2018 at 9:17 AM, Mukesh Ojha 
>>> wrote:

 On 7/13/2018 10:50 PM, John Stultz wrote:
>
> On Fri, Jul 13, 2018 at 12:13 AM, Mukesh Ojha 
>>
>> On 7/11/2018 1:43 AM, John Stultz wrote:
>>>
>>> I worry this upside-down logic is too subtle to be easily reasoned
>>> about, and will just lead to future mistakes.
>>>
>>> Can we instead call this "suspend_timing_needed" and only set it to
>>> true when we don't inject any sleep time on resume?
>>
>>
>> I did not get your point "only set it to true when we don't inject any
>> sleep
>> time on resume? "
>> How do we know  this ?
>> This question itself depends on the "sleeptime_injected" if it is true
>> means
>> no need to inject else need to inject.
>>
>> Also, we need to make this variable back and forth true, false;
>> suspends
>> path ensures it to make it false.
>
> So yea, I'm not saying logically the code is really any different,
> this is more of a naming nit. So instead of having a variable that is
> always on that we occasionally turn off, lets invert the naming and
> have it be a flag that we occasionally turn on.


 I understand your concern about the name of the variable will be
 misleading.
 But the changing Boolean state would not solve the actual issue.

 If i understand you correctly you meant below code

 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index 32ae9ae..becc5bd 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -1523,7 +1523,7 @@ void __weak read_boot_clock64(struct timespec64
 *ts)
* If a suspend fails before reaching timekeeping_resume() then the
 flag
* stays true and prevents erroneous sleeptime injection.
*/
 -static bool sleeptime_injected = true;
 +static bool suspend_timing_needed;

   /* Flag for if there is a persistent clock on this platform */
   static bool persistent_clock_exists;
 @@ -1658,7 +1658,7 @@ void timekeeping_inject_sleeptime64(struct
 timespec64
 *delta)
  raw_spin_lock_irqsave(_lock, flags);
  write_seqcount_begin(_core.seq);

 -   sleeptime_injected = true;
 +   suspend_timing_needed = false;

  timekeeping_forward_now(tk);

 @@ -1714,10 +1714,10 @@ void timekeeping_resume(void)
tk->tkr_mono.mask);
  nsec = mul_u64_u32_shr(cyc_delta, clock->mult,
 clock->shift);
  ts_delta = ns_to_timespec64(nsec);
 -   sleeptime_injected = true;
 +   suspend_timing_needed = true;
  } else if (timespec64_compare(_new,
 _suspend_time) >
 0) {
  ts_delta = timespec64_sub(ts_new,
 timekeeping_suspend_time);
 -   sleeptime_injected = true;
 +   suspend_timing_needed = true;
  }
>>>
>>> No no... This part is wrong. We only set suspend_timing_needed if we
>>> *didn't* calculate the suspend time in timekeeping_resume.
>>>
>>> You have to invert all the boolean logic for it to be equivalent.
>>>
>> ...


>>>
>>>
>>> So, I think with the logic bug above it will work out properly, but
>>> let me know if I'm still missing something.
>
>
> Please give it thought to a case where very first suspend fails with your
> logic.

I believe I did. If the first suspend fails, we never reach
timekeeping_resume, so we never set "suspend_time_needed = true", so
then timekeeping_rtc_skipresume can then return true, and we don't
inject the time in the RTC code.

> If i am not able to get your thought, please write a patch.

I probably will, but I'd like to encourage you to follow through on
this one. You reported the issue, and submitted a few patches, so I
think it would be good for you to also get the patch credit here. I
don't believe its a complex request I've made, and I think you can
figure it out.

So, please, take one more real stab at this and I'll rework it if it
seems necessary.

thanks
-john

[PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-16 Thread Rik van Riel

Lazy TLB mode can result in an idle CPU being woken up by a TLB flush,
when all it really needs to do is reload %CR3 at the next context switch,
assuming no page table pages got freed.

Memory ordering is used to prevent race conditions between switch_mm_irqs_off,
which checks whether .tlb_gen changed, and the TLB invalidation code, which
increments .tlb_gen whenever page table entries get invalidated.

The atomic increment in inc_mm_tlb_gen is its own barrier; the context
switch code adds an explicit barrier between reading tlbstate.is_lazy and
next->context.tlb_gen.

Unlike the 2016 version of this patch, CPUs with cpu_tlbstate.is_lazy set
are not removed from the mm_cpumask(mm), since that would prevent the TLB
flush IPIs at page table free time from being sent to all the CPUs
that need them.

This patch reduces total CPU use in the system by about 1-2% for a
memcache workload on two socket systems, and by about 1% for a heavily
multi-process netperf between two systems.

Signed-off-by: Rik van Riel 
Acked-by: Dave Hansen 
Tested-by: Song Liu 
---
 arch/x86/mm/tlb.c | 68 +++
 1 file changed, 59 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 4b73fe835c95..26542cc17043 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -185,6 +186,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 {
struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
+   bool was_lazy = this_cpu_read(cpu_tlbstate.is_lazy);
unsigned cpu = smp_processor_id();
u64 next_tlb_gen;
bool need_flush;
@@ -242,17 +244,40 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
   next->context.ctx_id);
 
/*
-* We don't currently support having a real mm loaded without
-* our cpu set in mm_cpumask().  We have all the bookkeeping
-* in place to figure out whether we would need to flush
-* if our cpu were cleared in mm_cpumask(), but we don't
-* currently use it.
+* Even in lazy TLB mode, the CPU should stay set in the
+* mm_cpumask. The TLB shootdown code can figure out from
+* from cpu_tlbstate.is_lazy whether or not to send an IPI.
 */
if (WARN_ON_ONCE(real_prev != _mm &&
 !cpumask_test_cpu(cpu, mm_cpumask(next
cpumask_set_cpu(cpu, mm_cpumask(next));
 
-   return;
+   /*
+* If the CPU is not in lazy TLB mode, we are just switching
+* from one thread in a process to another thread in the same
+* process. No TLB flush required.
+*/
+   if (!was_lazy)
+   return;
+
+   /*
+* Read the tlb_gen to check whether a flush is needed.
+* If the TLB is up to date, just use it.
+* The barrier synchronizes with the tlb_gen increment in
+* the TLB shootdown code.
+*/
+   smp_mb();
+   next_tlb_gen = atomic64_read(>context.tlb_gen);
+   if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
+   next_tlb_gen)
+   return;
+
+   /*
+* TLB contents went out of date while we were in lazy
+* mode. Fall through to the TLB switching code below.
+*/
+   new_asid = prev_asid;
+   need_flush = true;
} else {
u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
 
@@ -454,6 +479,9 @@ static void flush_tlb_func_common(const struct 
flush_tlb_info *f,
 * paging-structure cache to avoid speculatively reading
 * garbage into our TLB.  Since switching to init_mm is barely
 * slower than a minimal flush, just switch to init_mm.
+*
+* This should be rare, with native_flush_tlb_others skipping
+* IPIs to lazy TLB mode CPUs.
 */
switch_mm_irqs_off(NULL, _mm, NULL);
return;
@@ -560,6 +588,9 @@ static void flush_tlb_func_remote(void *info)
 void native_flush_tlb_others(const struct cpumask *cpumask,
 const struct flush_tlb_info *info)
 {
+   cpumask_var_t lazymask;
+   unsigned int cpu;
+
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
if (info->end == TLB_FLUSH_ALL)
trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
@@ -583,8 +614,6 @@ void native_flush_tlb_others(const struct

[PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs

2018-07-16 Thread Rik van Riel

CPUs in !is_lazy have either received TLB flush IPIs earlier on during
the munmap (when the user memory was unmapped), or have context switched
and reloaded during that stage of the munmap.

Page table free TLB flushes only need to be sent to CPUs in lazy TLB
mode, which TLB contents might not yet be up to date yet.

Signed-off-by: Rik van Riel 
Acked-by: Dave Hansen 
Tested-by: Song Liu 
---
 arch/x86/mm/tlb.c | 43 +++
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 26542cc17043..e4156e37aa71 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -712,15 +712,50 @@ void tlb_flush_remove_tables_local(void *arg)
}
 }
 
+static void mm_fill_lazy_tlb_cpu_mask(struct mm_struct *mm,
+ struct cpumask *lazy_cpus)
+{
+   int cpu;
+
+   for_each_cpu(cpu, mm_cpumask(mm)) {
+   if (!per_cpu(cpu_tlbstate.is_lazy, cpu))
+   cpumask_set_cpu(cpu, lazy_cpus);
+   }
+}
+
 void tlb_flush_remove_tables(struct mm_struct *mm)
 {
int cpu = get_cpu();
+   cpumask_var_t lazy_cpus;
+
+   if (cpumask_any_but(mm_cpumask(mm), cpu) >= nr_cpu_ids) {
+   put_cpu();
+   return;
+   }
+
+   if (!zalloc_cpumask_var(_cpus, GFP_ATOMIC)) {
+   /*
+* If the cpumask allocation fails, do a brute force flush
+* on all the CPUs that have this mm loaded.
+*/
+   smp_call_function_many(mm_cpumask(mm),
+   tlb_flush_remove_tables_local, (void *)mm, 1);
+   put_cpu();
+   return;
+   }
+
/*
-* XXX: this really only needs to be called for CPUs in lazy TLB mode.
+* CPUs with !is_lazy either received a TLB flush IPI while the user
+* pages in this address range were unmapped, or have context switched
+* and reloaded %CR3 since then.
+*
+* Shootdown IPIs at page table freeing time only need to be sent to
+* CPUs that may have out of date TLB contents.
 */
-   if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
-   smp_call_function_many(mm_cpumask(mm), 
tlb_flush_remove_tables_local, (void *)mm, 1);
-
+   mm_fill_lazy_tlb_cpu_mask(mm, lazy_cpus);
+   smp_call_function_many(lazy_cpus,
+   tlb_flush_remove_tables_local, (void *)mm, 1);
+   free_cpumask_var(lazy_cpus);
put_cpu();
 }
 
-- 
2.14.4

[PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm

2018-07-16 Thread Rik van Riel

Song noticed switch_mm_irqs_off taking a lot of CPU time in recent
kernels,using 1.8% of a 48 CPU system during a netperf to localhost run.
Digging into the profile, we noticed that cpumask_clear_cpu and
cpumask_set_cpu together take about half of the CPU time taken by
switch_mm_irqs_off.

However, the CPUs running netperf end up switching back and forth
between netperf and the idle task, which does not require changes
to the mm_cpumask. Furthermore, the init_mm cpumask ends up being
the most heavily contended one in the system.

Simply skipping changes to mm_cpumask(_mm) reduces overhead.

Signed-off-by: Rik van Riel 
Acked-by: Dave Hansen 
Reported-and-tested-by: Song Liu 
---
 arch/x86/mm/tlb.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 493559cae2d5..f086195f644c 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -310,15 +310,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
sync_current_stack_to_mm(next);
}
 
-   /* Stop remote flushes for the previous mm */
-   VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, mm_cpumask(real_prev)) &&
-   real_prev != _mm);
-   cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+   /*
+* Stop remote flushes for the previous mm.
+* Skip kernel threads; we never send init_mm TLB flushing IPIs,
+* but the bitmap manipulation can cause cache line contention.
+*/
+   if (real_prev != _mm) {
+   VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
+   mm_cpumask(real_prev)));
+   cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+   }
 
/*
 * Start remote flushes and then read tlb_gen.
 */
-   cpumask_set_cpu(cpu, mm_cpumask(next));
+   if (next != _mm)
+   cpumask_set_cpu(cpu, mm_cpumask(next));
next_tlb_gen = atomic64_read(>context.tlb_gen);
 
choose_new_asid(next, next_tlb_gen, _asid, _flush);
-- 
2.14.4

[PATCH 6/7] x86,mm: always use lazy TLB mode

2018-07-16 Thread Rik van Riel

Now that CPUs in lazy TLB mode no longer receive TLB shootdown IPIs, except
at page table freeing time, and idle CPUs will no longer get shootdown IPIs
for things like mprotect and madvise, we can always use lazy TLB mode.

Signed-off-by: Rik van Riel 
Acked-by: Dave Hansen 
Tested-by: Song Liu 
---
 arch/x86/include/asm/tlbflush.h | 16 
 arch/x86/mm/tlb.c   | 15 +--
 2 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 3aa3204b5dc0..511bf5fae8b8 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -148,22 +148,6 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, 
u16 asid)
 #define __flush_tlb_one_user(addr) __native_flush_tlb_one_user(addr)
 #endif
 
-static inline bool tlb_defer_switch_to_init_mm(void)
-{
-   /*
-* If we have PCID, then switching to init_mm is reasonably
-* fast.  If we don't have PCID, then switching to init_mm is
-* quite slow, so we try to defer it in the hopes that we can
-* avoid it entirely.  The latter approach runs the risk of
-* receiving otherwise unnecessary IPIs.
-*
-* This choice is just a heuristic.  The tlb code can handle this
-* function returning true or false regardless of whether we have
-* PCID.
-*/
-   return !static_cpu_has(X86_FEATURE_PCID);
-}
-
 struct tlb_context {
u64 ctx_id;
u64 tlb_gen;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e4156e37aa71..493559cae2d5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -379,20 +379,7 @@ void enter_lazy_tlb(struct mm_struct *mm, struct 
task_struct *tsk)
if (this_cpu_read(cpu_tlbstate.loaded_mm) == _mm)
return;
 
-   if (tlb_defer_switch_to_init_mm()) {
-   /*
-* There's a significant optimization that may be possible
-* here.  We have accurate enough TLB flush tracking that we
-* don't need to maintain coherence of TLB per se when we're
-* lazy.  We do, however, need to maintain coherence of
-* paging-structure caches.  We could, in principle, leave our
-* old mm loaded and only switch to init_mm when
-* tlb_remove_page() happens.
-*/
-   this_cpu_write(cpu_tlbstate.is_lazy, true);
-   } else {
-   switch_mm(NULL, _mm, NULL);
-   }
+   this_cpu_write(cpu_tlbstate.is_lazy, true);
 }
 
 /*
-- 
2.14.4

[PATCH 3/7] x86,mm: restructure switch_mm_irqs_off

2018-07-16 Thread Rik van Riel

Move some code that will be needed for the lazy -> !lazy state
transition when a lazy TLB CPU has gotten out of date.

No functional changes, since the if (real_prev == next) branch
always returns.

Signed-off-by: Rik van Riel 
Acked-by: Dave Hansen 
Suggested-by: Andy Lutomirski 
---
 arch/x86/mm/tlb.c | 60 +++
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 9a893673c56b..4b73fe835c95 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -187,6 +187,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
u16 prev_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
unsigned cpu = smp_processor_id();
u64 next_tlb_gen;
+   bool need_flush;
+   u16 new_asid;
 
/*
 * NB: The scheduler will call us with prev == next when switching
@@ -252,8 +254,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
 
return;
} else {
-   u16 new_asid;
-   bool need_flush;
u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
 
/*
@@ -297,41 +297,41 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
mm_struct *next,
next_tlb_gen = atomic64_read(>context.tlb_gen);
 
choose_new_asid(next, next_tlb_gen, _asid, _flush);
+   }
 
-   if (need_flush) {
-   this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, 
next->context.ctx_id);
-   this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, 
next_tlb_gen);
-   load_new_mm_cr3(next->pgd, new_asid, true);
-
-   /*
-* NB: This gets called via leave_mm() in the idle path
-* where RCU functions differently.  Tracing normally
-* uses RCU, so we need to use the _rcuidle variant.
-*
-* (There is no good reason for this.  The idle code 
should
-*  be rearranged to call this before rcu_idle_enter().)
-*/
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
-   } else {
-   /* The new ASID is already up to date. */
-   load_new_mm_cr3(next->pgd, new_asid, false);
-
-   /* See above wrt _rcuidle. */
-   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
-   }
+   if (need_flush) {
+   this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, 
next->context.ctx_id);
+   this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, 
next_tlb_gen);
+   load_new_mm_cr3(next->pgd, new_asid, true);
 
/*
-* Record last user mm's context id, so we can avoid
-* flushing branch buffer with IBPB if we switch back
-* to the same user.
+* NB: This gets called via leave_mm() in the idle path
+* where RCU functions differently.  Tracing normally
+* uses RCU, so we need to use the _rcuidle variant.
+*
+* (There is no good reason for this.  The idle code should
+*  be rearranged to call this before rcu_idle_enter().)
 */
-   if (next != _mm)
-   this_cpu_write(cpu_tlbstate.last_ctx_id, 
next->context.ctx_id);
+   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
+   } else {
+   /* The new ASID is already up to date. */
+   load_new_mm_cr3(next->pgd, new_asid, false);
 
-   this_cpu_write(cpu_tlbstate.loaded_mm, next);
-   this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
+   /* See above wrt _rcuidle. */
+   trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
}
 
+   /*
+* Record last user mm's context id, so we can avoid
+* flushing branch buffer with IBPB if we switch back
+* to the same user.
+*/
+   if (next != _mm)
+   this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
+   this_cpu_write(cpu_tlbstate.loaded_mm, next);
+   this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
+
load_mm_cr4(next);
switch_ldt(real_prev, next);
 }
-- 
2.14.4

Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks

2018-07-16 Thread Eric W. Biederman

Linus Torvalds  writes:

> On Mon, Jul 16, 2018 at 8:08 AM Eric W. Biederman  
> wrote:
>>
>> The change for global init is it will now die if init is a member of the
>> session or init is using this tty as it's controlling tty.
>>
>> Semantically killing init with SAK is completely appropriate.
>
> No.
>
> Semtnaitcally killing init is completely wrong. Because it will kill
> the whole system.
>
> And I don't mean that in "now init won't spawn new things". I mean
> that in "now we don't have a child reaper any more, and the system
> will be dead because we'll panic on exit".
>
> So it's not about the controlling tty, it's about fundamental kernel
> internal consistency guarantees.
>
> See
>
> write_unlock_irq(_lock);
> if (unlikely(pid_ns == _pid_ns)) {
> panic("Attempted to kill init! exitcode=0x%08x\n",
> father->signal->group_exit_code ?: father->exit_code);
> }
>
> in kernel/exit.c.

I should have said it doesn't matter because init does not open ttys and
become a member of session groups.  Or at least it never has in my
experience.  The only way I know to get that behavior is to boot with
init=/bin/bash.

With the force_sig already in do_SAK today my change is not a
regression.  As force_sig in a completely different way forces the
signal to init.


Looking deeper, all of the silliness with SEND_SIG_FORCED and
force_sig_info is to guarantee delivery of synchronous exceptions even
to init.

So I think we want the patch below to clean that up.  Then we don't have
to worry about the wrong things sending signals to init by accident, and
SEND_SIG_FORCED becomes just SEND_SIG_PRIV that skips the unnecesary
allocation of a siginfo struct.

Thoughts?

Eric


From: "Eric W. Biederman" 
Date: Mon, 16 Jul 2018 13:29:04 -0500
Subject: [PATCH] signal: Cleanup delivery of exceptions to init

- Stop clearing SIGNAL_UNKILLABLE. It makes interaction by
  the process with other signals problematic, and exceptions
  are not necessarily fatal.

- Don't allow SIGKILL and SIGSTOP to the global init.
  It never helps and it it can only make things worse.

- Explicitly allow exceptions to any kind of init.  They are
  exceptions and synchronous and need to be handled somehow.
  Init can setup a handler or deal with the default action.

  This is not a change it is just code movement from
  force_sig_info into send_signal and get_signal.

- Treat all signals from the kernel as if they are from an ancestor
  pid namespace.

- Take out the overrides of SIGNAL_UNKILLABLE from force_sig_info
  The changes to send_signal and get_signal make them unnecessary.

- Take out the SEND_SIG_FORCED overrides from prepare_signal.
  The changes to send_signal makes it redundant.

- Rename force back to from_ancestor_ns as that makes the logic
  with respect to namespaces clearer and logically if the kernel
  is sending you a signal it is from your ancestor namespace.

Signed-off-by: "Eric W. Biederman" 
---
 kernel/signal.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 94296afacf44..298f5c690681 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -72,20 +72,21 @@ static int sig_handler_ignored(void __user *handler, int 
sig)
(handler == SIG_DFL && sig_kernel_ignore(sig));
 }
 
-static int sig_task_ignored(struct task_struct *t, int sig, bool force)
+static int sig_task_ignored(struct task_struct *t, int sig, bool 
from_ancestor_ns)
 {
void __user *handler;
 
handler = sig_handler(t, sig);
 
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
-   handler == SIG_DFL && !(force && sig_kernel_only(sig)))
+   handler == SIG_DFL &&
+   (is_global_init(t) || !(from_ancestor_ns && sig_kernel_only(sig
return 1;
 
return sig_handler_ignored(handler, sig);
 }
 
-static int sig_ignored(struct task_struct *t, int sig, bool force)
+static int sig_ignored(struct task_struct *t, int sig, bool from_ancestor_ns)
 {
/*
 * Blocked signals are never ignored, since the
@@ -103,7 +104,7 @@ static int sig_ignored(struct task_struct *t, int sig, bool 
force)
if (t->ptrace && sig != SIGKILL)
return 0;
 
-   return sig_task_ignored(t, sig, force);
+   return sig_task_ignored(t, sig, from_ancestor_ns);
 }
 
 /*
@@ -809,7 +810,7 @@ static void ptrace_trap_notify(struct task_struct *t)
  * Returns true if the signal should be actually delivered, otherwise
  * it should be dropped.
  */
-static bool prepare_signal(int sig, struct task_struct *p, bool force)
+static bool prepare_signal(int sig, struct task_struct *p, bool 
from_ancestor_ns)
 {
struct signal_struct *signal = p->signal;
struct task_struct *t;
@@ -871,7 +872,7 @@ static bool prepare_signal(int sig, struct task_struct *p, 
bool force)
}
}
 
-   return

[RFC] kvm: Adding skelaton for Memory ROE

2018-07-16 Thread Ahmed Abd El Mawgood

This is my first patch, an attempt to implement Memory ROE discussed by me
earlier as a way to prevent Rootkits. I have already explained in details
in this thread:
https://www.mail-archive.com/kernelnewbies@kernelnewbies.org/msg18826.html
So I think there is no need for saying the exact same thing again.
The problem is that the code isn't working and I can't figure out why

I tried implementing the protection to follow similar behavior to that
of KVM_MEM_READONLY but to be on page (SPTE) level
The current problem I am facing is that when handling the hypercall
vcpu->mode turns to be OUTSIDE_GUEST_MODE but KVM_REQ_TLB_FLUSH doesn't
seem to be handled correctly. KVM documentation promised that when VCPU is
not in GUEST_MODE VCPU are handled asap and kvm_vcpu_kick(vcpu); will
even force that, but it doesn't seem to be the case for me. This is the
kind of logging I am getting:

[3556.312299] kvm_mmu_slot_apply_flags: visited
[3556.312301] kvm_mmu_slot_apply_write_access: Flush = false
[3557.034243] gfn_is_readonly: test_bit = 0
[3557.034251] gfn_is_readonly: test_bit = 0
[3557.034254] gfn_is_readonly: test_bit = 0
[3557.034463] Hypercall received, page address 0x0
[3557.034466] gfn_is_readonly: test_bit = 0
[3557.034469] kvm_mroe: flush state = Done
[3557.034472] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[3557.034475] Setting page number 0 in slot number 0
[3557.034480] slot_rmap_apply_protection: The 0th page is readonly, Flush = True
[3557.034483] kvm_mmu_slot_apply_write_access: Flush = true
[3557.034486] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[3557.034488] kvm_mroe: cpu mode = OUTSIDE_GUEST_MODE
[3557.034490] kvm_mroe: flush state = Waiting

For some reason kvm_vcpu_kick() didn't force the KVM_REQ_TLB_FLUSH to
kick into the virtual cpu (I am talking about the last 2 lines).

I am aware that there is still alot missing (like dealing with malicious
guest remappings) and the code quality sucks, but any ideas about what I
could be doing wrong (or ideas in general) would be apprciated. I am
already planning to do everything cleanly once it works.

Thansk.

Signed-off-by: Ahmed Abd El Mawgood 
---
 arch/x86/include/asm/kvm_host.h |   7 ++-
 arch/x86/kvm/Kconfig|   7 +++
 arch/x86/kvm/mmu.c  | 127 +++-
 arch/x86/kvm/x86.c  |  83 --
 include/linux/kvm_host.h|  17 ++
 include/uapi/linux/kvm_para.h   |   4 +-
 virt/kvm/kvm_main.c |  36 +---
 7 files changed, 226 insertions(+), 55 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c13cd28d9d1b..c66e9245f750 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -235,7 +235,10 @@ struct kvm_mmu_memory_cache {
int nobjs;
void *objects[KVM_NR_MEM_OBJS];
 };
-
+struct kvm_write_access_data {
+   int i;
+   struct kvm_memory_slot *memslot;
+};
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
@@ -1130,7 +1133,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 
accessed_mask,
u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
-void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
+void kvm_mmu_slot_apply_write_access(struct kvm *kvm,
  struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
   const struct kvm_memory_slot *memslot);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 92fd433c50b9..8ae822a8dc7a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -96,6 +96,13 @@ config KVM_MMU_AUDIT
 This option adds a R/W kVM module parameter 'mmu_audit', which allows
 auditing of KVM MMU events at runtime.
 
+config KVM_MROE
+   bool "Hypercall Memory Read-Only Enforcement"
+   depends on KVM && X86
+   help
+   This option add KVM_HC_HMROE hypercall to kvm which as hardening
+   mechanism to protect memory pages from being edited.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/vhost/Kconfig
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d594690d8b95..946545b8b8cb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -70,7 +70,7 @@ enum {
 #undef MMU_DEBUG
 
 #ifdef MMU_DEBUG
-static bool dbg = 0;
+static bool dbg = 1;
 module_param(dbg, bool, 0644);
 
 #define pgprintk(x...) do { if (dbg) printk(x); } while (0)
@@ -1402,7 +1402,6 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 
*sptep)
 static bool spte_write_protect(u64 *sptep, bool pt_protect)
 {
u64 spte = *sptep;
-
if (!is_writable_pte(spte) &&
  !(pt_protect && spte_can_locklessly_be_made_writable(spte)))
return false;
@@ -1418,15 +1417,23

Re: [RFC][PATCH 09/11] tty_io: Use do_send_sig_info in __do_SACK to forcibly kill tasks

2018-07-16 Thread Linus Torvalds

On Mon, Jul 16, 2018 at 12:17 PM Eric W. Biederman
 wrote:
>
> I should have said it doesn't matter because init does not open ttys and
> become a member of session groups.  Or at least it never has in my
> experience.  The only way I know to get that behavior is to boot with
> init=/bin/bash.

That's hopefully true, yes.

Presumably init does open the console, but hopefull doesn't do setsid.

(We *do* do "setsid()" for the linuxrc running, but that's not done by
the init thread itself).

> With the force_sig already in do_SAK today my change is not a
> regression.  As force_sig in a completely different way forces the
> signal to init.

Ok. A couple of notes in the commit description on this might be good.

> So I think we want the patch below to clean that up.  Then we don't have
> to worry about the wrong things sending signals to init by accident, and
> SEND_SIG_FORCED becomes just SEND_SIG_PRIV that skips the unnecesary
> allocation of a siginfo struct.
>
> Thoughts?

I think the end result is fine, but then I look at that patch of yours
and it does many other things and that makes me nervous.

Can you separate out the different things it does into separate
patches to make it easier to read?

   Linus

Re: [PATCH] mmc: tegra: Force correct divider calculation on DDR50/52

2018-07-16 Thread Jon Hunter



On 16/07/18 15:34, Aapo Vienamo wrote:
> Tegra SDHCI controllers require the SDHCI clock divider to be configured
> to divide the clock by two in DDR50/52 modes. Incorrectly configured
> clock divider results in corrupted data.
> 
> Prevent the possibility of incorrectly calculating the divider value due
> to clock rate rounding or low parent clock frequency by not assigning
> host->max_clk to clk_get_rate() on tegra_sdhci_set_clock().
> 
> See the comments for further details.
> 
> Fixes: a8e326a ("mmc: tegra: implement module external clock change")
> Signed-off-by: Aapo Vienamo 
> ---
>  drivers/mmc/host/sdhci-tegra.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> index ddf00166..908b23e 100644
> --- a/drivers/mmc/host/sdhci-tegra.c
> +++ b/drivers/mmc/host/sdhci-tegra.c
> @@ -210,9 +210,24 @@ static void tegra_sdhci_set_clock(struct sdhci_host 
> *host, unsigned int clock)
>   if (!clock)
>   return sdhci_set_clock(host, clock);
>  
> + /*
> +  * In DDR50/52 modes the Tegra SDHCI controllers require the SDHCI 
> +  * divider to be configured to divided the host clock by two. The SDHC
> +  * clock divider is calculated as part of sdhci_set_clock() by
> +  * sdhci_calc_clk(). The divider is calculated from host->max_clk and
> +  * the requested clock rate.
> +  *
> +  * By setting the host->max_clk to clock * 2 the divider calculation
> +  * will always result in the correct value for DDR50/52 modes,
> +  * regardless of clock rate rounding, which may happen if the value
> +  * from clk_get_rate() is used.
> +  */
>   host_clk = tegra_host->ddr_signaling ? clock * 2 : clock;
>   clk_set_rate(pltfm_host->clk, host_clk);
> - host->max_clk = clk_get_rate(pltfm_host->clk);
> + if (tegra_host->ddr_signaling)
> + host->max_clk = host_clk;
> + else
> + host->max_clk = clk_get_rate(pltfm_host->clk);
>  
>   sdhci_set_clock(host, clock);
>  

I see what you are saying, but should we be concerned that we are not
getting the rate we are requesting in the first place?

Maybe it would help if you could provide a specific example showing a
case where we request rate X and get Y, and then this leads to a bad
rate Z later.
 
Cheers
Jon

-- 
nvpublic

Re: [PATCH] IPoIB: use kvzalloc to allocate an array of bucket pointers

2018-07-16 Thread Jan Dakinevich

On Mon,  9 Jul 2018 16:51:03 +0300
Jan Dakinevich  wrote:

> This table by default takes 32KiB which is 3rd memory order.
> Meanwhile, this memory is not aimed for DMA operation and could be
> safely allocated by vmalloc.
> 
> Signed-off-by: Jan Dakinevich 
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 26cde95..cb752df
> 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1526,7 +1526,7 @@ static int ipoib_neigh_hash_init(struct
> ipoib_dev_priv *priv) return -ENOMEM;
>   set_bit(IPOIB_STOP_NEIGH_GC, >flags);
>   size = roundup_pow_of_two(arp_tbl.gc_thresh3);
> - buckets = kcalloc(size, sizeof(*buckets), GFP_KERNEL);
> + buckets = kvcalloc(size, sizeof(*buckets), GFP_KERNEL);
>   if (!buckets) {
>   kfree(htbl);
>   return -ENOMEM;
> @@ -1554,7 +1554,7 @@ static void neigh_hash_free_rcu(struct rcu_head
> *head) struct ipoib_neigh __rcu **buckets = htbl->buckets;
>   struct ipoib_neigh_table *ntbl = htbl->ntbl;
>  
> - kfree(buckets);
> + kvfree(buckets);
>   kfree(htbl);
>   complete(>deleted);
>  }

ping

-- 
Best regards
Jan Dakinevich

Re: [PATCH 4.17 00/67] 4.17.7-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.17.7 release.
> There are 67 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jul 18 07:34:11 UTC 2018.
> Anything received after that time might be too late.
> 

Build results:
total: 134 pass: 134 fail: 0
Qemu test results:
total: 171 pass: 159 fail: 12
Failed tests: 
i386:Broadwell:q35:defconfig:smp:rootfs 
i386:Skylake-Client:q35:defconfig:smp:rootfs 
i386:SandyBridge:q35:defconfig:smp:rootfs
i386:Haswell:pc:defconfig:smp:rootfs 
i386:Nehalem:q35:defconfig:smp:rootfs 
i386:phenom:pc:defconfig:smp:rootfs 
i386:Opteron_G5:q35:defconfig:smp:initrd 
i386:Westmere:q35:defconfig:smp:initrd 
i386:core2duo:q35:defconfig:nosmp:rootfs 
i386:Conroe:pc:defconfig:nosmp:rootfs 
i386:Opteron_G1:pc:defconfig:nosmp:initrd
i386:n270:q35:defconfig:nosmp:rootfs

All 32-bit i386 boot tests crash.

[0.00] BUG: unable to handle kernel NULL pointer dereference at 
[0.00] *pde =  
[0.00] Oops: 0002 [#1] SMP
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1
[0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24
[0.00] EIP: zero_resv_unavail+0x9b/0x103
[0.00] EFLAGS: 0022 CPU: 0
[0.00] EAX:  EBX:  ECX: 0008 EDX: 
[0.00] ESI: c6bc5e90 EDI:  EBP: c6bc5eac ESP: c6bc5e80
[0.00]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[0.00] CR0: 80050033 CR2:  CR3: 06d67000 CR4: 00040690
[0.00] Call Trace:
[0.00]  free_area_init_nodes+0x407/0x44c
[0.00]  zone_sizes_init+0x45/0x5c
[0.00]  paging_init+0xac/0xaf
[0.00]  native_pagetable_init+0xc6/0xce
[0.00]  setup_arch+0x9c6/0xb18
[0.00]  start_kernel+0x4f/0x3d2
[0.00]  ? idt_setup_early_handler+0x2b/0x3e
[0.00]  i386_start_kernel+0x95/0x99
[0.00]  startup_32_smp+0x164/0x168

Guenter

Re: [PATCH 1/2] mm: Fix vma_is_anonymous() false-positives

2018-07-16 Thread Michal Hocko

On Mon 16-07-18 17:47:39, Kirill A. Shutemov wrote:
> On Mon, Jul 16, 2018 at 04:22:45PM +0200, Michal Hocko wrote:
> > On Mon 16-07-18 17:04:41, Kirill A. Shutemov wrote:
> > > On Mon, Jul 16, 2018 at 01:30:28PM +, Michal Hocko wrote:
> > > > On Tue 10-07-18 13:48:58, Andrew Morton wrote:
> > > > > On Tue, 10 Jul 2018 16:48:20 +0300 "Kirill A. Shutemov" 
> > > > >  wrote:
> > > > > 
> > > > > > vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous
> > > > > > VMA. This is unreliable as ->mmap may not set ->vm_ops.
> > > > > > 
> > > > > > False-positive vma_is_anonymous() may lead to crashes:
> > > > > > 
> > > > > > ...
> > > > > > 
> > > > > > This can be fixed by assigning anonymous VMAs own vm_ops and not 
> > > > > > relying
> > > > > > on it being NULL.
> > > > > > 
> > > > > > If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
> > > > > > dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs.
> > > > > 
> > > > > Is there a smaller, simpler fix which we can use for backporting
> > > > > purposes and save the larger rework for development kernels?
> > > > 
> > > > Why cannot we simply keep anon vma with null vm_ops and set dummy_vm_ops
> > > > for all users who do not initialize it in their mmap callbacks?
> > > > Basically have a sanity check in call_mmap?
> > > 
> > > As I said, there's a corner case of MAP_PRIVATE of /dev/zero.
> > 
> > This is really creative. I really didn't think about that. I am
> > wondering whether this really has to be handled as a private anonymous
> > mapping implicitly. Why does vma_is_anonymous has to succeed for these
> > mappings? Why cannot we simply handle it as any other file backed
> > PRIVATE mapping?
> 
> Because it's established way to create anonymous mappings in Linux.
> And we cannot break the semantics.

How exactly would semantic break? You would still get zero pages on read
faults and anonymous pages on CoW. So basically the same thing as for
any other file backed MAP_PRIVATE mapping.
-- 
Michal Hocko
SUSE Labs

Re: [tip:sched/core] sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust()

2018-07-16 Thread Ingo Molnar



* Peter Zijlstra  wrote:

> On Sun, Jul 15, 2018 at 04:36:17PM -0700, tip-bot for Xunlei Pang wrote:
> > Commit-ID:  8d4c00dc38a8aa30dae8402955e55e7b34e74bc8
> > Gitweb: 
> > https://git.kernel.org/tip/8d4c00dc38a8aa30dae8402955e55e7b34e74bc8
> > Author: Xunlei Pang 
> > AuthorDate: Mon, 9 Jul 2018 22:58:43 +0800
> > Committer:  Ingo Molnar 
> > CommitDate: Mon, 16 Jul 2018 00:28:31 +0200
> > 
> > sched/cputime: Ensure accurate utime and stime ratio in cputime_adjust()
> > 
> > If users access "/proc/pid/stat", the utime and stime ratio in the
> > current SAMPLE period are excepted, but currently cputime_adjust()
> > always calculates with the ratio of the WHOLE lifetime of the process.
> > 
> > This results in inaccurate utime and stime in "/proc/pid/stat". For
> > example,  a process runs for a while with "50% usr, 0% sys", then
> > followed by "100% sys". For later while, the following is excepted:
> > 
> >   0.0 usr,  100.0 sys
> > 
> > but we get:
> > 
> >   10.0 usr,  90.0 sys
> > 
> > This patch uses the accurate ratio in cputime_adjust() to address the
> > issue. A new 'task_cputime' type field is added in prev_cputime to record
> > previous 'task_cputime' so that we can get the elapsed times as the accurate
> > ratio.
> 
> Ingo, please make this one go away.

Sure, I've removed it from sched/core.

Thanks,

Ingo

[GIT PULL rcu/next] RCU commits for 4.19

2018-07-16 Thread Paul E. McKenney

Hello, Ingo!

This pull request contains the following changes:

1.  An optimization and a fix for RCU expedited grace periods, with
the fix being from Boqun Feng.

http://lkml.kernel.org/r/20180625224308.ga10...@linux.vnet.ibm.com

2.  Miscellaneous fixes, including a lockdep-annotation fix from
Boqun Feng.

http://lkml.kernel.org/r/20180625230410.ga13...@linux.vnet.ibm.com

3.  SRCU updates.

http://lkml.kernel.org/r/20180625224734.ga10...@linux.vnet.ibm.com

4.  Updates to rcutorture and associated scripting.

http://lkml.kernel.org/r/20180625225735.ga11...@linux.vnet.ibm.com

5.  Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt,
and RCU-sched flavors, replacing the old ->gpnum and ->completed
pair of fields.  This change allows lockless code to obtain the
complete grace-period state with a single READ_ONCE(), which is
needed to maintain tolerable lock contention during the upcoming
consolidation of the three RCU flavors.  Note that grace-period
sequence numbers are already used by rcu_barrier(), expedited
RCU grace periods, and SRCU, and are thus already heavily used
and well-tested.  Joel Fernandes contributed a number of excellent
fixes and improvements.

http://lkml.kernel.org/r/20180626000841.ga15...@linux.vnet.ibm.com

6.  Clean up some grace-period-reporting loose ends, including
improving the handling of quiescent states from offline CPUs
and fixing some false-positive WARN_ON_ONCE() invocations.
(Strictly speaking, the WARN_ON_ONCE() invocations were quite
correct, but their invariants were (harmlessly) violated by the
earlier sloppy handling of quiescent states from offline CPUs.)
In addition, improve grace-period forward-progress guarantees so
as to allow removal of fail-safe checks that required otherwise
needless lock acquisitions.  Finally, add more diagnostics to
help debug the upcoming consolidation of the RCU-bh, RCU-preempt,
and RCU-sched flavors.

http://lkml.kernel.org/r/20180626002052.ga24...@linux.vnet.ibm.com

7.  Additional miscellaneous fixes, including those contributed by
Byungchul Park, Mauro Carvalho Chehab, Joe Perches, Joel Fernandes,
Steven Rostedt, Andrea Parri, and Neil Brown.

http://lkml.kernel.org/r/20180626003448.ga26...@linux.vnet.ibm.com

8.  Additional torture-test changes, including several contributed by
Arnd Bergmann and Joel Fernandes.

http://lkml.kernel.org/r/20180626005205.ga28...@linux.vnet.ibm.com

The layout of this pull request is unusual due to the large footprint
of the commits in #5 and #6.  Therefore #1-#4 are branches on top of
v4.18-rc1, #5 and #6 are stacked on top of each other based off of the
merge point for #1-#4, and #7 and #8 are branches on top of #6.

Although I merged this series with v4.18-rc4 without conflicts and tested
the results, Stephen Rothwell has reported a couple of conflicts with
this series in the course of his -next integration-testing efforts and
has provided the following resolutions:

http://lkml.kernel.org/r/20180514134636.61831...@canb.auug.org.au
http://lkml.kernel.org/r/20180622122717.5f475...@canb.auug.org.au

Both of these resolutions look good to me.

This pull request increases the size of RCU by not quite 500 lines.
Of this, about 200 is due to adding test code to rcutorture in
order to adequately test the consolidated RCU-bh, RCU-preempt, and
RCU-sched flavors.  The purpose of adding this test code early is to
test it against RCU-sched, which happens to already support compound RCU
read-side critical sections via rcu_read_lock_sched(), preempt_disable(),
local_bh_disable(), and local_irq_save().  The remainder are primarily
due to the addition of diagnostics and the fixing of quiescent-state and
grace-period reporting.  This size increase is temporary: The overall
effect of this set of cleanups, the consolidation of the three RCU
flavors, and additional cleanups will be to -decrease- RCU's line count
by a couple of hundred lines.

All of these changes have been subjected to 0day Test Robot and -next
testing, and are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo

for you to fetch changes up to 18952651dae8efcc6d565c97f8fe5629b399cb3e:

  Merge branches 'fixes1.2018.07.12b' and 'torture1.2018.07.12b' into HEAD 
(2018-07-12 15:42:41 -0700)


Andrea Parri (1):
  doc: Update synchronize_rcu() definition in whatisRCU.txt

Arnd Bergmann (1):
  rcutorture: Use monotonic timestamp for stall detection

Boqun Feng (2):
  rcu: Use the proper lockdep annotation in dump_blkd_tasks()
  rcu: Make expedited GPs handle CPU 0 being offline

Byungchul Park (2):
  rcu: Improve

Re: [PATCH v2 3/3] dt-bindings: mfd: max8998: Add charger subnode binding

2018-07-16 Thread Rob Herring

On Sat, Jul 14, 2018 at 02:26:53PM +0200, Paweł Chmiel wrote:
> This patch adds devicetree bindings documentation for
> battery charging controller as the subnode of MAX8998 PMIC.
> 
> Signed-off-by: Paweł Chmiel 
> ---
> Changes from v1:
>   - Removed unneeded Fixes tag
>   - Correct description of all charger values
>   - Added missing property unit
> ---
>  Documentation/devicetree/bindings/mfd/max8998.txt | 24 
> +++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/mfd/max8998.txt 
> b/Documentation/devicetree/bindings/mfd/max8998.txt
> index 23a3650ff2a2..196e50097a36 100644
> --- a/Documentation/devicetree/bindings/mfd/max8998.txt
> +++ b/Documentation/devicetree/bindings/mfd/max8998.txt
> @@ -50,6 +50,23 @@ Additional properties required if 
> max8998,pmic-buck2-dvs-gpio is defined:
>  - max8998,pmic-buck2-dvs-voltage: An array of 2 voltage values in microvolts
>for buck2 regulator that can be selected using dvs gpio.
>  
> +Charger: Configuration for battery charging controller should be added
> +inside a child node named 'charger'.
> +  Required properties:
> +  - max8998,charge-eoc-percent: Setup End of Charge Level.
> +If value equals 0, leave it unchanged.
> +Otherwise it should be value from 10 to 45 by 5 step.
> +
> +  - max8998,charge-restart-level-microvolt: Setup Charge Restart Level.
> +If value equals 0, leave it unchanged.
> +If value equals -1, it will be disabled.

-1 is a bit strange. Why not 'not present' is disabled?

> +Otherwise it should be one of following values: 100, 150, 200.
> +
> +  - max8998,charge-timeout-hours: Setup Charge Full Timeout.
> +If value equals 0, leave it unchanged.
> +If value equals -1, it will be disabled.
> +Otherwise it should be one of following values: 5, 6, 7.
> +
>  Regulators: All the regulators of MAX8998 to be instantiated shall be
>  listed in a child node named 'regulators'. Each regulator is represented
>  by a child node of the 'regulators' node.
> @@ -99,6 +116,13 @@ Example:
>   max8998,pmic-buck2-dvs-gpio = < 0 3 0 0>; /* SET3 */
>   max8998,pmic-buck2-dvs-voltage = <135>, <130>;
>  
> + /* Charger configuration */
> + charger {
> + max8998,charge-eoc-percent = <0>;
> + max8998,charge-restart-level-microvolt = <(-1)>;
> + max8998,charge-timeout-hours = <7>;
> + };
> +
>   /* Regulators to instantiate */
>   regulators {
>   ldo2_reg: LDO2 {
> -- 
> 2.7.4
>

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Greg Kroah-Hartman

On Mon, Jul 16, 2018 at 11:02:19AM -0700, Guenter Roeck wrote:
> On Mon, Jul 16, 2018 at 07:43:32PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Jul 16, 2018 at 09:41:23AM -0700, Guenter Roeck wrote:
> > > On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote:
> > > > > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> > > > > > This is the start of the stable review cycle for the 4.9.113 
> > > > > > release.
> > > > > > There are 32 patches in this series, all will be posted as a 
> > > > > > response
> > > > > > to this one.  If anyone has any issues with these being applied, 
> > > > > > please
> > > > > > let me know.
> > > > > > 
> > > > > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> > > > > > Anything received after that time might be too late.
> > > > > >
> > > > > 
> > > > > Build results:
> > > > >   total: 148 pass: 133 fail: 15
> > > > > Failed builds: 
> > > > >   mips:defconfig 
> > > > >   mips:allnoconfig 
> > > > >   mips:defconfig 
> > > > >   mips:allmodconfig 
> > > > >   mips:allnoconfig 
> > > > >   mips:bcm47xx_defconfig 
> > > > >   mips:bcm63xx_defconfig 
> > > > >   mips:nlm_xlp_defconfig 
> > > > >   mips:ath79_defconfig 
> > > > >   mips:ar7_defconfig 
> > > > >   mips:e55_defconfig 
> > > > >   mips:cavium_octeon_defconfig 
> > > > >   mips:malta_defconfig 
> > > > >   mips:rt305x_defconfig 
> > > > >   mips:defconfig 
> > > > > Qemu test results:
> > > > >   total: 166 pass: 154 fail: 12
> > > > > Failed tests: 
> > > > >   mips:malta_defconfig:nosmp 
> > > > >   mips:malta_defconfig:smp 
> > > > >   mips64:malta_defconfig:nosmp 
> > > > >   mips64:malta_defconfig:smp 
> > > > >   mipsel:24Kf:malta_defconfig:nosmp:initrd 
> > > > >   mipsel:24Kf:malta_defconfig:smp:initrd 
> > > > >   mipsel:24Kf:malta_defconfig:smp:rootfs 
> > > > >   mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
> > > > >   mipsel64:malta_defconfig:nosmp:rootfs 
> > > > >   mipsel64:malta_defconfig:smp:initrd 
> > > > >   mipsel64:malta_defconfig:smp:rootfs 
> > > > >   mipsel64:fuloong2e_defconfig:fulong2e:rootfs
> > > > > 
> > > > > Error is always the same.
> > > > > 
> > > > > arch/mips/kernel/process.c:637:8:
> > > > >   error: 'call_single_data_t' undeclared here (not in a function)
> > > > > arch/mips/kernel/process.c: In function 'raise_backtrace':
> > > > > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
> > > > >   error: 'csd' undeclared (first use in this function)
> > > > 
> > > > mips should now be fixed with the updated tree I have pushed out.
> > > > 
> > > 
> > > The above is with v4.9.112-33-g7fb1f5e, which is the latest version
> > > available from the git repository (as of right now). My builders
> > > pulled it at 4:07am this morning, and there was no subsequent update.
> > 
> > Odd.  Ok, I've bumped the -rc number to -rc2 now, and pushed it all out
> > again.  Let me know if you don't see it show up within an hour or so.
> > 
> Version is now v4.9.112-33-gb44db2b, so something changed, but I still get
> the same build error.
> 
> Comparing the old and the new version, the only difference is the updated rc.
> 
> diff --git a/Makefile b/Makefile
> index 57f315d00a94..986470ef6f6e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,7 +1,7 @@
>  VERSION = 4
>  PATCHLEVEL = 9
>  SUBLEVEL = 113
> -EXTRAVERSION = -rc1
> +EXTRAVERSION = -rc2
>  NAME = Roaring Lionus
> 
> The error itself is that single_data_t was replaced in the backport with
> call_single_data. It should have been "struct call_single_data".

{sigh}  This is why I asked for a fixup patch.  Let me just go rip that
thing out now...

greg k-h

Re: [PATCH v3] time: Fix incorrect sleeptime injection when suspend fails

2018-07-16 Thread Thomas Gleixner

On Tue, 17 Jul 2018, Mukesh Ojha wrote:
> On 7/16/2018 10:44 PM, John Stultz wrote:
> > > So, I think with the logic bug above it will work out properly, but
> > > let me know if I'm still missing something.
> 
> Please give it thought to a case where very first suspend fails with your
> logic.
> If i am not able to get your thought, please write a patch.

John wants you to invert the logic. i.e.

 true -> false
 false -> true
 if (var) -> if (!var)
 if (!var) -> if (var)

It's not that hard, right?

Thanks,

tglx

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.113 release.
> There are 32 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> Anything received after that time might be too late.
>

Build results:
total: 148 pass: 133 fail: 15
Failed builds: 
mips:defconfig 
mips:allnoconfig 
mips:defconfig 
mips:allmodconfig 
mips:allnoconfig 
mips:bcm47xx_defconfig 
mips:bcm63xx_defconfig 
mips:nlm_xlp_defconfig 
mips:ath79_defconfig 
mips:ar7_defconfig 
mips:e55_defconfig 
mips:cavium_octeon_defconfig 
mips:malta_defconfig 
mips:rt305x_defconfig 
mips:defconfig 
Qemu test results:
total: 166 pass: 154 fail: 12
Failed tests: 
mips:malta_defconfig:nosmp 
mips:malta_defconfig:smp 
mips64:malta_defconfig:nosmp 
mips64:malta_defconfig:smp 
mipsel:24Kf:malta_defconfig:nosmp:initrd 
mipsel:24Kf:malta_defconfig:smp:initrd 
mipsel:24Kf:malta_defconfig:smp:rootfs 
mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
mipsel64:malta_defconfig:nosmp:rootfs 
mipsel64:malta_defconfig:smp:initrd 
mipsel64:malta_defconfig:smp:rootfs 
mipsel64:fuloong2e_defconfig:fulong2e:rootfs

Error is always the same.

arch/mips/kernel/process.c:637:8:
error: 'call_single_data_t' undeclared here (not in a function)
arch/mips/kernel/process.c: In function 'raise_backtrace':
/opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
error: 'csd' undeclared (first use in this function)

Details are available at http://kerneltests.org/builders.

Guenter

compliation error with aio_abi.h

2018-07-16 Thread Laura Abbott


Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1601529
with 4.18-rc4:

Steps to Reproduce:
1.echo 'int f;' | gcc -include linux/aio_abi.h -xc -c - -o /dev/null

Actual results:
/usr/include/asm/signal.h:127:2: error: unknown type name ‘size_t’
  size_t ss_size;
  ^~
In file included from :32:
/usr/include/linux/aio_abi.h:115:2: error: unknown type name ‘size_t’
  size_t  sigsetsize;
  ^~

Expected results:
no errors, as in Fedora 28 kernel-headers-4.17.4-200.fc28.x86_64)


This looks like the structure that was introduced with 7a074e96dee6
("aio: implement io_pgetevents") . is #include 
the correct header? This breaks compilation of nginx
https://bugzilla.redhat.com/show_bug.cgi?id=1597674

Thanks,
Laura

Re: [PATCH 4.17 00/67] 4.17.7-stable review

2018-07-16 Thread Nathan Chancellor

On Mon, Jul 16, 2018 at 09:33:38AM -0700, Guenter Roeck wrote:
> On Mon, Jul 16, 2018 at 09:34:29AM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.17.7 release.
> > There are 67 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Wed Jul 18 07:34:11 UTC 2018.
> > Anything received after that time might be too late.
> > 
> 
> Build results:
>   total: 134 pass: 134 fail: 0
> Qemu test results:
>   total: 171 pass: 159 fail: 12
> Failed tests: 
>   i386:Broadwell:q35:defconfig:smp:rootfs 
>   i386:Skylake-Client:q35:defconfig:smp:rootfs 
>   i386:SandyBridge:q35:defconfig:smp:rootfs
>   i386:Haswell:pc:defconfig:smp:rootfs 
>   i386:Nehalem:q35:defconfig:smp:rootfs 
>   i386:phenom:pc:defconfig:smp:rootfs 
>   i386:Opteron_G5:q35:defconfig:smp:initrd 
>   i386:Westmere:q35:defconfig:smp:initrd 
>   i386:core2duo:q35:defconfig:nosmp:rootfs 
>   i386:Conroe:pc:defconfig:nosmp:rootfs 
>   i386:Opteron_G1:pc:defconfig:nosmp:initrd
>   i386:n270:q35:defconfig:nosmp:rootfs
> 
> All 32-bit i386 boot tests crash.
> 
> [0.00] BUG: unable to handle kernel NULL pointer dereference at 
> 
> [0.00] *pde =  
> [0.00] Oops: 0002 [#1] SMP
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.7-rc1+ #1
> [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/24
> [0.00] EIP: zero_resv_unavail+0x9b/0x103
> [0.00] EFLAGS: 0022 CPU: 0
> [0.00] EAX:  EBX:  ECX: 0008 EDX: 
> [0.00] ESI: c6bc5e90 EDI:  EBP: c6bc5eac ESP: c6bc5e80
> [0.00]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [0.00] CR0: 80050033 CR2:  CR3: 06d67000 CR4: 00040690
> [0.00] Call Trace:
> [0.00]  free_area_init_nodes+0x407/0x44c
> [0.00]  zone_sizes_init+0x45/0x5c
> [0.00]  paging_init+0xac/0xaf
> [0.00]  native_pagetable_init+0xc6/0xce
> [0.00]  setup_arch+0x9c6/0xb18
> [0.00]  start_kernel+0x4f/0x3d2
> [0.00]  ? idt_setup_early_handler+0x2b/0x3e
> [0.00]  i386_start_kernel+0x95/0x99
> [0.00]  startup_32_smp+0x164/0x168
> 
> Guenter

I think this is caused by 3e3eda67092b ("mm: zero available pages before
memmap init"). A fix for it was posted to the mailing list this morning:
https://lore.kernel.org/lkml/20180716151630.770-1-pasha.tatas...@oracle.com/

Cheers,
Nathan

Re: [PATCH] x86/power/64: Remove VLA usage

2018-07-16 Thread Eric Biggers

On Sun, Jul 15, 2018 at 08:56:57PM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> removes the discouraged use of AHASH_REQUEST_ON_STACK by switching to
> shash directly and allocating the descriptor in heap memory (which should
> be fine: the tfm has already been allocated there too).
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
> Signed-off-by: Kees Cook 
> ---
>  arch/x86/power/hibernate_64.c | 35 +++
>  1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 67ccf64c8bd8..0ed01bb935a6 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -233,28 +233,31 @@ struct restore_data_record {
>   */
>  static int get_e820_md5(struct e820_table *table, void *buf)
>  {
> - struct scatterlist sg;
> - struct crypto_ahash *tfm;
> + struct crypto_shash *tfm;
> + struct shash_desc *desc;
>   int size;
>   int ret = 0;
>  
> - tfm = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC);
> + tfm = crypto_alloc_shash("md5", 0, 0);
>   if (IS_ERR(tfm))
>   return -ENOMEM;
>  
> - {
> - AHASH_REQUEST_ON_STACK(req, tfm);
> - size = offsetof(struct e820_table, entries) + sizeof(struct 
> e820_entry) * table->nr_entries;
> - ahash_request_set_tfm(req, tfm);
> - sg_init_one(, (u8 *)table, size);
> - ahash_request_set_callback(req, 0, NULL, NULL);
> - ahash_request_set_crypt(req, , buf, size);
> -
> - if (crypto_ahash_digest(req))
> - ret = -EINVAL;
> - ahash_request_zero(req);
> - }
> - crypto_free_ahash(tfm);
> + desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
> +GFP_KERNEL);
> + if (!desc)
> + return -ENOMEM;

Need crypto_free_shash(tfm) if the kmalloc() here fails.

> +
> + desc->tfm = tfm;
> + desc->flags = 0;
> +
> + size = offsetof(struct e820_table, entries) +
> + sizeof(struct e820_entry) * table->nr_entries;
> +
> + if (crypto_shash_digest(desc, (u8 *)table, size, buf))
> + ret = -EINVAL;
> +
> + kzfree(desc);
> + crypto_free_shash(tfm);
>  
>   return ret;
>  }
> -- 
> 2.17.1
> 
> 
> -- 
> Kees Cook
> Pixel Security

Re: [PATCH v5 05/12] PM / devfreq: Add support for policy notifiers

2018-07-16 Thread Matthias Kaehlcke

On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
> 
> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> > 
> > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> > 
> >> Firstly,
> >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>
> >> devfreq already used the OPP interface as default. It means that
> >> the outside of 'drivers/devfreq' can disable/enable the frequency
> >> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >> drivers disable/enable the specific frequency, the devfreq core
> >> consider them.
> >>
> >> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >> already support some interface to change the minimum/maximum frequency
> >> of devfreq device. 
> >>
> >> In case of cpufreq subsystem, cpufreq only provides 
> >> 'cpufreq_verify_with_limits()'
> >> to change the minimum/maximum frequency of cpu. some device driver cannot
> >> change the minimum/maximum frequency through OPP interface.
> >>
> >> But, in case of devfreq subsystem, as I explained already, devfreq support
> >> the OPP interface as default way. devfreq subsystem doesn't need to add
> >> other way to change the minimum/maximum frequency.
> > 
> > Using the OPP interface exclusively works as long as a
> > enabling/disabling of OPPs is limited to a single driver
> > (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> > involved you need a way to resolve conflicts, that's the purpose of
> > devfreq_verify_within_limits(). Please let me know if there are
> > existing mechanisms for conflict resolution that I overlooked.
> > 
> > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> > devfreq_verify_within_limits() instead of the OPP interface if
> > desired, however this seems beyond the scope of this series.
> 
> Actually, if we uses this approach, it doesn't support the multiple drivers 
> too.
> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> happen.

As long as drivers limit the max freq there is no conflict, the lowest
max freq wins. I expect this to be the usual case, apparently it
worked for cpufreq for 10+ years.

However it is correct that there would be a conflict if a driver
requests a min freq that is higher than the max freq requested by
another. In this case devfreq_verify_within_limits() resolves the
conflict by raising p->max to the min freq. Not sure if this is
something that would ever occur in practice though.

If we are really concerned about this case it would also be an option
to limit the adjustment to the max frequency.

> To resolve the conflict for multiple device driver, maybe OPP interface
> have to support 'usage_count' such as clk_enable/disable().

This would require supporting negative usage count values, since a OPP
should not be enabled if e.g. thermal enables it but the throttler
disabled it or viceversa.

Theoretically there could also be conflicts, like one driver disabling
the higher OPPs and another the lower ones, with the outcome of all
OPPs being disabled, which would be a more drastic conflict resolution
than that of devfreq_verify_within_limits().

Viresh, what do you think about an OPP usage count?

Thanks

Matthias

Re: [PATCH v2 1/3] dt-bindings: pfuze100: add optional disable switch-regulators binding

2018-07-16 Thread Rob Herring

On Fri, Jul 13, 2018 at 02:50:01PM +0200, Marco Felsch wrote:
> This binding is used to keep the backward compatibility with the current
> dtb's [1]. The binding informs the driver that the unused switch regulators
> can be disabled.
> If it is not specified, the driver doesn't disable the switch regulators.
> 
> [1] https://patchwork.kernel.org/patch/10490381/
> 
> Signed-off-by: Marco Felsch 
> 
> ---
> Changes in V2:
>   - add more information about the binding
>   - rename binding and add vendor prefix
> 
>  .../devicetree/bindings/regulator/pfuze100.txt| 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/regulator/pfuze100.txt 
> b/Documentation/devicetree/bindings/regulator/pfuze100.txt
> index 672c939045ff..2c46b8d368db 100644
> --- a/Documentation/devicetree/bindings/regulator/pfuze100.txt
> +++ b/Documentation/devicetree/bindings/regulator/pfuze100.txt
> @@ -4,6 +4,17 @@ Required properties:
>  - compatible: "fsl,pfuze100", "fsl,pfuze200", "fsl,pfuze3000", 
> "fsl,pfuze3001"
>  - reg: I2C slave address
>  
> +Optional properties:
> +- fsl,pfuze-support-disable: Boolean, if present disable all unused switch
> +  regulators to save power consumption. Attention, till 4.18 these regulators

You shouldn't have kernel version info in bindings.

> +  were always on without specifying "regulator-always-on". So be sure to mark
> +  import regulators as "regulator-always-on" (e.g. DDR ref, DDR supply). If

s/import/important/

> +  not present, the switched regualtors are always on and can't be disabled.
> +  This binding is a workaround to keep backward compatibility with old dtb's
> +  which rely on the fact that the switched regulators are always on and don't
> +  mark them explicit as "regulator-always-on". On new dtbs this property 
> should
> +  always be present.
> +
>  Required child node:
>  - regulators: This is the list of child nodes that specify the regulator
>initialization data for defined regulators. Please refer to below doc
> -- 
> 2.18.0
>

Re: cpu_no_speculation omissions?

2018-07-16 Thread Alan Cox

On Mon, 2018-07-16 at 10:28 -0700, Dave Hansen wrote:
> On 07/16/2018 09:56 AM, Thomas Gleixner wrote:
> > On Mon, 16 Jul 2018, Rich Felker wrote:
> > > At least the Centerton (late-generation Bonnell uarch) Atom
> > > family is
> > > omitted from the cpu_no_speculation table added by commit
> > > fec9434a12f3
> > > to arch/x86/kernel/cpu/common.c. Is this intentional? Would a
> > > patch
> > > adding it and possibly other omissions be welcome?
> > 
> > Probably. Dave?
> 
> IIRC, Alan Cox was compiling a list on what is affected vs. not.  He
> would know way better than I.

The pre Silvermont atom cores are in order. When I did the original
list I didn't bother with all the 32bit cores as we didn't have any
32bit mitigations then.

Alan

Re: [PATCH 4.9 00/32] 4.9.113-stable review

2018-07-16 Thread Guenter Roeck

On Mon, Jul 16, 2018 at 06:31:36PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jul 16, 2018 at 09:25:38AM -0700, Guenter Roeck wrote:
> > On Mon, Jul 16, 2018 at 09:36:08AM +0200, Greg Kroah-Hartman wrote:
> > > This is the start of the stable review cycle for the 4.9.113 release.
> > > There are 32 patches in this series, all will be posted as a response
> > > to this one.  If anyone has any issues with these being applied, please
> > > let me know.
> > > 
> > > Responses should be made by Wed Jul 18 07:34:43 UTC 2018.
> > > Anything received after that time might be too late.
> > >
> > 
> > Build results:
> > total: 148 pass: 133 fail: 15
> > Failed builds: 
> > mips:defconfig 
> > mips:allnoconfig 
> > mips:defconfig 
> > mips:allmodconfig 
> > mips:allnoconfig 
> > mips:bcm47xx_defconfig 
> > mips:bcm63xx_defconfig 
> > mips:nlm_xlp_defconfig 
> > mips:ath79_defconfig 
> > mips:ar7_defconfig 
> > mips:e55_defconfig 
> > mips:cavium_octeon_defconfig 
> > mips:malta_defconfig 
> > mips:rt305x_defconfig 
> > mips:defconfig 
> > Qemu test results:
> > total: 166 pass: 154 fail: 12
> > Failed tests: 
> > mips:malta_defconfig:nosmp 
> > mips:malta_defconfig:smp 
> > mips64:malta_defconfig:nosmp 
> > mips64:malta_defconfig:smp 
> > mipsel:24Kf:malta_defconfig:nosmp:initrd 
> > mipsel:24Kf:malta_defconfig:smp:initrd 
> > mipsel:24Kf:malta_defconfig:smp:rootfs 
> > mipsel:mips32r6-generic:malta_32r6_defconfig:smp:rootfs 
> > mipsel64:malta_defconfig:nosmp:rootfs 
> > mipsel64:malta_defconfig:smp:initrd 
> > mipsel64:malta_defconfig:smp:rootfs 
> > mipsel64:fuloong2e_defconfig:fulong2e:rootfs
> > 
> > Error is always the same.
> > 
> > arch/mips/kernel/process.c:637:8:
> > error: 'call_single_data_t' undeclared here (not in a function)
> > arch/mips/kernel/process.c: In function 'raise_backtrace':
> > /opt/buildbot/slave/stable-queue-4.9/build/arch/mips/kernel/process.c:648:22:
> > error: 'csd' undeclared (first use in this function)
> 
> mips should now be fixed with the updated tree I have pushed out.
> 

The above is with v4.9.112-33-g7fb1f5e, which is the latest version
available from the git repository (as of right now). My builders
pulled it at 4:07am this morning, and there was no subsequent update.

Guenter

Re: [PATCH] x86/ioapic: Do not unmask io_apic when interrupt is in progress

2018-07-16 Thread Thomas Gleixner

On Fri, 2 Mar 2018, Sebastian Andrzej Siewior wrote:

> From: Ingo Molnar 
> Date: Fri, 3 Jul 2009 08:29:27 -0500
> 
> With threaded interrupts we might see an interrupt in progress on
> migration. Do not unmask it when this is the case.

It took me quite a while to wrap my head around this one.

> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1690,7 +1690,8 @@ static bool io_apic_level_ack_pending(st
>  static inline bool ioapic_irqd_mask(struct irq_data *data)
>  {
>   /* If we are moving the irq we need to mask it */
> - if (unlikely(irqd_is_setaffinity_pending(data))) {
> + if (unlikely(irqd_is_setaffinity_pending(data) &&
> +  !irqd_irq_inprogress(data))) {

I really don't see how that is preventing anything. The point is that
chip->eoi_irq() is always called _after_ the INPROGRESS flag has been
cleared. And if that's not the case then the interrupt would be never
moved.

But, that patch is almost 10 years old and I didn't find the exact
context. So it might have been valid back then, but I doubt it.

Though there is an issue with threaded interrupts which are marked ONESHOT
and using the fasteoi handler.

if (IS_ONESHOT())
mask_irq();




cond_unmask_eoi_irq()
chip->irq_eoi();

So if setaffinity is pending then the interrupt will be moved and then
unmasked, which is wrong as it should be kept masked up to the point where
the threaded handler finished. It's not a real problem, the interrupt will
just be able to fire before the threaded handler has finished, though the irq
masked state will be wrong for a bit.

The patch below should cure the issue. It also renames the horribly
misnomed functions so it becomes clear what they are supposed to do.

Thanks,

tglx

8<---
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 3982f79d2377..23667451e44e 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1721,19 +1721,20 @@ static bool io_apic_level_ack_pending(struct 
mp_chip_data *data)
return false;
 }
 
-static inline bool ioapic_irqd_mask(struct irq_data *data)
+static inline bool ioapic_prepare_move(struct irq_data *data)
 {
/* If we are moving the irq we need to mask it */
if (unlikely(irqd_is_setaffinity_pending(data))) {
-   mask_ioapic_irq(data);
+   if (!irqd_irq_masked(data))
+   mask_ioapic_irq(data);
return true;
}
return false;
 }
 
-static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
+static inline void ioapic_finish_move(struct irq_data *data, bool moveit)
 {
-   if (unlikely(masked)) {
+   if (unlikely(moveit)) {
/* Only migrate the irq if the ack has been received.
 *
 * On rare occasions the broadcast level triggered ack gets
@@ -1762,15 +1763,17 @@ static inline void ioapic_irqd_unmask(struct irq_data 
*data, bool masked)
 */
if (!io_apic_level_ack_pending(data->chip_data))
irq_move_masked_irq(data);
-   unmask_ioapic_irq(data);
+   /* If the irq is masked in the core, leave it */
+   if (!irqd_irq_masked(data))
+   unmask_ioapic_irq(data);
}
 }
 #else
-static inline bool ioapic_irqd_mask(struct irq_data *data)
+static inline bool ioapic_move_prepare(struct irq_data *data)
 {
return false;
 }
-static inline void ioapic_irqd_unmask(struct irq_data *data, bool masked)
+static inline void ioapic_move_irq(struct irq_data *data, bool masked)
 {
 }
 #endif
@@ -1779,11 +1782,11 @@ static void ioapic_ack_level(struct irq_data *irq_data)
 {
struct irq_cfg *cfg = irqd_cfg(irq_data);
unsigned long v;
-   bool masked;
+   bool moveit;
int i;
 
irq_complete_move(cfg);
-   masked = ioapic_irqd_mask(irq_data);
+   moveit = ioapic_prepare_move(irq_data);
 
/*
 * It appears there is an erratum which affects at least version 0x11
@@ -1838,7 +1841,7 @@ static void ioapic_ack_level(struct irq_data *irq_data)
eoi_ioapic_pin(cfg->vector, irq_data->chip_data);
}
 
-   ioapic_irqd_unmask(irq_data, masked);
+   ioapic_finish_move(irq_data, moveit);
 }
 
 static void ioapic_ir_ack_level(struct irq_data *irq_data)

Re: cpu_no_speculation omissions?

2018-07-16 Thread Thomas Gleixner

On Mon, 16 Jul 2018, Rich Felker wrote:

> At least the Centerton (late-generation Bonnell uarch) Atom family is
> omitted from the cpu_no_speculation table added by commit fec9434a12f3
> to arch/x86/kernel/cpu/common.c. Is this intentional? Would a patch
> adding it and possibly other omissions be welcome?

Probably. Dave?

Thanks,

tglx

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1772 matches

Mail list logo