date:20180702

Re: [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault

2018-07-02 Thread Michal Hocko

On Mon 02-07-18 10:24:27, Yang Shi wrote:
> 
> 
> On 7/2/18 6:37 AM, Michal Hocko wrote:
> > On Mon 02-07-18 15:33:11, Laurent Dufour wrote:
> > > 
> > > On 02/07/2018 14:45, Michal Hocko wrote:
> > > > On Mon 02-07-18 14:26:09, Laurent Dufour wrote:
> > > > > On 02/07/2018 14:15, Michal Hocko wrote:
> > [...]
> > > > > > We already do have a model for that. Have a look at MMF_UNSTABLE.
> > > > > MMF_UNSTABLE is a mm's flag, here this is a VMA's flag which is 
> > > > > checked.
> > > > Yeah, and we have the VMA ready for all places where we do check the
> > > > flag. check_stable_address_space can be made to get vma rather than mm.
> > > Yeah, this would have been more efficient to check that flag at the 
> > > beginning
> > > of the page fault handler rather than the end, but this way it will be 
> > > easier
> > > to handle the speculative page fault too ;)
> > The thing is that it doesn't really need to be called earlier. You are
> > not risking data corruption on file backed mappings.
> 
> OK, I just think it could save a few cycles to check the flag earlier.

This should be an extremely rare case. Just think about it. It should
only ever happen when an access races with munmap which itself is
questionable if not an outright bug.

> If nobody think it is necessary, we definitely could re-use
> check_stable_address_space(),

If we really need this whole VM_DEAD thing then it should be better
handled at the same place rather than some ad-hoc places.

> just return VM_FAULT_SIGSEGV for VM_DEAD vma,
> and check for both shared and non-shared.

Why would you even care about shared mappings?
-- 
Michal Hocko
SUSE Labs

[PATCH] /dev/mem: Mark expected switch fall-through

2018-07-02 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/char/mem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index ffeb60d..4b00d6a 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -765,6 +765,7 @@ static loff_t memory_lseek(struct file *file, loff_t 
offset, int orig)
switch (orig) {
case SEEK_CUR:
offset += file->f_pos;
+   /* fall through */
case SEEK_SET:
/* to avoid userland mistaking f_pos=-9 as -EBADF=-9 */
if ((unsigned long long)offset >= -MAX_ERRNO) {
-- 
2.7.4

Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message

2018-07-02 Thread Vineet Gupta

+CC Al

On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> We process signals in the end of syscall/exception handler.
> It the signal is fatal we print register's content using
> show_regs function. show_regs() also prints information about
> last exception happened.
>
> In case of multicore system we can catch the situation when we
> will print wrong information about exception. See the example:
> __
> CPU-0: started to handle page fault
> CPU-1: sent signal to process, which is executed on CPU-0
> CPU-0: ended page fault handle. Started to process signal before
>returnig to userspace. Process signal, which is send from
>CPU-0. As th signal is fatal we call show_regs().
>show_regs() will show information about last exception
>which is *page fault* (instead of "trap" which is used for
>signals and happened on CPU-0)
>
> So we will get message like this:
> /home/waitpid02
>   potentially unexpected fatal signal 8.
>   Path: /home/waitpid02
>   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
>   task: 9f11c200 task.stack: 9f3ae000
>
>   [ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec
>   [EFA   ]: 0x
>   [BLINK ]: 0x123ea
>   [ERET  ]: 0x123ec
> @off 0x123ec in [/home/waitpid02]
> VMA: 0x0001 to 0x00016000
>   [STAT32]: 0x80080882 : IE U
>   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x
>   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006
>   [-other-info-]
>
> This message is confusing because it show information about page fault
> ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> to signal.

Agreed this is misleading. @Al, is there a way to identify process termination
from signals because it did something wrong vs. say unhandled signal. For 
former,
we want to dump additional info in show_regs() such as PC / Fault addres etc and
not in other scenario.

> This situation was reproduced with waitpid02 LTP test.
> _
>
> So remove printing information about exceptions from show_regs()
> to avoid confusing messages. Print information about exceptions
> only in required places instead of show_regs()
>
> Now we don't print information about exceptions if signal is simply
> send by another userspace app. So in case of waitpid02 we will print
> next message:
> _
> ./waitpid02
>   potentially unexpected fatal signal 8.
>   [STAT32]: 0x80080082 : IE U
>   BTA: 0x2fc4  SP: 0x5ff8bd64  FP: 0x
>   LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006
>   [-other-info-]
> _

The prints I'm seeing now, for a segv from NULL pointer access is even more
confusing !
There's a mixup of prints

>8
Path: /segv
CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412

[ECR   ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac
[EFA   ]: 0x
[BLINK ]: 0x20047bb0
[ERET  ]: 0x103ac
    @off 0x103ac in [/segv]
    VMA: 0x0001 to 0x00012000

potentially unexpected fatal signal 11.
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0x0001    r01: 0x5fc95e94    r02: 0x   
r03: 0x0064    r04: 0x80808080    r05: 0x2f2f2f2f   
...
>8

and for the process killed by signal 8, we get below.

>8
[ARCLinux]# kill -8 71
[ARCLinux]# potentially unexpected fatal signal 8.
[STAT32]: 0x80080882 : IE U
BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x
r00: 0xfdfc    r01: 0x5fbcddf0    r02: 0x   
r03: 0x0008    r04: 0x80808080    r05: 0x2f2f2f2f   
r06: 0x7a2f5f4a    r07: 0x    r08: 0x0065   
...


[1]+  Floating point exception   ./sleep
>8

I'm not sure whats the improvement here vs. the status quo.

For signal based kill, we don't want to dump the extra registers and if any, we
might still want to print the PC where the process was last seen in user mode to
give user of idea what it was doing at the time.

-Vineet

Re: [RFC v3 PATCH 3/5] mm: refactor do_munmap() to extract the common part

2018-07-02 Thread Michal Hocko

On Mon 02-07-18 09:59:06, Yang Shi wrote:
> 
> 
> On 7/2/18 6:42 AM, Michal Hocko wrote:
> > On Sat 30-06-18 06:39:43, Yang Shi wrote:
> > > Introduces two new helper functions:
> > >* munmap_addr_sanity()
> > >* munmap_lookup_vma()
> > > 
> > > They will be used by do_munmap() and the new do_munmap with zapping
> > > large mapping early in the later patch.
> > > 
> > > There is no functional change, just code refactor.
> > There are whitespace changes which make the code much harder to review
> > than necessary.
> > > +static inline bool munmap_addr_sanity(unsigned long start, size_t len)
> > >   {
> > > - unsigned long end;
> > > - struct vm_area_struct *vma, *prev, *last;
> > > + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE - 
> > > start)
> > > + return false;
> > > - if ((offset_in_page(start)) || start > TASK_SIZE || len > 
> > > TASK_SIZE-start)
> > > - return -EINVAL;
> > e.g. here.
> 
> Oh, yes. I did some coding style cleanup too.

If you want to do some coding style cleanups make them a separate patch.
The resulting diff would be much easier to review.

-- 
Michal Hocko
SUSE Labs

[PATCH v2 3/8] arm64/kernel: jump_label: switch to relative references

2018-07-02 Thread Ard Biesheuvel

On a randomly chosen distro kernel build for arm64, vmlinux.o shows the
following sections, containing jump label entries, and the associated
RELA relocation records, respectively:

  ...
  [38088] __jump_table  PROGBITS   00e19f30
   0002ea10    WA   0 0 8
  [38089] .rela__jump_table RELA   01fd8bb0
   0008be30  0018   I  38178   38088 8
  ...

In other words, we have 190 KB worth of 'struct jump_entry' instances,
and 573 KB worth of RELA entries to relocate each entry's code, target
and key members. This means the RELA section occupies 10% of the .init
segment, and the two sections combined represent 5% of vmlinux's entire
memory footprint.

So let's switch from 64-bit absolute references to 32-bit relative
references for the code and target field, and a 64-bit relative
reference for the 'key' field (which may reside in another module or the
core kernel, which may be more than 4 GB way on arm64 when running with
KASLR enable): this reduces the size of the __jump_table by 33%, and
gets rid of the RELA section entirely.

Acked-by: Will Deacon 
Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/Kconfig  |  1 +
 arch/arm64/include/asm/jump_label.h | 36 +---
 arch/arm64/kernel/jump_label.c  |  6 ++--
 3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1940c6405d04..d17aa9614e69 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -91,6 +91,7 @@ config ARM64
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
+   select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
diff --git a/arch/arm64/include/asm/jump_label.h 
b/arch/arm64/include/asm/jump_label.h
index 1b5e0e843c3a..bb6d15b34668 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -26,13 +26,15 @@
 
 #define JUMP_LABEL_NOP_SIZEAARCH64_INSN_SIZE
 
-static __always_inline bool arch_static_branch(struct static_key *key, bool 
branch)
+static __always_inline bool arch_static_branch(struct static_key *key,
+  bool branch)
 {
-   asm goto("1: nop\n\t"
-".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
-".popsection\n\t"
+   asm goto("1:nop \n\t"
+"  .pushsection__jump_table, \"aw\"\n\t"
+"  .align  3   \n\t"
+"  .long   1b - ., %l[l_yes] - .   \n\t"
+"  .quad   %c0 - . \n\t"
+"  .popsection \n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
return false;
@@ -40,13 +42,15 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
return true;
 }
 
-static __always_inline bool arch_static_branch_jump(struct static_key *key, 
bool branch)
+static __always_inline bool arch_static_branch_jump(struct static_key *key,
+   bool branch)
 {
-   asm goto("1: b %l[l_yes]\n\t"
-".pushsection __jump_table,  \"aw\"\n\t"
-".align 3\n\t"
-".quad 1b, %l[l_yes], %c0\n\t"
-".popsection\n\t"
+   asm goto("1:b   %l[l_yes]   \n\t"
+"  .pushsection__jump_table, \"aw\"\n\t"
+"  .align  3   \n\t"
+"  .long   1b - ., %l[l_yes] - .   \n\t"
+"  .quad   %c0 - . \n\t"
+"  .popsection \n\t"
 :  :  "i"(&((char *)key)[branch]) :  : l_yes);
 
return false;
@@ -54,13 +58,5 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
return true;
 }
 
-typedef u64 jump_label_t;
-
-struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
-};
-
 #endif  /* __ASSEMBLY__ */
 #endif /* __ASM_JUMP_LABEL_H */
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index c2dd1ad3e648..903d17023d77 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -25,12 +25,12 @@
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   void *addr = (void *)entry->code;
+   void *addr = (void *)jump_entry_code(entry);
u32 insn;
 
if (type ==

Re: [PATCH v10 4/7] i2c: fsi: Add abort and hardware reset procedures

2018-07-02 Thread Wolfram Sang

Hi Eddie,

> > I think this is a way too aggressive recovery. Your are doing the 9
> > pulse toggles basically on any error while this is only when the device
> > keeps SDA low and you want to recover from that. If SDA is not stuck
> > low, sending a STOP should do. Or do you have a known case where this is
> > not going to work?
> 
> It is aggressive, but I don't see the harm in doing this on every error.

Well, as it happens, I just fixed such a case. Please check these patch
series and elinux wiki pages:

===

(new fault injector)
[PATCH v2 0/2] i2c: gpio: fault-injector: add new injector

(actual recovery fix)
[PATCH 0/2] i2c: recovery: make sure pulses are not misinterpreted

===

And here is the new elinux wiki page to describe my findings:

https://elinux.org/Tests:I2C-bus-recovery-write-byte-fix

Also, the previous pages have been updated to reflect the latest status:

https://elinux.org/Tests:I2C-fault-injection
https://elinux.org/Tests:I2C-bus-recovery

To sum it up: This is a proven case where uncontrolled bus recovery can
result into a bogus write!

> There are some other error conditions with this hardware which may require
> the clock toggling, such as "bus arbitration lost." I think this is the

Why is that? In my understanding, recovery is *only* needed when SDA is
stuck low. If SDA is high, sending STOP should do. If not, it needs to
be researched why.

> safest option for this hardware, and this routine has been tested for many
> years.

I remember having a similar argument with Joakim Tjernlund a while ago.
I recently re-read our argument, yet I still keep my position: I don't
want to do $random things to recover, just a tested and well understood
procedure. And in that thread, I was never given a test case.

> > 
> > Also, you implement the pulse toggling manually. Can't you just populate
> > {get|set}_{scl|sda} and use the generic routine we have in the core?
> 
> I see that the generic implementation breaks the loop if it sees the clock
> isn't high after setting it, or if SDA goes high. I think it's safer to
> finish the reset for our hardware. Plus, we actually have different

Why do you think it is safer? What is the test case for that? I think
one really should do check SDA! See above, you might trigger a write
otherwise. If this breaks something for you, I am looking forward to
discuss it.

> registers for setting 0 or 1 to the clock/data, so we save some cpu cycles
> by doing it directly instead of implementing set_scl/sda and having to check
> val every time :)

Correctness comes above all here. And I am afraid your implementation is
not correct.

> If you feel very strongly that this recovery procedure needs to be reduced,
> then I will work on that and have to do some extensive testing.

I am open for discussion, yet I also feel strong about it. The reason
why the recovery procedure is moved into the core is to have one working
and understood bit-banging algorithm which all drivers can rely on. If
all drivers implement their custom version, they might miss gory details
like the above write_byte fix.

I do understand this might cause testing effort for you, I am sorry for
the delay it causes. However, my goal as a maintainer is to have a
reliable recovery mechanism, for your driver as well as for all drivers.

I hope this is understandable. BTW if you want this driver upstream
soon, then it may be an idea to resend it without any bus recovery and
then we can work on it incrementally.

Kind regards and thanks,

   Wolfram


signature.asc
Description: PGP signature

[PATCH v2 1/8] kernel/jump_label: abstract jump_entry member accessors

2018-07-02 Thread Ard Biesheuvel

In preparation of allowing architectures to use relative references
in jump_label entries [which can dramatically reduce the memory
footprint], introduce abstractions for references to the 'code',
'target' and 'key' members of struct jump_entry.

Signed-off-by: Ard Biesheuvel 
---
 include/linux/jump_label.h | 34 +
 kernel/jump_label.c| 40 
 2 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index b46b541c67c4..4603a1c88e48 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -119,6 +119,40 @@ struct static_key {
 
 #ifdef HAVE_JUMP_LABEL
 #include 
+
+#ifndef __ASSEMBLY__
+
+static inline unsigned long jump_entry_code(const struct jump_entry *entry)
+{
+   return entry->code;
+}
+
+static inline unsigned long jump_entry_target(const struct jump_entry *entry)
+{
+   return entry->target;
+}
+
+static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
+{
+   return (struct static_key *)((unsigned long)entry->key & ~1UL);
+}
+
+static inline bool jump_entry_is_branch(const struct jump_entry *entry)
+{
+   return (unsigned long)entry->key & 1UL;
+}
+
+static inline bool jump_entry_is_init(const struct jump_entry *entry)
+{
+   return entry->code == 0;
+}
+
+static inline void jump_entry_set_init(struct jump_entry *entry)
+{
+   entry->code = 0;
+}
+
+#endif
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 01ebdf1f9f40..8a3ac4f5f490 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -38,10 +38,10 @@ static int jump_label_cmp(const void *a, const void *b)
const struct jump_entry *jea = a;
const struct jump_entry *jeb = b;
 
-   if (jea->key < jeb->key)
+   if (jump_entry_key(jea) < jump_entry_key(jeb))
return -1;
 
-   if (jea->key > jeb->key)
+   if (jump_entry_key(jea) > jump_entry_key(jeb))
return 1;
 
return 0;
@@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(jump_label_rate_limit);
 
 static int addr_conflict(struct jump_entry *entry, void *start, void *end)
 {
-   if (entry->code <= (unsigned long)end &&
-   entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
+   if (jump_entry_code(entry) <= (unsigned long)end &&
+   jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
return 1;
 
return 0;
@@ -321,16 +321,6 @@ static inline void static_key_set_linked(struct static_key 
*key)
key->type |= JUMP_TYPE_LINKED;
 }
 
-static inline struct static_key *jump_entry_key(struct jump_entry *entry)
-{
-   return (struct static_key *)((unsigned long)entry->key & ~1UL);
-}
-
-static bool jump_entry_branch(struct jump_entry *entry)
-{
-   return (unsigned long)entry->key & 1UL;
-}
-
 /***
  * A 'struct static_key' uses a union such that it either points directly
  * to a table of 'struct jump_entry' or to a linked list of modules which in
@@ -355,7 +345,7 @@ static enum jump_label_type jump_label_type(struct 
jump_entry *entry)
 {
struct static_key *key = jump_entry_key(entry);
bool enabled = static_key_enabled(key);
-   bool branch = jump_entry_branch(entry);
+   bool branch = jump_entry_is_branch(entry);
 
/* See the comment in linux/jump_label.h */
return enabled ^ branch;
@@ -370,12 +360,12 @@ static void __jump_label_update(struct static_key *key,
 * An entry->code of 0 indicates an entry which has been
 * disabled because it was in an init text area.
 */
-   if (entry->code) {
-   if (kernel_text_address(entry->code))
+   if (!jump_entry_is_init(entry)) {
+   if (kernel_text_address(jump_entry_code(entry)))
arch_jump_label_transform(entry, 
jump_label_type(entry));
else
WARN_ONCE(1, "can't patch jump_label at %pS",
- (void *)(unsigned long)entry->code);
+ (void *)jump_entry_code(entry));
}
}
 }
@@ -430,8 +420,8 @@ void __init jump_label_invalidate_initmem(void)
struct jump_entry *iter;
 
for (iter = iter_start; iter < iter_stop; iter++) {
-   if (init_section_contains((void *)(unsigned long)iter->code, 1))
-   iter->code = 0;
+   if (init_section_contains((void *)jump_entry_code(iter), 1))
+   jump_entry_set_init(iter);
}
 }
 
@@ -441,7 +431,7 @@ static enum jump_label_type jump_label_init_type(struct 
jump_entry *entry)
 {
struct static_key *key = jump_entry_key(entry);
bool type = static_key_type(key);
-   bool branch = jump_entry_branch(entry);
+   bool branch =

Re: [PATCH 1/2] Revert "UBIFS: Fix potential integer overflow in allocation"

2018-07-02 Thread Kees Cook

On Mon, Jul 2, 2018 at 10:50 AM, Richard Weinberger  wrote:
> Am Montag, 2. Juli 2018, 18:00:05 CEST schrieb Kees Cook:
>> On Sun, Jul 1, 2018 at 2:20 PM, Richard Weinberger  wrote:
>> > This reverts commit 353748a359f1821ee934afc579cf04572406b420.
>> > It bypassed the linux-mtd review process and fixes the issue not as it
>> > should.
>>
>> Ah, sorry, I thought you were CCed on the original report.
>
> No big deal. I was just "surprised".

Yeah, totally my mistake. There were other overflow patches that went
out pubically and I thought this one had too.

>> > Cc: Kees Cook 
>> > Cc: Silvio Cesare 
>> > Cc: sta...@vger.kernel.org
>> > Signed-off-by: Richard Weinberger 
>> > ---
>> >  fs/ubifs/journal.c | 5 ++---
>> >  1 file changed, 2 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
>> > index 07b4956e0425..da8afdfccaa6 100644
>> > --- a/fs/ubifs/journal.c
>> > +++ b/fs/ubifs/journal.c
>> > @@ -1282,11 +1282,10 @@ static int truncate_data_node(const struct 
>> > ubifs_info *c, const struct inode *in
>> >   int *new_len)
>> >  {
>> > void *buf;
>> > -   int err, compr_type;
>> > -   u32 dlen, out_len, old_dlen;
>> > +   int err, dlen, compr_type, out_len, old_dlen;
>>
>> What's wrong with making these unsigned?
>
> Well, what is the benefit?
> In ubifs a data node carries at most 4k of bytes.
> WORST_COMPR_FACTOR is 2.
> So the computed lengths are always in a range where a natural int does work 
> just fine.

Just a robustness preference: it keeps it from going negative. But I
don't feel strongly. :)

>> > out_len = le32_to_cpu(dn->size);
>> > -   buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS);
>> > +   buf = kmalloc(out_len * WORST_COMPR_FACTOR, GFP_NOFS);
>> > if (!buf)
>> > return -ENOMEM;
>>
>> Please leave the kmalloc() -> kmalloc_array() change, as that has
>> happened treewide already. We don't want to have any multiplications
>> in the size argument for the allocators (i.e. they should use 2-factor
>> arg version like here, or use array_size() for things like vmalloc()).
>
> Let's queue another patch for the next merge window which converts
> kmalloc() -> kmalloc_array().

I'd prefer to leave it as-is for 4.18 because it would be the only
unconverted kmalloc()-with-multiplication in the entire tree. We did
treewide conversions and a revert would be undoing that here. (The
scripts that check for this case would run "clean" for 4.18.)

So, this gets back to the question of the int vs u32: if you just
didn't revert this patch, then the kmalloc_array() would stand too.
Easy! :)

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v4 2/6] dt: qcom: 8996: thermal: Move to DT initialisation

2018-07-02 Thread Bjorn Andersson

On Mon 02 Jul 05:44 PDT 2018, Amit Kucheria wrote:

> We also split up the regmap address space into two, one for the TM
> registers, the other for the SROT registers. This was required to deal with
> different address offsets for the TM and SROT registers across different
> SoC families.
> 
> Since tsens-common.c/init_common() currently only registers one address
> space, the order is important (TM before SROT). This is OK since the code
> doesn't really use the SROT functionality yet.
> 
> Signed-off-by: Amit Kucheria 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
>  arch/arm64/boot/dts/qcom/msm8996.dtsi | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi 
> b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> index 8c7f9ca..6c8a857 100644
> --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
> +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> @@ -461,7 +461,17 @@
>  
>   tsens0: thermal-sensor@4a8000 {
>   compatible = "qcom,msm8996-tsens";
> - reg = <0x4a8000 0x2000>;
> + reg = <0x4a9000 0x1000>, /* TM */
> +   <0x4a8000 0x1000>; /* SROT */
> + #qcom,sensors = <13>;
> + #thermal-sensor-cells = <1>;
> + };
> +
> + tsens1: thermal-sensor@4ac000 {
> + compatible = "qcom,msm8996-tsens";
> + reg = <0x4ad000 0x1000>, /* TM */
> +   <0x4ac000 0x1000>; /* SROT */
> + #qcom,sensors = <8>;
>   #thermal-sensor-cells = <1>;
>   };
>  
> -- 
> 2.7.4
>

[PATCH] Drivers: hv: hv_kvp: Mark expected switch fall-through

2018-07-02 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/hv/hv_kvp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 5eed1e7..9b1adcb 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -352,6 +352,7 @@ static void process_ib_ipinfo(void *in_msg, void *out_msg, 
int op)
MAX_IP_ADDR_SIZE);
 
out->body.kvp_ip_val.dhcp_enabled = in->kvp_ip_val.dhcp_enabled;
+   /* fall through */
 
default:
utf16s_to_utf8s((wchar_t *)in->kvp_ip_val.adapter_id,
-- 
2.7.4

Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

2018-07-02 Thread Mathieu Desnoyers

- On Jul 2, 2018, at 1:11 PM, Andy Lutomirski l...@amacapital.net wrote:
> 
> But I think that the limited solution of changing
> instruction_pointer_set() really is a sufficient
> architecture-dependent change to fully solve your problem.

So let me recap with the changes I gather for 4.18 and 4.19:

4.18:

* Change struct rseq_cs field types from LINUX_FIELD_u32_u64() to __u64 in
  uapi/linux/rseq.h,
* Compare rseq->rseq_cs->abort_ip with TASK_SIZE before using it. Kill offending
  process if its value is over TASK_SIZE,
* Explicitly check that padding of rseq->rseq_cs is zero on 32-bit kernels
  (#ifndef __LP64__).

4.19:

* Introduce instruction_pointer_set() with input validation, use it when setting
  IP to abort_ip in rseq. This replaces the comparison of abort_ip with 
TASK_SIZE.

Is that consistent with what you have in mind ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[PATCH] staging: rtl8188eu: remove rtw_mp_phy_regdef.h

2018-07-02 Thread Michael Straube

The header rtw_mp_phy_regdef.h is not used anywhere.
'git grep rtw_mp_phy_regdef.h' returns nothing, remove the file.

Signed-off-by: Michael Straube 
---
 .../rtl8188eu/include/rtw_mp_phy_regdef.h | 1078 -
 1 file changed, 1078 deletions(-)
 delete mode 100644 drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h

diff --git a/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h 
b/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h
deleted file mode 100644
index 9276e2321f2a..
--- a/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h
+++ /dev/null
@@ -1,1078 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/**
- *
- * Copyright(c) 2007 - 2011 Realtek Corporation. All rights reserved.
- *
- 
**/
-/*
- *
- * Module: __RTW_MP_PHY_REGDEF_H_
- *
- *
- * Note:   1. Define PMAC/BB register map
- * 2. Define RF register map
- * 3. PMAC/BB register bit mask.
- * 4. RF reg bit mask.
- * 5. Other BB/RF relative definition.
- *
- *
- * Export: Constants, macro, functions(API), global variables(None).
- *
- * Abbrev:
- *
- * History:
- * DataWho Remark
- * 08/07/2007  MHC 1. Porting from 9x series PHYCFG.h.
- * 2. Reorganize code architecture.
- * 09/25/2008  MH  1. Add RL6052 register definition
- *
- */
-#ifndef __RTW_MP_PHY_REGDEF_H_
-#define __RTW_MP_PHY_REGDEF_H_
-
-
-/*--Define Parameters---*/
-
-/*  */
-/* 8192S Regsiter offset definition */
-/*  */
-
-/*  */
-/*  BB-PHY register PMAC 0x100 PHY 0x800 - 0xEFF */
-/*  1. PMAC duplicate register due to connection: RF_Mode, TRxRN, NumOf L-STF 
*/
-/*  2. 0x800/0x900/0xA00/0xC00/0xD00/0xE00 */
-/*  3. RF register 0x00-2E */
-/*  4. Bit Mask for BB/RF register */
-/*  5. Other definition for BB/RF R/W */
-/*  */
-
-
-/*  */
-/*  1. PMAC duplicate register due to connection: RF_Mode, TRxRN, NumOf L-STF 
*/
-/*  1. Page1(0x100) */
-/*  */
-#definerPMAC_Reset 0x100
-#definerPMAC_TxStart   0x104
-#definerPMAC_TxLegacySIG   0x108
-#definerPMAC_TxHTSIG1  0x10c
-#definerPMAC_TxHTSIG2  0x110
-#definerPMAC_PHYDebug  0x114
-#definerPMAC_TxPacketNum   0x118
-#definerPMAC_TxIdle0x11c
-#definerPMAC_TxMACHeader0  0x120
-#definerPMAC_TxMACHeader1  0x124
-#definerPMAC_TxMACHeader2  0x128
-#definerPMAC_TxMACHeader3  0x12c
-#definerPMAC_TxMACHeader4  0x130
-#definerPMAC_TxMACHeader5  0x134
-#definerPMAC_TxDataType0x138
-#definerPMAC_TxRandomSeed  0x13c
-#definerPMAC_CCKPLCPPreamble   0x140
-#definerPMAC_CCKPLCPHeader 0x144
-#definerPMAC_CCKCRC16  0x148
-#definerPMAC_OFDMRxCRC32OK 0x170
-#definerPMAC_OFDMRxCRC32Er 0x174
-#definerPMAC_OFDMRxParityEr0x178
-#definerPMAC_OFDMRxCRC8Er  0x17c
-#definerPMAC_CCKCRxRC16Er  0x180
-#definerPMAC_CCKCRxRC32Er  0x184
-#definerPMAC_CCKCRxRC32OK  0x188
-#definerPMAC_TxStatus  0x18c
-
-/*  */
-/*  2. Page2(0x200) */
-/*  */
-/*  The following two definition are only used for USB interface. */
-/* define  RF_BB_CMD_ADDR  0x02c0   RF/BB read/write command address. */
-/* define  RF_BB_CMD_DATA  0x02c4   RF/BB read/write command data. */
-
-/*  */
-/*  3. Page8(0x800) */
-/*  */
-#definerFPGA0_RFMOD0x800   /* RF mode & CCK TxSC RF BW 
Setting?? */
-
-#definerFPGA0_TxInfo   0x804   /*  Status report?? */
-#definerFPGA0_PSDFunction  0x808
-
-#definerFPGA0_TxGainStage  0x80c   /*  Set TX PWR init gain? */
-
-#definerFPGA0_RFTiming10x810   /*  Useless now */
-#definerFPGA0_RFTiming20x814
-/* define rFPGA0_XC_RFTiming   0x818 */
-/* define rFPGA0_XD_RFTiming   0x81c */
-
-#define rFPGA0_XA_HSSIParameter1   0x820   /*  RF 3 wire register */
-#define rFPGA0_XA_HSSIParameter2   0x824
-#define rFPGA0_XB_HSSIParameter1   0x828
-#define rFPGA0_XB_HSSIParameter2   0x82c
-#define rFPGA0_XC_HSSIParameter1   0x830
-#define rFPGA0_XC_HSSIParameter2   0x834
-#define rFPGA0_XD_HSSIParameter1   0x838
-#define rFPGA0_XD_HSSIParameter2   0x83c
-#definerFPGA0_XA_LSSIParameter 0x840
-#definerFPGA0_XB_LSSIParameter 0x844
-#define

Re: [v7,07/10] dt-bindings: counter: Document stm32 quadrature encoder

2018-07-02 Thread David Lechner


On 06/21/2018 04:08 PM, William Breathitt Gray wrote:

From: Benjamin Gaignard 

Add bindings for STM32 Timer quadrature encoder.
It is a sub-node of STM32 Timer which implement the
quadratic encoder part of the hardware.


quadradic? or quadrature?



Cc: Rob Herring 
Cc: Mark Rutland 
Signed-off-by: Benjamin Gaignard 
Signed-off-by: William Breathitt Gray 
---
  .../bindings/counter/stm32-timer-cnt.txt  | 31 +++
  .../devicetree/bindings/mfd/stm32-timers.txt  |  7 +
  2 files changed, 38 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt

diff --git a/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt 
b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
new file mode 100644
index ..c52fcdd4bf6c
--- /dev/null
+++ b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt
@@ -0,0 +1,31 @@
+STMicroelectronics STM32 Timer quadrature encoder
+
+STM32 Timer provides quadrature encoder to detect
+angular position and direction of rotary elements,
+from IN1 and IN2 input signals.
+
+Must be a sub-node of an STM32 Timer device tree node.
+See ../mfd/stm32-timers.txt for details about the parent node.
+
+Required properties:
+- compatible:  Must be "st,stm32-timer-counter".
+- pinctrl-names:   Set to "default".
+- pinctrl-0:   List of phandles pointing to pin configuration nodes,
+   to set CH1/CH2 pins in mode of operation for STM32
+   Timer input on external pin.
+
+Example:
+   timers@4001 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "st,stm32-timers";
+   reg = <0x4001 0x400>;
+   clocks = < 0 160>;
+   clock-names = "int";
+
+   counter {
+   compatible = "st,stm32-timer-counter";
+   pinctrl-names = "default";
+   pinctrl-0 = <_in_pins>;
+   };
+   };
diff --git a/Documentation/devicetree/bindings/mfd/stm32-timers.txt 
b/Documentation/devicetree/bindings/mfd/stm32-timers.txt
index 1db6e0057a63..ff9c14ada30b 100644
--- a/Documentation/devicetree/bindings/mfd/stm32-timers.txt
+++ b/Documentation/devicetree/bindings/mfd/stm32-timers.txt
@@ -23,6 +23,7 @@ Optional parameters:
  Optional subnodes:
  - pwm:See ../pwm/pwm-stm32.txt
  - timer:  See ../iio/timer/stm32-timer-trigger.txt
+- counter: See ../counter/stm32-timer-cnt.txt
  
  Example:

timers@4001 {
@@ -43,4 +44,10 @@ Example:
compatible = "st,stm32-timer-trigger";
reg = <0>;
};
+
+   counter {
+   compatible = "st,stm32-timer-counter";
+   pinctrl-names = "default";
+   pinctrl-0 = <_in_pins>;
+   };
};

[PATCH v5] add param that allows bootline control of hardened usercopy

2018-07-02 Thread Chris von Recklinghausen

From: Chris von Recklinghausen  

Enabling HARDENED_USERCOPY causes measurable regressions in
 networking performance, up to 8% under UDP flood.

I'm running an a small packet UDP flood using pktgen vs. a host b2b
connected. On the receiver side the UDP packets are processed by a
simple user space process that just reads and drops them:

https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c

Not very useful from a functional PoV, but it helps to pin-point
bottlenecks in the networking stack.

When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
regression in the receive tput, compared to the same kernel without
this option enabled.

With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).

The call-chain is:

__GI___libc_recvfrom
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_recvfrom
__sys_recvfrom
inet_recvmsg
udp_recvmsg
__check_object_size

udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
calls check_copy_size() (again, inlined).

A generic distro may want to enable HARDENED_USERCOPY in their default
kernel config, but at the same time, such distro may want to be able to
avoid the performance penalties in with the default configuration and
disable the stricter check on a per-boot basis.

This change adds a boot parameter that conditionally disables
HARDENED_USERCOPY at boot time.

This feature is not available on platforms that don't have CONFIG_JUMP_LABEL
set.

v4->v5:
key off of CONFIG_JUMP_LABEL, not CONFIG_SMP_BROKEN.

v3->v4:
fix a couple of nits in commit comments
declaration of bypass_usercopy_checks moved inside mm/usercopy.c and
made static
add blurb to commit comments about not enabling this functionality on
platforms with CONFIG_BROKEN_ON_SMP set.
v2->v3:
add benchmark details to commit comments
Don't add new item to Documentation/admin-guide/kernel-parameters.rst
rename boot param to "hardened_usercopy="
update description in Documentation/admin-guide/kernel-parameters.txt
static_branch_likely -> static_branch_unlikely
add __ro_after_init versions of DEFINE_STATIC_KEY_FALSE,
DEFINE_STATIC_KEY_TRUE
disable_huc_atboot -> enable_checks (strtobool "on" == true)

v1->v2:
remove CONFIG_HUC_DEFAULT_OFF
default is now enabled, boot param disables
move check to __check_object_size so as to not break optimization of
__builtin_constant_p()
include linux/atomic.h before linux/jump_label.h

Signed-off-by: Chris von Recklinghausen 
---
 Documentation/admin-guide/kernel-parameters.txt | 11 
 include/linux/jump_label.h  |  6 +
 mm/usercopy.c   | 35 +
 3 files changed, 52 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7..560d4dc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -816,6 +816,17 @@
disable=[IPV6]
See Documentation/networking/ipv6.txt.
 
+   hardened_usercopy=
+[KNL] Under CONFIG_HARDENED_USERCOPY, whether
+hardening is enabled for this boot. Hardened
+usercopy checking is used to protect the kernel
+from reading or writing beyond known memory
+allocation boundaries as a proactive defense
+against bounds-checking flaws in the kernel's
+copy_to_user()/copy_from_user() interface.
+on  Perform hardened usercopy checks (default).
+off Disable hardened usercopy checks.
+
disable_radix   [PPC]
Disable RADIX MMU mode on POWER9
 
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index b46b541..1a0b6f1 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -299,12 +299,18 @@ struct static_key_false {
 #define DEFINE_STATIC_KEY_TRUE(name)   \
struct static_key_true name = STATIC_KEY_TRUE_INIT
 
+#define DEFINE_STATIC_KEY_TRUE_RO(name)\
+   struct static_key_true name __ro_after_init = STATIC_KEY_TRUE_INIT
+
 #define DECLARE_STATIC_KEY_TRUE(name)  \
extern struct static_key_true name
 
 #define DEFINE_STATIC_KEY_FALSE(name)  \
struct static_key_false name = STATIC_KEY_FALSE_INIT
 
+#define DEFINE_STATIC_KEY_FALSE_RO(name)   \
+   struct static_key_false name __ro_after_init = STATIC_KEY_FALSE_INIT
+
 #define DECLARE_STATIC_KEY_FALSE(name) \
extern struct static_key_false name
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325..baa0193 100644
--- a/mm/usercopy.c

Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig

2018-07-02 Thread Randy Dunlap

On 07/02/18 13:03, Randy Dunlap wrote:
> On 07/02/18 07:47, Christoph Hellwig wrote:
>> Instead of duplicating the source statements in every architecture just
>> do it once in the toplevel Kconfig file.
>>
>> Signed-off-by: Christoph Hellwig 
>> ---
>>  Kconfig | 22 ++
>>  arch/alpha/Kconfig  | 20 
>>  arch/arc/Kconfig| 16 
>>  arch/arm/Kconfig| 25 -
>>  arch/arm64/Kconfig  | 23 ---
>>  arch/c6x/Kconfig| 24 
>>  arch/h8300/Kconfig  | 24 
>>  arch/hexagon/Kconfig| 16 
>>  arch/ia64/Kconfig   | 20 
>>  arch/m68k/Kconfig   | 24 
>>  arch/microblaze/Kconfig | 24 
>>  arch/mips/Kconfig   | 24 
>>  arch/nds32/Kconfig  | 16 
>>  arch/nios2/Kconfig  | 24 
>>  arch/openrisc/Kconfig   | 23 ---
>>  arch/parisc/Kconfig | 24 
>>  arch/powerpc/Kconfig| 19 ---
>>  arch/riscv/Kconfig  | 24 
>>  arch/s390/Kconfig   | 24 
>>  arch/sh/Kconfig | 24 
>>  arch/sparc/Kconfig  | 24 
>>  arch/unicore32/Kconfig  | 24 
>>  arch/x86/Kconfig| 22 +-
>>  arch/xtensa/Kconfig | 25 -
>>  24 files changed, 23 insertions(+), 512 deletions(-)
>>
>> diff --git a/Kconfig b/Kconfig
>> index a90d9f9e268b..5499b1273ba5 100644
>> --- a/Kconfig
>> +++ b/Kconfig
>> @@ -10,3 +10,25 @@ comment "Compiler: $(CC_VERSION_TEXT)"
>>  source "scripts/Kconfig.include"
>>  
>>  source "arch/$(SRCARCH)/Kconfig"
>> +
>> +source "init/Kconfig"
> 
> Hi Christoph,
> 
> Looks good overall.  I'm still doing some testing on it.
> 
> I would prefer to have init/Kconfig before arch/$(SRCARCH)/Kconfig.

Ugh, that won't get this set correctly on x86_64:
CONFIG_PGTABLE_LEVELS=2

> Is there a reason that you chose the ordering above?
> Any known dependencies?
> 
> Thanks.
> 
>> +
>> +source "kernel/Kconfig.freezer"
>> +
>> +menu "Executable file formats"
>> +source "fs/Kconfig.binfmt"
>> +endmenu
>> +
>> +source "mm/Kconfig"
>> +
>> +source "net/Kconfig"
>> +
>> +source "drivers/Kconfig"
>> +
>> +source "fs/Kconfig"
>> +
>> +source "security/Kconfig"
>> +
>> +source "crypto/Kconfig"
>> +
>> +source "lib/Kconfig"
> 
> 


-- 
~Randy

Re: [PATCH v4] ata: Remove depends on HAS_DMA in case of platform dependency

2018-07-02 Thread Tejun Heo

On Fri, Jun 22, 2018 at 01:03:07PM +0200, Geert Uytterhoeven wrote:
> Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
> symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
> In most cases this other symbol is an architecture or platform specific
> symbol, or PCI.
> 
> Generic symbols and drivers without platform dependencies keep their
> dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
> cannot work anyway.
> 
> This simplifies the dependencies, and allows to improve compile-testing.
> 
> Signed-off-by: Geert Uytterhoeven 
> Reviewed-by: Mark Brown 
> Acked-by: Robin Murphy 

Applied to libata/for-4.18-fixes.

Thanks.

-- 
tejun

[PATCH v2 3/7] tracing: Generalize hist trigger onmax and save action

2018-07-02 Thread Tom Zanussi

From: Tom Zanussi 

The action refactor code allowed actions and handlers to be separated,
but the existing onmax handler and save action code is still not
flexible enough to handle arbitrary coupling.  This change generalizes
them and in the process makes additional handlers and actions easier
to implement.

The onmax action can be broken up and thought of as two separate
components - a variable to be tracked (the parameter given to the
onmax($var_to_track) function) and an invisible variable created to
save the ongoing result of doing something with that variable, such as
saving the max value of that variable so far seen.

Separating it out like this and renaming it appropriately allows us to
use the same code for similar tracking functions such as
onchange($var_to_track), which would just track the last value seen
rather than the max seen so far, which is useful in some situations.

Additionally, because different handlers and actions may want to save
and access data differently e.g. save and retrieve tracking values as
local variables vs something more global, save_val() and get_val()
interface functions are introduced and max-specific implementations
are used instead.

The same goes for the code that checks whether a maximum has been hit
- a generic check_val() interface and max-checking implementation is
used instead, which allows future patches to make use of he same code
using their own implemetations of similar functionality.

Signed-off-by: Tom Zanussi 
---
 kernel/trace/trace_events_hist.c | 213 +--
 1 file changed, 139 insertions(+), 74 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 61a3a8ca2f76..f54dc8782d91 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -328,6 +328,15 @@ typedef void (*action_fn_t) (struct hist_trigger_data 
*hist_data,
 struct ring_buffer_event *rbe,
 struct action_data *data, u64 *var_ref_vals);
 
+typedef bool (*check_track_val_fn_t) (u64 track_val, u64 var_val);
+typedef bool (*save_track_val_fn_t) (struct hist_trigger_data *hist_data,
+struct tracing_map_elt *elt,
+struct action_data *data,
+unsigned int track_var_idx, u64 var_val);
+typedef u64 (*get_track_val_fn_t) (struct hist_trigger_data *hist_data,
+  struct tracing_map_elt *elt,
+  struct action_data *data);
+
 enum handler_id {
HANDLER_ONMATCH = 1,
HANDLER_ONMAX,
@@ -358,14 +367,18 @@ struct action_data {
 
struct {
char*var_str;
-   unsigned intmax_var_ref_idx;
-   struct hist_field   *max_var;
-   struct hist_field   *var;
-   } onmax;
+   struct hist_field   *var_ref;
+   unsigned intvar_ref_idx;
+
+   struct hist_field   *track_var;
+
+   check_track_val_fn_tcheck_val;
+   save_track_val_fn_t save_val;
+   get_track_val_fn_t  get_val;
+   } track_data;
};
 };
 
-
 static char last_hist_cmd[MAX_FILTER_STR_VAL];
 static char hist_err_str[MAX_FILTER_STR_VAL];
 
@@ -3122,10 +3135,10 @@ static void update_field_vars(struct hist_trigger_data 
*hist_data,
hist_data->n_field_vars, 0);
 }
 
-static void update_max_vars(struct hist_trigger_data *hist_data,
-   struct tracing_map_elt *elt,
-   struct ring_buffer_event *rbe,
-   void *rec)
+static void update_save_vars(struct hist_trigger_data *hist_data,
+struct tracing_map_elt *elt,
+struct ring_buffer_event *rbe,
+void *rec)
 {
__update_field_vars(elt, rbe, rec, hist_data->save_vars,
hist_data->n_save_vars, hist_data->n_field_var_str);
@@ -3263,15 +3276,68 @@ create_target_field_var(struct hist_trigger_data 
*target_hist_data,
return create_field_var(target_hist_data, file, var_name);
 }
 
-static void onmax_print(struct seq_file *m,
-   struct hist_trigger_data *hist_data,
-   struct tracing_map_elt *elt,
-   struct action_data *data)
+static bool check_track_val_max(u64 track_val, u64 var_val)
+{
+   if (var_val <= track_val)
+   return false;
+
+   return true;
+}
+
+static u64 get_track_val_local(struct hist_trigger_data *hist_data,
+  struct tracing_map_elt *elt,
+  struct action_data *data)
+{
+   unsigned int

[PATCH v2 1/7] tracing: Refactor hist trigger action code

2018-07-02 Thread Tom Zanussi

From: Tom Zanussi 

The hist trigger action code currently implements two essentially
hard-coded pairs of 'actions' - onmax(), which tracks a variable and
saves some event fields when a max is hit, and onmatch(), which is
hard-coded to generate a synthetic event.

These hardcoded pairs (track max/save fields and detect match/generate
synthetic event) should really be decoupled into separate components
that can then be arbitrarily combined.  The first component of each
pair (track max/detect match) is called a 'handler' in the new code,
while the second component (save fields/generate synthetic event) is
called an 'action' in this scheme.

This change refactors the action code to reflect this split by adding
two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two
actions, ACTION_SAVE and ACTION_TRACE.

The new code combines them to produce the existing ONMATCH/TRACE and
ONMAX/SAVE functionality, but doesn't implement the other combinations
now possible.  Future patches will expand these to further useful
cases, such as ONMAX/TRACE, as well as add additional handlers and
actions such as ONCHANGE and SNAPSHOT.

Signed-off-by: Tom Zanussi 
---
 kernel/trace/trace_events_hist.c | 387 +++
 1 file changed, 230 insertions(+), 157 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 046c716a6536..ede7a27fa52b 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -296,9 +296,9 @@ struct hist_trigger_data {
struct field_var_hist   *field_var_hists[SYNTH_FIELDS_MAX];
unsigned intn_field_var_hists;
 
-   struct field_var*max_vars[SYNTH_FIELDS_MAX];
-   unsigned intn_max_vars;
-   unsigned intn_max_var_str;
+   struct field_var*save_vars[SYNTH_FIELDS_MAX];
+   unsigned intn_save_vars;
+   unsigned intn_save_var_str;
 };
 
 struct synth_field {
@@ -328,8 +328,22 @@ typedef void (*action_fn_t) (struct hist_trigger_data 
*hist_data,
 struct ring_buffer_event *rbe,
 struct action_data *data, u64 *var_ref_vals);
 
+enum handler_id {
+   HANDLER_ONMATCH = 1,
+   HANDLER_ONMAX,
+};
+
+enum action_id {
+   ACTION_SAVE = 1,
+   ACTION_TRACE,
+};
+
 struct action_data {
+   enum handler_id handler;
+   enum action_id  action;
+   char*action_name;
action_fn_t fn;
+
unsigned intn_params;
char*params[SYNTH_FIELDS_MAX];
 
@@ -338,13 +352,11 @@ struct action_data {
unsigned intvar_ref_idx;
char*match_event;
char*match_event_system;
-   char*synth_event_name;
struct synth_event  *synth_event;
} onmatch;
 
struct {
char*var_str;
-   char*fn_name;
unsigned intmax_var_ref_idx;
struct hist_field   *max_var;
struct hist_field   *var;
@@ -1560,7 +1572,7 @@ find_match_var(struct hist_trigger_data *hist_data, char 
*var_name)
for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *data = hist_data->actions[i];
 
-   if (data->fn == action_trace) {
+   if (data->handler == HANDLER_ONMATCH) {
char *system = data->onmatch.match_event_system;
char *event_name = data->onmatch.match_event;
 
@@ -1992,7 +2004,7 @@ static int hist_trigger_elt_data_alloc(struct 
tracing_map_elt *elt)
}
}
 
-   n_str = hist_data->n_field_var_str + hist_data->n_max_var_str;
+   n_str = hist_data->n_field_var_str + hist_data->n_save_var_str;
 
size = STR_VAR_LEN_MAX;
 
@@ -2934,7 +2946,7 @@ create_field_var_hist(struct hist_trigger_data 
*target_hist_data,
int ret;
 
if (target_hist_data->n_field_var_hists >= SYNTH_FIELDS_MAX) {
-   hist_err_event("onmatch: Too many field variables defined: ",
+   hist_err_event("trace action: Too many field variables defined: 
",
   subsys_name, event_name, field_name);
return ERR_PTR(-EINVAL);
}
@@ -2942,7 +2954,7 @@ create_field_var_hist(struct hist_trigger_data 
*target_hist_data,
file = event_file(tr, subsys_name, event_name);
 
if (IS_ERR(file)) {
-   hist_err_event("onmatch: Event file not found: ",
+   hist_err_event("trace action: Event file not found: ",

Re: [PATCH] leds: max8997: use mode when calling max8997_led_set_mode

2018-07-02 Thread Jacek Anaszewski


Hi Colin,

Thank you for the patch.

On 07/02/2018 06:50 PM, Colin King wrote:

From: Colin Ian King 

Variable mode is assigned to pdata->led_pdata->mode[led->id] and yet
is not being used when calling function max8997_led_set_mode. Fix
this by using mode when calling max8997_led_set_mode.

Cleans up clang warning:
warning: variable 'mode' set but not used [-Wunused-but-set-variable]

Fixes: 8584cb82f151 ("leds: Add suuport for MAX8997-LED driver")
Signed-off-by: Colin Ian King 
---
  drivers/leds/leds-max8997.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/leds/leds-max8997.c b/drivers/leds/leds-max8997.c
index 4edf74f1d6d4..8c019c28f9f5 100644
--- a/drivers/leds/leds-max8997.c
+++ b/drivers/leds/leds-max8997.c
@@ -268,7 +268,7 @@ static int max8997_led_probe(struct platform_device *pdev)
mode = pdata->led_pdata->mode[led->id];
brightness = pdata->led_pdata->brightness[led->id];
  
-		max8997_led_set_mode(led, pdata->led_pdata->mode[led->id]);

+   max8997_led_set_mode(led, mode);
  
  		if (brightness > led->cdev.max_brightness)

brightness = led->cdev.max_brightness;



Applied.

--
Best regards,
Jacek Anaszewski

Re: [PATCH] security: CONFIG_HARDENED_USERCOPY does not need to select BUG

2018-07-02 Thread Kees Cook

Hi Kamal,

On Mon, Jul 2, 2018 at 1:14 PM, Kamal Mostafa  wrote:
> On Fri, Jun 29, 2018 at 01:27:08PM -0700, Kees Cook wrote:
>> Do the lkdtm tests for usercopy correctly halt the kernel thread if
>> CONFIG_BUG is removed?
>
> Yes, they do...

Perfect, thanks for double-checking! I'll apply this to my tree.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2 2/2] arm64: dts: qcom: pm8998: Add pm8998 thermal zone

2018-07-02 Thread Matthias Kaehlcke

On Mon, Jul 02, 2018 at 12:53:44PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Mon, Jul 2, 2018 at 11:10 AM, Matthias Kaehlcke  wrote:
> > The thermal zone uses spmi-temp-alarm as sensor. If the sensor is
> > configured without an IIO input it always reports 37°C for temperatures
> > below the first hardware trip point at 105°C. This hardware trip point
> > is configured as critical trip point, to initiate a system shutdown
> > before the temperature reaches the next hardware trip point at 125°C,
> > where the PMIC performs a partial shutdown.
> >
> > The temperature of the critical trip point can be raised after adding
> > the die temperature ADC as IIO input for spmi-temp-alarm, which
> > significantly increases the precision of the temperature measurements.
> >
> > Signed-off-by: Matthias Kaehlcke 
> > ---
> > Changes in v2:
> > - defined 'thermal-zones' node in pm8998.dtsi instead of using a label
> >   to refer to it
> > - use 105°C hardware trip point as critical trip point
> 
> I'm not sure this was right.  I guess you're trying to avoid
> Temperature Stage 2?

Indeed

> From Davi'd email in response to v1:
> 
> > The PMIC TEMP_ALARM hardware peripheral will perform an automatic partial
> > PMIC shutdown upon hitting over-temperature stage 2 (125 C).  This turns
> > off peripherals within the PMIC that are expected to draw significant
> > current.  The set of peripherals included varies between PMICs.  This
> > partial shutdown will occur simultaneously with the triggering of an
> > interrupt to the APPS processor that informs the qcom-spmi-temp-alarm
> > driver that an over-temperature threshold has been crossed.
> 
> I think it's actually OK to use Temperature Stage 2 as the "critical"
> point, which is why it still interrupts the CPU.  At "critical" the
> system will shut down, right?  ...so presumably it's OK if the drivers
> can't recover from having the power yanked out from underneath them as
> long as they don't hang/crash the system in this case.  If I had to
> guess the whole point of this stage is to give the system shutdown a
> better chance of succeeding without getting to stage 3.

That was my starting point, however in my tests the system reset
several times when the temperature got close to 125°C, not allowing
for a proper shutdown. Apparently the partial shutdown of the PMIC can
result in a full reset at least on some systems.

> I do agree, however, that removing the "145" from the device tree was
> the right thing to do since software will never see that.  The system
> will just shut down.
> 
> 
> > - reduced number of trip points to 2
> > - lowered temperature of passive trip point
> 
> This won't actually do anything until the ADC gets hooked up, right?

Correct

> I guess I would have expected:
> - 105: passive
> - 125: critical
> 
> ...and then we could add (if we wanted) a "hot" between passive and
> critical once we have the ADC hooked up?

Exactly, once we have more granularity we could add (an)other trip point(s).

> > - updated trip point names and added labels
> > - updated commit message
> >
> >  arch/arm64/boot/dts/qcom/pm8998.dtsi | 25 +
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> > b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > index 2f4989e7ef68..e7caa334e6c7 100644
> > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> > @@ -3,6 +3,7 @@
> >
> >  #include 
> >  #include 
> > +#include 
> >
> >  _bus {
> > pm8998_lsid0: pmic@0 {
> > @@ -60,3 +61,27 @@
> > #size-cells = <0>;
> > };
> >  };
> > +
> > +/ {
> > +   thermal-zones {
> > +   pm8998 {
> > +   polling-delay-passive = <250>;
> > +   polling-delay = <1000>;
> > +
> > +   thermal-sensors = <_temp>;
> > +
> > +   trips {
> > +   pm8998_alert0: pm8998-alert0 {
> > +   temperature = <95000>;
> > +   hysteresis = <2000>;
> > +   type = "passive";
> > +   };
> > +   pm8998_crit: pm8998-crit {
> > +   temperature = <105000>;
> > +   hysteresis = <2000>;
> > +   type = "critical";
> > +   };
> > +   };
> > +   };
> > +   };
> > +};
> 
> A nit, but I think convention is to actually put additions straight to
> the root node before reference to phandles, so I would have put this
> above the "_bus" part.

Ok, thanks, will do in v3

Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig

2018-07-02 Thread Randy Dunlap

On 07/02/18 13:41, Randy Dunlap wrote:

> --- linux-next-20180702.orig/init/Kconfig
> +++ linux-next-20180702/init/Kconfig
> @@ -1717,6 +1717,12 @@ config PROFILING
>  config TRACEPOINTS
>   bool
>  
> +# Note:  arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig
> +# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS
> +# before the default value is found in arch/Kconfig.
> +
> +source "arch/$(SRCARCH)/Kconfig"
> +
>  source "arch/Kconfig"
>  
>  endmenu  # General setup
> 

except that the endmenu should be moved up a few lines so that the
Processor type and features menu is not part of the General setup menu.

v2 patch is below.

---
From: Randy Dunlap 

Present "General setup" before "Processor type and features".
This is done by sourcing arch/$(SRCARCH)/Kconfig before arch/Kconfig
inside init/Kconfig.

Signed-off-by: Randy Dunlap 
---
v2: move General setup's endmenu before the $ARCH Kconfigs.

 Kconfig  |2 --
 init/Kconfig |   10 --
 2 files changed, 8 insertions(+), 4 deletions(-)

--- linux-next-20180702.orig/Kconfig
+++ linux-next-20180702/Kconfig
@@ -9,8 +9,6 @@ comment "Compiler: $(CC_VERSION_TEXT)"
 
 source "scripts/Kconfig.include"
 
-source "arch/$(SRCARCH)/Kconfig"
-
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
--- linux-next-20180702.orig/init/Kconfig
+++ linux-next-20180702/init/Kconfig
@@ -1717,10 +1717,16 @@ config PROFILING
 config TRACEPOINTS
bool
 
-source "arch/Kconfig"
-
 endmenu# General setup
 
+# Note:  arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig
+# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS
+# before the default value is found in arch/Kconfig.
+
+source "arch/$(SRCARCH)/Kconfig"
+
+source "arch/Kconfig"
+
 config RT_MUTEXES
bool

Re: [PATCH] ima: Use tpm_default_chip() and call TPM functions with a tpm_chip

2018-07-02 Thread Mimi Zohar

On Mon, 2018-07-02 at 11:24 -0400, Stefan Berger wrote:
> Rather than accessing the TPM functions by passing a NULL pointer for
> the tpm_chip, which causes a lookup for a suitable chip every time, get a
> hold of a tpm_chip and access the TPM functions using it. Also get rid of
> the ima_used_chip variable and use the new ima_tpm_chip variable instead
> for determining whether to call TPM functions.
> 
> Signed-off-by: Stefan Berger 
> Acked-by: Jarkko Sakkinen 

Signed-off-by: Mimi Zohar 

Jarkko, would you mind staging this patch with the rest of the patch
set?

Mimi


> ---
>  security/integrity/ima/ima.h|  2 +-
>  security/integrity/ima/ima_crypto.c |  4 ++--
>  security/integrity/ima/ima_init.c   | 16 +---
>  security/integrity/ima/ima_queue.c  |  4 ++--
>  4 files changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 354bb5716ce3..2ab1aa36 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -53,9 +53,9 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 };
>  extern int ima_policy_flag;
> 
>  /* set during initialization */
> -extern int ima_used_chip;
>  extern int ima_hash_algo;
>  extern int ima_appraise;
> +extern struct tpm_chip *ima_tpm_chip;
> 
>  /* IMA event related data */
>  struct ima_event_data {
> diff --git a/security/integrity/ima/ima_crypto.c 
> b/security/integrity/ima/ima_crypto.c
> index 4e085a17124f..7e7e7e7c250a 100644
> --- a/security/integrity/ima/ima_crypto.c
> +++ b/security/integrity/ima/ima_crypto.c
> @@ -631,10 +631,10 @@ int ima_calc_buffer_hash(const void *buf, loff_t len,
> 
>  static void __init ima_pcrread(int idx, u8 *pcr)
>  {
> - if (!ima_used_chip)
> + if (!ima_tpm_chip)
>   return;
> 
> - if (tpm_pcr_read(NULL, idx, pcr) != 0)
> + if (tpm_pcr_read(ima_tpm_chip, idx, pcr) != 0)
>   pr_err("Error Communicating to TPM chip\n");
>  }
> 
> diff --git a/security/integrity/ima/ima_init.c 
> b/security/integrity/ima/ima_init.c
> index 29b72cd2502e..faac9ecaa0ae 100644
> --- a/security/integrity/ima/ima_init.c
> +++ b/security/integrity/ima/ima_init.c
> @@ -26,7 +26,7 @@
> 
>  /* name for boot aggregate entry */
>  static const char *boot_aggregate_name = "boot_aggregate";
> -int ima_used_chip;
> +struct tpm_chip *ima_tpm_chip;
> 
>  /* Add the boot aggregate to the IMA measurement list and extend
>   * the PCR register.
> @@ -64,7 +64,7 @@ static int __init ima_add_boot_aggregate(void)
>   iint->ima_hash->algo = HASH_ALGO_SHA1;
>   iint->ima_hash->length = SHA1_DIGEST_SIZE;
> 
> - if (ima_used_chip) {
> + if (ima_tpm_chip) {
>   result = ima_calc_boot_aggregate();
>   if (result < 0) {
>   audit_cause = "hashing_error";
> @@ -106,17 +106,11 @@ void __init ima_load_x509(void)
> 
>  int __init ima_init(void)
>  {
> - u8 pcr_i[TPM_DIGEST_SIZE];
>   int rc;
> 
> - ima_used_chip = 0;
> - rc = tpm_pcr_read(NULL, 0, pcr_i);
> - if (rc == 0)
> - ima_used_chip = 1;
> -
> - if (!ima_used_chip)
> - pr_info("No TPM chip found, activating TPM-bypass! (rc=%d)\n",
> - rc);
> + ima_tpm_chip = tpm_default_chip();
> + if (!ima_tpm_chip)
> + pr_info("No TPM chip found, activating TPM-bypass!\n");
> 
>   rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA);
>   if (rc)
> diff --git a/security/integrity/ima/ima_queue.c 
> b/security/integrity/ima/ima_queue.c
> index 418f35e38015..b186819bd5aa 100644
> --- a/security/integrity/ima/ima_queue.c
> +++ b/security/integrity/ima/ima_queue.c
> @@ -142,10 +142,10 @@ static int ima_pcr_extend(const u8 *hash, int pcr)
>  {
>   int result = 0;
> 
> - if (!ima_used_chip)
> + if (!ima_tpm_chip)
>   return result;
> 
> - result = tpm_pcr_extend(NULL, pcr, hash);
> + result = tpm_pcr_extend(ima_tpm_chip, pcr, hash);
>   if (result != 0)
>   pr_err("Error Communicating to TPM chip, result: %d\n", result);
>   return result;

Re: [PATCH 2/2] PCI: imx: Initial imx7d pm support

2018-07-02 Thread Leonard Crestez

On Fri, 2018-06-08 at 16:33 +0200, Lucas Stach wrote:
> Am Dienstag, den 29.05.2018, 22:39 +0300 schrieb Leonard Crestez:
> > On imx7d the phy is turned off in suspend and must be reset on resume.
> > Right now lspci -v fails after a suspend/resume cycle, fix this by
> > adding minimal suspend/resume code from the nxp vendor tree.
> > 
> > This is currently only enabled for imx7 but the same sequence can be
> > applied to other imx pcie variants.

> > +#ifdef CONFIG_PM_SLEEP
> > +static int imx6_pcie_suspend_noirq(struct device *dev)
> > +{
> > +   struct imx6_pcie *imx6_pcie = dev_get_drvdata(dev);
> > +
> > +   if (imx6_pcie->variant == IMX7D) {
> 
> Instead of this large indented block, please just have an early return
> at the start of this function, like:
> 
> if (imx6_pcie->variant != IMX7D)
>   return 0;
> 
> Same for the resume function.

OK. The resume sequence is mostly the same for all SOCs where it
applies.

> > +static int imx6_pcie_resume_noirq(struct device *dev)
> > +{
> > +   int ret = 0;
> > +   struct imx6_pcie *imx6_pcie = dev_get_drvdata(dev);
> > +   struct pcie_port *pp = _pcie->pci->pp;
> > 
> > +
> > +   if (imx6_pcie->variant == IMX7D) {
> > +   imx6_pcie_ltssm_disable(dev);
> 
> The LTSSM disable seems misplaced here. I would have expected it to be
> in the suspend function.

This is a requirement for reinitializing the core: LTSSM needs to be
turned off during initialization.

> > +   /*
> > +* controller maybe turn off, re-configure again
> > +*/
> > +   dw_pcie_setup_rc(pp);
> > +
> > +   imx6_pcie_ltssm_enable(dev);
> > +
> > +   ret = imx6_pcie_wait_for_link(imx6_pcie);
> > +   if (ret < 0)
> > +   pr_info("pcie link is down after resume.\n");
> 
> If the PHY has been powered down and LTSSM stopped I guess we need to
> do the full imx6_pcie_establish_link() dance again here to reliably re-
> establish the link. The above seems unsafe.

What imx6_pcie_establish_link does additionally is some workaround for
link gen detection. I agree that it should be included.

This would make resume mostly the same as imx_pcie_host_init except for
dw_pcie_msi_init; that needs to be saved/restored separately.

Another issue that should be discussed here is that on 6sx and 7d the
gpc PCIE power domain is not defined correctly: the PCIE block is in
the DISPMIX domain on both SOCs and it is only PCIE_PHY which has a
separate power domain.

This matters: enabling power-gating for displays will break pcie if
this relationship is not taken into account. Here are some options:

1) Make _pcie a child of _disp by hacking into gpc probe
functions and calling pm_genpd_add_subdomain. Not very nice.

2) Support nesting PGCs in GPC code? Lots of code and still an
incorrect representation of hardware.

3) Set power-domains = <_disp> and enable runtime PM on _pcie?

4) Add separate devices for the pcie-phy. These would be mostly empty
but still different, for example on imx6sx the PHY is not even
accessible on the bus but only though PCIE registers.

Solutions 1-3 seem like workarounds while 4 seems excessive, would
appreciate some feedback.

--
Regards,
Leonard

Re: [GIT PULL] tee driver for v4.18

2018-07-02 Thread Olof Johansson

Hi Jens,


On Mon, Jul 2, 2018 at 5:10 AM, Jens Wiklander
 wrote:
> Hello arm-soc maintainers,
>
> Please pull these small tee driver enhancements. There's a new config
> option for the OP-TEE driver, OPTEE_SHM_NUM_PRIV_PAGES. Also the OP-TEE
> driver reads current time with ktime_get_real_ts64() from now on.
>
> Thanks,
> Jens
>
> The following changes since commit 29dcea88779c856c7dc92040a0c01233263101d4:
>
>   Linux 4.17 (2018-06-03 14:15:21 -0700)
>
> are available in the Git repository at:
>
>   git://git.linaro.org/people/jens.wiklander/linux-tee.git 
> tags/tee-drv-for-4.18
>
> for you to fetch changes up to 3249527f19d660c5adfb2b6f4ffd4ca0506b8755:
>
>   tee: optee: making OPTEE_SHM_NUM_PRIV_PAGES configurable via Kconfig 
> (2018-06-20 11:20:36 +0200)
>
> 
> Misc enhancement for tee driver subsystem
>
> * Replaces getnstimeofday64() with ktime_get_real_ts64()
> * Adds OPTEE_SHM_NUM_PRIV_PAGES to configure how many pages should be
>   statically reserved for driver private allocations
>
> 
> Arnd Bergmann (1):
>   tee: replace getnstimeofday64() with ktime_get_real_ts64()
>
> Sahil Malhotra (1):
>   tee: optee: making OPTEE_SHM_NUM_PRIV_PAGES configurable via Kconfig

This doesn't seem to be bugfixes (and not regression fixes in
particular), so I took the liberty of queuing these for 4.19 (in
next/drivers) instead of fixes for v4.18.

Let me know if this was an inaccurate assessment and I can move them over.


Thanks!

-Olof

Re: iwlwifi problem with iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()

2018-07-02 Thread Fabio Coatti

On Monday 2 July 2018 15:19:51 CEST Christoph Hellwig wrote:
> > By reverting this commit the card works again, tested in 4.17.3 . I've
> > noticed that the corresponding amd commit (
> > b468620f2a1dfdcfddfd6fa54367b8bcc1b51248) has been reverted in linus tree
> > (e16c4790de39dc861b749674c2a9319507f6f64f), and 4.16.X stable tree iirc, 
> > but the intel one has not been reverted.
> > 
> > hw: Lenovo P50 Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
> > 
> > Please let me know if more details are needed, but CC: me as I'm not
> > currently subscribed to LKML.
> 
> It probablty is the same issue.  Did you check if you can just revert
> the commit cleanly on top of current mainline and that fixes the issue
> for you?

Yep, it does. I issued a 
git revert d657c5c73ca987214a6f9436e435b34fc60f332a
on mainline (4.18-rc3) and recompiled, the card works just fine.

Re: [PATCH] mm/sparse: Make sparse_init_one_section void and remove check

2018-07-02 Thread Pavel Tatashin

On Mon, Jul 2, 2018 at 11:43 AM  wrote:
>
> From: Oscar Salvador 
>
> sparse_init_one_section() is being called from two sites:
> sparse_init() and sparse_add_one_section().
> The former calls it from a for_each_present_section_nr() loop,
> and the latter marks the section as present before calling it.
> This means that when sparse_init_one_section() gets called, we already know
> that the section is present.
> So there is no point to double check that in the function.
>
> This removes the check and makes the function void.
>
> Signed-off-by: Oscar Salvador 

Thank you Oscar.

Reviewed-by: Pavel Tatashin 

> ---
>  mm/sparse.c | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)

Re: [PATCH] arm64: Clear the stack

2018-07-02 Thread Laura Abbott


On 07/02/2018 06:02 AM, Alexander Popov wrote:

Hello Laura,

Thanks for your work!
Please see my comments below.

On 29.06.2018 22:05, Laura Abbott wrote:

Implementation of stackleak based heavily on the x86 version

Signed-off-by: Laura Abbott 
---
Changes since last time:
- Minor name change in entry.S
- Converted to use the generic interfaces so there's minimal additions.
- Added the fast syscall path.
- Addition of on_thread_stack and current_top_of_stack
- Disable stackleak on hyp per suggestion
- Added a define for check_alloca. I'm still not sure about keeping it
   since the x86 version got reworked?

I've mostly kept this as one patch with a minimal commit text. I can
split it up and elaborate more before final merging.
---
  arch/arm64/Kconfig|  1 +
  arch/arm64/include/asm/processor.h|  9 +
  arch/arm64/kernel/entry.S |  7 +++
  arch/arm64/kernel/process.c   | 16 
  arch/arm64/kvm/hyp/Makefile   |  3 ++-
  drivers/firmware/efi/libstub/Makefile |  3 ++-
  include/linux/stackleak.h |  1 +
  scripts/Makefile.gcc-plugins  |  5 -
  8 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..b0221db95dc9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -92,6 +92,7 @@ config ARM64
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
+   select HAVE_ARCH_STACKLEAK
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 767598932549..73914fd7e142 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -248,5 +248,14 @@ void cpu_clear_disr(const struct arm64_cpu_capabilities 
*__unused);
  #define SVE_SET_VL(arg)   sve_set_current_vl(arg)
  #define SVE_GET_VL()  sve_get_current_vl()
  
+/*

+ * For stackleak


May I ask you to call it STACKLEAK here and in other places for easier grep?



Sure


+ *
+ * These need to be macros because otherwise we get stuck in a nightmare
+ * of header definitions for the use of task_stack_page.
+ */
+#define current_top_of_stack() (task_stack_page(current) + THREAD_SIZE)
+#define on_thread_stack()  (on_task_stack(current, current_stack_pointer))
+
  #endif /* __ASSEMBLY__ */
  #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ec2ee720e33e..31c9da7d401e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -401,6 +401,11 @@ tsk.reqx28 // current thread_info
  
  	.text
  
+	.macro	stackleak_erase


Could you rename the macro to STACKLEAK_ERASE for similarity with x86?



Mark Rutland had previously asked for this to be lowercase.
I really don't care one way or the other so I'll defer to
someone else to have the final word.


+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+   bl  stackleak_erase_kstack
+#endif
+   .endm
  /*
   * Exception vectors.
   */
@@ -880,6 +885,7 @@ ret_fast_syscall:
and x2, x1, #_TIF_WORK_MASK
cbnzx2, work_pending
enable_step_tsk x1, x2
+   stackleak_erase
kernel_exit 0
  ret_fast_syscall_trace:
enable_daif
@@ -906,6 +912,7 @@ ret_to_user:
cbnzx2, work_pending
  finish_ret_to_user:
enable_step_tsk x1, x2
+   stackleak_erase
kernel_exit 0
  ENDPROC(ret_to_user)
  
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c

index f08a2ed9db0d..9f0f135f8b66 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -493,3 +493,19 @@ void arch_setup_new_exec(void)
  {
current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
  }
+
+#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
+#define MIN_STACK_LEFT 256
+
+void __used stackleak_check_alloca(unsigned long size)
+{
+   unsigned long sp, stack_left;
+
+   sp = current_stack_pointer;
+
+   stack_left = sp & (THREAD_SIZE - 1);
+   BUG_ON(stack_left < MIN_STACK_LEFT ||
+   size >= stack_left - MIN_STACK_LEFT);
+}
+EXPORT_SYMBOL(stackleak_check_alloca);
+#endif


This code should be updated.
You may remember the troubles I had with MIN_STACK_LEFT:
http://openwall.com/lists/kernel-hardening/2018/05/11/12
Please see that thread where Mark Rutland and I worked out the solution.



Ah yeah, I missed the details in that thread. Thanks for
that pointer.


By the way, different stacks on x86_64 have different sizes. Is it false for 
arm64?



Currently everything except the overflow stack looks to be
the same size but there's also another stack I missed. It might
be cleaner just to use on_accessible_stack and then another
function to get the top of stack. This also might just be
reimplementing what x86

Re: [GIT PULL] x86 fixes

2018-07-02 Thread Andy Lutomirski

On Sat, Jun 30, 2018 at 12:01 PM, Linus Torvalds
 wrote:
> On Sat, Jun 30, 2018 at 1:49 AM Ingo Molnar  wrote:
>>
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32)
>>  * whereas POPF does not.)
>>  */
>> addl$PT_EFLAGS-PT_DS, %esp  /* point esp at pt_regs->flags */
>> -   btr $X86_EFLAGS_IF_BIT, (%esp)
>> +   btrl$X86_EFLAGS_IF_BIT, (%esp)
>> popfl
>
> Ho humm. Just looking at this patch, my reaction was "why isn't this
> an 'andl $~X86_EFLAGS_IF' instead"?
>
> Yeah, I guess the 'andl' is two bytes longer (due to the 32-bit
> constant - because IF is bit 9, you can't use a byte constant, and you
> don't want to get a partial word write just before the popfl).
>
> But btr is really pretty heavy operation for older CPU's (it's gotten
> better, but 32-bit code presumably cares more about the older CPUs).
>
> It really doesn't matter, I guess. The btr goes back to commit
> c2c9b52fab0d ("x86/entry/32: Restore FLAGS on SYSEXIT").
>
> Andy?
>

BTR is way more leet than AND!

Seriously, though, I've never really tried to shave cycles off the
32-bit code, and BTR is shorter, and I didn't spend more than about
one brain cycle thinking about it.  I guess that BTR has a more
complicated flags pipeline (the output flags depend on the input, not
just the output) and probably uses some more complicated ALU circuit
as compared to ANDL.

--Andy

Re: [PATCH v3] add param that allows bootline control of hardened usercopy

2018-07-02 Thread Christoph von Recklinghausen

On 07/02/2018 02:43 PM, Kees Cook wrote:
> On Sat, Jun 30, 2018 at 1:43 PM, Christoph von Recklinghausen
>  wrote:
>> The last issue I'm chasing is build failures on ARCH=m68k. The error is
>> atomic_read and friends needed by the jump label code not being found.
>> The config has CONFIG_BROKEN_ON_SMP=y, so the jump label calls I added
>> will only be made #ifndef CONFIG_BROKEN_ON_SMP. Do you think that's
>> worth a mention in the blurb that's added to
>> Documentation/admin-guide/kernel-parameters.txt?
> Uhm, that's weird -- I think the configs on m68k need fixing then? I
> don't want to have to sprinkle that ifdef in generic code.
>
> How are other users of static keys and jump labels dealing with m68k 
> weirdness?
>
> -Kees
>
There's also CONFIG_JUMP_LABEL which is defined in x86_64 but not
defined in the m68k configs. I'll use that instead. In hindsight I
should have spotted that but didn't.

Thanks,

Chris

Re: [PATCH 2/3] pinctrl: msm: Mux out gpio function with gpio_request()

2018-07-02 Thread Bjorn Andersson

On Thu 28 Jun 10:14 PDT 2018, Stephen Boyd wrote:

> Quoting Linus Walleij (2018-06-28 07:25:46)
> > On Fri, Jun 22, 2018 at 8:29 PM Bjorn Andersson
> >  wrote:
> > > On Fri 22 Jun 10:58 PDT 2018, Bjorn Andersson wrote:
> > > > On Mon 18 Jun 13:52 PDT 2018, Stephen Boyd wrote:
> > > >
> > > > > We rely on devices to use pinmuxing configurations in DT to select the
> > > > > GPIO function (function 0) if they're going to use the gpio in GPIO
> > > > > mode. Let's simplify things for driver authors by implementing
> > > > > gpio_request_enable() for this pinctrl driver to mux out the GPIO
> > > > > function when the gpio is use from gpiolib.
> > > > >
> > > > > Cc: Bjorn Andersson 
> > > >
> > > > Reviewed-by: Bjorn Andersson 
> > (...)
> > > While both patch 2 and 3 are convenient ways to get around the annoyance
> > > of having to specify a pinmux state both patches then ends up relying on
> > > some default pinconf state; which I think is bad.
> 
> What default state are we relying on? The reset state of the pins? I'm
> very confused by this statement. These last two patches are making sure
> that when a GPIO is requested as a GPIO or IRQ, it's really in GPIO
> mode.

Yes and this is convenient, as the TLMM is both multiplexor and gpio
controller this is probably what people would expect. However looking at
the downstream code people don't think this way (i.e. many drivers calls
gpio_request() to get some sort of exclusive access to its pins - not
to request gpio to be the muxed function).

But my concern is related to pinconf, not pinmux - automating pinmuxing
doesn't change the fact that the systems integrator should make sure to
configure appropriate pull properties on the pins.

> If this code is not in place, then we'll allow drivers to request
> a GPIO but pinmuxing won't be called to actually mux that pin to GPIO
> mode unless they have a DT conf for function = "gpio". That seems
> entirely unexpected and easy to mess up.
> 

I do agree with this, but the opposite doesn't feel crystal clear
either.

> > 
> > Nothing stops you from setting up a default conf in
> > this callback though.
> > 
> > But admittedly this call was added for simpler hardware.
> > 
> > > Further more in situations like i2c-qup (downstream), where the pins are
> > > requested as gpios in order to "bitbang" a reset this would mean that
> > > the driver has to counter the convenience; by either switching in the
> > > default pinmux at the end of probe or postponing the gpio_request() to
> > > the invocation of reset and then, after issuing the gpio_release,
> > > switching in the default pinmux explicitly again.
> > 
> > That's a bigger problem. If the system is using device and GPIO
> > mode orthogonally, it'd be good to leave like this.
> > 
> 
> Doesn't that driver need to explicitly mux GPIO mode vs. device mode
> right now? So having gpio_request() do the muxing to GPIO mode and then
> explicit muxing of the pin to device mode would be what we have after
> this patch, while before this patch we would have mux to GPIO, request
> GPIO (nop), mux to device. We saved an explicit pinmux call?
> 

It's currently possible to call gpio_request() in the i2c driver's probe
function and then mux when needed. With this patch you would have to
either follow the gpio_request with a mux-to-default or defer the
gpio_request until the point where they today would have an explicit
mux-to-gpio.

Regards,
Bjorn

Re: [PATCH 12/15] arm: dts: highbank: Add missing OPP properties for CPUs

2018-07-02 Thread Rob Herring

On Fri, May 25, 2018 at 4:32 AM Viresh Kumar  wrote:
>
> The OPP properties, like "operating-points", should either be present
> for all the CPUs of a cluster or none. If these are present only for a
> subset of CPUs of a cluster then things will start falling apart as soon
> as the CPUs are brought online in a different order. For example, this
> will happen because the operating system looks for such properties in
> the CPU node it is trying to bring up, so that it can create an OPP
> table.
>
> Add such missing properties.
>
> Fix other missing property (clock latency) as well to make it all
> work.
>
> Signed-off-by: Viresh Kumar 
> ---
>  arch/arm/boot/dts/highbank.dts | 30 ++
>  1 file changed, 30 insertions(+)

Acked-by: Rob Herring 

ARM-SOC maintainers, Please apply this.

Rob

[RFC PATCH for 4.18 2/2] rseq: validate rseq->rseq_cs padding to be zero

2018-07-02 Thread Mathieu Desnoyers

On 32-bit kernels, the rseq->rseq_cs_padding field is never read by the
kernel. However, 64-bit kernels dealing with 32-bit compat tasks read the
full 64-bit in its entirety, and terminates the offending process with
a segmentation fault if the upper 32 bits are set due to failure of
copy_from_user().

Ensure that both 32-bit and 64-bit kernels dealing with 32-bit tasks end
up terminating offending tasks with a segmentation fault if the upper
32-bit padding bits (rseq->rseq_cs_padding) are set by explicitly ensuring
that padding is zero on 32-bit kernels.

Signed-off-by: Mathieu Desnoyers 
CC: "Paul E. McKenney" 
CC: Peter Zijlstra 
CC: Paul Turner 
CC: Thomas Gleixner 
CC: Andy Lutomirski 
CC: Andi Kleen 
CC: Dave Watson 
CC: Chris Lameter 
CC: Ingo Molnar 
CC: "H. Peter Anvin" 
CC: Ben Maurer 
CC: Steven Rostedt 
CC: Josh Triplett 
CC: Linus Torvalds 
CC: Andrew Morton 
CC: Russell King 
CC: Catalin Marinas 
CC: Will Deacon 
CC: Michael Kerrisk 
CC: Boqun Feng 
CC: linux-...@vger.kernel.org
---
 kernel/rseq.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 2e5d88f09baf..c4c48157198f 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -112,6 +112,29 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t)
return 0;
 }
 
+#ifndef __LP64__
+/*
+ * Ensure that padding is zero.
+ */
+static int check_rseq_cs_padding(struct task_struct *t)
+{
+   unsigned long pad;
+   int ret;
+
+   ret = __get_user(pad, >rseq->rseq_cs_padding);
+   if (ret)
+   return ret;
+   if (pad)
+   return -EFAULT;
+   return 0;
+}
+#else
+static int check_rseq_cs_padding(struct task_struct *t)
+{
+   return 0;
+}
+#endif
+
 static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
 {
struct rseq_cs __user *urseq_cs;
@@ -123,6 +146,9 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct 
rseq_cs *rseq_cs)
ret = __get_user(ptr, >rseq->rseq_cs);
if (ret)
return ret;
+   ret = check_rseq_cs_padding(t);
+   if (ret)
+   return ret;
if (!ptr) {
memset(rseq_cs, 0, sizeof(*rseq_cs));
return 0;
-- 
2.11.0

[RFC PATCH for 4.18 1/2] rseq: use __u64 for rseq_cs fields, validate abort_ip < TASK_SIZE

2018-07-02 Thread Mathieu Desnoyers

Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip
fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather
that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a
consistent behavior for a 32-bit binary executed on 32-bit kernels and in
compat mode on 64-bit kernels.

Validating the value of abort_ip field to be below TASK_SIZE ensures the
kernel don't return to an invalid address when returning to userspace
after an abort. I don't fully trust each architecture code to consistently
deal with invalid return addresses.

If validation fails, the process is killed with a segmentation fault.

Signed-off-by: Mathieu Desnoyers 
CC: "Paul E. McKenney" 
CC: Peter Zijlstra 
CC: Paul Turner 
CC: Thomas Gleixner 
CC: Andy Lutomirski 
CC: Andi Kleen 
CC: Dave Watson 
CC: Chris Lameter 
CC: Ingo Molnar 
CC: "H. Peter Anvin" 
CC: Ben Maurer 
CC: Steven Rostedt 
CC: Josh Triplett 
CC: Linus Torvalds 
CC: Andrew Morton 
CC: Russell King 
CC: Catalin Marinas 
CC: Will Deacon 
CC: Michael Kerrisk 
CC: Boqun Feng 
CC: linux-...@vger.kernel.org
---
 include/uapi/linux/rseq.h | 6 +++---
 kernel/rseq.c | 5 +++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index d620fa43756c..519ad6e176d1 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -52,10 +52,10 @@ struct rseq_cs {
__u32 version;
/* enum rseq_cs_flags */
__u32 flags;
-   LINUX_FIELD_u32_u64(start_ip);
+   __u64 start_ip;
/* Offset from start_ip. */
-   LINUX_FIELD_u32_u64(post_commit_offset);
-   LINUX_FIELD_u32_u64(abort_ip);
+   __u64 post_commit_offset;
+   __u64 abort_ip;
 } __attribute__((aligned(4 * sizeof(__u64;
 
 /*
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 22b6acf1ad63..2e5d88f09baf 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -128,7 +128,8 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct 
rseq_cs *rseq_cs)
return 0;
}
urseq_cs = (struct rseq_cs __user *)ptr;
-   if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)))
+   if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)) ||
+   rseq_cs->abort_ip >= TASK_SIZE)
return -EFAULT;
if (rseq_cs->version > 0)
return -EINVAL;
@@ -137,7 +138,7 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct 
rseq_cs *rseq_cs)
if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset)
return -EINVAL;
 
-   usig = (u32 __user *)(rseq_cs->abort_ip - sizeof(u32));
+   usig = (u32 __user *)(unsigned long)(rseq_cs->abort_ip - sizeof(u32));
ret = get_user(sig, usig);
if (ret)
return ret;
-- 
2.11.0

Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig

2018-07-02 Thread Randy Dunlap

On 07/02/18 07:47, Christoph Hellwig wrote:
> Instead of duplicating the source statements in every architecture just
> do it once in the toplevel Kconfig file.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  Kconfig | 22 ++
>  arch/alpha/Kconfig  | 20 
>  arch/arc/Kconfig| 16 
>  arch/arm/Kconfig| 25 -
>  arch/arm64/Kconfig  | 23 ---
>  arch/c6x/Kconfig| 24 
>  arch/h8300/Kconfig  | 24 
>  arch/hexagon/Kconfig| 16 
>  arch/ia64/Kconfig   | 20 
>  arch/m68k/Kconfig   | 24 
>  arch/microblaze/Kconfig | 24 
>  arch/mips/Kconfig   | 24 
>  arch/nds32/Kconfig  | 16 
>  arch/nios2/Kconfig  | 24 
>  arch/openrisc/Kconfig   | 23 ---
>  arch/parisc/Kconfig | 24 
>  arch/powerpc/Kconfig| 19 ---
>  arch/riscv/Kconfig  | 24 
>  arch/s390/Kconfig   | 24 
>  arch/sh/Kconfig | 24 
>  arch/sparc/Kconfig  | 24 
>  arch/unicore32/Kconfig  | 24 
>  arch/x86/Kconfig| 22 +-
>  arch/xtensa/Kconfig | 25 -
>  24 files changed, 23 insertions(+), 512 deletions(-)
> 
> diff --git a/Kconfig b/Kconfig
> index a90d9f9e268b..5499b1273ba5 100644
> --- a/Kconfig
> +++ b/Kconfig
> @@ -10,3 +10,25 @@ comment "Compiler: $(CC_VERSION_TEXT)"
>  source "scripts/Kconfig.include"
>  
>  source "arch/$(SRCARCH)/Kconfig"
> +
> +source "init/Kconfig"
> +
> +source "kernel/Kconfig.freezer"
> +
> +menu "Executable file formats"
> +source "fs/Kconfig.binfmt"
> +endmenu
> +
> +source "mm/Kconfig"
> +
> +source "net/Kconfig"
> +
> +source "drivers/Kconfig"
> +
> +source "fs/Kconfig"
> +
> +source "security/Kconfig"
> +
> +source "crypto/Kconfig"
> +
> +source "lib/Kconfig"

FWIW, I prefer this modification, but it's not a deal breaker.

---
From: Randy Dunlap 

Present "General setup" before "Processor type and features".
This is done by sourcing arch/$(SRCARCH)/Kconfig before arch/Kconfig
inside init/Kconfig.

Signed-off-by: Randy Dunlap 
---
 Kconfig  |2 --
 init/Kconfig |6 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

--- linux-next-20180702.orig/Kconfig
+++ linux-next-20180702/Kconfig
@@ -9,8 +9,6 @@ comment "Compiler: $(CC_VERSION_TEXT)"
 
 source "scripts/Kconfig.include"
 
-source "arch/$(SRCARCH)/Kconfig"
-
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
--- linux-next-20180702.orig/init/Kconfig
+++ linux-next-20180702/init/Kconfig
@@ -1717,6 +1717,12 @@ config PROFILING
 config TRACEPOINTS
bool
 
+# Note:  arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig
+# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS
+# before the default value is found in arch/Kconfig.
+
+source "arch/$(SRCARCH)/Kconfig"
+
 source "arch/Kconfig"
 
 endmenu# General setup

Re: [PATCH] mmc: dw_mmc: fix card threshold control configuration

2018-07-02 Thread Doug Anderson

Hi,

On Mon, Jun 11, 2018 at 7:17 AM, Shawn Lin  wrote:
> On 2018/6/11 20:20, Ulf Hansson wrote:
>>
>> + Shawn Lin, Evgeniy Didin, Doug Andersson
>>
>> On 29 May 2018 at 12:38, Qing Xia  wrote:
>>>
>>> From: x00270170 
>>>
>>> Card write threshold control is supposed to be set since controller
>>> version 2.80a for data write in HS400 mode and data read in
>>> HS200/HS400/SDR104 mode. However the current code returns without
>>> configuring it in the case of data writing in HS400 mode.
>>> Meanwhile the patch fixes that the current code goes to
>>> 'disable' when doing data reading in HS400 mode.
>>>
>
> I'm more or less unable to review this, as I don't have 2.80a databook,
> nor a such platform to verify it. :(

Sorry for not responding earlier.  I didn't have a lot of context here
so reviewing it never made it to the top of my queue.  ...but quickly
checking what I can...

I'm in almost the same boat as Shawn.  I don't have any HS400 hardware
using dw_mmc available to me, but I do seem to have the 2.80a databook
hanging around and it agrees that we need to enable the card write
threshold for HS400 writes.  That also matches the comments, so the
patch seems right to me.  Probably Jaehoon would make a better
reviewer since he submitted the original code and also has more
familiarity with HS400 on dw_mmc.

I don't think I have enough context to give a full Reviewed-by, but in
the very least I can confirm that the patch seems sane.

-Doug

Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

2018-07-02 Thread Andy Lutomirski

On Mon, Jul 2, 2018 at 7:32 AM Mathieu Desnoyers
 wrote:
>
> - On Jun 29, 2018, at 4:39 PM, Andy Lutomirski l...@amacapital.net wrote:
>
> > On Fri, Jun 29, 2018 at 12:48 PM, Mathieu Desnoyers
> >  wrote:
> >> There are two aspects I'm concerned about here:
> >>
> >> 1) security: we don't want 32-bit user-space to feed a 64-bit value over 
> >> 4GB
> >>as abort_ip that may end up causing OOPSes on architectures that would
> >>lack proper validation of those values on return to userspace.
> >
> > I'm not too worried about this.  As long as you're doing it from
> > signal-delivery context (which you are AFAICT) you're fine.
>
> No, it's not just signal-delivery context. It's _also_ called from
> return to usermode loop, which can by called on return from
> interrupt/trap/syscall.
>

TIF_NOTIFY_RESUME context in the exit slowpath is fine, too.

> >
> > But I re-read the code and I think I have a really straightforward
> > solution.  Two choices:
> >
> > (1) Change instruction_pointer_set() to return an error code if the
> > address passed in is garbage in a way that could cause unexpected
> > behavior (like >=2^32 on x86_64 if regs->cs is 32-bit).  It has very
> > very few callers.
>
> This would take care of my security concern wrt abort_ip, but would not
> provide consistent behavior for the other fields. Also, perhaps this
> kind of change should aim the next merge window ?

It's not about security.  The idea is that instruction_pointer_set()
should return some indication of whether it actually set the
instruction pointer to the requested value.  On x86, if you have
!user_64bit_mode(regs) and you call instruction_pointer_set() to set
ip to 0xbaadc0de12345678, then you end up with a state where we will
probably execute user code at the address 0x12345678.  Conversely, if
you have user_64bit_mode(regs) == true and you set ip to
0xbaadc0de12345678, then you will end up sending a signal to the task
because 0xbaadc0de12345678 is not executable (and, in fact, is highly
likely to be noncanonical).

So I would argue that the semantics *should* be:

/*
 * Attempts to modify @regs such that the next user instruction to be
executed is
 * the instruction at @addr.  instruction_pointer_set() may return
false to indicate
 * that addr was invalid in the sense that the next user instruction executed
 * might be some other address instead.  The most likely cause is that
 * regs refers to a 32-bit compat context, addr != (u32)addr, and the
architecture
 * might silently truncate the address on the next return to user code.
 *
 * instruction_pointer_set() must only be called from a context in
which the architecture
 * allows arbitrary modifications of @regs.
 *
 * Architecture implementations promise that calling
instruction_pointer_set() will not
 * crash or otherwise corrupt the kernel when called from a valid
context, regardless
 * of what value is passed in @addr.
 */
bool instruction_pointer_set(struct pt_regs *regs, unsigned long addr);

>
> >
> > (2) Add instruction_pointer_validate() to go along with
> > instruction_pointer_set().
> >
> > That should be enough to solve the problem, right?
>
> This would only handle the "security" part of the matter, which
> is specifically related to rseq->rseq_cs->abort_ip.
>
> What is left is ensuring that we have consistent behavior for
> other fields:
>
> [ Note: we have introduced this helper macro: LINUX_FIELD_u32_u64
> which defines a field which is 64-bit for 64-bit processes, and 32-bit
> with 32-bit of padding for 32-bit processes. ]
>
> * rseq->rseq_cs: (userspace pointer to user-space, updated by user-space
>   with single-copy atomicity): current type: LINUX_FIELD_u32_u64,
>   cannot be changed to __u64 due to single-copy atomicity requirement,
>
> * rseq->rseq_cs->start_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> * rseq->rseq_cs->post_commit_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> * rseq->rseq_cs->abort_ip: currently a LINUX_FIELD_u32_u64,
>   could become a __u64,
>
> For abort_ip, changing the type to __u64 and using the
> instruction_pointer_validate() approach you propose would work.
>
> For start_ip and post_commit_ip, we need to decide whether we
> want to kill a 32-bit process setting the high bits or if we just
> accept and use the full __u64 content on both 32-bit and 64-bit
> kernels. Those two fields are only used for arithmetic comparison.
> Using the full __u64 content means using 64-bit arithmetic on
> 32-bit native kernels though.

Just use the 64-bit values, I think.  I see no point in killing the task.

>
> For rseq->rseq_cs, we cannot use __u64 due to single-copy atomicity
> update requirement for 32-bit processes. However, we are using this
> field in a copy_from_user(), so it will EFAULT if the high-bits are
> set by a compat 32-bit task on a 64-bit kernel. We can therefore check
> that the padding is zeroed explicitly on a native 32-bit kernel to
> provide a consistent behavior.

Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping

2018-07-02 Thread Yang Shi





On 7/2/18 5:33 AM, Kirill A. Shutemov wrote:

On Sat, Jun 30, 2018 at 06:39:44AM +0800, Yang Shi wrote:

When running some mmap/munmap scalability tests with large memory (i.e.

300GB), the below hung task issue may happen occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
  ps  D0 14018  1 0x0004
   885582f84000 885e8682f000 880972943000 885ebf499bc0
   8828ee12 c900349bfca8 817154d0 0040
   00ff812f872a 885ebf499bc0 024000d000948300 880972943000
  Call Trace:
   [] ? __schedule+0x250/0x730
   [] schedule+0x36/0x80
   [] rwsem_down_read_failed+0xf0/0x150
   [] call_rwsem_down_read_failed+0x18/0x30
   [] down_read+0x20/0x40
   [] proc_pid_cmdline_read+0xd9/0x4e0
   [] ? do_filp_open+0xa5/0x100
   [] __vfs_read+0x37/0x150
   [] ? security_file_permission+0x9b/0xc0
   [] vfs_read+0x96/0x130
   [] SyS_read+0x55/0xc0
   [] entry_SYSCALL_64_fastpath+0x1a/0xc5

It is because munmap holds mmap_sem from very beginning to all the way
down to the end, and doesn't release it in the middle. When unmapping
large mapping, it may take long time (take ~18 seconds to unmap 320GB
mapping with every single page mapped on an idle machine).

It is because munmap holds mmap_sem from very beginning to all the way
down to the end, and doesn't release it in the middle. When unmapping
large mapping, it may take long time (take ~18 seconds to unmap 320GB
mapping with every single page mapped on an idle machine).

Zapping pages is the most time consuming part, according to the
suggestion from Michal Hock [1], zapping pages can be done with holding
read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write
mmap_sem to cleanup vmas. All zapped vmas will have VM_DEAD flag set,
the page fault to VM_DEAD vma will trigger SIGSEGV.

Define large mapping size thresh as PUD size or 1GB, just zap pages with
read mmap_sem for mappings which are >= thresh value.

If the vma has VM_LOCKED | VM_HUGETLB | VM_PFNMAP or uprobe, then just
fallback to regular path since unmapping those mappings need acquire
write mmap_sem.

For the time being, just do this in munmap syscall path. Other
vm_munmap() or do_munmap() call sites remain intact for stability
reason.

The below is some regression and performance data collected on a machine
with 32 cores of E5-2680 @ 2.70GHz and 384GB memory.

With the patched kernel, write mmap_sem hold time is dropped to us level
from second.

[1] https://lwn.net/Articles/753269/

Cc: Michal Hocko 
Cc: Matthew Wilcox 
Cc: Laurent Dufour 
Cc: Andrew Morton 
Signed-off-by: Yang Shi 
---
  mm/mmap.c | 136 +-
  1 file changed, 134 insertions(+), 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 87dcf83..d61e08b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2763,6 +2763,128 @@ static int munmap_lookup_vma(struct mm_struct *mm, 
struct vm_area_struct **vma,
return 1;
  }
  
+/* Consider PUD size or 1GB mapping as large mapping */

+#ifdef HPAGE_PUD_SIZE
+#define LARGE_MAP_THRESH   HPAGE_PUD_SIZE
+#else
+#define LARGE_MAP_THRESH   (1 * 1024 * 1024 * 1024)
+#endif

PUD_SIZE is defined everywhere.


If THP is defined, otherwise it is:

#define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })




+
+/* Unmap large mapping early with acquiring read mmap_sem */
+static int do_munmap_zap_early(struct mm_struct *mm, unsigned long start,
+  size_t len, struct list_head *uf)
+{
+   unsigned long end = 0;
+   struct vm_area_struct *vma = NULL, *prev, *tmp;
+   bool success = false;
+   int ret = 0;
+
+   if (!munmap_addr_sanity(start, len))
+   return -EINVAL;
+
+   len = PAGE_ALIGN(len);
+
+   end = start + len;
+
+   /* Just deal with uf in regular path */
+   if (unlikely(uf))
+   goto regular_path;
+
+   if (len >= LARGE_MAP_THRESH) {
+   /*
+* need write mmap_sem to split vma and set VM_DEAD flag
+* splitting vma up-front to save PITA to clean if it is failed

What errors do you talk about? ENOMEM on VMA split? Anything else?


Yes, ENOMEM on vma split.




+*/
+   down_write(>mmap_sem);
+   ret = munmap_lookup_vma(mm, , , start, end);
+   if (ret != 1) {
+   up_write(>mmap_sem);
+   return ret;
+   }
+   /* This ret value might be returned, so reset it */
+   ret = 0;
+
+   /*
+* Unmapping vmas, which has VM_LOCKED|VM_HUGETLB|VM_PFNMAP
+* flag set or has uprobes set, need acquire write map_sem,
+* so skip them in early zap. Just deal with such mapping in
+* regular path.
+

Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality

2018-07-02 Thread Jakub Kicinski

On Sun, 1 Jul 2018 11:31:47 +0100, Okash Khawaja wrote:
> On Wed, Jun 27, 2018 at 02:56:49PM +0200, Daniel Borkmann wrote:
> > On 06/27/2018 01:47 PM, Okash Khawaja wrote:  
> > > On Wed, Jun 27, 2018 at 12:34:35PM +0200, Daniel Borkmann wrote:  
> > >> On 06/27/2018 12:35 AM, Jakub Kicinski wrote:  
> > >>> On Tue, 26 Jun 2018 15:27:09 -0700, Martin KaFai Lau wrote:  
> >  On Tue, Jun 26, 2018 at 01:31:33PM -0700, Jakub Kicinski wrote:  
> > >> [...]  
> > > Implementing both outputs in one series will help you structure your
> > > code to best suit both of the formats up front.
> >  hex and "formatted" are the only things missing?  As always, things
> >  can be refactored when new use case comes up.  Lets wait for
> >  Okash input.
> > 
> >  Regardless, plaintext is our current use case.  Having the current
> >  patchset in does not stop us or others from contributing other use
> >  cases (json, "bpftool map find"...etc),  and IMO it is actually
> >  the opposite.  Others may help us get there faster than us alone.
> >  We should not stop making forward progress and take this patch
> >  as hostage because "abc" and "xyz" are not done together.  
> > >>>
> > >>> Parity between JSON and plain text output is non negotiable.  
> > >>
> > >> Longish discussion and some confusion in this thread. :-) First of all
> > >> thanks a lot for working on it, very useful!   
> > > Thanks :)
> > >   
> > >> My $0.02 on it is that so far
> > >> great care has been taken in bpftool to indeed have feature parity 
> > >> between
> > >> JSON and plain text, so it would be highly desirable to keep continuing
> > >> this practice if the consensus is that it indeed is feasible and makes
> > >> sense wrt BTF data. There has been mentioned that given BTF data can be
> > >> dynamic depending on what the user loads via bpf(2) so a potential JSON
> > >> output may look different/break each time anyway. This however could all 
> > >> be
> > >> embedded under a container object that has a fixed key like 'formatted'
> > >> where tools like jq(1) can query into it. I think this would be fine 
> > >> since
> > >> the rest of the (non-dynamic) output is still retained as-is and then
> > >> wouldn't confuse or collide with existing users, and anyone 
> > >> programmatically
> > >> parsing deeper into the BTF data under such JSON container object needs
> > >> to have awareness of what specific data it wants to query from it; so
> > >> there's no conflict wrt breaking anything here. Imho, both outputs would
> > >> be very valuable.  
> > > Okay I can add "formatted" object under json output.
> > > 
> > > One thing to note here is that the fixed output will change if the map
> > > itself changes. So someone writing a program that consumes that fixed
> > > output will have to account for his program breaking in future, thus  
> > 
> > Yes, that aspect is fine though, any program/script parsing this would need
> > to be aware of the underlying map type to make sense of it (e.g. per-cpu vs
> > non per-cpu maps to name one). But that info it could query/verify already
> > beforehand via bpftool as well (via normal map info dump for a given id).
> >   
> > > breaking backward compatibility anyway as far as the developer is
> > > concerned :)
> > > 
> > > I will go ahead with work on "formatted" object.  
> > 
> > Cool, thanks,
> > Daniel  
> 
> 
> hi,
> 
> couple of questions:
> 
> 1. just to be sure, formatted section will be on the same level as "key"
> and "value"? so something like following:
> 
> 
> $ bpftool map dump -p id 8
> [{
> "key": ["0x00","0x00","0x00","0x00"
> ],
> "value": [...
> ],
> "formatted": {
> "key": 0,
> "value": {
> "int_field":  3,
> "pointerfield": 2152930552,
> ...
> }
> }
> }]

Looks good, yes!

> 2. i noticed that the ouput in v1 has all the keys and values on the
> same level. in v2, i'll change them so that each key-value pair is a
> separate object. let me know what you think.

For non-JSON output?  No preference, whatever looks better :)  Empty
line between key/value pairs to visually separate them could also
work.  But up to you.

> finally, i noticed there is a map lookup command which also prints map
> entries. do want that to also be btf-printed in this patchset?

It would be nice to share the printing code for the two, yes.

[PATCH] iscsi_ibft: Mark expected switch fall-through

2018-07-02 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/firmware/iscsi_ibft.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c
index 6bc8e66..ba64568 100644
--- a/drivers/firmware/iscsi_ibft.c
+++ b/drivers/firmware/iscsi_ibft.c
@@ -542,6 +542,7 @@ static umode_t __init ibft_check_tgt_for(void *data, int 
type)
case ISCSI_BOOT_TGT_NIC_ASSOC:
case ISCSI_BOOT_TGT_CHAP_TYPE:
rc = S_IRUGO;
+   /* fall through */
case ISCSI_BOOT_TGT_NAME:
if (tgt->tgt_name_len)
rc = S_IRUGO;
-- 
2.7.4

[PATCH] liquidio: make timeout HZ independent and readable

2018-07-02 Thread Nicholas Mc Guire

schedule_timeout_* takes a timeout in jiffies but the code currently is
passing in a constant which makes this timeout HZ dependent. So define
a constant with (hopefully) meaningful name and pass it through
msecs_to_jiffies() to fix the HZ dependency.

Signed-off-by: Nicholas Mc Guire 
commit f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters")
---

Problem found by experimental coccinelle script

The current wait time can vary by a factor 10 depending on the HZ
setting chose, which does not seem reasonable here.

The below patch sets the timeout to 1s - which is the current duration
assuming a setting of HZ== 100. It is though not clear if this is the
intent or if it should be shorter as it is not clear what HZ setting
was assumed during design and used for testing.

This needs an ack by someone who knows the device and can confirm that
waiting 1s for in-flight requests on device removal is reasonable.

Patch was compile tested with: x86_64_defconfig (implies
CONFIG_NET_VENDOR_CAVIUM=y)
(with a large number of sparse warnings though unrelated to the
proposed change)

Patch is against 4.18-rc2 (localversion-next is -next-20180702)

 drivers/net/ethernet/cavium/liquidio/lio_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 7cb4e75..b2d0598 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -91,6 +91,9 @@ static int octeon_console_debug_enabled(u32 console)
  */
 #define LIO_SYNC_OCTEON_TIME_INTERVAL_MS 6
 
+/* time to wait for possible in-flight requests in milliseconds */
+#define WAIT_INFLIGHT_REQUEST  msecs_to_jiffies(1000)
+
 struct lio_trusted_vf_ctx {
struct completion complete;
int status;
@@ -259,7 +262,7 @@ static inline void pcierror_quiesce_device(struct 
octeon_device *oct)
force_io_queues_off(oct);
 
/* To allow for in-flight requests */
-   schedule_timeout_uninterruptible(100);
+   schedule_timeout_uninterruptible(WAIT_INFLIGHT_REQUEST);
 
if (wait_for_pending_requests(oct))
dev_err(>pci_dev->dev, "There were pending requests\n");
-- 
2.1.4

Re: [PATCH v10 7/7] i2c: fsi: Add bus recovery

2018-07-02 Thread Wolfram Sang


> > This all won't have any effect since you never call i2c_recover_bus
> > which calls back into i2c_bus_recovery_info callbacks.
> 
> Ah, I thought there would be some use of this in the core or in client
> drivers, or some ioctl interface. Would there be any outside users of these
> callbacks in the future?

No. i2c_recover_bus() is only to recover from stalled SDA and the only
place where this can be detected is inside the I2C master driver. It is
not an event which can be controlled by a user.



signature.asc
Description: PGP signature

Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

2018-07-02 Thread Linus Torvalds

On Mon, Jul 2, 2018 at 12:02 PM Andy Lutomirski  wrote:
>
> Works for me.  Linus, any objection?

I think the 4.19 stage may be overkill, but I don't hate it, so no
real objections.

If the main reason for this is that we silently clear the upper bits
when returning to compat mode, I actually think that a better fix
would be to just fix that. We shouldn't silently ignore bogus data in
the return path.

But I don't care enough, I think.

Linus

Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-02 Thread Pavel Tatashin

On 07/02/2018 03:47 PM, Dave Hansen wrote:
> On 07/01/2018 07:04 PM, Pavel Tatashin wrote:
>> +for_each_present_section_nr(pnum_begin + 1, pnum_end) {
>> +int nid = sparse_early_nid(__nr_to_section(pnum_end));
>>  
>> +if (nid == nid_begin) {
>> +map_count++;
>>  continue;
>>  }
> 
>> +sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
>> +nid_begin = nid;
>> +pnum_begin = pnum_end;
>> +map_count = 1;
>>  }
> 
> Ugh, this is really hard to read.  Especially because the pnum "counter"
> is called "pnum_end".

I called it pnum_end, because that is what is passed to sparse_init_nid(), but 
I see your point, and I can rename pnum_end to simply pnum if that will make 
things look better.

> 
> So, this is basically a loop that collects all of the adjacent sections
> in a given single nid and then calls sparse_init_nid().  pnum_end in
> this case is non-inclusive, so the sparse_init_nid() call is actually
> for the *previous* nid that pnum_end is pointing _past_.
> 
> This *really* needs commenting.

There is a comment before sparse_init_nid() about inclusiveness:

434 /*
435  * Initialize sparse on a specific node. The node spans [pnum_begin, 
pnum_end)
436  * And number of present sections in this node is map_count.
437  */
438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
439unsigned long pnum_end,
440unsigned long map_count)

Thank you,
Pavel

Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()

2018-07-02 Thread Dave Hansen

> @@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page 
> **map_map,
>  unsigned long pnum_end,
>  unsigned long map_count,
>  int nodeid);
> +struct page * sparse_populate_node(unsigned long pnum_begin,

CodingStyle: put the "*" next to the function name, no space, please.

> +unsigned long pnum_end,
> +unsigned long map_count,
> +int nid);
> +struct page * sparse_populate_node_section(struct page *map_base,
> +unsigned long map_index,
> +unsigned long pnum,
> +int nid);

These two functions are named in very similar ways.  Do they do similar
things?

>  struct page *sparse_mem_map_populate(unsigned long pnum, int nid,
>   struct vmem_altmap *altmap);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index e1a54ba411ec..b3e325962306 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -311,3 +311,52 @@ void __init sparse_mem_maps_populate_node(struct page 
> **map_map,
>   vmemmap_buf_end = NULL;
>   }
>  }
> +
> +struct page * __init sparse_populate_node(unsigned long pnum_begin,
> +   unsigned long pnum_end,
> +   unsigned long map_count,
> +   int nid)
> +{

Could you comment what the function is doing, please?

> + unsigned long size = sizeof(struct page) * PAGES_PER_SECTION;
> + unsigned long pnum, map_index = 0;
> + void *vmemmap_buf_start;
> +
> + size = ALIGN(size, PMD_SIZE) * map_count;
> + vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size,
> +   PMD_SIZE,
> +   __pa(MAX_DMA_ADDRESS));

Let's not repeat the mistakes of the previous version of the code.
Please explain why we are aligning this.  Also,
__earlyonly_bootmem_alloc()->memblock_virt_alloc_try_nid_raw() claims to
be aligning the size.  Do we also need to do it here?

Yes, I know the old code did this, but this is the cost of doing a
rewrite. :)

> + if (vmemmap_buf_start) {
> + vmemmap_buf = vmemmap_buf_start;
> + vmemmap_buf_end = vmemmap_buf_start + size;
> + }

It would be nice to call out that these are globals that other code
picks up.

> + for (pnum = pnum_begin; map_index < map_count; pnum++) {
> + if (!present_section_nr(pnum))
> + continue;
> + if (!sparse_mem_map_populate(pnum, nid, NULL))
> + break;

^ This consumes "vmemmap_buf", right?  That seems like a really nice
thing to point out here if so.

> + map_index++;
> + BUG_ON(pnum >= pnum_end);
> + }
> +
> + if (vmemmap_buf_start) {
> + /* need to free left buf */
> + memblock_free_early(__pa(vmemmap_buf),
> + vmemmap_buf_end - vmemmap_buf);
> + vmemmap_buf = NULL;
> + vmemmap_buf_end = NULL;
> + }
> + return pfn_to_page(section_nr_to_pfn(pnum_begin));
> +}
> +
> +/*
> + * Return map for pnum section. sparse_populate_node() has populated memory 
> map
> + * in this node, we simply do pnum to struct page conversion.
> + */
> +struct page * __init sparse_populate_node_section(struct page *map_base,
> +   unsigned long map_index,
> +   unsigned long pnum,
> +   int nid)
> +{
> + return pfn_to_page(section_nr_to_pfn(pnum));
> +}

What is up with all of the unused arguments to this function?

> diff --git a/mm/sparse.c b/mm/sparse.c
> index d18e2697a781..c18d92b8ab9b 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page 
> **map_map,
>  __func__);
>   }
>  }
> +
> +static unsigned long section_map_size(void)
> +{
> + return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
> +}

Seems like if we have this, we should use it wherever possible, like
sparse_populate_node().


> +/*
> + * Try to allocate all struct pages for this node, if this fails, we will
> + * be allocating one section at a time in sparse_populate_node_section().
> + */
> +struct page * __init sparse_populate_node(unsigned long pnum_begin,
> +   unsigned long pnum_end,
> +   unsigned long map_count,
> +   int nid)
> +{
> + return memblock_virt_alloc_try_nid_raw(section_map_size() * map_count,
> +PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
> +

Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-02 Thread Dave Hansen

On 07/02/2018 12:54 PM, Pavel Tatashin wrote:
> 
> 
> On 07/02/2018 03:47 PM, Dave Hansen wrote:
>> On 07/01/2018 07:04 PM, Pavel Tatashin wrote:
>>> +   for_each_present_section_nr(pnum_begin + 1, pnum_end) {
>>> +   int nid = sparse_early_nid(__nr_to_section(pnum_end));
>>>  
>>> +   if (nid == nid_begin) {
>>> +   map_count++;
>>> continue;
>>> }
>>
>>> +   sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
>>> +   nid_begin = nid;
>>> +   pnum_begin = pnum_end;
>>> +   map_count = 1;
>>> }
>>
>> Ugh, this is really hard to read.  Especially because the pnum "counter"
>> is called "pnum_end".
> 
> I called it pnum_end, because that is what is passed to
> sparse_init_nid(), but I see your point, and I can rename pnum_end to
> simply pnum if that will make things look better.

Could you just make it a helper that takes a beginning pnum and returns
the number of consecutive sections?

>> So, this is basically a loop that collects all of the adjacent sections
>> in a given single nid and then calls sparse_init_nid().  pnum_end in
>> this case is non-inclusive, so the sparse_init_nid() call is actually
>> for the *previous* nid that pnum_end is pointing _past_.
>>
>> This *really* needs commenting.
> 
> There is a comment before sparse_init_nid() about inclusiveness:
> 
> 434 /*
> 435  * Initialize sparse on a specific node. The node spans [pnum_begin, 
> pnum_end)
> 436  * And number of present sections in this node is map_count.
> 437  */
> 438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
> 439unsigned long pnum_end,
> 440unsigned long map_count)

Which I totally missed.  Could you comment the code, please?

Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code

2018-07-02 Thread Pavel Tatashin

On Mon, Jul 2, 2018 at 4:00 PM Dave Hansen  wrote:
>
> On 07/02/2018 12:54 PM, Pavel Tatashin wrote:
> >
> >
> > On 07/02/2018 03:47 PM, Dave Hansen wrote:
> >> On 07/01/2018 07:04 PM, Pavel Tatashin wrote:
> >>> +   for_each_present_section_nr(pnum_begin + 1, pnum_end) {
> >>> +   int nid = sparse_early_nid(__nr_to_section(pnum_end));
> >>>
> >>> +   if (nid == nid_begin) {
> >>> +   map_count++;
> >>> continue;
> >>> }
> >>
> >>> +   sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count);
> >>> +   nid_begin = nid;
> >>> +   pnum_begin = pnum_end;
> >>> +   map_count = 1;
> >>> }
> >>
> >> Ugh, this is really hard to read.  Especially because the pnum "counter"
> >> is called "pnum_end".
> >
> > I called it pnum_end, because that is what is passed to
> > sparse_init_nid(), but I see your point, and I can rename pnum_end to
> > simply pnum if that will make things look better.
>
> Could you just make it a helper that takes a beginning pnum and returns
> the number of consecutive sections?

But sections do not have to be consequent. Some nodes may have
sections that are not present. So we are looking for two values:
map_count -> which is number of present sections and node_end for the
current node i.e. the first section of the next node. So the helper
would need to return two things, and would basically repeat the same
code that is done in this function.

>
> >> So, this is basically a loop that collects all of the adjacent sections
> >> in a given single nid and then calls sparse_init_nid().  pnum_end in
> >> this case is non-inclusive, so the sparse_init_nid() call is actually
> >> for the *previous* nid that pnum_end is pointing _past_.
> >>
> >> This *really* needs commenting.
> >
> > There is a comment before sparse_init_nid() about inclusiveness:
> >
> > 434 /*
> > 435  * Initialize sparse on a specific node. The node spans [pnum_begin, 
> > pnum_end)
> > 436  * And number of present sections in this node is map_count.
> > 437  */
> > 438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin,
> > 439unsigned long pnum_end,
> > 440unsigned long map_count)
>
> Which I totally missed.  Could you comment the code, please?

Sure, I will add a comment into sparse_init() as well.

Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

2018-07-02 Thread Andy Lutomirski

On Mon, Jul 2, 2018 at 12:31 PM, Linus Torvalds
 wrote:
> On Mon, Jul 2, 2018 at 12:02 PM Andy Lutomirski  wrote:
>>
>> Works for me.  Linus, any objection?
>
> I think the 4.19 stage may be overkill, but I don't hate it, so no
> real objections.
>
> If the main reason for this is that we silently clear the upper bits
> when returning to compat mode, I actually think that a better fix
> would be to just fix that. We shouldn't silently ignore bogus data in
> the return path.
>
> But I don't care enough, I think.

Like this:

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b81918..ec40223c8856 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -170,6 +170,26 @@ static void exit_to_usermode_loop(struct pt_regs
*regs, u32 cached_flags)
 if (cached_flags & _TIF_USER_RETURN_NOTIFY)
 fire_user_return_notifiers();

+if (unlikely(!user_64bit_mode(regs) &&
+ (regs->ip & 0xull))) {
+siginfo_t info;
+struct task_struct *tsk = current;
+
+/* I haven't thought about this *that* hard. */
+clear_siginfo();
+tsk->thread.cr2= regs->ip;
+tsk->thread.trap_nr = X86_TRAP_PF;
+tsk->thread.error_code = X86_PF_USER | X86_PF_INSTR;
+info.si_signo = SIGSEGV;
+info.si_errno = 0;
+info.si_code = SEGV_MAPERR;
+info.si_addr = (void __user *)regs->ip;
+/* si_addr_lsb? */
+force_sig_info(SIGSEGV, , tsk);
+
+/* And we'll go through the loop again. */
+}
+
 /* Disable IRQs and retry */
 local_irq_disable();

It's whitespace damaged and barely tested, but it seems to at least
not be completely busted.  I don't really love doing this, though.

Re: [PATCH v2 1/2] arm64: dts: qcom: pm8998: Add spmi-temp-alarm node

2018-07-02 Thread Doug Anderson

Hi,

On Mon, Jul 2, 2018 at 11:10 AM, Matthias Kaehlcke  wrote:
> This adds the spmi-temp-alarm node to pm8998 based on the examples in the
> bindings.
>
> Signed-off-by: Matthias Kaehlcke 
> ---
> Changes in v2:
> - none
>
>  arch/arm64/boot/dts/qcom/pm8998.dtsi | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
> b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> index 92bed1e7d4bb..2f4989e7ef68 100644
> --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
> +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
> @@ -11,6 +11,13 @@
> #address-cells = <1>;
> #size-cells = <0>;
>
> +   pm8998_temp: qcom,temp-alarm@2400 {

Remove "qcom," from the node name (AKA please change to
"temp-alarm@2400").  Someone internal in Qualcomm seems to have
started this trend so you see it on all downstream kernels, but
upstream device tree isn't supposed to have it.


> +   compatible = "qcom,spmi-temp-alarm";
> +   reg = <0x2400 0x100>;

Why are there two numbers for the "reg"?  Should just be 0x2400.


-Doug

Re: [PATCH V4 4/7] PCI: Unify try slot and bus reset API

2018-07-02 Thread Sinan Kaya

On 6/29/2018 5:43 PM, Andy Shevchenko wrote:
>> +int pci_try_reset_bus(struct pci_dev *pdev)
>> +{
>> +   bool slot = false;
>> +
>> +   if (!pci_probe_reset_slot(pdev->slot))
>> +   slot = true;
>> +
>> +   return slot ? __pci_try_reset_slot(pdev->slot) :
>> +   __pci_try_reset_bus(pdev->bus);
> This can be as simple as
> 
>   return pci_probe_reset_slot(pdev->slot) ?
> __pci_try_reset_bus(pdev->slot) : __pci_try_reset_slot(pdev->bus);
> 
> 

done

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH v4 0/6] thermal: tsens: Refactoring for TSENSv2 IP

2018-07-02 Thread Eduardo Valentin

On Mon, Jul 02, 2018 at 06:14:03PM +0530, Amit Kucheria wrote:
> This series is a mixed bag:
> - Some code moves to allow code sharing between various v2.x.y versions of
>   the TSENS IP,
> - new qcom,tsens-v2.4.0 DT property for SDM845 and a generic qcom,tsens-v2
>   property as a fallback compatible for all v2.x.y platforms,
> - new platform support (sdm845)
> - a cleanup patch and
> - a DT change to have a common way to deal with the SROT and TM registers
>   despite slightly different features across the IP family and different
>   register offsets.
> 
> Rob mentioned offline that we should expose the full version string of the
> TSENS IP (x.y.z) and have a fallback compatible. I hope patch 4 does what
> you were looking for.

Applied patches 1, 3, and 4. Patch 5 needs a description. Patches 2 and
6 go via your arch tree.

> 
> Regards,
> Amit
> 
> Amit Kucheria (6):
>   thermal: tsens: Get rid of unused fields in structure
>   dt: qcom: 8996: thermal: Move to DT initialisation
>   thermal: tsens: Rename tsens-8996 to tsens-v2 for reuse
>   thermal: tsens: Add support for SDM845
>   thermal: tsens: Check if we have valid data before reading
>   arm64: dts: sdm845: Add tsens nodes
> 
>  .../devicetree/bindings/thermal/qcom-tsens.txt |  2 ++
>  arch/arm64/boot/dts/qcom/msm8996.dtsi  | 12 ++-
>  arch/arm64/boot/dts/qcom/sdm845.dtsi   | 16 +
>  drivers/thermal/qcom/Makefile  |  2 +-
>  drivers/thermal/qcom/{tsens-8996.c => tsens-v2.c}  | 39 
> --
>  drivers/thermal/qcom/tsens.c   |  6 
>  drivers/thermal/qcom/tsens.h   |  7 ++--
>  7 files changed, 62 insertions(+), 22 deletions(-)
>  rename drivers/thermal/qcom/{tsens-8996.c => tsens-v2.c} (66%)
> 
> -- 
> 2.7.4
>

[PATCH v2 2/2] arm64: dts: qcom: pm8998: Add pm8998 thermal zone

2018-07-02 Thread Matthias Kaehlcke

The thermal zone uses spmi-temp-alarm as sensor. If the sensor is
configured without an IIO input it always reports 37°C for temperatures
below the first hardware trip point at 105°C. This hardware trip point
is configured as critical trip point, to initiate a system shutdown
before the temperature reaches the next hardware trip point at 125°C,
where the PMIC performs a partial shutdown.

The temperature of the critical trip point can be raised after adding
the die temperature ADC as IIO input for spmi-temp-alarm, which
significantly increases the precision of the temperature measurements.

Signed-off-by: Matthias Kaehlcke 
---
Changes in v2:
- defined 'thermal-zones' node in pm8998.dtsi instead of using a label
  to refer to it
- use 105°C hardware trip point as critical trip point
- reduced number of trip points to 2
- lowered temperature of passive trip point
- updated trip point names and added labels
- updated commit message

 arch/arm64/boot/dts/qcom/pm8998.dtsi | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 2f4989e7ef68..e7caa334e6c7 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 _bus {
pm8998_lsid0: pmic@0 {
@@ -60,3 +61,27 @@
#size-cells = <0>;
};
 };
+
+/ {
+   thermal-zones {
+   pm8998 {
+   polling-delay-passive = <250>;
+   polling-delay = <1000>;
+
+   thermal-sensors = <_temp>;
+
+   trips {
+   pm8998_alert0: pm8998-alert0 {
+   temperature = <95000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+   pm8998_crit: pm8998-crit {
+   temperature = <105000>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+   };
+   };
+};
-- 
2.18.0.399.gad0ab374a1-goog

[PATCH] drbd: mark expected switch fall-throughs

2018-07-02 Thread Gustavo A. R. Silva

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Warning level 2 was used in this case: -Wimplicit-fallthrough=2

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/block/drbd/drbd_receiver.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index be9450f..a36a307 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -2790,6 +2790,7 @@ static int receive_DataRequest(struct drbd_connection 
*connection, struct packet
   then we would do something smarter here than reading
   the block... */
peer_req->flags |= EE_RS_THIN_REQ;
+   /* fall through */
case P_RS_DATA_REQUEST:
peer_req->w.cb = w_e_end_rsdata_req;
fault_type = DRBD_FAULT_RS_RD;
@@ -2968,6 +2969,7 @@ static int drbd_asb_recover_0p(struct drbd_peer_device 
*peer_device) __must_hold
/* Else fall through to one of the other strategies... */
drbd_warn(device, "Discard younger/older primary did not find a 
decision\n"
 "Using discard-least-changes instead\n");
+   /* fall through */
case ASB_DISCARD_ZERO_CHG:
if (ch_peer == 0 && ch_self == 0) {
rv = test_bit(RESOLVE_CONFLICTS, 
_device->connection->flags)
@@ -2979,6 +2981,7 @@ static int drbd_asb_recover_0p(struct drbd_peer_device 
*peer_device) __must_hold
}
if (after_sb_0p == ASB_DISCARD_ZERO_CHG)
break;
+   /* else: fall through */
case ASB_DISCARD_LEAST_CHG:
if  (ch_self < ch_peer)
rv = -1;
-- 
2.7.4

Re: [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault

2018-07-02 Thread Yang Shi





On 7/2/18 10:57 AM, Michal Hocko wrote:

On Mon 02-07-18 10:24:27, Yang Shi wrote:


On 7/2/18 6:37 AM, Michal Hocko wrote:

On Mon 02-07-18 15:33:11, Laurent Dufour wrote:

On 02/07/2018 14:45, Michal Hocko wrote:

On Mon 02-07-18 14:26:09, Laurent Dufour wrote:

On 02/07/2018 14:15, Michal Hocko wrote:

[...]

We already do have a model for that. Have a look at MMF_UNSTABLE.

MMF_UNSTABLE is a mm's flag, here this is a VMA's flag which is checked.

Yeah, and we have the VMA ready for all places where we do check the
flag. check_stable_address_space can be made to get vma rather than mm.

Yeah, this would have been more efficient to check that flag at the beginning
of the page fault handler rather than the end, but this way it will be easier
to handle the speculative page fault too ;)

The thing is that it doesn't really need to be called earlier. You are
not risking data corruption on file backed mappings.

OK, I just think it could save a few cycles to check the flag earlier.

This should be an extremely rare case. Just think about it. It should
only ever happen when an access races with munmap which itself is
questionable if not an outright bug.


If nobody think it is necessary, we definitely could re-use
check_stable_address_space(),

If we really need this whole VM_DEAD thing then it should be better
handled at the same place rather than some ad-hoc places.


just return VM_FAULT_SIGSEGV for VM_DEAD vma,
and check for both shared and non-shared.

Why would you even care about shared mappings?


Just thought about we are dealing with VM_DEAD, which means the vma will 
be tore down soon regardless it is shared or non-shared.


MMF_UNSTABLE doesn't care about !shared case.

[PATCH v2 1/2] arm64: dts: qcom: pm8998: Add spmi-temp-alarm node

2018-07-02 Thread Matthias Kaehlcke

This adds the spmi-temp-alarm node to pm8998 based on the examples in the
bindings.

Signed-off-by: Matthias Kaehlcke 
---
Changes in v2:
- none

 arch/arm64/boot/dts/qcom/pm8998.dtsi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi 
b/arch/arm64/boot/dts/qcom/pm8998.dtsi
index 92bed1e7d4bb..2f4989e7ef68 100644
--- a/arch/arm64/boot/dts/qcom/pm8998.dtsi
+++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi
@@ -11,6 +11,13 @@
#address-cells = <1>;
#size-cells = <0>;
 
+   pm8998_temp: qcom,temp-alarm@2400 {
+   compatible = "qcom,spmi-temp-alarm";
+   reg = <0x2400 0x100>;
+   interrupts = <0x0 0x24 0x0 IRQ_TYPE_EDGE_RISING>;
+   #thermal-sensor-cells = <0>;
+   };
+
pm8998_gpio: gpios@c000 {
compatible = "qcom,pm8998-gpio", "qcom,spmi-gpio";
reg = <0xc000>;
-- 
2.18.0.399.gad0ab374a1-goog

Re: [PATCH v3] add param that allows bootline control of hardened usercopy

2018-07-02 Thread Kees Cook

On Sat, Jun 30, 2018 at 1:43 PM, Christoph von Recklinghausen
 wrote:
> The last issue I'm chasing is build failures on ARCH=m68k. The error is
> atomic_read and friends needed by the jump label code not being found.
> The config has CONFIG_BROKEN_ON_SMP=y, so the jump label calls I added
> will only be made #ifndef CONFIG_BROKEN_ON_SMP. Do you think that's
> worth a mention in the blurb that's added to
> Documentation/admin-guide/kernel-parameters.txt?

Uhm, that's weird -- I think the configs on m68k need fixing then? I
don't want to have to sprinkle that ifdef in generic code.

How are other users of static keys and jump labels dealing with m68k weirdness?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH 0/2] tools/memory-model: remove ACCESS_ONCE()

2018-07-02 Thread Paul E. McKenney

On Thu, Jun 28, 2018 at 06:51:11PM +0200, Andrea Parri wrote:
> > 1bc179880fba docs: atomic_ops: Describe atomic_set as a write operation
> > 
> > The above patches need at least one additional Acked-by
> > or Reviewed-by.  If any of you gets a chance, please do
> > look them over.
> 
> Glad this came out. ;-)
> 
> No objection to the patch: feel free to add my Reviewed-by: tag.

Done, thank you!

> (BTW, atomic_set() would be better mapped to WRITE_ONCE()... in fact, to
>  be fair, some archs do it the __asm__ __volatile__() way).
> 
> I do however have some suggestions concerning "the process":  searching
> LKML for the patch and the related discussion, I could only find:
> 
>   [PATCH] docs: atomic_ops: atomic_set is a write (not read) operation
> 
> and I realize that none of the person Cc:-ed in this thread, except you,
> were Cc:-ed in that discussion (in compliance with get_maintainer.pl).
> 
> My suggestions:
> 
>   1) Merge the file touched by that patch into (the recently created):
>   
> Documentation/atomic_t.txt
> 
>  (FWIW, queued in my TODO list).

Some consolidation of documentation would be good.  ;-)

Thoughts from others?

>   2) Add the entry:
> 
>   F: Documentation/atomic_t.txt
> 
>  to the "ATOMIC INFRASTRUCTURE" subsystem in the MAINTAINERS file so
>  that developers can easily find (the intended?) reviewers for their
>  patch. (Of course, this will need ACK from the ATOMIC people).

If the merging will take awhile, it might also be good to put
Documentation/core-api/atomic_ops.rst somewhere as well.

Thanx, Paul

Re: [PATCH v2] mm: teach dump_page() to correctly output poisoned struct pages

2018-07-02 Thread Michal Hocko

On Mon 02-07-18 14:05:36, Pavel Tatashin wrote:
[...]
>  void __dump_page(struct page *page, const char *reason)
>  {
> + bool page_poisoned = PagePoisoned(page);
> + int mapcount;
> +
> + /*
> +  * If struct page is poisoned don't access Page*() functions as that
> +  * leads to recursive loop. Page*() check for poisoned pages, and calls
> +  * dump_page() when detected.
> +  */
> + if (page_poisoned) {
> + pr_emerg("page:%px is uninitialized and poisoned", page);
> + goto hex_only;
> + }

Thanks for the updated comment. Exactly what I was looking for!
-- 
Michal Hocko
SUSE Labs

Re: [v7, 02/10] counter: Documentation: Add Generic Counter sysfs documentation

2018-07-02 Thread David Lechner


On 06/21/2018 04:07 PM, William Breathitt Gray wrote:

This patch adds standard documentation for the userspace sysfs
attributes of the Generic Counter interface.

Reviewed-by: Jonathan Cameron 
Signed-off-by: William Breathitt Gray 
---
  Documentation/ABI/testing/sysfs-bus-counter | 230 
  MAINTAINERS |   1 +
  2 files changed, 231 insertions(+)
  create mode 100644 Documentation/ABI/testing/sysfs-bus-counter

diff --git a/Documentation/ABI/testing/sysfs-bus-counter 
b/Documentation/ABI/testing/sysfs-bus-counter
new file mode 100644
index ..0bb0dce67bf8
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-counter
@@ -0,0 +1,230 @@
+What:  /sys/bus/counter/devices/counterX/countY/count
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Count data of Count Y represented as a string.
+
+What:  /sys/bus/counter/devices/counterX/countY/ceiling
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Count value ceiling for Count Y. This is the upper limit for the
+   respective counter.
+
+What:  /sys/bus/counter/devices/counterX/countY/floor
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Count value floor for Count Y. This is the lower limit for the
+   respective counter.
+
+What:  /sys/bus/counter/devices/counterX/countY/count_mode
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Count mode for channel Y. The ceiling and floor values for
+   Count Y are used by the count mode where required. The following
+   count modes are available:
+
+   Normal:
+   Counting is continuous in either direction.
+
+   Range Limit:
+   An upper or lower limit is set, mimicking limit switches
+   in the mechanical counterpart. The upper limit is set to
+   the Count Y ceiling value, while the lower limit is set
+   to the Count Y floor value. The counter freezes at
+   count = ceiling when counting up, and at count = floor
+   when counting down. At either of these limits, the
+   counting is resumed only when the count direction is
+   reversed.
+
+   Non-Recycle:
+   The counter is disabled whenever a counter overflow or
+   underflow takes place. The counter is re-enabled when a
+   new count value is loaded to the counter via a preset
+   operation or direct write.
+
+   Modulo-N:
+   A count value boundary is set between the Count Y floor
+   value and the Count Y ceiling value. The counter is
+   reset to the Cunt Y floor value at count = ceiling when
+   counting up, while the counter is set to the Count Y
+   ceiling value at count = floor when counting down; the
+   counter does not freeze at the boundary points, but
+   counts continuously throughout.
+


This might be a bit misleading since the values returned are all lower case,
e.g. "rising edge", whereas they are listed here in Title Case.


+What:  /sys/bus/counter/devices/counterX/countY/count_mode_available
+What:  /sys/bus/counter/devices/counterX/countY/error_noise_available
+What:  /sys/bus/counter/devices/counterX/countY/function_available
+What:  
/sys/bus/counter/devices/counterX/countY/signalZ_action_available
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Discrete set of available values for the respective Count Y
+   configuration are listed in this file. Values are delineated by
+   newline characters.
+
+What:  /sys/bus/counter/devices/counterX/countY/direction
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Read-only attribute that indicates the count direction of Count
+   Y. Two count directions are available: forward and backward.
+
+   Some counter devices are able to determine the direction of
+   their counting. For example, quadrature encoding counters can
+   determine the direction of movement by evaluating the leading
+   phase of the respective A and B quadrature encoding signals.
+   This attribute exposes such count directions.
+
+What:  /sys/bus/counter/devices/counterX/countY/enable
+KernelVersion: 4.19
+Contact:   linux-...@vger.kernel.org
+Description:
+   Whether channel Y counter is enabled. Valid attribute values are
+

Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

2018-07-02 Thread Linus Torvalds

On Mon, Jul 2, 2018 at 3:23 AM Benjamin Herrenschmidt
 wrote:
>
> Let me try one last ditch attempt to convince you using maybe a
> different perspective: this is how sysfs is intended to work and how
> the device model already does everywhere else except the gluedirs.

So don't get me wrong. I don't think that patch is _wrong_. It may
well be the best thing we can do now.

I just think some of the arguments for the patch are bogus.

I still think that the auto-cleanup is actually a good thing in
general. Not because it simplifies things (which it can do), but
because it fundamentally *allows* you to use less locking.

And that ends up being my real reason to not like your patch all that
much. It depends on

 (a) an extra reference count

 (b) on fairly heavy-handed locking (ie the whole "lock on release
too, not just on allocation").

and I think both of those are non-optimal.

So:

> The actual lifetime of the struct device is handled *separately* by
> device_get/put which are just wrappers on kobject_get/put.

I agree that that is the end result of your patch (and perhaps the
buggy intent of the original code).

I just don't necessarily agree it's a great thing in general. It
basically uses sysfs visibility as an argument for why lifetimes
should differ from the refcounted lifetimes.

And that may be a practical argument, but I don't think it's a very
good argument in general. I think it would arguably be much better to
be less serialized, and depend on refcounting more, and make less of a
deal about the sysfs visibility.

For example, if we *really* add back the exact same device immediately
after removing it, and the device was still in use somehow (ie the
refcount never went down to zero), maybe we really should not have
reused the same name? Or if we do re-use the same name, maybe we
should have re-used the device node itself?

See what I'm saying? I understand where your patch is coming from, but
I am suspicious of the fact that it seems to want to re-use a name
(but not the object) that is by definition still in use.

Maybe that's the right thing in this case. Considering that we have a
real oops and a real problem, and I don't have an alternative patch, I
guess the "patches talk, bullshit walks" rule applies.

Linus

Re: hrtimer become inaccurate with RT patch

2018-07-02 Thread John Stultz

On Mon, Jul 2, 2018 at 2:34 AM, gengdongjiu  wrote:
> Hi Thomas/Anna/John,
>
>   Recently I found that the hrtimer become inaccurate when there is a RT
> process runs on the same cpu core, and the kernel has applied preempt_rt
> patch.
>   The Linux kernel version is v4.1.46, and the preempt_rt patch is
> patch-4.1.46-rt52.patch.
>   I know that in the preempt_rt environment the interrupt handlers no
> longer run in interrupt context but in process context, so that RT
> process will not be interrupt. But if the hrtimer is also runs in
> process context the timer is useless when it's inaccurate. so I want to
> consult you whether this is expected behavior? whether is reasonable to move 
> the timer IRQ
> handling to a thread?

I've not looked at the PREEMPT_RT code in a long time, but years ago
there was a tension in that there is not an easy way track ownership
of timers. Thus timers all fired at the same priority of the hrtimer
irq thread. This thread could be moved up or down in priority, but the
problem was all timers would fire with the same priority. So either
the thread priority was so high that low-priority process could
generate a bunch of timers which would interrupt higher priority
tasks, or the thread priority was lower, so a high priority task could
block all timers.

There was some handwavy talk of trying to keep per-process timer
lists, so the hrtimer irq could still be in irq context but the firing
logic it didn't do anything but mark its task as runnable and do the
the actual timer firing logic before we eventually run the task (in
proper rt priority order), in a fashion similar to signals. But I'm
not sure if any attempts were made in that direction. I also think it
was an open question if there's any logic in kernel that depend on
strict in-order kernel timer processing, so its possible there could
be odd inversion issues where high priority timer logic is waiting on
/expecting lower priority timers to fire, etc, so its probably an area
of research.

thanks
-john

Re: [PATCH v3 0/8] gnss: add new GNSS subsystem

2018-07-02 Thread Pavel Machek

On Fri 2018-06-29 14:09:14, Johan Hovold wrote:
> On Fri, Jun 29, 2018 at 02:05:54PM +0200, Pavel Machek wrote:
> > On Fri 2018-06-29 13:46:46, Johan Hovold wrote:
> > > On Fri, Jun 29, 2018 at 11:46:07AM +0200, Pavel Machek wrote:
> > > > 
> > > > > > Finally, note that documentation (including kerneldoc) remains to be
> > > > > > written, but hopefully this will not hinder review given that the
> > > > > > current interfaces are fairly self-describing.
> > > > > 
> > > > > This all looks great.  Thanks for doing this work and adding a new
> > > > > subsystem for something that has been asked for for many years.
> > > > > 
> > > > > All now merged in my tree, nice job!
> > > > 
> > > > I don't think discussion was finished on this one.
> > > > 
> > > > In particular, we agreed that /dev/gnssrawX would be better device
> > > > name, so that we still have place where to put proper abstraction
> > > > layer in future.
> > > 
> > > I did not agree with you on that. I said we could consider that name if
> > > this was to be changed at all, which I do not think is necessary for
> > > the reasons spelled out in this thread.
> > 
> > So, again: there's nothing gnss specific in those patches. It does not
> > know about the format of the data passed around. (Best you can claim
> > that somehow data flow characteristics are unique to gnss.) And this
> > takes namespace needed for real gnss subsystem. Please don't do it.
> 
> This is the real gnss subsystem. Get over it.

Congratulations. You have created gnss subsystem that has 0 lines of
code that are gnss-specific.

This is not real gnss subsystem. This is pipe that passes data,
similar to /dev/psaux or mouse on /dev/ttyS0. Sooner or later, real
gnss subsystem (with unified interface) will be needed, as it was for
input, and this "pipe and gpio" thing should not hog required
namespace.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: hrtimer become inaccurate with RT patch

2018-07-02 Thread Sebastian Andrzej Siewior

On 2018-07-02 19:19:07 [+0800], gengdongjiu wrote:
> Hi Sebastian ,
Hi gengdongjiu,

> > the 4.1 series is no longer supported (neither RT wise nor non-RT,
> > https://www.kernel.org/category/releases.html). I suggest to move away.
> > If you notice this problem now it is hardly a long running project.
> yes, I Know, but we found the latest RT 4.14 series also has the same problem,
> so this is common issue.
This does not change what I wrote regarding the v4.1 series. Also you
could have mention v4.14 instead v4.1 if you really tested on v4.14.

> >> process will not be interrupt. But if the hrtimer is also runs in
> >> process context the timer is useless when it's inaccurate. so I want to
> >> consult you whether this is expected behavior? whether is reasonable to 
> >> move the timer IRQ
> >> handling to a thread?
> > 
> > This depends on your expectations. The timer is defined not to fire
> > before the programmed time. So it fires as soon as possible _after_ the
> > programmed time.
> It is reasonable that the timer is defined not to fire before the programmed 
> time.
> but we found it fires long _after_ the programmed time. For example, we 
> define it to
> fire after 2s, but it will fire after 5s, so it is very later than the 
> expectations. 

under normal circumstances I would expect to have a few µs delay due to
wakeup of the softirq thread. Not seconds. This is either broken HW or a
long running RT thread which blocks the expected execution.

> I think the reason may be
> that the timer handler thread is preempted by another higher priority thread. 
> so from for this issue,
> the timer handler should be in IRQ context instead of the process context or 
> increase the timer handler thread priority, right?

speculating on what is going on and acting based on speculation is one
way to handle situation. You could also enable tracing to see
- when does the timer fire
- when does the thread wake up
- when does is timer's function start / complete

and then you know what *really* causes the delay. The hrtimer and sched
tracepoints should provide enough information. Based on that you can
figure out if it is wise the toggle the irqsave flag or change something
else so that the system does not run ~3sec RT secs without a break.

Sebastian

[PATCH] fixup! firmware: raspberrypi: Remove VLA usage

2018-07-02 Thread Eric Anholt

Kees - with this fix to your patch, the kernel boots again (otherwise,
the FW would try to parse the uninitialized bits of stack and throw
errors).  If you're good with me squashing this in, I'll do so and
send it to -next.

Signed-off-by: Eric Anholt 
---
 drivers/firmware/raspberrypi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c
index b80f15214b73..a200a2174611 100644
--- a/drivers/firmware/raspberrypi.c
+++ b/drivers/firmware/raspberrypi.c
@@ -162,7 +162,7 @@ int rpi_firmware_property(struct rpi_firmware *fw,
memcpy(data + sizeof(struct rpi_firmware_property_tag_header),
   tag_data, buf_size);
 
-   ret = rpi_firmware_property_list(fw, , sizeof(data));
+   ret = rpi_firmware_property_list(fw, , buf_size + sizeof(*header));
memcpy(tag_data,
   data + sizeof(struct rpi_firmware_property_tag_header),
   buf_size);
-- 
2.18.0

Re: [PATCH v2] mm: teach dump_page() to correctly output poisoned struct pages

2018-07-02 Thread Andrew Morton

On Mon,  2 Jul 2018 14:05:36 -0400 Pavel Tatashin  
wrote:

> If struct page is poisoned, and uninitialized access is detected via
> PF_POISONED_CHECK(page) dump_page() is called to output the page. But,
> the dump_page() itself accesses struct page to determine how to print
> it, and therefore gets into a recursive loop.
> 
> For example:
> dump_page()
>  __dump_page()
>   PageSlab(page)
>PF_POISONED_CHECK(page)
> VM_BUG_ON_PGFLAGS(PagePoisoned(page), page)
>  dump_page() recursion loop.
> 
> Fixes: f165b378bbdf ("mm: uninitialized struct page poisoning sanity 
> checking")
> 
> Signed-off-by: Pavel Tatashin 
> Acked-by: Michal Hocko 

Thanks.  I added a cc:stable to make sure this gets into 4.17.x.

Re: [PATCH v4] ata: Remove depends on HAS_DMA in case of platform dependency

2018-07-02 Thread Geert Uytterhoeven

Hi Tejun,

On Mon, Jul 2, 2018 at 6:12 PM Tejun Heo  wrote:
> On Fri, Jun 22, 2018 at 01:03:07PM +0200, Geert Uytterhoeven wrote:
> > Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
> > symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
> > In most cases this other symbol is an architecture or platform specific
> > symbol, or PCI.
> >
> > Generic symbols and drivers without platform dependencies keep their
> > dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
> > cannot work anyway.
> >
> > This simplifies the dependencies, and allows to improve compile-testing.
> >
> > Signed-off-by: Geert Uytterhoeven 
> > Reviewed-by: Mark Brown 
> > Acked-by: Robin Murphy 
>
> This looks fine to me but how do you wanna route it?  Should I apply
> it to for-4.18-fixes or should it go with arch changes in some other
> tree?

Please queue it in the ata tree. There are no related arch changes.
Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v3] x86/e820: put !E820_TYPE_RAM regions into memblock.reserved

2018-07-02 Thread Pavel Tatashin

> So I expect this patch needs a cc:stable, which I'll add.
>
> The optimiation patch seems less important and I'd like to hold that
> off for 4.19-rc1?

Hi Andrew,

Should I resend the optimization patch [1] once 4.18 is released, or
will you include it, and I do not need to do anything?

[1] http://lkml.kernel.org/r/20180615155733.1175-1-pasha.tatas...@oracle.com

Thank you,
Pavel

Re: [RFC PATCH] ata: ahci: Enable DEVSLP by default on x86 modern standby platform

2018-07-02 Thread Tejun Heo

On Mon, Jul 02, 2018 at 12:08:45PM -0700, Srinivas Pandruvada wrote:
> From: Srinivas 
> 
> One of the requirement for modern x86 system to enter lowest power mode
> (SLP_S0) is SATA IP block to be off. This is true even during when
> platform is suspended to idle and not only in opportunistic (runtime)
> suspend.
> 
> Several of these system don't have traditional ACPI S3, so it is
> important that they enter SLP_S0 state, to avoid draining battery even
> during suspend even with out of the box Linux installation.
> 
> SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here
> user has to either use scsi-host sysfs or tools like powertop to set
> the sata-host link_power_management_policy to min_power.
> 
> This change sets by default link power management policy to min_power
> for some platforms.  To avoid regressions, the following conditions
> are used:
> - The kernel config is already set to use med_power_with_dipm or deeper
> - System is a modern standby system using ACPI low power idle flag
> - The platform is not blacklisted for Suspend to Idle and suspend
> to idle is used instead of S3
> This combination will make sure that systems are fairly recent and
> since getting shipped with modern standby standby, the DEVSLP function
> is already validated.
> 
> Signed-off-by: Srinivas Pandruvada 

Seems sane to me.  Hans, what do you think?

Thanks.

-- 
tejun

[PATCH v2 2/7] tracing: Split up onmatch action data

2018-07-02 Thread Tom Zanussi

From: Tom Zanussi 

Currently, the onmatch action data binds the onmatch action to data
related to synthetic event generation.  Since we want to allow the
onmatch handler to potentially invoke a different action, and because
we expect other handlers to generate synthetic events, we need to
separate the data related to these two functions.

Also rename the onmatch data to something more descriptive.

Signed-off-by: Tom Zanussi 
---
 kernel/trace/trace_events_hist.c | 59 
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index ede7a27fa52b..61a3a8ca2f76 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -347,13 +347,14 @@ struct action_data {
unsigned intn_params;
char*params[SYNTH_FIELDS_MAX];
 
+   unsigned intvar_ref_idx;
+   struct synth_event  *synth_event;
+
union {
struct {
-   unsigned intvar_ref_idx;
-   char*match_event;
-   char*match_event_system;
-   struct synth_event  *synth_event;
-   } onmatch;
+   char*event;
+   char*event_system;
+   } match_data;
 
struct {
char*var_str;
@@ -1001,9 +1002,9 @@ static void action_trace(struct hist_trigger_data 
*hist_data,
 struct ring_buffer_event *rbe,
 struct action_data *data, u64 *var_ref_vals)
 {
-   struct synth_event *event = data->onmatch.synth_event;
+   struct synth_event *event = data->synth_event;
 
-   trace_synth(event, var_ref_vals, data->onmatch.var_ref_idx);
+   trace_synth(event, var_ref_vals, data->var_ref_idx);
 }
 
 struct hist_var_data {
@@ -1573,8 +1574,8 @@ find_match_var(struct hist_trigger_data *hist_data, char 
*var_name)
struct action_data *data = hist_data->actions[i];
 
if (data->handler == HANDLER_ONMATCH) {
-   char *system = data->onmatch.match_event_system;
-   char *event_name = data->onmatch.match_event;
+   char *system = data->match_data.event_system;
+   char *event_name = data->match_data.event;
 
file = find_var_file(tr, system, event_name, var_name);
if (!file)
@@ -3521,15 +3522,15 @@ static void onmatch_destroy(struct action_data *data)
 
mutex_lock(_event_mutex);
 
-   kfree(data->onmatch.match_event);
-   kfree(data->onmatch.match_event_system);
+   kfree(data->match_data.event);
+   kfree(data->match_data.event_system);
kfree(data->action_name);
 
for (i = 0; i < data->n_params; i++)
kfree(data->params[i]);
 
-   if (data->onmatch.synth_event)
-   data->onmatch.synth_event->ref--;
+   if (data->synth_event)
+   data->synth_event->ref--;
 
kfree(data);
 
@@ -3611,8 +3612,8 @@ trace_action_find_var(struct hist_trigger_data *hist_data,
hist_field = find_target_event_var(hist_data, system, event, var);
if (!hist_field) {
if (!system && data->handler == HANDLER_ONMATCH) {
-   system = data->onmatch.match_event_system;
-   event = data->onmatch.match_event;
+   system = data->match_data.event_system;
+   event = data->match_data.event;
}
 
hist_field = find_event_var(hist_data, system, event, var);
@@ -3651,8 +3652,8 @@ trace_action_create_field_var(struct hist_trigger_data 
*hist_data,
 * event.
 */
if (!system && data->handler == HANDLER_ONMATCH) {
-   system = data->onmatch.match_event_system;
-   event = data->onmatch.match_event;
+   system = data->match_data.event_system;
+   event = data->match_data.event;
}
 
/*
@@ -3764,8 +3765,8 @@ static int trace_action_create(struct hist_trigger_data 
*hist_data,
goto err;
}
 
-   data->onmatch.synth_event = event;
-   data->onmatch.var_ref_idx = var_ref_idx;
+   data->synth_event = event;
+   data->var_ref_idx = var_ref_idx;
  out:
return ret;
  err:
@@ -3855,14 +3856,14 @@ static struct action_data *onmatch_parse(struct 
trace_array *tr, char *str)
goto free;
}
 
-   data->onmatch.match_event = kstrdup(match_event, GFP_KERNEL);
-   if (!data->onmatch.match_event) {
+   data->match_data.event = kstrdup(match_event, GFP_KERNEL);

Re: [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields

2018-07-02 Thread John Hubbard

On 07/02/2018 02:53 AM, Jan Kara wrote:
> On Sun 01-07-18 17:56:53, john.hubb...@gmail.com wrote:
>> From: John Hubbard 
>>
> ...
> 
>> @@ -904,12 +907,24 @@ static inline void get_page(struct page *page)
>>   */
>>  VM_BUG_ON_PAGE(page_ref_count(page) <= 0, page);
>>  page_ref_inc(page);
>> +
>> +if (unlikely(PageDmaPinned(page)))
>> +__get_page_for_pinned_dma(page);
>>  }
>>  
>>  static inline void put_page(struct page *page)
>>  {
>>  page = compound_head(page);
>>  
>> +/* Because the page->dma_pinned_* fields are unioned with
>> + * page->lru, there is no way to do classical refcount-style
>> + * decrement-and-test-for-zero. Instead, PageDmaPinned(page) must
>> + * be checked, in order to safely check if we are allowed to decrement
>> + * page->dma_pinned_count at all.
>> + */
>> +if (unlikely(PageDmaPinned(page)))
>> +__put_page_for_pinned_dma(page);
>> +
> 
> These two are just wrong. You cannot make any page reference for
> PageDmaPinned() account against a pin count. First, it is just conceptually
> wrong as these references need not be long term pins, second, you can
> easily race like:
> 
> PinnerRandom process
>   get_page(page)
> pin_page_for_dma()
>   put_page(page)
>-> oops, page gets unpinned too early
> 

I'll drop this approach, without mentioning any of the locking that is hiding in
there, since that was probably breaking other rules anyway. :) Thanks for your
patience in reviewing this.

> So you really have to create counterpart to get_user_pages() - like
> put_user_page() or whatever... It is inconvenient to have to modify all GUP
> users but I don't see a way around that. 

OK, there will be a long-ish pause, while I go visit all the gup sites. I count 
about
88 callers, which is not nearly as crazy as my first casual grep showed, but 
still
quite a chunk, since I have to track down where each one does its put_page 
call(s).

It's definitely worth the effort, though. These pins just plain need some 
special
handling in order to get everything correct.


thanks,
-- 
John Hubbard
NVIDIA

Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping

2018-07-02 Thread Andrew Morton

On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko  wrote:

> On Fri 29-06-18 20:15:47, Andrew Morton wrote:
> [...]
> > Would one of your earlier designs have addressed all usecases?  I
> > expect the dumb unmap-a-little-bit-at-a-time approach would have?
> 
> It has been already pointed out that this will not work.

I said "one of".  There were others.

> You simply
> cannot drop the mmap_sem during unmap because another thread could
> change the address space under your feet. So you need some form of
> VM_DEAD and handle concurrent and conflicting address space operations.

Unclear that this is a problem.  If a thread does an unmap of a range
of virtual address space, there's no guarantee that upon return some
other thread has not already mapped new stuff into that address range. 
So what's changed?

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Andy Lutomirski




> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers 
>  wrote:
> 
> - On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers 
> mathieu.desnoy...@efficios.com wrote:
> 
>> - On Jul 2, 2018, at 7:06 PM, Linus Torvalds 
>> torva...@linux-foundation.org
>> wrote:
>> 
>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>>  wrote:
 
 Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
 with single-copy atomicity. Therefore, we want 32-bit user-space to 
 initialize
 the padding with 0, and only update the low bits with single-copy 
 atomicity.
>>> 
>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>> 
>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>> writing the same value to them (zero) anyway.
>>> 
>>> So who cares if the write ends up being two instructions, because the
>>> write to the upper bits doesn't actually *do* anything.
>>> 
>>> Hmm?
>> 
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>> 
>> I don't worry whether the upper bits get updated or how, but I really care
>> about not having store tearing of the low bits update.
> 
> For the records, most updates of those low bits are done in assembly
> from critical sections, for which we control exactly how the update is
> performed.
> 
> However, there is one helper function in user-space that updates that value
> from C through a volatile store, e.g.:
> 
> static inline void rseq_prepare_unload(void)
> {
>__rseq_abi.rseq_cs = 0;
> }

How about making the field be:

union {
 __u64 rseq_cs;
 struct {
   __u32 rseq_cs_low;
   __u32 rseq_cs_high;
 };
};

32-bit user code that cares about performance can just write to rseq_cs_low 
because it already knows that rseq_cs_high == 0.

The header could even supply a static inline helper write_rseq_cs() that 
atomically writes a pointer and just does the right thing for 64-bit, for 
32-bit BE, and for 32-bit LE.

I think the union really is needed because we can’t rely on user code being 
built with -fno-strict-aliasing.  Or the helper could use inline asm.

Anyway, the point is that we get optimal code generation (a single instruction 
write of the correct number of bits) without any compat magic in the kernel.

> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Christopher Lameter

On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:

> >
> > Platforms with 32 bit word size only guarantee atomicity of a 32 bit
> > write or RMV instruction.
> >
> > Special instructions may exist on a platform to perform 64 bit atomic
> > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
> > atomicity8.
> >
> > So use the macros that we have to guarantee 64 bit ops and you should be
> > fine. See linux/arch/x86/include/asm/atomic64_32.h
>
> We are talking about user-space here. What we need is a single instruction
> atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
> is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
> on a 32-bit architecture should be considered to provide single-copy atomicity
> on the low 32 bits.

Right. You would need to make this work for userspace. atomic64_32.h is a
good reference as to which instructions provide 64 bit atomicity on 32
bit platforms.

Re: Build regressions/improvements in v4.18-rc3

2018-07-02 Thread Michael Ellerman

Helge Deller  writes:

> On 02.07.2018 16:09, Geert Uytterhoeven wrote:
>> On Mon, Jul 2, 2018 at 4:01 PM Geert Uytterhoeven  
>> wrote:
>>> JFYI, when comparing v4.18-rc3[1] to v4.18-rc2[3], the summaries are:
>> ...
>
> Both of the following are simply happening because of old compiler which is 
> being used:
>
>>> [Deleted 26903 lines about "warning: ... [-Wpointer-sign]" on 
>>> parisc-allmodconfig]

It's GCC 4.6.3. Are you saying that's not supported anymore?

I can update to 8.1.0 if that's more useful.

cheers

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Mathieu Desnoyers

- On Jul 2, 2018, at 8:35 PM, Chris Lameter c...@linux.com wrote:

> On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:
> 
>> >
>> > Platforms with 32 bit word size only guarantee atomicity of a 32 bit
>> > write or RMV instruction.
>> >
>> > Special instructions may exist on a platform to perform 64 bit atomic
>> > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
>> > atomicity8.
>> >
>> > So use the macros that we have to guarantee 64 bit ops and you should be
>> > fine. See linux/arch/x86/include/asm/atomic64_32.h
>>
>> We are talking about user-space here. What we need is a single instruction
>> atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
>> is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
>> on a 32-bit architecture should be considered to provide single-copy 
>> atomicity
>> on the low 32 bits.
> 
> Right. You would need to make this work for userspace. atomic64_32.h is a
> good reference as to which instructions provide 64 bit atomicity on 32
> bit platforms.

We only need to update a pointer, so we don't need 64-bit atomicity on
32-bit processes.

What we need is to ensure single-copy atomicity of the 32-bit pointer update
on the 32-bit process in a field read from the kernel as a __u64.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[PATCH 2/2] x86/build/vdso: simplify cmd_vdso2c

2018-07-02 Thread Masahiro Yamada

No reason to use 'define' directive here.  Just use the = operator.

Signed-off-by: Masahiro Yamada 
---

 arch/x86/entry/vdso/Makefile | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 261802b..b9ed1aa 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -58,9 +58,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include 
-I$(srctree)/include/uapi -I$(src
 hostprogs-y+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
-define cmd_vdso2c
-   $(obj)/vdso2c $< $(<:%.dbg=%) $@
-endef
+  cmd_vdso2c = $(obj)/vdso2c $< $(<:%.dbg=%) $@
 
 $(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c FORCE
$(call if_changed,vdso2c)
-- 
2.7.4

[PATCH 0/2] x86/build/vdso: a little more Makefile cleanups

2018-07-02 Thread Masahiro Yamada



At first, I sent the first patch to UML ML,
but they did not pick it up.
Instead, I was able to get Acked-by from Richard,
one of the UML maintainers.

https://patchwork.kernel.org/patch/10399787/

I am resending it to x86 ML.



Masahiro Yamada (2):
  x86/build/vdso: remove unused vdso-syms.lds
  x86/build/vdso: simplify cmd_vdso2c

 arch/x86/entry/vdso/Makefile |  4 +---
 arch/x86/um/vdso/.gitignore  |  1 -
 arch/x86/um/vdso/Makefile| 16 
 3 files changed, 1 insertion(+), 20 deletions(-)

-- 
2.7.4

[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map

2018-07-02 Thread Sai Praneeth Prakhya

From: Sai Praneeth 

efi_memmap_install(), unmaps the existing memory map and installs a new
memory map but doesn't free the memory allocated to the existing
memory map. Fortunately, the details about the existing memory map (like
the physical address, number of entries and type of memory) are
stored in efi.memmap. Hence, use them to free the memory.

In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to
install a new memory map, instead we use efi_memmap_init_late(). Hence,
free existing memory map there too before installing a new memory map.

Generally, memory for new memory map is allocated using
efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using
realloc_pages() [please see efi_map_regions()]. So, it's OK to free this
memory using efi_memmap_free() in efi_free_boot_services().

Also, note that the first time efi_free_memmap() is called either from
efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot
sequence], we are actually freeing memblock_reserved memory which isn't
allocated by efi_memmap_alloc(). So, there are two outliers where we use
efi_free_memmap() to free memory allocated through other sources
rather than efi_memmap_alloc().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c | 3 +++
 arch/x86/platform/efi/quirks.c  | 6 ++
 drivers/firmware/efi/fake_mem.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index cda54abf25a6..7756426e93b5 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void)
 * firmware via SetVirtualAddressMap().
 */
efi_memmap_unmap();
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
 
if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) {
pr_err("Failed to remap late EFI memory map\n");
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 11fa6ac9f0c2..11800f3cbb93 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memmap_insert(, new, );
early_memunmap(new, new_size);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_phys, num_entries, alloc_type);
 }
 
@@ -452,6 +455,9 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 82dcfa1c340b..a47754efb796 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -90,6 +90,9 @@ void __init efi_fake_memmap(void)
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type);
 
/* print new EFI memmap */
-- 
2.7.4

[PATCH 1/2] x86/build/vdso: remove unused vdso-syms.lds

2018-07-02 Thread Masahiro Yamada

This file contains symbol values, and was originally linked into
vmlinux, but I have no idea what it was actually used for.

Since commit 827880ec260b ("x86/um: thin archives build fix"), it is
not even linked.  Now it is completely orphan, and no problem has
been reported.  It is a proof that this file was not needed in the
first place.

Signed-off-by: Masahiro Yamada 
Acked-by: Richard Weinberger 
Acked-by: Ingo Molnar 
---

 arch/x86/um/vdso/.gitignore |  1 -
 arch/x86/um/vdso/Makefile   | 16 
 2 files changed, 17 deletions(-)

diff --git a/arch/x86/um/vdso/.gitignore b/arch/x86/um/vdso/.gitignore
index 9cac6d0..f8b69d8 100644
--- a/arch/x86/um/vdso/.gitignore
+++ b/arch/x86/um/vdso/.gitignore
@@ -1,2 +1 @@
-vdso-syms.lds
 vdso.lds
diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index b2d6967..822ccdb 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -53,22 +53,6 @@ $(vobjs): KBUILD_CFLAGS += $(CFL)
 CFLAGS_REMOVE_vdso-note.o = -pg -fprofile-arcs -ftest-coverage
 CFLAGS_REMOVE_um_vdso.o = -pg -fprofile-arcs -ftest-coverage
 
-targets += vdso-syms.lds
-extra-$(VDSO64-y)  += vdso-syms.lds
-
-#
-# Match symbols in the DSO that look like VDSO*; produce a file of constants.
-#
-sed-vdsosym := -e 's/^00*/0/' \
-   -e 's/^\([0-9a-fA-F]*\) . \(VDSO[a-zA-Z0-9_]*\)$$/\2 = 0x\1;/p'
-quiet_cmd_vdsosym = VDSOSYM $@
-define cmd_vdsosym
-   $(NM) $< | LC_ALL=C sed -n $(sed-vdsosym) | LC_ALL=C sort > $@
-endef
-
-$(obj)/%-syms.lds: $(obj)/%.so.dbg FORCE
-   $(call if_changed,vdsosym)
-
 #
 # The DSO images are built using a special linker script.
 #
-- 
2.7.4

Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64

2018-07-02 Thread Jia He




On 7/2/2018 7:40 PM, Michal Hocko Wrote:
> On Fri 29-06-18 10:29:17, Jia He wrote:
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") tried to optimize the loop in memmap_init_zone(). But
>> there is still some room for improvement.
> 
> It would be great to shortly describe those optimization from high level
> POV.
> 
>>
>> Patch 1 introduce new config to make codes more generic
>> Patch 2 remain the memblock_next_valid_pfn on arm and arm64
>> Patch 3 optimizes the memblock_next_valid_pfn()
>> Patch 4~6 optimizes the early_pfn_valid()
>>
>> As for the performance improvement, after this set, I can see the time
>> overhead of memmap_init() is reduced from 27956us to 13537us in my
>> armv8a server(QDF2400 with 96G memory, pagesize 64k).
> 
> So this is 13ms saving when booting 96G machine. Is this really worth
> the additional code? Are there any other benefits?
Sorry, Michal
I missed one thing.
This 13ms optimization is merely the result of my patch 3~6
Patch 1 is originated by Paul Burton in commit b92df1de5d289.
In its description,
===
James said "I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map.  The kernel boot time drops from 109 to
62 seconds. "
===

-- 
Cheers,
Jia
> [...]
>>  arch/arm/Kconfig  |  4 +++
>>  arch/arm/mm/init.c|  1 +
>>  arch/arm64/Kconfig|  4 +++
>>  arch/arm64/mm/init.c  |  1 +
>>  include/linux/early_pfn.h | 79 
>> +++
>>  include/linux/memblock.h  |  2 ++
>>  include/linux/mmzone.h| 18 ++-
>>  mm/Kconfig|  3 ++
>>  mm/memblock.c |  9 ++
>>  mm/page_alloc.c   |  5 ++-
>>  10 files changed, 124 insertions(+), 2 deletions(-)
>>  create mode 100644 include/linux/early_pfn.h
>

RE: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio struct

2018-07-02 Thread Anson Huang

Hi, Linus

Anson Huang
Best Regards!


> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Monday, July 2, 2018 7:42 PM
> To: Anson Huang 
> Cc: Ulf Hansson ; Masahiro Yamada
> ; Adrian Hunter
> ; evgr...@chromium.org; Shawn Lin
> ; Fabio Estevam ;
> linux-mmc ; linux-kernel@vger.kernel.org;
> dl-linux-imx 
> Subject: Re: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio
> struct
> 
> On Mon, Jul 2, 2018 at 3:32 AM Anson Huang 
> wrote:
> 
> > commit bfd694d5e21c ("mmc: core: Add tunable delay before detecting
> > card after card is inserted") adds
> > "u32 cd_debounce_delay_ms" to the last of mmc_gpio struct and cause
> > "char cd_label[0]" NOT work as string pointer of card detect label,
> > when "cat /proc/interrupts", the devname for card detect gpio is
> > incorrect as below:
> >
> > 144:  0  gpio-mxc  22 Edge  ▒
> > 161:  0  gpio-mxc   7 Edge  ▒
> >
> > Move the cd_label field down to fix this, and drop the zero from the
> > array size to prevent future similar bugs, the result is correct as
> > below:
> >
> > 144:  0  gpio-mxc  22 Edge  2198000.mmc cd
> > 161:  0  gpio-mxc   7 Edge  219.mmc cd
> >
> > Fixes: bfd694d5e21c ("mmc: core: Add tunable delay before detecting
> > card after card is inserted")
> > Signed-off-by: Anson Huang 
> > Tested-by: Fabio Estevam 
> 
> Curious.
> 
> > --- a/drivers/mmc/core/slot-gpio.c
> > +++ b/drivers/mmc/core/slot-gpio.c
> > @@ -27,8 +27,8 @@ struct mmc_gpio {
> > bool override_cd_active_level;
> > irqreturn_t (*cd_gpio_isr)(int irq, void *dev_id);
> > char *ro_label;
> > -   char cd_label[0];
> > u32 cd_debounce_delay_ms;
> > +   char cd_label[];
> 
> Not that I am the smartest in the world when it comes to how the C compilers
> work this out.
> 
> But isn't that equivalent to:
> char *cd_label;
> ?

I think C compiler will check structure if it has flexible array, the flexible 
array much be at end
of a structure, and its size can be dynamic changed by mallocing a memory for 
this structure,
there are many such case in kernel. If the flexible array is NOT at the end of 
a struct,
error " error: flexible array member not at end of struct" will come out by C 
compiler.

> 
> Look at this from drivers/mmc/core/slot-gpio.c:
> 
> ctx->ro_label = ctx->cd_label + len;
> ctx->cd_debounce_delay_ms = 200;
> snprintf(ctx->cd_label, len, "%s cd", dev_name(host->parent));
> snprintf(ctx->ro_label, len, "%s ro", dev_name(host->parent));
> host->slot.handler_priv = ctx;
> host->slot.cd_irq = -EINVAL;
> 
> So now that ro_label is char * that points len chars into the struct and that 
> is all
> fine as the ctx is allocated like that:
> 
> struct mmc_gpio *ctx = devm_kzalloc(host->parent,
>sizeof(*ctx) + 2 * len, GFP_KERNEL);
> 
> And I've seen this being done to allocate a state with some dynamic strings
> after it.
> 
> But I just don't like this, it seems fragile and now it broke.
 
I think the difference of current implementation and using two string pointers 
are,
current implementation can only alloc memory for mmc_gpio structure once, the 
ro_label's address
can be got by cd_label's address. If using two string pointers for ro_label and 
cd_label,
we have to alloc twice for them separately.

> 
> What about this:
> 
> From facb3799f479bfd4dad99c25c9c5d4c77b14c9b0 Mon Sep 17 00:00:00
> 2001
> From: Linus Walleij 
> Date: Mon, 2 Jul 2018 13:35:01 +0200
> Subject: [PATCH] mmc: slot-gpio: Allocate GPIO labels dynamically
> 
> The use of string pointers in the MMC slot GPIO context is pretty dubious,
> allocating some 2*len extra bytes for each label of the ro and wp pins.
> 
> Tidy this up using kasprintf() with dynamic allocation of labels for these 
> strings.
> 
> Signed-off-by: Linus Walleij 
> ---
>  drivers/mmc/core/slot-gpio.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c index
> ef05e0039378..c417801b366e 100644
> --- a/drivers/mmc/core/slot-gpio.c
> +++ b/drivers/mmc/core/slot-gpio.c
> @@ -27,7 +27,7 @@ struct mmc_gpio {
>  bool override_cd_active_level;
>  irqreturn_t (*cd_gpio_isr)(int irq, void *dev_id);
>  char *ro_label;
> -char cd_label[0];
> +char *cd_label;
>  u32 cd_debounce_delay_ms;
>  };
> 
> @@ -47,13 +47,18 @@ int mmc_gpio_alloc(struct mmc_host *host)  {
>  size_t len = strlen(dev_name(host->parent)) + 4;
 
len is no need any more.

>  struct mmc_gpio *ctx = devm_kzalloc(host->parent,
> -sizeof(*ctx) + 2 * len,GFP_KERNEL);
> +sizeof(*ctx), GFP_KERNEL);
> 
>  if (ctx) {
> -ctx->ro_label = ctx->cd_label + len;
>  ctx->cd_debounce_delay_ms = 200;
> -snprintf(ctx->cd_label, len, "%s cd", dev_name(host->parent));
> -snprintf(ctx->ro_label, len, "%s ro",

Re: [PATCH 0/2] serial: 8250_dw: add fractional divisor support

2018-07-02 Thread Jisheng Zhang

Hi,

On Mon, 2 Jul 2018 14:51:03 +0300 Andy Shevchenko wrote:

> On Mon, 2018-07-02 at 13:18 +0300, Andy Shevchenko wrote:
> > On Mon, 2018-07-02 at 18:04 +0800, Jisheng Zhang wrote:  
> > > For Synopsys DesignWare 8250 uart which version >= 4.00a, there's a
> > > valid divisor latch fraction register. The fractional divisor width
> > > is
> > > 4bits ~ 6bits.
> > >   
> > 
> > There are several serial IPs that have fractional divider built-in.
> > None
> > is using any specific hooks. Why do you need in your case, esp. taking
> > into consideration that we have a custom ->set_termios() callback?  
> 
> Okay, I see that in 8250 we have hooks embedded into 8250_port.c which
> is not the best solution.
> 
> For example it prevents better splitting Exar code.
> So, we would need these hooks, but better to integrate them in the same
> way like it's done for the rest of 8250 ones, i.e.
> - rename existing to have a "do" word
> - create new functions which would be a replacement that choose between
> "do" variant and custom one
> - not sure if we need to export "do" variants (at least for now)

So you mean add the support as following:

1.rename current serial8250_set_divisor as serial8250_do_set_divisor

2.add a new serial8250_set_divisor which will be as simple as

static void serial8250_set_divisor(struct uart_port *port, ...)
{
struct uart_8250_port *up = up_to_u8250p(port);

if (up->set_divisor)
up->set_divisor(...);
else
serial8250_do_set_divisor(...);
}

could you please confirm?

Another issue is I'm not sure which struct to add the hook,
struct uart_port or struct uart_8250_port? Currently, it seems that only
uart_8250_port needs this hook, sure, adding the hook to struct uart_port
is fine either. Could you please kindly give some suggestions?

Thanks,
Jisheng

Re: [PATCH] gcc-plugins: remove unused GCC_PLUGIN_SUBDIR

2018-07-02 Thread Masahiro Yamada

Hi Kees,

2018-07-03 11:18 GMT+09:00 Kees Cook :
> On Mon, Jul 2, 2018 at 5:39 PM, Masahiro Yamada
>  wrote:
>> GCC_PLUGIN_SUBDIR has never been used.  If you really need this in
>> the future, please re-add it then.
>>
>> For now, the code is unused. Remove.
>>
>> 'export HOSTLIBS' is not necessary either.
>>
>> Signed-off-by: Masahiro Yamada 
>
> Acked-by: Kees Cook 
>
> Is this going via your tree, or should I take it via the gcc-plugins tree?
>
> Thanks!
>
> -Kees

Could you pick it up to your tree?

Thanks.




-- 
Best Regards
Masahiro Yamada

Re: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio struct

2018-07-02 Thread Fabio Estevam

On Mon, Jul 2, 2018 at 11:13 PM, Anson Huang  wrote:

> I think either way is OK, since flexible array is used in kernel code quite 
> commonly,
> so I prefer to make code change as small as possible, the original patch can 
> also prevent
> similar bug in future. And like below commit Fabio pointed out, it also has 
> same kind
> of fix: a158531f3c92 ("gpio: 74x164: Fix crash during .remove()"). Thanks.

I am also fine with this patch or the one from Linus.

Maybe Anson's patch could be applied to 4.18-rc as a bug fix and
Linus' patch could be applied to 4.19-rc1 as a cleanup/improvement?

Ulf,

What would you prefer?

Thanks

Re: [PATCH] gcc-plugins: remove unused GCC_PLUGIN_SUBDIR

2018-07-02 Thread Kees Cook

On Mon, Jul 2, 2018 at 7:23 PM, Masahiro Yamada
 wrote:
> Hi Kees,
>
> 2018-07-03 11:18 GMT+09:00 Kees Cook :
>> On Mon, Jul 2, 2018 at 5:39 PM, Masahiro Yamada
>>  wrote:
>>> GCC_PLUGIN_SUBDIR has never been used.  If you really need this in
>>> the future, please re-add it then.
>>>
>>> For now, the code is unused. Remove.
>>>
>>> 'export HOSTLIBS' is not necessary either.
>>>
>>> Signed-off-by: Masahiro Yamada 
>>
>> Acked-by: Kees Cook 
>>
>> Is this going via your tree, or should I take it via the gcc-plugins tree?
>>
>> Thanks!
>>
>> -Kees
>
> Could you pick it up to your tree?

Happily; thanks!

-Kees

-- 
Kees Cook
Pixel Security

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Mathieu Desnoyers

- On Jul 2, 2018, at 10:18 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

> On Mon, Jul 2, 2018 at 7:01 PM Mathieu Desnoyers
>  wrote:
>>
>> One thing to consider is how we will implement the load of that pointer
>> on the kernel side.
> 
> Use "get_user()". It works for 64-bit objects too, and it will be
> atomic in the 32-bit sub-parts on a 32-bit architecture.

Is it really ? Last time we had this discussion, not all architectures
guaranteed that reading a 64-bit integer would happen in two atomic
32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
macro as it stands today (rather than using a union).

> 
> Again: there is no point in trying to be atomic in the full 64 bits
> (when you're running on 32-bit). The upper bits don't have to "match"
> the lower bits. They just have to be zero. So doing it as two loads is
> fine - the same way it's perfectly fine to do it as two stores (since
> the store to the upper bits will always be zero).

I'd be fine with two atomic loads, but I'd rather have a strong
confirmation about this, because last time around there were
architectures where it was not true as far as I recall.

Thanks,

Mathieu

> 
> Linus

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Linus Torvalds

On Mon, Jul 2, 2018 at 7:30 PM Mathieu Desnoyers
 wrote:
>
>
> Is it really ? Last time we had this discussion, not all architectures
> guaranteed that reading a 64-bit integer would happen in two atomic
> 32-bit sub-parts.

All architectures that matter do.

Please don't overdesign this, or try to make a problem out of
something that isn't a problem.

Sure, maybe some toy architecture does a 8-byte "get_user()" as a
"copy_from_user()" one byte at a time, because that's the best way to
do unaligned accesses.

But nobody will ever care about rseq on such a thing anyway. Let it go.

   Linus

[lkp-robot] ee410f15b1 BUG: kernel hang in boot stage

2018-07-02 Thread kernel test robot


Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit ee410f15b1418f2f4428e79980674c979081bcb7
Author: Thierry Escande 
AuthorDate: Thu Jun 14 15:28:15 2018 -0700
Commit: Linus Torvalds 
CommitDate: Fri Jun 15 07:55:25 2018 +0900

lib/test_printf.c: call wait_for_random_bytes() before plain %p tests

If the test_printf module is loaded before the crng is initialized, the
plain 'p' tests will fail because the printed address will not be hashed
and the buffer will contain '(ptrval)' instead.

This patch adds a call to wait_for_random_bytes() before plain 'p' tests
to make sure the crng is initialized.

Link: 
http://lkml.kernel.org/r/20180604113708.11554-1-thierry.esca...@linaro.org
Signed-off-by: Thierry Escande 
Acked-by: Tobin C. Harding 
Reviewed-by: Andrew Morton 
Cc: David Miller 
Cc: Rasmus Villemoes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

608dbdfb1f  hexagon: drop the unused variable zero_page_mask
ee410f15b1  lib/test_printf.c: call wait_for_random_bytes() before plain %p 
tests
883c9ab9eb  Merge branch 'parisc-4.18-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
e3c7283c19  Add linux-next specific files for 20180629
+---++++---+
|   | 608dbdfb1f | ee410f15b1 | 883c9ab9eb | 
next-20180629 |
+---++++---+
| boot_successes| 35 | 0  | 19 | 13 
   |
| boot_failures | 0  | 15 ||
   |
| BUG:kernel_hang_in_boot_stage | 0  | 15 ||
   |
+---++++---+

[9.488584] -
[9.491008] Testing concurrent rhashtable access from 10 threads
[   21.577749] test 3125 add/delete pairs into rhlist
[   21.734553] test 3125 random rhlist add/delete operations
[   21.813107] Started 10 threads, 0 failed, rhltable test returns 0
BUG: kernel hang in boot stage


  # HH:MM RESULT GOOD 
BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 7daf201d7fe8334e2d2364d4e8ed3394ec9af819 v4.17 --
git bisect good a16afaf7928b74c30a4727cdcaa67bd10675a55d  # 08:00  G 11 
00   0  Merge tag 'for-v4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
git bisect good dc594c39f7a9dcdfd5dbb1a446ac6d06182e2472  # 08:13  G 11 
00   0  Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client
git bisect  bad 81e97f01371f4e1701feeafe484665112cd9ddc2  # 08:33  B  0 
1   15   0  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect  bad 35773c93817c5f2df264d013978e7551056a063a  # 08:55  B  0 
1   15   0  Merge branch 'afs-proc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect  bad 8949170cf48e91da7e4e69a59e2842d81d9a5885  # 09:26  B  0 
1   15   0  Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect  bad becfc5e97cbab00b25a592aabc36838ec7217d1f  # 09:49  B  0
10   24   0  Merge tag 'drm-next-2018-06-15' of 
git://anongit.freedesktop.org/drm/drm
git bisect good 7a932516f55cdf430c7cce78df2010ff7db6b874  # 10:21  G 11 
00   0  Merge tag 'vfs-timespec64' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
git bisect  bad b5d903c2d656e9bc54bc76554a477d796a63120d  # 10:44  B  0 
1   15   0  Merge branch 'akpm' (patches from Andrew)
git bisect good 3fb3894b84c2e0f83cb1e4f4e960243742e6b3a6  # 11:06  G 10 
00   0  kernel/relay.c: change return type to vm_fault_t
git bisect good 14f28f5776927be30717986f86b765d49eec392c  # 11:20  G 10 
00   0  ipc: use new return type vm_fault_t
git bisect good fe6bdfc8e1e131720abbe77a2eb990c94c9024cb  # 11:44  G 10 
00   0  mm: fix oom_kill event handling
git bisect good 608dbdfb1f0299f4500e56d62b0d84c44dcfa3be  # 11:56  G 11 
00   0  hexagon: drop the unused variable zero_page_mask
git bisect  bad ee410f15b1418f2f4428e79980674c979081bcb7  # 12:16  B  0 
1   15   0  lib/test_printf.c: call wait_for_random_bytes() before plain %p 
tests
# first bad commit: [ee410f15b1418f2f4428e79980674c979081bcb7] 
lib/test_printf.c: call wait_for_random_bytes() before plain %p tests
git bisect good 608dbdfb1f0299f4500e56d62b0d84c44dcfa3be  # 12:42  G 30 
00   0  hexagon: drop the unused variable zero_page_mask
# extra tests with debug options
git bisect  bad ee410f15b1418f2f4428e79980674c979081bcb7  # 13:00  B  0
11   25   0  lib/test_printf.c: call

Re: [PATCH 0/2] serial: 8250_dw: add fractional divisor support

2018-07-02 Thread Jisheng Zhang

On Tue, 3 Jul 2018 10:22:57 +0800 Jisheng Zhang wrote:

> Hi,
> 
> On Mon, 2 Jul 2018 14:51:03 +0300 Andy Shevchenko wrote:
> 
> > On Mon, 2018-07-02 at 13:18 +0300, Andy Shevchenko wrote:  
> > > On Mon, 2018-07-02 at 18:04 +0800, Jisheng Zhang wrote:
> > > > For Synopsys DesignWare 8250 uart which version >= 4.00a, there's a
> > > > valid divisor latch fraction register. The fractional divisor width
> > > > is
> > > > 4bits ~ 6bits.
> > > > 
> > > 
> > > There are several serial IPs that have fractional divider built-in.
> > > None
> > > is using any specific hooks. Why do you need in your case, esp. taking
> > > into consideration that we have a custom ->set_termios() callback?
> > 
> > Okay, I see that in 8250 we have hooks embedded into 8250_port.c which
> > is not the best solution.
> > 
> > For example it prevents better splitting Exar code.
> > So, we would need these hooks, but better to integrate them in the same
> > way like it's done for the rest of 8250 ones, i.e.
> > - rename existing to have a "do" word
> > - create new functions which would be a replacement that choose between
> > "do" variant and custom one
> > - not sure if we need to export "do" variants (at least for now)  
> 
> So you mean add the support as following:
> 
> 1.rename current serial8250_set_divisor as serial8250_do_set_divisor
> 
> 2.add a new serial8250_set_divisor which will be as simple as
> 
> static void serial8250_set_divisor(struct uart_port *port, ...)
> {
>   struct uart_8250_port *up = up_to_u8250p(port);
> 
>   if (up->set_divisor)
>   up->set_divisor(...);
>   else
>   serial8250_do_set_divisor(...);
> }
> 
> could you please confirm?
> 
> Another issue is I'm not sure which struct to add the hook,
> struct uart_port or struct uart_8250_port? Currently, it seems that only
> uart_8250_port needs this hook, sure, adding the hook to struct uart_port
> is fine either. Could you please kindly give some suggestions?

patching struct uart_port seems a bit overhead. After reading the code
again, I propose another solution, similar as what dl_write() is used in
8250 core:

1.introduce the hook to struct uart_8250_port as my previous patches do, 

2.rename current serial8250_set_divisor() as default_serial_get_divisor()
then introduce a new serial8250_set_divisor() as:
static inline void serial8250_set_divisor(struct uart_8250_port *up,)
{
up->set_divisor();
}

and point up->set-divisor to default_serial_get_divisor

what do you think about this solution?

Thanks

> 
> Thanks,
> Jisheng

Re: [PATCH V2 02/19] csky: defconfig

2018-07-02 Thread Rob Herring

On Sun, Jul 1, 2018 at 11:34 AM Guo Ren  wrote:
>

Needs a commit msg. Perhaps some overview of what's in each config.

> Signed-off-by: Guo Ren 
> ---
>  arch/csky/configs/gx66xx_defconfig | 549 
> +
>  arch/csky/configs/qemu_ck807_defconfig | 541 
>  2 files changed, 1090 insertions(+)
>  create mode 100644 arch/csky/configs/gx66xx_defconfig
>  create mode 100644 arch/csky/configs/qemu_ck807_defconfig

Are these configs mutually exclusive? We try to have one kernel build
serve many platforms. So you'd probably want to divide things between
the 2 ABIs.

It looks like you still have lots of options enabled that I wouldn't
expect you to need. Start with something more minimal for what you
need to boot and support upstream.

For a full config, you can use allmodconfig at least to build test.

Rob

Re: Linux 3.18.111

2018-07-02 Thread Seung-Woo Kim

Hello,

On 2018년 05월 30일 16:32, Greg KH wrote:
> I'm announcing the release of the 3.18.111 kernel.
> 
> All users of the 3.18 kernel series must upgrade.
> 
> The updated 3.18.y git tree can be found at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
> linux-3.18.y
> and can be browsed at the normal kernel.org git web browser:
>   
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
> 
> thanks,
> 
> greg k-h
> 
> 



>   do d_instantiate/unlock_new_inode combinations safely

Recent my test in 3.18.113 kernel with security smack showed following
crash during mkdir on ext4 fs.

Unable to handle kernel paging request at virtual address ff98
pgd = ffc012411000
[ff98] *pgd=, *pud=
[ cut here ]
Kernel BUG at ffc0007d9430 [verbose debug info unavailable]
Internal error: Oops - BUG: 9605 [#1] PREEMPT SMP
CPU: 0 MPIDR: 8000 PID: 1237 Comm: mkdir Not tainted
3.18.113-00083-g1bfc02f-dirty #29-Tizen
task: ffc02cbc2340 ti: ffc02b7fc000 task.ti: ffc02b7fc000
PC is at down_read+0x24/0x54
LR is at down_read+0x24/0x54
[...]
Call trace:
[] down_read+0x24/0x54
[] ext4_xattr_get+0x74/0x1f4
[] ext4_xattr_security_get+0x28/0x38
[] generic_getxattr+0x4c/0x60
[] smk_fetch.isra.6+0x8c/0xe0
[] smack_d_instantiate+0x194/0x324
[] security_d_instantiate+0x24/0x30
[] d_instantiate_new+0x34/0x94
[] ext4_mkdir+0x284/0x354
[] vfs_mkdir+0xc0/0x150
[] SyS_mkdirat+0x88/0xb8
[] SyS_mkdir+0x18/0x20
Code: aa0003f3 b00017c0 912e1000 97e38943 (c85f7e60)
---[ end trace b1ad797d63dae9c5 ]---

It is because d_instantiate_new() added from above commit calls
security_d_instantiate() before calling __d_instantiate() and
dentry->d_inode is not yet set and null. In 3.18.113 kernel,
inode->i_op_getxattr() of ext4 is still generic_getxattr() and it only
has dentry parameter without inode, so it tries to access dentry->d_inode.

I did not test with selinux, but selinux also calls
inode->i_op_getxattr() from selinux_d_instantiate(), so maybe there is
also same issue.

Best Regards,
- Seung-Woo Kim

-- 
Seung-Woo Kim
Samsung Research
--

Re: [PATCH V2 01/19] csky: Build infrastructure

2018-07-02 Thread Rob Herring

On Sun, Jul 1, 2018 at 11:36 AM Guo Ren  wrote:
>
> Signed-off-by: Guo Ren 
> ---

[...]

> +config CSKY_BUILTIN_DTB
> +   bool "Use kernel builtin dtb"
> +
> +config CSKY_BUILTIN_DTB_NAME
> +   string "kernel builtin dtb name"
> +   depends on CSKY_BUILTIN_DTB
> +endmenu

These options generally exist for backwards compatibility with legacy
bootloaders that don't support DT which shouldn't apply here given
this is a new arch. If we need this for other reasons, it should not
be an architecture specific option.

Rob

Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device

2018-07-02 Thread Kevin Hilman

Hi Sudeep,

On Wed, May 9, 2018 at 9:02 AM Sudeep Holla  wrote:
>
> Checking the equality of cpumask for both new and old tick device doesn't
> ensure that it's CPU local device. This will cause issue if a low rating
> clockevent tick device is registered first followed by the registration
> of higher rating clockevent tick device.
>
> In such case, clockevents_released list will never get emptied as both
> the devices get selected as preferred one and we will loop forever in
> clockevents_notify_released.
>
> Cc: Frederic Weisbecker 
> Cc: Thomas Gleixner 
> Signed-off-by: Sudeep Holla 

I've got a arm32 board (meson8b-odroidc1) that's been failing in
kernelCI.org since the merge window (boot log[1]), and I finally got
around to bisecting it[2].  Unfortunately, the bisect pointed at a
merge commit, but with some trial and error (and a suggestion by Arnd)
I was able to test that revering $SUBJECT commit[3], my problem goes
away.

Another interesting data point is that disabling SMP (either by
"nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
away, without needing to revert this patch.

AFAICT, this platform, is using a single timer as a clocksource
("amlogic,meson6-timer") which is not a per-CPU timer.

I ran out of time to keep digging on this issue, and I'm still not
sure exactly what's going on, but I wanted to report it in case anyone
else has any ideas, and so we can hopefully get it fixed during the
-rc cycle.

Kevin

[1] 
https://storage.kernelci.org/mainline/master/v4.18-rc2-357-gd3bc0e67f852/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-meson8b-odroidc1.html
[2] http://termbin.com/mk07
[3] in mainline as: 1332a9055801 tick: Prefer a lower rating device
only if it's CPU local device

> ---
>  kernel/time/tick-common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Hi Thomas,
>
> I am seeing this issue on my Juno devboard, where system wide timers
> with rating 300 and 400 are registered in same order and we get stuck in
> a loop in clockevents_notify_released. Let me know if this looks sane or
> you have any suggestions that I can try out.
>
> Regards,
> Sudeep
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index 49edc1c4f3e6..78e598334007 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -277,7 +277,8 @@ static bool tick_check_preferred(struct 
> clock_event_device *curdev,
>  */
> return !curdev ||
> newdev->rating > curdev->rating ||
> -  !cpumask_equal(curdev->cpumask, newdev->cpumask);
> +  (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
> +   !tick_check_percpu(curdev, newdev, smp_processor_id()));
>  }
>
>  /*
> --
> 2.7.4
>

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Christopher Lameter

On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:

> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
> won't be torn into something daft like byte-per-byte stores when performed
> from C code ?
>
> I don't worry whether the upper bits get updated or how, but I really care
> about not having store tearing of the low bits update.

Platforms with 32 bit word size only guarantee atomicity of a 32 bit
write or RMV instruction.

Special instructions may exist on a platform to perform 64 bit atomic
updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
atomicity8.

So use the macros that we have to guarantee 64 bit ops and you should be
fine. See linux/arch/x86/include/asm/atomic64_32.h

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Mathieu Desnoyers

- On Jul 2, 2018, at 8:19 PM, Chris Lameter c...@linux.com wrote:

> On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:
> 
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>>
>> I don't worry whether the upper bits get updated or how, but I really care
>> about not having store tearing of the low bits update.
> 
> Platforms with 32 bit word size only guarantee atomicity of a 32 bit
> write or RMV instruction.
> 
> Special instructions may exist on a platform to perform 64 bit atomic
> updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
> atomicity8.
> 
> So use the macros that we have to guarantee 64 bit ops and you should be
> fine. See linux/arch/x86/include/asm/atomic64_32.h

We are talking about user-space here. What we need is a single instruction
atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
on a 32-bit architecture should be considered to provide single-copy atomicity
on the low 32 bits.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

RE: [PATCH] bindings: add clocks optional binding for imx gpio

2018-07-02 Thread Anson Huang

Hi, Linus

Anson Huang
Best Regards!


> -Original Message-
> From: Linus Walleij [mailto:linus.wall...@linaro.org]
> Sent: Monday, July 2, 2018 9:46 PM
> To: Anson Huang ; Fabio Estevam
> 
> Cc: Rob Herring ; Mark Rutland
> ; open list:GPIO SUBSYSTEM
> ; open list:OPEN FIRMWARE AND FLATTENED
> DEVICE TREE BINDINGS ;
> linux-kernel@vger.kernel.org; dl-linux-imx 
> Subject: Re: [PATCH] bindings: add clocks optional binding for imx gpio
> 
> On Fri, Jun 29, 2018 at 5:34 AM Anson Huang 
> wrote:
> 
> > Some i.MX SoCs have GPIO clock gate in CCM, accessing GPIO registers
> > needs to enable GPIO clock gate first, i.MX GPIO driver will enable
> > clock gate if there is clock property in GPIO node of dtb, add
> > optional property to i.MX GPIO binding doc.
> >
> > Signed-off-by: Anson Huang 
> 
> Make sense since the gpio-mxc driver already supports this :)
> 
> > +Optional properties:
> > +- clocks: the clocks used by gpio bank
> 
> Should the text be "the clock for clocking the GPIO silicon"
> I guess that is what it is. And singularis?

Yes, it is singularis, I will improve the text.

> 
> Does it hurt to give the clock a name? Like the common "pclk" for peripheral
> clock or something similar that other i.MX silicon uses?

It is just because GPIO only needs one clock, and the driver does NOT get the
clock using clock name, so the GPIO node in dtb also has no clock name 
specified,
if we add a clock name here, dtb also need to be updated? And I saw other i.MX
modules which have only one clock, they also have no clock name specified, like 
I2C
on i.MX6QDL, I will send a V2 patch with text improved, thanks.

Anson.

> 
> Fabio: can we have your ACK on this too.
> 
> Yours,
> Linus Walleij

Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

2018-07-02 Thread Mathieu Desnoyers

- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski l...@amacapital.net wrote:

>> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers 
>> 
>> wrote:
>> 
>> - On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers
>> mathieu.desnoy...@efficios.com wrote:
>> 
>>> - On Jul 2, 2018, at 7:06 PM, Linus Torvalds 
>>> torva...@linux-foundation.org
>>> wrote:
>>> 
 On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
  wrote:
> 
> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
> with single-copy atomicity. Therefore, we want 32-bit user-space to 
> initialize
> the padding with 0, and only update the low bits with single-copy 
> atomicity.
 
 Well... It's actually still single-copy atomicity as a 64-bit value.
 
 Why? Because it doesn't matter how you write the upper bits. You'll be
 writing the same value to them (zero) anyway.
 
 So who cares if the write ends up being two instructions, because the
 write to the upper bits doesn't actually *do* anything.
 
 Hmm?
>>> 
>>> Are there any kind of guarantees that a __u64 update on a 32-bit 
>>> architecture
>>> won't be torn into something daft like byte-per-byte stores when performed
>>> from C code ?
>>> 
>>> I don't worry whether the upper bits get updated or how, but I really care
>>> about not having store tearing of the low bits update.
>> 
>> For the records, most updates of those low bits are done in assembly
>> from critical sections, for which we control exactly how the update is
>> performed.
>> 
>> However, there is one helper function in user-space that updates that value
>> from C through a volatile store, e.g.:
>> 
>> static inline void rseq_prepare_unload(void)
>> {
>>__rseq_abi.rseq_cs = 0;
>> }
> 
> How about making the field be:
> 
> union {
> __u64 rseq_cs;
> struct {
>   __u32 rseq_cs_low;
>   __u32 rseq_cs_high;
> };
> };
> 
> 32-bit user code that cares about performance can just write to rseq_cs_low
> because it already knows that rseq_cs_high == 0.
> 
> The header could even supply a static inline helper write_rseq_cs() that
> atomically writes a pointer and just does the right thing for 64-bit, for
> 32-bit BE, and for 32-bit LE.
> 
> I think the union really is needed because we can’t rely on user code being
> built with -fno-strict-aliasing.  Or the helper could use inline asm.
> 
> Anyway, the point is that we get optimal code generation (a single instruction
> write of the correct number of bits) without any compat magic in the kernel.

That works for me! Any objection from anyone else for this approach ?

Thanks,

Mathieu

> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>>> 
>>> Thanks,
>>> 
>>> Mathieu
>>> 
>>> 
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH upstream] KASAN: slab-out-of-bounds Read in getname_kernel

2018-07-02 Thread Ian Kent

On Mon, 2018-07-02 at 14:15 +0200, Dmitry Vyukov wrote:
> On Mon, Jul 2, 2018 at 1:55 PM, tomas  wrote:
> > Yes, thanks. Please use my full name, Tomas Bortoli.
> 
> 
> Please also include:
> 
> Reported-by: syzbot+60c837b428dc84e83...@syzkaller.appspotmail.com

Done.

> 
> from the original bug report. This this help to keep automatic testing
> process running.

Umm ... should I include this email address on the actual cc when
submitting the patch?

> 
> Thanks
> 
> 
> > 
> > On 07/02/2018 12:20 PM, Ian Kent wrote:
> > > On Mon, 2018-07-02 at 10:31 +0200, tomas wrote:
> > > > Hi Ian,
> > > > 
> > > > you are welcome!
> > > > 
> > > > yes your patch is much better. You should just put the "_IOC_NR" macro
> > > > around "cmd" in the lines added to "validate_dev_ioctl" to make it work.
> > > 
> > > LOL, yes, that was a dumb mistake.
> > > 
> > > I'll send it to Andrew Morton, after some fairly simple sanity testing,
> > > with both our Signed-off-by added.
> > > 
> > > > Tomas
> > > > 
> > > > 
> > > > On 07/02/2018 03:42 AM, Ian Kent wrote:
> > > > > On Mon, 2018-07-02 at 09:10 +0800, Ian Kent wrote:
> > > > > > On Mon, 2018-07-02 at 00:04 +0200, tomas wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > I've looked into this issue found by Syzbot and I made a patch:
> > > > > > > 
> > > > > > > https://syzkaller.appspot.com/bug?id=d03abd8b42847f7f69b1d1d7f9720
> > > > > > > 8ae425
> > > > > > > b116
> > > > > > > 3
> > > > > > 
> > > > > > Umm ... oops!
> > > > > > 
> > > > > > Thanks for looking into this Tomas.
> > > > > > 
> > > > > > > The autofs subsystem does not check that the "path" parameter is
> > > > > > > present
> > > > > > > within the "param" struct passed by the userspace in case the
> > > > > > > AUTOFS_DEV_IOCTL_OPENMOUNT_CMD command is passed. Indeed, it
> > > > > > > assumes a
> > > > > > > path is always provided (though a path is not always present, as
> > > > > > > per how
> > > > > > > the struct is defined:
> > > > > > > https://github.com/torvalds/linux/blob/master/include/uapi/linux/a
> > > > > > > uto_de
> > > > > > > v-io
> > > > > > > ct
> > > > > > > l.h#L89).
> > > > > > > Skipping the check provokes an oob read in "strlen", called by
> > > > > > > "getname_kernel", in turn called by the autofs to assess the
> > > > > > > length of
> > > > > > > the non-existing path.
> > > > > > > 
> > > > > > > To solve it, modify the "validate_dev_ioctl" function to check
> > > > > > > also that
> > > > > > > a path has been provided if the command is
> > > > > > > AUTOFS_DEV_IOCTL_OPENMOUNT_CMD.
> > > > > > > 
> > > > > > > 
> > > > > > > --- b/fs/autofs/dev-ioctl.c2018-07-01 23:10:16.059728621
> > > > > > > +0200around
> > > > > > > +++ a/fs/autofs/dev-ioctl.c2018-07-01 23:10:24.311792133 +0200
> > > > > > > @@ -136,6 +136,9 @@ static int validate_dev_ioctl(int cmd, s
> > > > > > >  goto out;
> > > > > > >  }
> > > > > > >  }
> > > > > > > +/* AUTOFS_DEV_IOCTL_OPENMOUNT_CMD without path */
> > > > > > > +else if(_IOC_NR(cmd) == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD)
> > > > > > > +return -EINVAL;
> > > > > > 
> > > > > > My preference is to put the comment inside the else but ...
> > > > > > 
> > > > > > There's another question, should the check be done in
> > > > > > autofs_dev_ioctl_openmount() in the same way it's checked in other
> > > > > > ioctls that need a path, such as in autofs_dev_ioctl_requester()
> > > > > > and autofs_dev_ioctl_ismountpoint()?
> > > > > > 
> > > > > > For consistency I'd say it should.
> > > > > > 
> > > > > > > 
> > > > > > >  err = 0;You should just put the "_IOC_NR" directive around
> > > > > > > "cmd" in
> > > > > > > the lines added to "validate_dev_ioctl" to make it work.
> > > > > > >  out:
> > > > > > > 
> > > > > > > 
> > > > > > > Tested and solves the issue on Linus' main git tree.
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > Or perhaps this (not even compile tested) patch would be better?
> > > > > 
> > > > > autofs - fix slab out of bounds read in getname_kernel()
> > > > > 
> > > > > From: Ian Kent 
> > > > > 
> > > > > The autofs subsystem does not check that the "path" parameter is
> > > > > present for all cases where it is required when it is passed in
> > > > > via the "param" struct.
> > > > > 
> > > > > In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD
> > > > > ioctl command.
> > > > > 
> > > > > To solve it, modify validate_dev_ioctl() function to check that a
> > > > > path has been provided for ioctl commands that require it.
> > > > > ---
> > > > >  fs/autofs/dev-ioctl.c |   15 +++
> > > > >  1 file changed, 7 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c
> > > > > index ea4ca1445ab7..61c63715c3fb 100644
> > > > > --- a/fs/autofs/dev-ioctl.c
> > > > > +++ b/fs/autofs/dev-ioctl.c
> > > > > @@ -135,6 +135,11 @@ static int validate_dev_ioctl(int cmd, struct
> > > > > autofs_dev_ioctl

Re: [PATCH 02/15] arm: dts: ls1021a: Add missing cooling device properties for CPUs

2018-07-02 Thread Shawn Guo

On Fri, May 25, 2018 at 04:01:48PM +0530, Viresh Kumar wrote:
> The cooling device properties, like "#cooling-cells" and
> "dynamic-power-coefficient", should either be present for all the CPUs
> of a cluster or none. If these are present only for a subset of CPUs of
> a cluster then things will start falling apart as soon as the CPUs are
> brought online in a different order. For example, this will happen
> because the operating system looks for such properties in the CPU node
> it is trying to bring up, so that it can register a cooling device.
> 
> Add such missing properties.
> 
> Signed-off-by: Viresh Kumar 

Applied, thanks.

Re: [RFC PATCH 0/4] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory.

2018-07-02 Thread Chao Fan

On Mon, Jul 02, 2018 at 08:18:55PM +0800, Baoquan He wrote:
>Hi Chao,
>
>On 06/12/18 at 04:10pm, Chao Fan wrote:
>> *** Issues need be discussed
>> There are several issues I am not quite sure, please help review and
>> give suggestions:
>> 
>> 1) In PATCH 1, I copy the structures and functions from ACPI head file,
>>so that ACPI head file will never been used here. I am not sure
>>whether it's good to include ACPI head file or use the method in
>>PATCH 1. If people think we can use ACPI head files directely, I
>>will remove the PATCH 1.
>
>Usaully we try to reuse code even though they are located in different
>life space. I think it applies to this SRAT handling. Copying ACPI codes

Yes, you are right. But as you said, they are in different life space,
so I don't have a better method to dig the SRAT table.
If anyone have a good way or suggestion, please tell me.

Thanks,
Chao Fan

>to kernel decompressing looks messy. Is there better way to reuse ACPI
>code and read SRAT table, then get what we need in a simple way?
>
>Thanks
>Baoquan
>> 
>> ***Test results:
>>  - I did a very simple test, and it can get the memory information in
>>bios and efi KVM guest machine, and put it by early printk. But no
>>more tests, so it's with RFC tag.
>> 
>> Any comments will be welcome.
>> 
>> 
>> Chao Fan (4):
>>   x86/boot: Add acpitb.h to help parse acpi tables
>>   x86/boot: Add acpitb.c to parse acpi tables
>>   x86/boot/KASLR: Walk srat tables to filter immovable memory
>>   x86/boot/KASLR: Limit kaslr to choosing the immovable memory
>> 
>>  arch/x86/boot/compressed/Makefile |   1 +
>>  arch/x86/boot/compressed/acpitb.c | 245 ++
>>  arch/x86/boot/compressed/acpitb.h | 175 +
>>  arch/x86/boot/compressed/kaslr.c  | 120 +--
>>  4 files changed, 530 insertions(+), 11 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/acpitb.c
>>  create mode 100644 arch/x86/boot/compressed/acpitb.h
>> 
>> -- 
>> 2.17.0
>> 
>> 
>> 
>
>

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1656 matches

Mail list logo