Re: [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault
On Mon 02-07-18 10:24:27, Yang Shi wrote: > > > On 7/2/18 6:37 AM, Michal Hocko wrote: > > On Mon 02-07-18 15:33:11, Laurent Dufour wrote: > > > > > > On 02/07/2018 14:45, Michal Hocko wrote: > > > > On Mon 02-07-18 14:26:09, Laurent Dufour wrote: > > > > > On 02/07/2018 14:15, Michal Hocko wrote: > > [...] > > > > > > We already do have a model for that. Have a look at MMF_UNSTABLE. > > > > > MMF_UNSTABLE is a mm's flag, here this is a VMA's flag which is > > > > > checked. > > > > Yeah, and we have the VMA ready for all places where we do check the > > > > flag. check_stable_address_space can be made to get vma rather than mm. > > > Yeah, this would have been more efficient to check that flag at the > > > beginning > > > of the page fault handler rather than the end, but this way it will be > > > easier > > > to handle the speculative page fault too ;) > > The thing is that it doesn't really need to be called earlier. You are > > not risking data corruption on file backed mappings. > > OK, I just think it could save a few cycles to check the flag earlier. This should be an extremely rare case. Just think about it. It should only ever happen when an access races with munmap which itself is questionable if not an outright bug. > If nobody think it is necessary, we definitely could re-use > check_stable_address_space(), If we really need this whole VM_DEAD thing then it should be better handled at the same place rather than some ad-hoc places. > just return VM_FAULT_SIGSEGV for VM_DEAD vma, > and check for both shared and non-shared. Why would you even care about shared mappings? -- Michal Hocko SUSE Labs
[PATCH] /dev/mem: Mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- drivers/char/mem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/char/mem.c b/drivers/char/mem.c index ffeb60d..4b00d6a 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -765,6 +765,7 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig) switch (orig) { case SEEK_CUR: offset += file->f_pos; + /* fall through */ case SEEK_SET: /* to avoid userland mistaking f_pos=-9 as -EBADF=-9 */ if ((unsigned long long)offset >= -MAX_ERRNO) { -- 2.7.4
Re: [PATCH] ARC: prevent showing irrelevant exception info in signal message
+CC Al On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote: > We process signals in the end of syscall/exception handler. > It the signal is fatal we print register's content using > show_regs function. show_regs() also prints information about > last exception happened. > > In case of multicore system we can catch the situation when we > will print wrong information about exception. See the example: > __ > CPU-0: started to handle page fault > CPU-1: sent signal to process, which is executed on CPU-0 > CPU-0: ended page fault handle. Started to process signal before >returnig to userspace. Process signal, which is send from >CPU-0. As th signal is fatal we call show_regs(). >show_regs() will show information about last exception >which is *page fault* (instead of "trap" which is used for >signals and happened on CPU-0) > > So we will get message like this: > /home/waitpid02 > potentially unexpected fatal signal 8. > Path: /home/waitpid02 > CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2 > task: 9f11c200 task.stack: 9f3ae000 > > [ECR ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000123ec > [EFA ]: 0x > [BLINK ]: 0x123ea > [ERET ]: 0x123ec > @off 0x123ec in [/home/waitpid02] > VMA: 0x0001 to 0x00016000 > [STAT32]: 0x80080882 : IE U > BTA: 0x000123ea SP: 0x5ffd3db0 FP: 0x > LPS: 0x20031684 LPE: 0x2003169a LPC: 0x0006 > [-other-info-] > > This message is confusing because it show information about page fault > ( [ECR ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant > to signal. Agreed this is misleading. @Al, is there a way to identify process termination from signals because it did something wrong vs. say unhandled signal. For former, we want to dump additional info in show_regs() such as PC / Fault addres etc and not in other scenario. > This situation was reproduced with waitpid02 LTP test. > _ > > So remove printing information about exceptions from show_regs() > to avoid confusing messages. Print information about exceptions > only in required places instead of show_regs() > > Now we don't print information about exceptions if signal is simply > send by another userspace app. So in case of waitpid02 we will print > next message: > _ > ./waitpid02 > potentially unexpected fatal signal 8. > [STAT32]: 0x80080082 : IE U > BTA: 0x2fc4 SP: 0x5ff8bd64 FP: 0x > LPS: 0x200524a0 LPE: 0x200524b6 LPC: 0x0006 > [-other-info-] > _ The prints I'm seeing now, for a segv from NULL pointer access is even more confusing ! There's a mixup of prints >8 Path: /segv CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412 [ECR ]: 0x00050200 => Invalid Write @ 0x by insn @ 0x000103ac [EFA ]: 0x [BLINK ]: 0x20047bb0 [ERET ]: 0x103ac @off 0x103ac in [/segv] VMA: 0x0001 to 0x00012000 potentially unexpected fatal signal 11. [STAT32]: 0x80080882 : IE U BTA: 0x00010398 SP: 0x5fc95e1c FP: 0x5fc95e20 LPS: 0x20039ffc LPE: 0x2003a000 LPC: 0x r00: 0x0001 r01: 0x5fc95e94 r02: 0x r03: 0x0064 r04: 0x80808080 r05: 0x2f2f2f2f ... >8 and for the process killed by signal 8, we get below. >8 [ARCLinux]# kill -8 71 [ARCLinux]# potentially unexpected fatal signal 8. [STAT32]: 0x80080882 : IE U BTA: 0x20020660 SP: 0x5fbcddec FP: 0x5fbcde1c LPS: 0x20039ffc LPE: 0x2003a000 LPC: 0x r00: 0xfdfc r01: 0x5fbcddf0 r02: 0x r03: 0x0008 r04: 0x80808080 r05: 0x2f2f2f2f r06: 0x7a2f5f4a r07: 0x r08: 0x0065 ... [1]+ Floating point exception ./sleep >8 I'm not sure whats the improvement here vs. the status quo. For signal based kill, we don't want to dump the extra registers and if any, we might still want to print the PC where the process was last seen in user mode to give user of idea what it was doing at the time. -Vineet
Re: [RFC v3 PATCH 3/5] mm: refactor do_munmap() to extract the common part
On Mon 02-07-18 09:59:06, Yang Shi wrote: > > > On 7/2/18 6:42 AM, Michal Hocko wrote: > > On Sat 30-06-18 06:39:43, Yang Shi wrote: > > > Introduces two new helper functions: > > >* munmap_addr_sanity() > > >* munmap_lookup_vma() > > > > > > They will be used by do_munmap() and the new do_munmap with zapping > > > large mapping early in the later patch. > > > > > > There is no functional change, just code refactor. > > There are whitespace changes which make the code much harder to review > > than necessary. > > > +static inline bool munmap_addr_sanity(unsigned long start, size_t len) > > > { > > > - unsigned long end; > > > - struct vm_area_struct *vma, *prev, *last; > > > + if ((offset_in_page(start)) || start > TASK_SIZE || len > TASK_SIZE - > > > start) > > > + return false; > > > - if ((offset_in_page(start)) || start > TASK_SIZE || len > > > > TASK_SIZE-start) > > > - return -EINVAL; > > e.g. here. > > Oh, yes. I did some coding style cleanup too. If you want to do some coding style cleanups make them a separate patch. The resulting diff would be much easier to review. -- Michal Hocko SUSE Labs
[PATCH v2 3/8] arm64/kernel: jump_label: switch to relative references
On a randomly chosen distro kernel build for arm64, vmlinux.o shows the following sections, containing jump label entries, and the associated RELA relocation records, respectively: ... [38088] __jump_table PROGBITS 00e19f30 0002ea10 WA 0 0 8 [38089] .rela__jump_table RELA 01fd8bb0 0008be30 0018 I 38178 38088 8 ... In other words, we have 190 KB worth of 'struct jump_entry' instances, and 573 KB worth of RELA entries to relocate each entry's code, target and key members. This means the RELA section occupies 10% of the .init segment, and the two sections combined represent 5% of vmlinux's entire memory footprint. So let's switch from 64-bit absolute references to 32-bit relative references for the code and target field, and a 64-bit relative reference for the 'key' field (which may reside in another module or the core kernel, which may be more than 4 GB way on arm64 when running with KASLR enable): this reduces the size of the __jump_table by 33%, and gets rid of the RELA section entirely. Acked-by: Will Deacon Signed-off-by: Ard Biesheuvel --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/jump_label.h | 36 +--- arch/arm64/kernel/jump_label.c | 6 ++-- 3 files changed, 20 insertions(+), 23 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1940c6405d04..d17aa9614e69 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -91,6 +91,7 @@ config ARM64 select HAVE_ARCH_BITREVERSE select HAVE_ARCH_HUGE_VMAP select HAVE_ARCH_JUMP_LABEL + select HAVE_ARCH_JUMP_LABEL_RELATIVE select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48) select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h index 1b5e0e843c3a..bb6d15b34668 100644 --- a/arch/arm64/include/asm/jump_label.h +++ b/arch/arm64/include/asm/jump_label.h @@ -26,13 +26,15 @@ #define JUMP_LABEL_NOP_SIZEAARCH64_INSN_SIZE -static __always_inline bool arch_static_branch(struct static_key *key, bool branch) +static __always_inline bool arch_static_branch(struct static_key *key, + bool branch) { - asm goto("1: nop\n\t" -".pushsection __jump_table, \"aw\"\n\t" -".align 3\n\t" -".quad 1b, %l[l_yes], %c0\n\t" -".popsection\n\t" + asm goto("1:nop \n\t" +" .pushsection__jump_table, \"aw\"\n\t" +" .align 3 \n\t" +" .long 1b - ., %l[l_yes] - . \n\t" +" .quad %c0 - . \n\t" +" .popsection \n\t" : : "i"(&((char *)key)[branch]) : : l_yes); return false; @@ -40,13 +42,15 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran return true; } -static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch) +static __always_inline bool arch_static_branch_jump(struct static_key *key, + bool branch) { - asm goto("1: b %l[l_yes]\n\t" -".pushsection __jump_table, \"aw\"\n\t" -".align 3\n\t" -".quad 1b, %l[l_yes], %c0\n\t" -".popsection\n\t" + asm goto("1:b %l[l_yes] \n\t" +" .pushsection__jump_table, \"aw\"\n\t" +" .align 3 \n\t" +" .long 1b - ., %l[l_yes] - . \n\t" +" .quad %c0 - . \n\t" +" .popsection \n\t" : : "i"(&((char *)key)[branch]) : : l_yes); return false; @@ -54,13 +58,5 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool return true; } -typedef u64 jump_label_t; - -struct jump_entry { - jump_label_t code; - jump_label_t target; - jump_label_t key; -}; - #endif /* __ASSEMBLY__ */ #endif /* __ASM_JUMP_LABEL_H */ diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c index c2dd1ad3e648..903d17023d77 100644 --- a/arch/arm64/kernel/jump_label.c +++ b/arch/arm64/kernel/jump_label.c @@ -25,12 +25,12 @@ void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type) { - void *addr = (void *)entry->code; + void *addr = (void *)jump_entry_code(entry); u32 insn; if (type ==
Re: [PATCH v10 4/7] i2c: fsi: Add abort and hardware reset procedures
Hi Eddie, > > I think this is a way too aggressive recovery. Your are doing the 9 > > pulse toggles basically on any error while this is only when the device > > keeps SDA low and you want to recover from that. If SDA is not stuck > > low, sending a STOP should do. Or do you have a known case where this is > > not going to work? > > It is aggressive, but I don't see the harm in doing this on every error. Well, as it happens, I just fixed such a case. Please check these patch series and elinux wiki pages: === (new fault injector) [PATCH v2 0/2] i2c: gpio: fault-injector: add new injector (actual recovery fix) [PATCH 0/2] i2c: recovery: make sure pulses are not misinterpreted === And here is the new elinux wiki page to describe my findings: https://elinux.org/Tests:I2C-bus-recovery-write-byte-fix Also, the previous pages have been updated to reflect the latest status: https://elinux.org/Tests:I2C-fault-injection https://elinux.org/Tests:I2C-bus-recovery To sum it up: This is a proven case where uncontrolled bus recovery can result into a bogus write! > There are some other error conditions with this hardware which may require > the clock toggling, such as "bus arbitration lost." I think this is the Why is that? In my understanding, recovery is *only* needed when SDA is stuck low. If SDA is high, sending STOP should do. If not, it needs to be researched why. > safest option for this hardware, and this routine has been tested for many > years. I remember having a similar argument with Joakim Tjernlund a while ago. I recently re-read our argument, yet I still keep my position: I don't want to do $random things to recover, just a tested and well understood procedure. And in that thread, I was never given a test case. > > > > Also, you implement the pulse toggling manually. Can't you just populate > > {get|set}_{scl|sda} and use the generic routine we have in the core? > > I see that the generic implementation breaks the loop if it sees the clock > isn't high after setting it, or if SDA goes high. I think it's safer to > finish the reset for our hardware. Plus, we actually have different Why do you think it is safer? What is the test case for that? I think one really should do check SDA! See above, you might trigger a write otherwise. If this breaks something for you, I am looking forward to discuss it. > registers for setting 0 or 1 to the clock/data, so we save some cpu cycles > by doing it directly instead of implementing set_scl/sda and having to check > val every time :) Correctness comes above all here. And I am afraid your implementation is not correct. > If you feel very strongly that this recovery procedure needs to be reduced, > then I will work on that and have to do some extensive testing. I am open for discussion, yet I also feel strong about it. The reason why the recovery procedure is moved into the core is to have one working and understood bit-banging algorithm which all drivers can rely on. If all drivers implement their custom version, they might miss gory details like the above write_byte fix. I do understand this might cause testing effort for you, I am sorry for the delay it causes. However, my goal as a maintainer is to have a reliable recovery mechanism, for your driver as well as for all drivers. I hope this is understandable. BTW if you want this driver upstream soon, then it may be an idea to resend it without any bus recovery and then we can work on it incrementally. Kind regards and thanks, Wolfram signature.asc Description: PGP signature
[PATCH v2 1/8] kernel/jump_label: abstract jump_entry member accessors
In preparation of allowing architectures to use relative references in jump_label entries [which can dramatically reduce the memory footprint], introduce abstractions for references to the 'code', 'target' and 'key' members of struct jump_entry. Signed-off-by: Ard Biesheuvel --- include/linux/jump_label.h | 34 + kernel/jump_label.c| 40 2 files changed, 49 insertions(+), 25 deletions(-) diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index b46b541c67c4..4603a1c88e48 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -119,6 +119,40 @@ struct static_key { #ifdef HAVE_JUMP_LABEL #include + +#ifndef __ASSEMBLY__ + +static inline unsigned long jump_entry_code(const struct jump_entry *entry) +{ + return entry->code; +} + +static inline unsigned long jump_entry_target(const struct jump_entry *entry) +{ + return entry->target; +} + +static inline struct static_key *jump_entry_key(const struct jump_entry *entry) +{ + return (struct static_key *)((unsigned long)entry->key & ~1UL); +} + +static inline bool jump_entry_is_branch(const struct jump_entry *entry) +{ + return (unsigned long)entry->key & 1UL; +} + +static inline bool jump_entry_is_init(const struct jump_entry *entry) +{ + return entry->code == 0; +} + +static inline void jump_entry_set_init(struct jump_entry *entry) +{ + entry->code = 0; +} + +#endif #endif #ifndef __ASSEMBLY__ diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 01ebdf1f9f40..8a3ac4f5f490 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -38,10 +38,10 @@ static int jump_label_cmp(const void *a, const void *b) const struct jump_entry *jea = a; const struct jump_entry *jeb = b; - if (jea->key < jeb->key) + if (jump_entry_key(jea) < jump_entry_key(jeb)) return -1; - if (jea->key > jeb->key) + if (jump_entry_key(jea) > jump_entry_key(jeb)) return 1; return 0; @@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(jump_label_rate_limit); static int addr_conflict(struct jump_entry *entry, void *start, void *end) { - if (entry->code <= (unsigned long)end && - entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start) + if (jump_entry_code(entry) <= (unsigned long)end && + jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE > (unsigned long)start) return 1; return 0; @@ -321,16 +321,6 @@ static inline void static_key_set_linked(struct static_key *key) key->type |= JUMP_TYPE_LINKED; } -static inline struct static_key *jump_entry_key(struct jump_entry *entry) -{ - return (struct static_key *)((unsigned long)entry->key & ~1UL); -} - -static bool jump_entry_branch(struct jump_entry *entry) -{ - return (unsigned long)entry->key & 1UL; -} - /*** * A 'struct static_key' uses a union such that it either points directly * to a table of 'struct jump_entry' or to a linked list of modules which in @@ -355,7 +345,7 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry) { struct static_key *key = jump_entry_key(entry); bool enabled = static_key_enabled(key); - bool branch = jump_entry_branch(entry); + bool branch = jump_entry_is_branch(entry); /* See the comment in linux/jump_label.h */ return enabled ^ branch; @@ -370,12 +360,12 @@ static void __jump_label_update(struct static_key *key, * An entry->code of 0 indicates an entry which has been * disabled because it was in an init text area. */ - if (entry->code) { - if (kernel_text_address(entry->code)) + if (!jump_entry_is_init(entry)) { + if (kernel_text_address(jump_entry_code(entry))) arch_jump_label_transform(entry, jump_label_type(entry)); else WARN_ONCE(1, "can't patch jump_label at %pS", - (void *)(unsigned long)entry->code); + (void *)jump_entry_code(entry)); } } } @@ -430,8 +420,8 @@ void __init jump_label_invalidate_initmem(void) struct jump_entry *iter; for (iter = iter_start; iter < iter_stop; iter++) { - if (init_section_contains((void *)(unsigned long)iter->code, 1)) - iter->code = 0; + if (init_section_contains((void *)jump_entry_code(iter), 1)) + jump_entry_set_init(iter); } } @@ -441,7 +431,7 @@ static enum jump_label_type jump_label_init_type(struct jump_entry *entry) { struct static_key *key = jump_entry_key(entry); bool type = static_key_type(key); - bool branch = jump_entry_branch(entry); + bool branch =
Re: [PATCH 1/2] Revert "UBIFS: Fix potential integer overflow in allocation"
On Mon, Jul 2, 2018 at 10:50 AM, Richard Weinberger wrote: > Am Montag, 2. Juli 2018, 18:00:05 CEST schrieb Kees Cook: >> On Sun, Jul 1, 2018 at 2:20 PM, Richard Weinberger wrote: >> > This reverts commit 353748a359f1821ee934afc579cf04572406b420. >> > It bypassed the linux-mtd review process and fixes the issue not as it >> > should. >> >> Ah, sorry, I thought you were CCed on the original report. > > No big deal. I was just "surprised". Yeah, totally my mistake. There were other overflow patches that went out pubically and I thought this one had too. >> > Cc: Kees Cook >> > Cc: Silvio Cesare >> > Cc: sta...@vger.kernel.org >> > Signed-off-by: Richard Weinberger >> > --- >> > fs/ubifs/journal.c | 5 ++--- >> > 1 file changed, 2 insertions(+), 3 deletions(-) >> > >> > diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c >> > index 07b4956e0425..da8afdfccaa6 100644 >> > --- a/fs/ubifs/journal.c >> > +++ b/fs/ubifs/journal.c >> > @@ -1282,11 +1282,10 @@ static int truncate_data_node(const struct >> > ubifs_info *c, const struct inode *in >> > int *new_len) >> > { >> > void *buf; >> > - int err, compr_type; >> > - u32 dlen, out_len, old_dlen; >> > + int err, dlen, compr_type, out_len, old_dlen; >> >> What's wrong with making these unsigned? > > Well, what is the benefit? > In ubifs a data node carries at most 4k of bytes. > WORST_COMPR_FACTOR is 2. > So the computed lengths are always in a range where a natural int does work > just fine. Just a robustness preference: it keeps it from going negative. But I don't feel strongly. :) >> > out_len = le32_to_cpu(dn->size); >> > - buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS); >> > + buf = kmalloc(out_len * WORST_COMPR_FACTOR, GFP_NOFS); >> > if (!buf) >> > return -ENOMEM; >> >> Please leave the kmalloc() -> kmalloc_array() change, as that has >> happened treewide already. We don't want to have any multiplications >> in the size argument for the allocators (i.e. they should use 2-factor >> arg version like here, or use array_size() for things like vmalloc()). > > Let's queue another patch for the next merge window which converts > kmalloc() -> kmalloc_array(). I'd prefer to leave it as-is for 4.18 because it would be the only unconverted kmalloc()-with-multiplication in the entire tree. We did treewide conversions and a revert would be undoing that here. (The scripts that check for this case would run "clean" for 4.18.) So, this gets back to the question of the int vs u32: if you just didn't revert this patch, then the kmalloc_array() would stand too. Easy! :) -Kees -- Kees Cook Pixel Security
Re: [PATCH v4 2/6] dt: qcom: 8996: thermal: Move to DT initialisation
On Mon 02 Jul 05:44 PDT 2018, Amit Kucheria wrote: > We also split up the regmap address space into two, one for the TM > registers, the other for the SROT registers. This was required to deal with > different address offsets for the TM and SROT registers across different > SoC families. > > Since tsens-common.c/init_common() currently only registers one address > space, the order is important (TM before SROT). This is OK since the code > doesn't really use the SROT functionality yet. > > Signed-off-by: Amit Kucheria Reviewed-by: Bjorn Andersson Regards, Bjorn > --- > arch/arm64/boot/dts/qcom/msm8996.dtsi | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi > b/arch/arm64/boot/dts/qcom/msm8996.dtsi > index 8c7f9ca..6c8a857 100644 > --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi > +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi > @@ -461,7 +461,17 @@ > > tsens0: thermal-sensor@4a8000 { > compatible = "qcom,msm8996-tsens"; > - reg = <0x4a8000 0x2000>; > + reg = <0x4a9000 0x1000>, /* TM */ > + <0x4a8000 0x1000>; /* SROT */ > + #qcom,sensors = <13>; > + #thermal-sensor-cells = <1>; > + }; > + > + tsens1: thermal-sensor@4ac000 { > + compatible = "qcom,msm8996-tsens"; > + reg = <0x4ad000 0x1000>, /* TM */ > + <0x4ac000 0x1000>; /* SROT */ > + #qcom,sensors = <8>; > #thermal-sensor-cells = <1>; > }; > > -- > 2.7.4 >
[PATCH] Drivers: hv: hv_kvp: Mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- drivers/hv/hv_kvp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index 5eed1e7..9b1adcb 100644 --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -352,6 +352,7 @@ static void process_ib_ipinfo(void *in_msg, void *out_msg, int op) MAX_IP_ADDR_SIZE); out->body.kvp_ip_val.dhcp_enabled = in->kvp_ip_val.dhcp_enabled; + /* fall through */ default: utf16s_to_utf8s((wchar_t *)in->kvp_ip_val.adapter_id, -- 2.7.4
Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE
- On Jul 2, 2018, at 1:11 PM, Andy Lutomirski l...@amacapital.net wrote: > > But I think that the limited solution of changing > instruction_pointer_set() really is a sufficient > architecture-dependent change to fully solve your problem. So let me recap with the changes I gather for 4.18 and 4.19: 4.18: * Change struct rseq_cs field types from LINUX_FIELD_u32_u64() to __u64 in uapi/linux/rseq.h, * Compare rseq->rseq_cs->abort_ip with TASK_SIZE before using it. Kill offending process if its value is over TASK_SIZE, * Explicitly check that padding of rseq->rseq_cs is zero on 32-bit kernels (#ifndef __LP64__). 4.19: * Introduce instruction_pointer_set() with input validation, use it when setting IP to abort_ip in rseq. This replaces the comparison of abort_ip with TASK_SIZE. Is that consistent with what you have in mind ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
[PATCH] staging: rtl8188eu: remove rtw_mp_phy_regdef.h
The header rtw_mp_phy_regdef.h is not used anywhere. 'git grep rtw_mp_phy_regdef.h' returns nothing, remove the file. Signed-off-by: Michael Straube --- .../rtl8188eu/include/rtw_mp_phy_regdef.h | 1078 - 1 file changed, 1078 deletions(-) delete mode 100644 drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h diff --git a/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h b/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h deleted file mode 100644 index 9276e2321f2a.. --- a/drivers/staging/rtl8188eu/include/rtw_mp_phy_regdef.h +++ /dev/null @@ -1,1078 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/** - * - * Copyright(c) 2007 - 2011 Realtek Corporation. All rights reserved. - * - **/ -/* - * - * Module: __RTW_MP_PHY_REGDEF_H_ - * - * - * Note: 1. Define PMAC/BB register map - * 2. Define RF register map - * 3. PMAC/BB register bit mask. - * 4. RF reg bit mask. - * 5. Other BB/RF relative definition. - * - * - * Export: Constants, macro, functions(API), global variables(None). - * - * Abbrev: - * - * History: - * DataWho Remark - * 08/07/2007 MHC 1. Porting from 9x series PHYCFG.h. - * 2. Reorganize code architecture. - * 09/25/2008 MH 1. Add RL6052 register definition - * - */ -#ifndef __RTW_MP_PHY_REGDEF_H_ -#define __RTW_MP_PHY_REGDEF_H_ - - -/*--Define Parameters---*/ - -/* */ -/* 8192S Regsiter offset definition */ -/* */ - -/* */ -/* BB-PHY register PMAC 0x100 PHY 0x800 - 0xEFF */ -/* 1. PMAC duplicate register due to connection: RF_Mode, TRxRN, NumOf L-STF */ -/* 2. 0x800/0x900/0xA00/0xC00/0xD00/0xE00 */ -/* 3. RF register 0x00-2E */ -/* 4. Bit Mask for BB/RF register */ -/* 5. Other definition for BB/RF R/W */ -/* */ - - -/* */ -/* 1. PMAC duplicate register due to connection: RF_Mode, TRxRN, NumOf L-STF */ -/* 1. Page1(0x100) */ -/* */ -#definerPMAC_Reset 0x100 -#definerPMAC_TxStart 0x104 -#definerPMAC_TxLegacySIG 0x108 -#definerPMAC_TxHTSIG1 0x10c -#definerPMAC_TxHTSIG2 0x110 -#definerPMAC_PHYDebug 0x114 -#definerPMAC_TxPacketNum 0x118 -#definerPMAC_TxIdle0x11c -#definerPMAC_TxMACHeader0 0x120 -#definerPMAC_TxMACHeader1 0x124 -#definerPMAC_TxMACHeader2 0x128 -#definerPMAC_TxMACHeader3 0x12c -#definerPMAC_TxMACHeader4 0x130 -#definerPMAC_TxMACHeader5 0x134 -#definerPMAC_TxDataType0x138 -#definerPMAC_TxRandomSeed 0x13c -#definerPMAC_CCKPLCPPreamble 0x140 -#definerPMAC_CCKPLCPHeader 0x144 -#definerPMAC_CCKCRC16 0x148 -#definerPMAC_OFDMRxCRC32OK 0x170 -#definerPMAC_OFDMRxCRC32Er 0x174 -#definerPMAC_OFDMRxParityEr0x178 -#definerPMAC_OFDMRxCRC8Er 0x17c -#definerPMAC_CCKCRxRC16Er 0x180 -#definerPMAC_CCKCRxRC32Er 0x184 -#definerPMAC_CCKCRxRC32OK 0x188 -#definerPMAC_TxStatus 0x18c - -/* */ -/* 2. Page2(0x200) */ -/* */ -/* The following two definition are only used for USB interface. */ -/* define RF_BB_CMD_ADDR 0x02c0 RF/BB read/write command address. */ -/* define RF_BB_CMD_DATA 0x02c4 RF/BB read/write command data. */ - -/* */ -/* 3. Page8(0x800) */ -/* */ -#definerFPGA0_RFMOD0x800 /* RF mode & CCK TxSC RF BW Setting?? */ - -#definerFPGA0_TxInfo 0x804 /* Status report?? */ -#definerFPGA0_PSDFunction 0x808 - -#definerFPGA0_TxGainStage 0x80c /* Set TX PWR init gain? */ - -#definerFPGA0_RFTiming10x810 /* Useless now */ -#definerFPGA0_RFTiming20x814 -/* define rFPGA0_XC_RFTiming 0x818 */ -/* define rFPGA0_XD_RFTiming 0x81c */ - -#define rFPGA0_XA_HSSIParameter1 0x820 /* RF 3 wire register */ -#define rFPGA0_XA_HSSIParameter2 0x824 -#define rFPGA0_XB_HSSIParameter1 0x828 -#define rFPGA0_XB_HSSIParameter2 0x82c -#define rFPGA0_XC_HSSIParameter1 0x830 -#define rFPGA0_XC_HSSIParameter2 0x834 -#define rFPGA0_XD_HSSIParameter1 0x838 -#define rFPGA0_XD_HSSIParameter2 0x83c -#definerFPGA0_XA_LSSIParameter 0x840 -#definerFPGA0_XB_LSSIParameter 0x844 -#define
Re: [v7,07/10] dt-bindings: counter: Document stm32 quadrature encoder
On 06/21/2018 04:08 PM, William Breathitt Gray wrote: From: Benjamin Gaignard Add bindings for STM32 Timer quadrature encoder. It is a sub-node of STM32 Timer which implement the quadratic encoder part of the hardware. quadradic? or quadrature? Cc: Rob Herring Cc: Mark Rutland Signed-off-by: Benjamin Gaignard Signed-off-by: William Breathitt Gray --- .../bindings/counter/stm32-timer-cnt.txt | 31 +++ .../devicetree/bindings/mfd/stm32-timers.txt | 7 + 2 files changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt diff --git a/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt new file mode 100644 index ..c52fcdd4bf6c --- /dev/null +++ b/Documentation/devicetree/bindings/counter/stm32-timer-cnt.txt @@ -0,0 +1,31 @@ +STMicroelectronics STM32 Timer quadrature encoder + +STM32 Timer provides quadrature encoder to detect +angular position and direction of rotary elements, +from IN1 and IN2 input signals. + +Must be a sub-node of an STM32 Timer device tree node. +See ../mfd/stm32-timers.txt for details about the parent node. + +Required properties: +- compatible: Must be "st,stm32-timer-counter". +- pinctrl-names: Set to "default". +- pinctrl-0: List of phandles pointing to pin configuration nodes, + to set CH1/CH2 pins in mode of operation for STM32 + Timer input on external pin. + +Example: + timers@4001 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "st,stm32-timers"; + reg = <0x4001 0x400>; + clocks = < 0 160>; + clock-names = "int"; + + counter { + compatible = "st,stm32-timer-counter"; + pinctrl-names = "default"; + pinctrl-0 = <_in_pins>; + }; + }; diff --git a/Documentation/devicetree/bindings/mfd/stm32-timers.txt b/Documentation/devicetree/bindings/mfd/stm32-timers.txt index 1db6e0057a63..ff9c14ada30b 100644 --- a/Documentation/devicetree/bindings/mfd/stm32-timers.txt +++ b/Documentation/devicetree/bindings/mfd/stm32-timers.txt @@ -23,6 +23,7 @@ Optional parameters: Optional subnodes: - pwm:See ../pwm/pwm-stm32.txt - timer: See ../iio/timer/stm32-timer-trigger.txt +- counter: See ../counter/stm32-timer-cnt.txt Example: timers@4001 { @@ -43,4 +44,10 @@ Example: compatible = "st,stm32-timer-trigger"; reg = <0>; }; + + counter { + compatible = "st,stm32-timer-counter"; + pinctrl-names = "default"; + pinctrl-0 = <_in_pins>; + }; };
[PATCH v5] add param that allows bootline control of hardened usercopy
From: Chris von Recklinghausen Enabling HARDENED_USERCOPY causes measurable regressions in networking performance, up to 8% under UDP flood. I'm running an a small packet UDP flood using pktgen vs. a host b2b connected. On the receiver side the UDP packets are processed by a simple user space process that just reads and drops them: https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c Not very useful from a functional PoV, but it helps to pin-point bottlenecks in the networking stack. When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8% regression in the receive tput, compared to the same kernel without this option enabled. With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%). The call-chain is: __GI___libc_recvfrom entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_recvfrom __sys_recvfrom inet_recvmsg udp_recvmsg __check_object_size udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters calls check_copy_size() (again, inlined). A generic distro may want to enable HARDENED_USERCOPY in their default kernel config, but at the same time, such distro may want to be able to avoid the performance penalties in with the default configuration and disable the stricter check on a per-boot basis. This change adds a boot parameter that conditionally disables HARDENED_USERCOPY at boot time. This feature is not available on platforms that don't have CONFIG_JUMP_LABEL set. v4->v5: key off of CONFIG_JUMP_LABEL, not CONFIG_SMP_BROKEN. v3->v4: fix a couple of nits in commit comments declaration of bypass_usercopy_checks moved inside mm/usercopy.c and made static add blurb to commit comments about not enabling this functionality on platforms with CONFIG_BROKEN_ON_SMP set. v2->v3: add benchmark details to commit comments Don't add new item to Documentation/admin-guide/kernel-parameters.rst rename boot param to "hardened_usercopy=" update description in Documentation/admin-guide/kernel-parameters.txt static_branch_likely -> static_branch_unlikely add __ro_after_init versions of DEFINE_STATIC_KEY_FALSE, DEFINE_STATIC_KEY_TRUE disable_huc_atboot -> enable_checks (strtobool "on" == true) v1->v2: remove CONFIG_HUC_DEFAULT_OFF default is now enabled, boot param disables move check to __check_object_size so as to not break optimization of __builtin_constant_p() include linux/atomic.h before linux/jump_label.h Signed-off-by: Chris von Recklinghausen --- Documentation/admin-guide/kernel-parameters.txt | 11 include/linux/jump_label.h | 6 + mm/usercopy.c | 35 + 3 files changed, 52 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index efc7aa7..560d4dc 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -816,6 +816,17 @@ disable=[IPV6] See Documentation/networking/ipv6.txt. + hardened_usercopy= +[KNL] Under CONFIG_HARDENED_USERCOPY, whether +hardening is enabled for this boot. Hardened +usercopy checking is used to protect the kernel +from reading or writing beyond known memory +allocation boundaries as a proactive defense +against bounds-checking flaws in the kernel's +copy_to_user()/copy_from_user() interface. +on Perform hardened usercopy checks (default). +off Disable hardened usercopy checks. + disable_radix [PPC] Disable RADIX MMU mode on POWER9 diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index b46b541..1a0b6f1 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -299,12 +299,18 @@ struct static_key_false { #define DEFINE_STATIC_KEY_TRUE(name) \ struct static_key_true name = STATIC_KEY_TRUE_INIT +#define DEFINE_STATIC_KEY_TRUE_RO(name)\ + struct static_key_true name __ro_after_init = STATIC_KEY_TRUE_INIT + #define DECLARE_STATIC_KEY_TRUE(name) \ extern struct static_key_true name #define DEFINE_STATIC_KEY_FALSE(name) \ struct static_key_false name = STATIC_KEY_FALSE_INIT +#define DEFINE_STATIC_KEY_FALSE_RO(name) \ + struct static_key_false name __ro_after_init = STATIC_KEY_FALSE_INIT + #define DECLARE_STATIC_KEY_FALSE(name) \ extern struct static_key_false name diff --git a/mm/usercopy.c b/mm/usercopy.c index e9e9325..baa0193 100644 --- a/mm/usercopy.c
Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig
On 07/02/18 13:03, Randy Dunlap wrote: > On 07/02/18 07:47, Christoph Hellwig wrote: >> Instead of duplicating the source statements in every architecture just >> do it once in the toplevel Kconfig file. >> >> Signed-off-by: Christoph Hellwig >> --- >> Kconfig | 22 ++ >> arch/alpha/Kconfig | 20 >> arch/arc/Kconfig| 16 >> arch/arm/Kconfig| 25 - >> arch/arm64/Kconfig | 23 --- >> arch/c6x/Kconfig| 24 >> arch/h8300/Kconfig | 24 >> arch/hexagon/Kconfig| 16 >> arch/ia64/Kconfig | 20 >> arch/m68k/Kconfig | 24 >> arch/microblaze/Kconfig | 24 >> arch/mips/Kconfig | 24 >> arch/nds32/Kconfig | 16 >> arch/nios2/Kconfig | 24 >> arch/openrisc/Kconfig | 23 --- >> arch/parisc/Kconfig | 24 >> arch/powerpc/Kconfig| 19 --- >> arch/riscv/Kconfig | 24 >> arch/s390/Kconfig | 24 >> arch/sh/Kconfig | 24 >> arch/sparc/Kconfig | 24 >> arch/unicore32/Kconfig | 24 >> arch/x86/Kconfig| 22 +- >> arch/xtensa/Kconfig | 25 - >> 24 files changed, 23 insertions(+), 512 deletions(-) >> >> diff --git a/Kconfig b/Kconfig >> index a90d9f9e268b..5499b1273ba5 100644 >> --- a/Kconfig >> +++ b/Kconfig >> @@ -10,3 +10,25 @@ comment "Compiler: $(CC_VERSION_TEXT)" >> source "scripts/Kconfig.include" >> >> source "arch/$(SRCARCH)/Kconfig" >> + >> +source "init/Kconfig" > > Hi Christoph, > > Looks good overall. I'm still doing some testing on it. > > I would prefer to have init/Kconfig before arch/$(SRCARCH)/Kconfig. Ugh, that won't get this set correctly on x86_64: CONFIG_PGTABLE_LEVELS=2 > Is there a reason that you chose the ordering above? > Any known dependencies? > > Thanks. > >> + >> +source "kernel/Kconfig.freezer" >> + >> +menu "Executable file formats" >> +source "fs/Kconfig.binfmt" >> +endmenu >> + >> +source "mm/Kconfig" >> + >> +source "net/Kconfig" >> + >> +source "drivers/Kconfig" >> + >> +source "fs/Kconfig" >> + >> +source "security/Kconfig" >> + >> +source "crypto/Kconfig" >> + >> +source "lib/Kconfig" > > -- ~Randy
Re: [PATCH v4] ata: Remove depends on HAS_DMA in case of platform dependency
On Fri, Jun 22, 2018 at 01:03:07PM +0200, Geert Uytterhoeven wrote: > Remove dependencies on HAS_DMA where a Kconfig symbol depends on another > symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST". > In most cases this other symbol is an architecture or platform specific > symbol, or PCI. > > Generic symbols and drivers without platform dependencies keep their > dependencies on HAS_DMA, to prevent compiling subsystems or drivers that > cannot work anyway. > > This simplifies the dependencies, and allows to improve compile-testing. > > Signed-off-by: Geert Uytterhoeven > Reviewed-by: Mark Brown > Acked-by: Robin Murphy Applied to libata/for-4.18-fixes. Thanks. -- tejun
[PATCH v2 3/7] tracing: Generalize hist trigger onmax and save action
From: Tom Zanussi The action refactor code allowed actions and handlers to be separated, but the existing onmax handler and save action code is still not flexible enough to handle arbitrary coupling. This change generalizes them and in the process makes additional handlers and actions easier to implement. The onmax action can be broken up and thought of as two separate components - a variable to be tracked (the parameter given to the onmax($var_to_track) function) and an invisible variable created to save the ongoing result of doing something with that variable, such as saving the max value of that variable so far seen. Separating it out like this and renaming it appropriately allows us to use the same code for similar tracking functions such as onchange($var_to_track), which would just track the last value seen rather than the max seen so far, which is useful in some situations. Additionally, because different handlers and actions may want to save and access data differently e.g. save and retrieve tracking values as local variables vs something more global, save_val() and get_val() interface functions are introduced and max-specific implementations are used instead. The same goes for the code that checks whether a maximum has been hit - a generic check_val() interface and max-checking implementation is used instead, which allows future patches to make use of he same code using their own implemetations of similar functionality. Signed-off-by: Tom Zanussi --- kernel/trace/trace_events_hist.c | 213 +-- 1 file changed, 139 insertions(+), 74 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 61a3a8ca2f76..f54dc8782d91 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -328,6 +328,15 @@ typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, struct ring_buffer_event *rbe, struct action_data *data, u64 *var_ref_vals); +typedef bool (*check_track_val_fn_t) (u64 track_val, u64 var_val); +typedef bool (*save_track_val_fn_t) (struct hist_trigger_data *hist_data, +struct tracing_map_elt *elt, +struct action_data *data, +unsigned int track_var_idx, u64 var_val); +typedef u64 (*get_track_val_fn_t) (struct hist_trigger_data *hist_data, + struct tracing_map_elt *elt, + struct action_data *data); + enum handler_id { HANDLER_ONMATCH = 1, HANDLER_ONMAX, @@ -358,14 +367,18 @@ struct action_data { struct { char*var_str; - unsigned intmax_var_ref_idx; - struct hist_field *max_var; - struct hist_field *var; - } onmax; + struct hist_field *var_ref; + unsigned intvar_ref_idx; + + struct hist_field *track_var; + + check_track_val_fn_tcheck_val; + save_track_val_fn_t save_val; + get_track_val_fn_t get_val; + } track_data; }; }; - static char last_hist_cmd[MAX_FILTER_STR_VAL]; static char hist_err_str[MAX_FILTER_STR_VAL]; @@ -3122,10 +3135,10 @@ static void update_field_vars(struct hist_trigger_data *hist_data, hist_data->n_field_vars, 0); } -static void update_max_vars(struct hist_trigger_data *hist_data, - struct tracing_map_elt *elt, - struct ring_buffer_event *rbe, - void *rec) +static void update_save_vars(struct hist_trigger_data *hist_data, +struct tracing_map_elt *elt, +struct ring_buffer_event *rbe, +void *rec) { __update_field_vars(elt, rbe, rec, hist_data->save_vars, hist_data->n_save_vars, hist_data->n_field_var_str); @@ -3263,15 +3276,68 @@ create_target_field_var(struct hist_trigger_data *target_hist_data, return create_field_var(target_hist_data, file, var_name); } -static void onmax_print(struct seq_file *m, - struct hist_trigger_data *hist_data, - struct tracing_map_elt *elt, - struct action_data *data) +static bool check_track_val_max(u64 track_val, u64 var_val) +{ + if (var_val <= track_val) + return false; + + return true; +} + +static u64 get_track_val_local(struct hist_trigger_data *hist_data, + struct tracing_map_elt *elt, + struct action_data *data) +{ + unsigned int
[PATCH v2 1/7] tracing: Refactor hist trigger action code
From: Tom Zanussi The hist trigger action code currently implements two essentially hard-coded pairs of 'actions' - onmax(), which tracks a variable and saves some event fields when a max is hit, and onmatch(), which is hard-coded to generate a synthetic event. These hardcoded pairs (track max/save fields and detect match/generate synthetic event) should really be decoupled into separate components that can then be arbitrarily combined. The first component of each pair (track max/detect match) is called a 'handler' in the new code, while the second component (save fields/generate synthetic event) is called an 'action' in this scheme. This change refactors the action code to reflect this split by adding two handlers, HANDLER_ONMATCH and HANDLER_ONMAX, along with two actions, ACTION_SAVE and ACTION_TRACE. The new code combines them to produce the existing ONMATCH/TRACE and ONMAX/SAVE functionality, but doesn't implement the other combinations now possible. Future patches will expand these to further useful cases, such as ONMAX/TRACE, as well as add additional handlers and actions such as ONCHANGE and SNAPSHOT. Signed-off-by: Tom Zanussi --- kernel/trace/trace_events_hist.c | 387 +++ 1 file changed, 230 insertions(+), 157 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 046c716a6536..ede7a27fa52b 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -296,9 +296,9 @@ struct hist_trigger_data { struct field_var_hist *field_var_hists[SYNTH_FIELDS_MAX]; unsigned intn_field_var_hists; - struct field_var*max_vars[SYNTH_FIELDS_MAX]; - unsigned intn_max_vars; - unsigned intn_max_var_str; + struct field_var*save_vars[SYNTH_FIELDS_MAX]; + unsigned intn_save_vars; + unsigned intn_save_var_str; }; struct synth_field { @@ -328,8 +328,22 @@ typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, struct ring_buffer_event *rbe, struct action_data *data, u64 *var_ref_vals); +enum handler_id { + HANDLER_ONMATCH = 1, + HANDLER_ONMAX, +}; + +enum action_id { + ACTION_SAVE = 1, + ACTION_TRACE, +}; + struct action_data { + enum handler_id handler; + enum action_id action; + char*action_name; action_fn_t fn; + unsigned intn_params; char*params[SYNTH_FIELDS_MAX]; @@ -338,13 +352,11 @@ struct action_data { unsigned intvar_ref_idx; char*match_event; char*match_event_system; - char*synth_event_name; struct synth_event *synth_event; } onmatch; struct { char*var_str; - char*fn_name; unsigned intmax_var_ref_idx; struct hist_field *max_var; struct hist_field *var; @@ -1560,7 +1572,7 @@ find_match_var(struct hist_trigger_data *hist_data, char *var_name) for (i = 0; i < hist_data->n_actions; i++) { struct action_data *data = hist_data->actions[i]; - if (data->fn == action_trace) { + if (data->handler == HANDLER_ONMATCH) { char *system = data->onmatch.match_event_system; char *event_name = data->onmatch.match_event; @@ -1992,7 +2004,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt) } } - n_str = hist_data->n_field_var_str + hist_data->n_max_var_str; + n_str = hist_data->n_field_var_str + hist_data->n_save_var_str; size = STR_VAR_LEN_MAX; @@ -2934,7 +2946,7 @@ create_field_var_hist(struct hist_trigger_data *target_hist_data, int ret; if (target_hist_data->n_field_var_hists >= SYNTH_FIELDS_MAX) { - hist_err_event("onmatch: Too many field variables defined: ", + hist_err_event("trace action: Too many field variables defined: ", subsys_name, event_name, field_name); return ERR_PTR(-EINVAL); } @@ -2942,7 +2954,7 @@ create_field_var_hist(struct hist_trigger_data *target_hist_data, file = event_file(tr, subsys_name, event_name); if (IS_ERR(file)) { - hist_err_event("onmatch: Event file not found: ", + hist_err_event("trace action: Event file not found: ",
Re: [PATCH] leds: max8997: use mode when calling max8997_led_set_mode
Hi Colin, Thank you for the patch. On 07/02/2018 06:50 PM, Colin King wrote: From: Colin Ian King Variable mode is assigned to pdata->led_pdata->mode[led->id] and yet is not being used when calling function max8997_led_set_mode. Fix this by using mode when calling max8997_led_set_mode. Cleans up clang warning: warning: variable 'mode' set but not used [-Wunused-but-set-variable] Fixes: 8584cb82f151 ("leds: Add suuport for MAX8997-LED driver") Signed-off-by: Colin Ian King --- drivers/leds/leds-max8997.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/leds/leds-max8997.c b/drivers/leds/leds-max8997.c index 4edf74f1d6d4..8c019c28f9f5 100644 --- a/drivers/leds/leds-max8997.c +++ b/drivers/leds/leds-max8997.c @@ -268,7 +268,7 @@ static int max8997_led_probe(struct platform_device *pdev) mode = pdata->led_pdata->mode[led->id]; brightness = pdata->led_pdata->brightness[led->id]; - max8997_led_set_mode(led, pdata->led_pdata->mode[led->id]); + max8997_led_set_mode(led, mode); if (brightness > led->cdev.max_brightness) brightness = led->cdev.max_brightness; Applied. -- Best regards, Jacek Anaszewski
Re: [PATCH] security: CONFIG_HARDENED_USERCOPY does not need to select BUG
Hi Kamal, On Mon, Jul 2, 2018 at 1:14 PM, Kamal Mostafa wrote: > On Fri, Jun 29, 2018 at 01:27:08PM -0700, Kees Cook wrote: >> Do the lkdtm tests for usercopy correctly halt the kernel thread if >> CONFIG_BUG is removed? > > Yes, they do... Perfect, thanks for double-checking! I'll apply this to my tree. -Kees -- Kees Cook Pixel Security
Re: [PATCH v2 2/2] arm64: dts: qcom: pm8998: Add pm8998 thermal zone
On Mon, Jul 02, 2018 at 12:53:44PM -0700, Doug Anderson wrote: > Hi, > > On Mon, Jul 2, 2018 at 11:10 AM, Matthias Kaehlcke wrote: > > The thermal zone uses spmi-temp-alarm as sensor. If the sensor is > > configured without an IIO input it always reports 37°C for temperatures > > below the first hardware trip point at 105°C. This hardware trip point > > is configured as critical trip point, to initiate a system shutdown > > before the temperature reaches the next hardware trip point at 125°C, > > where the PMIC performs a partial shutdown. > > > > The temperature of the critical trip point can be raised after adding > > the die temperature ADC as IIO input for spmi-temp-alarm, which > > significantly increases the precision of the temperature measurements. > > > > Signed-off-by: Matthias Kaehlcke > > --- > > Changes in v2: > > - defined 'thermal-zones' node in pm8998.dtsi instead of using a label > > to refer to it > > - use 105°C hardware trip point as critical trip point > > I'm not sure this was right. I guess you're trying to avoid > Temperature Stage 2? Indeed > From Davi'd email in response to v1: > > > The PMIC TEMP_ALARM hardware peripheral will perform an automatic partial > > PMIC shutdown upon hitting over-temperature stage 2 (125 C). This turns > > off peripherals within the PMIC that are expected to draw significant > > current. The set of peripherals included varies between PMICs. This > > partial shutdown will occur simultaneously with the triggering of an > > interrupt to the APPS processor that informs the qcom-spmi-temp-alarm > > driver that an over-temperature threshold has been crossed. > > I think it's actually OK to use Temperature Stage 2 as the "critical" > point, which is why it still interrupts the CPU. At "critical" the > system will shut down, right? ...so presumably it's OK if the drivers > can't recover from having the power yanked out from underneath them as > long as they don't hang/crash the system in this case. If I had to > guess the whole point of this stage is to give the system shutdown a > better chance of succeeding without getting to stage 3. That was my starting point, however in my tests the system reset several times when the temperature got close to 125°C, not allowing for a proper shutdown. Apparently the partial shutdown of the PMIC can result in a full reset at least on some systems. > I do agree, however, that removing the "145" from the device tree was > the right thing to do since software will never see that. The system > will just shut down. > > > > - reduced number of trip points to 2 > > - lowered temperature of passive trip point > > This won't actually do anything until the ADC gets hooked up, right? Correct > I guess I would have expected: > - 105: passive > - 125: critical > > ...and then we could add (if we wanted) a "hot" between passive and > critical once we have the ADC hooked up? Exactly, once we have more granularity we could add (an)other trip point(s). > > - updated trip point names and added labels > > - updated commit message > > > > arch/arm64/boot/dts/qcom/pm8998.dtsi | 25 + > > 1 file changed, 25 insertions(+) > > > > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi > > b/arch/arm64/boot/dts/qcom/pm8998.dtsi > > index 2f4989e7ef68..e7caa334e6c7 100644 > > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi > > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi > > @@ -3,6 +3,7 @@ > > > > #include > > #include > > +#include > > > > _bus { > > pm8998_lsid0: pmic@0 { > > @@ -60,3 +61,27 @@ > > #size-cells = <0>; > > }; > > }; > > + > > +/ { > > + thermal-zones { > > + pm8998 { > > + polling-delay-passive = <250>; > > + polling-delay = <1000>; > > + > > + thermal-sensors = <_temp>; > > + > > + trips { > > + pm8998_alert0: pm8998-alert0 { > > + temperature = <95000>; > > + hysteresis = <2000>; > > + type = "passive"; > > + }; > > + pm8998_crit: pm8998-crit { > > + temperature = <105000>; > > + hysteresis = <2000>; > > + type = "critical"; > > + }; > > + }; > > + }; > > + }; > > +}; > > A nit, but I think convention is to actually put additions straight to > the root node before reference to phandles, so I would have put this > above the "_bus" part. Ok, thanks, will do in v3
Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig
On 07/02/18 13:41, Randy Dunlap wrote: > --- linux-next-20180702.orig/init/Kconfig > +++ linux-next-20180702/init/Kconfig > @@ -1717,6 +1717,12 @@ config PROFILING > config TRACEPOINTS > bool > > +# Note: arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig > +# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS > +# before the default value is found in arch/Kconfig. > + > +source "arch/$(SRCARCH)/Kconfig" > + > source "arch/Kconfig" > > endmenu # General setup > except that the endmenu should be moved up a few lines so that the Processor type and features menu is not part of the General setup menu. v2 patch is below. --- From: Randy Dunlap Present "General setup" before "Processor type and features". This is done by sourcing arch/$(SRCARCH)/Kconfig before arch/Kconfig inside init/Kconfig. Signed-off-by: Randy Dunlap --- v2: move General setup's endmenu before the $ARCH Kconfigs. Kconfig |2 -- init/Kconfig | 10 -- 2 files changed, 8 insertions(+), 4 deletions(-) --- linux-next-20180702.orig/Kconfig +++ linux-next-20180702/Kconfig @@ -9,8 +9,6 @@ comment "Compiler: $(CC_VERSION_TEXT)" source "scripts/Kconfig.include" -source "arch/$(SRCARCH)/Kconfig" - source "init/Kconfig" source "kernel/Kconfig.freezer" --- linux-next-20180702.orig/init/Kconfig +++ linux-next-20180702/init/Kconfig @@ -1717,10 +1717,16 @@ config PROFILING config TRACEPOINTS bool -source "arch/Kconfig" - endmenu# General setup +# Note: arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig +# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS +# before the default value is found in arch/Kconfig. + +source "arch/$(SRCARCH)/Kconfig" + +source "arch/Kconfig" + config RT_MUTEXES bool
Re: [PATCH] ima: Use tpm_default_chip() and call TPM functions with a tpm_chip
On Mon, 2018-07-02 at 11:24 -0400, Stefan Berger wrote: > Rather than accessing the TPM functions by passing a NULL pointer for > the tpm_chip, which causes a lookup for a suitable chip every time, get a > hold of a tpm_chip and access the TPM functions using it. Also get rid of > the ima_used_chip variable and use the new ima_tpm_chip variable instead > for determining whether to call TPM functions. > > Signed-off-by: Stefan Berger > Acked-by: Jarkko Sakkinen Signed-off-by: Mimi Zohar Jarkko, would you mind staging this patch with the rest of the patch set? Mimi > --- > security/integrity/ima/ima.h| 2 +- > security/integrity/ima/ima_crypto.c | 4 ++-- > security/integrity/ima/ima_init.c | 16 +--- > security/integrity/ima/ima_queue.c | 4 ++-- > 4 files changed, 10 insertions(+), 16 deletions(-) > > diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h > index 354bb5716ce3..2ab1aa36 100644 > --- a/security/integrity/ima/ima.h > +++ b/security/integrity/ima/ima.h > @@ -53,9 +53,9 @@ enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8 }; > extern int ima_policy_flag; > > /* set during initialization */ > -extern int ima_used_chip; > extern int ima_hash_algo; > extern int ima_appraise; > +extern struct tpm_chip *ima_tpm_chip; > > /* IMA event related data */ > struct ima_event_data { > diff --git a/security/integrity/ima/ima_crypto.c > b/security/integrity/ima/ima_crypto.c > index 4e085a17124f..7e7e7e7c250a 100644 > --- a/security/integrity/ima/ima_crypto.c > +++ b/security/integrity/ima/ima_crypto.c > @@ -631,10 +631,10 @@ int ima_calc_buffer_hash(const void *buf, loff_t len, > > static void __init ima_pcrread(int idx, u8 *pcr) > { > - if (!ima_used_chip) > + if (!ima_tpm_chip) > return; > > - if (tpm_pcr_read(NULL, idx, pcr) != 0) > + if (tpm_pcr_read(ima_tpm_chip, idx, pcr) != 0) > pr_err("Error Communicating to TPM chip\n"); > } > > diff --git a/security/integrity/ima/ima_init.c > b/security/integrity/ima/ima_init.c > index 29b72cd2502e..faac9ecaa0ae 100644 > --- a/security/integrity/ima/ima_init.c > +++ b/security/integrity/ima/ima_init.c > @@ -26,7 +26,7 @@ > > /* name for boot aggregate entry */ > static const char *boot_aggregate_name = "boot_aggregate"; > -int ima_used_chip; > +struct tpm_chip *ima_tpm_chip; > > /* Add the boot aggregate to the IMA measurement list and extend > * the PCR register. > @@ -64,7 +64,7 @@ static int __init ima_add_boot_aggregate(void) > iint->ima_hash->algo = HASH_ALGO_SHA1; > iint->ima_hash->length = SHA1_DIGEST_SIZE; > > - if (ima_used_chip) { > + if (ima_tpm_chip) { > result = ima_calc_boot_aggregate(); > if (result < 0) { > audit_cause = "hashing_error"; > @@ -106,17 +106,11 @@ void __init ima_load_x509(void) > > int __init ima_init(void) > { > - u8 pcr_i[TPM_DIGEST_SIZE]; > int rc; > > - ima_used_chip = 0; > - rc = tpm_pcr_read(NULL, 0, pcr_i); > - if (rc == 0) > - ima_used_chip = 1; > - > - if (!ima_used_chip) > - pr_info("No TPM chip found, activating TPM-bypass! (rc=%d)\n", > - rc); > + ima_tpm_chip = tpm_default_chip(); > + if (!ima_tpm_chip) > + pr_info("No TPM chip found, activating TPM-bypass!\n"); > > rc = integrity_init_keyring(INTEGRITY_KEYRING_IMA); > if (rc) > diff --git a/security/integrity/ima/ima_queue.c > b/security/integrity/ima/ima_queue.c > index 418f35e38015..b186819bd5aa 100644 > --- a/security/integrity/ima/ima_queue.c > +++ b/security/integrity/ima/ima_queue.c > @@ -142,10 +142,10 @@ static int ima_pcr_extend(const u8 *hash, int pcr) > { > int result = 0; > > - if (!ima_used_chip) > + if (!ima_tpm_chip) > return result; > > - result = tpm_pcr_extend(NULL, pcr, hash); > + result = tpm_pcr_extend(ima_tpm_chip, pcr, hash); > if (result != 0) > pr_err("Error Communicating to TPM chip, result: %d\n", result); > return result;
Re: [PATCH 2/2] PCI: imx: Initial imx7d pm support
On Fri, 2018-06-08 at 16:33 +0200, Lucas Stach wrote: > Am Dienstag, den 29.05.2018, 22:39 +0300 schrieb Leonard Crestez: > > On imx7d the phy is turned off in suspend and must be reset on resume. > > Right now lspci -v fails after a suspend/resume cycle, fix this by > > adding minimal suspend/resume code from the nxp vendor tree. > > > > This is currently only enabled for imx7 but the same sequence can be > > applied to other imx pcie variants. > > +#ifdef CONFIG_PM_SLEEP > > +static int imx6_pcie_suspend_noirq(struct device *dev) > > +{ > > + struct imx6_pcie *imx6_pcie = dev_get_drvdata(dev); > > + > > + if (imx6_pcie->variant == IMX7D) { > > Instead of this large indented block, please just have an early return > at the start of this function, like: > > if (imx6_pcie->variant != IMX7D) > return 0; > > Same for the resume function. OK. The resume sequence is mostly the same for all SOCs where it applies. > > +static int imx6_pcie_resume_noirq(struct device *dev) > > +{ > > + int ret = 0; > > + struct imx6_pcie *imx6_pcie = dev_get_drvdata(dev); > > + struct pcie_port *pp = _pcie->pci->pp; > > > > + > > + if (imx6_pcie->variant == IMX7D) { > > + imx6_pcie_ltssm_disable(dev); > > The LTSSM disable seems misplaced here. I would have expected it to be > in the suspend function. This is a requirement for reinitializing the core: LTSSM needs to be turned off during initialization. > > + /* > > +* controller maybe turn off, re-configure again > > +*/ > > + dw_pcie_setup_rc(pp); > > + > > + imx6_pcie_ltssm_enable(dev); > > + > > + ret = imx6_pcie_wait_for_link(imx6_pcie); > > + if (ret < 0) > > + pr_info("pcie link is down after resume.\n"); > > If the PHY has been powered down and LTSSM stopped I guess we need to > do the full imx6_pcie_establish_link() dance again here to reliably re- > establish the link. The above seems unsafe. What imx6_pcie_establish_link does additionally is some workaround for link gen detection. I agree that it should be included. This would make resume mostly the same as imx_pcie_host_init except for dw_pcie_msi_init; that needs to be saved/restored separately. Another issue that should be discussed here is that on 6sx and 7d the gpc PCIE power domain is not defined correctly: the PCIE block is in the DISPMIX domain on both SOCs and it is only PCIE_PHY which has a separate power domain. This matters: enabling power-gating for displays will break pcie if this relationship is not taken into account. Here are some options: 1) Make _pcie a child of _disp by hacking into gpc probe functions and calling pm_genpd_add_subdomain. Not very nice. 2) Support nesting PGCs in GPC code? Lots of code and still an incorrect representation of hardware. 3) Set power-domains = <_disp> and enable runtime PM on _pcie? 4) Add separate devices for the pcie-phy. These would be mostly empty but still different, for example on imx6sx the PHY is not even accessible on the bus but only though PCIE registers. Solutions 1-3 seem like workarounds while 4 seems excessive, would appreciate some feedback. -- Regards, Leonard
Re: [GIT PULL] tee driver for v4.18
Hi Jens, On Mon, Jul 2, 2018 at 5:10 AM, Jens Wiklander wrote: > Hello arm-soc maintainers, > > Please pull these small tee driver enhancements. There's a new config > option for the OP-TEE driver, OPTEE_SHM_NUM_PRIV_PAGES. Also the OP-TEE > driver reads current time with ktime_get_real_ts64() from now on. > > Thanks, > Jens > > The following changes since commit 29dcea88779c856c7dc92040a0c01233263101d4: > > Linux 4.17 (2018-06-03 14:15:21 -0700) > > are available in the Git repository at: > > git://git.linaro.org/people/jens.wiklander/linux-tee.git > tags/tee-drv-for-4.18 > > for you to fetch changes up to 3249527f19d660c5adfb2b6f4ffd4ca0506b8755: > > tee: optee: making OPTEE_SHM_NUM_PRIV_PAGES configurable via Kconfig > (2018-06-20 11:20:36 +0200) > > > Misc enhancement for tee driver subsystem > > * Replaces getnstimeofday64() with ktime_get_real_ts64() > * Adds OPTEE_SHM_NUM_PRIV_PAGES to configure how many pages should be > statically reserved for driver private allocations > > > Arnd Bergmann (1): > tee: replace getnstimeofday64() with ktime_get_real_ts64() > > Sahil Malhotra (1): > tee: optee: making OPTEE_SHM_NUM_PRIV_PAGES configurable via Kconfig This doesn't seem to be bugfixes (and not regression fixes in particular), so I took the liberty of queuing these for 4.19 (in next/drivers) instead of fixes for v4.18. Let me know if this was an inaccurate assessment and I can move them over. Thanks! -Olof
Re: iwlwifi problem with iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent()
On Monday 2 July 2018 15:19:51 CEST Christoph Hellwig wrote: > > By reverting this commit the card works again, tested in 4.17.3 . I've > > noticed that the corresponding amd commit ( > > b468620f2a1dfdcfddfd6fa54367b8bcc1b51248) has been reverted in linus tree > > (e16c4790de39dc861b749674c2a9319507f6f64f), and 4.16.X stable tree iirc, > > but the intel one has not been reverted. > > > > hw: Lenovo P50 Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz > > > > Please let me know if more details are needed, but CC: me as I'm not > > currently subscribed to LKML. > > It probablty is the same issue. Did you check if you can just revert > the commit cleanly on top of current mainline and that fixes the issue > for you? Yep, it does. I issued a git revert d657c5c73ca987214a6f9436e435b34fc60f332a on mainline (4.18-rc3) and recompiled, the card works just fine.
Re: [PATCH] mm/sparse: Make sparse_init_one_section void and remove check
On Mon, Jul 2, 2018 at 11:43 AM wrote: > > From: Oscar Salvador > > sparse_init_one_section() is being called from two sites: > sparse_init() and sparse_add_one_section(). > The former calls it from a for_each_present_section_nr() loop, > and the latter marks the section as present before calling it. > This means that when sparse_init_one_section() gets called, we already know > that the section is present. > So there is no point to double check that in the function. > > This removes the check and makes the function void. > > Signed-off-by: Oscar Salvador Thank you Oscar. Reviewed-by: Pavel Tatashin > --- > mm/sparse.c | 12 +++- > 1 file changed, 3 insertions(+), 9 deletions(-)
Re: [PATCH] arm64: Clear the stack
On 07/02/2018 06:02 AM, Alexander Popov wrote: Hello Laura, Thanks for your work! Please see my comments below. On 29.06.2018 22:05, Laura Abbott wrote: Implementation of stackleak based heavily on the x86 version Signed-off-by: Laura Abbott --- Changes since last time: - Minor name change in entry.S - Converted to use the generic interfaces so there's minimal additions. - Added the fast syscall path. - Addition of on_thread_stack and current_top_of_stack - Disable stackleak on hyp per suggestion - Added a define for check_alloca. I'm still not sure about keeping it since the x86 version got reworked? I've mostly kept this as one patch with a minimal commit text. I can split it up and elaborate more before final merging. --- arch/arm64/Kconfig| 1 + arch/arm64/include/asm/processor.h| 9 + arch/arm64/kernel/entry.S | 7 +++ arch/arm64/kernel/process.c | 16 arch/arm64/kvm/hyp/Makefile | 3 ++- drivers/firmware/efi/libstub/Makefile | 3 ++- include/linux/stackleak.h | 1 + scripts/Makefile.gcc-plugins | 5 - 8 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index eb2cf4938f6d..b0221db95dc9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -92,6 +92,7 @@ config ARM64 select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_SECCOMP_FILTER + select HAVE_ARCH_STACKLEAK select HAVE_ARCH_THREAD_STRUCT_WHITELIST select HAVE_ARCH_TRACEHOOK select HAVE_ARCH_TRANSPARENT_HUGEPAGE diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 767598932549..73914fd7e142 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -248,5 +248,14 @@ void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused); #define SVE_SET_VL(arg) sve_set_current_vl(arg) #define SVE_GET_VL() sve_get_current_vl() +/* + * For stackleak May I ask you to call it STACKLEAK here and in other places for easier grep? Sure + * + * These need to be macros because otherwise we get stuck in a nightmare + * of header definitions for the use of task_stack_page. + */ +#define current_top_of_stack() (task_stack_page(current) + THREAD_SIZE) +#define on_thread_stack() (on_task_stack(current, current_stack_pointer)) + #endif /* __ASSEMBLY__ */ #endif /* __ASM_PROCESSOR_H */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index ec2ee720e33e..31c9da7d401e 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -401,6 +401,11 @@ tsk.reqx28 // current thread_info .text + .macro stackleak_erase Could you rename the macro to STACKLEAK_ERASE for similarity with x86? Mark Rutland had previously asked for this to be lowercase. I really don't care one way or the other so I'll defer to someone else to have the final word. +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK + bl stackleak_erase_kstack +#endif + .endm /* * Exception vectors. */ @@ -880,6 +885,7 @@ ret_fast_syscall: and x2, x1, #_TIF_WORK_MASK cbnzx2, work_pending enable_step_tsk x1, x2 + stackleak_erase kernel_exit 0 ret_fast_syscall_trace: enable_daif @@ -906,6 +912,7 @@ ret_to_user: cbnzx2, work_pending finish_ret_to_user: enable_step_tsk x1, x2 + stackleak_erase kernel_exit 0 ENDPROC(ret_to_user) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index f08a2ed9db0d..9f0f135f8b66 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -493,3 +493,19 @@ void arch_setup_new_exec(void) { current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0; } + +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK +#define MIN_STACK_LEFT 256 + +void __used stackleak_check_alloca(unsigned long size) +{ + unsigned long sp, stack_left; + + sp = current_stack_pointer; + + stack_left = sp & (THREAD_SIZE - 1); + BUG_ON(stack_left < MIN_STACK_LEFT || + size >= stack_left - MIN_STACK_LEFT); +} +EXPORT_SYMBOL(stackleak_check_alloca); +#endif This code should be updated. You may remember the troubles I had with MIN_STACK_LEFT: http://openwall.com/lists/kernel-hardening/2018/05/11/12 Please see that thread where Mark Rutland and I worked out the solution. Ah yeah, I missed the details in that thread. Thanks for that pointer. By the way, different stacks on x86_64 have different sizes. Is it false for arm64? Currently everything except the overflow stack looks to be the same size but there's also another stack I missed. It might be cleaner just to use on_accessible_stack and then another function to get the top of stack. This also might just be reimplementing what x86
Re: [GIT PULL] x86 fixes
On Sat, Jun 30, 2018 at 12:01 PM, Linus Torvalds wrote: > On Sat, Jun 30, 2018 at 1:49 AM Ingo Molnar wrote: >> >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -477,7 +477,7 @@ ENTRY(entry_SYSENTER_32) >> * whereas POPF does not.) >> */ >> addl$PT_EFLAGS-PT_DS, %esp /* point esp at pt_regs->flags */ >> - btr $X86_EFLAGS_IF_BIT, (%esp) >> + btrl$X86_EFLAGS_IF_BIT, (%esp) >> popfl > > Ho humm. Just looking at this patch, my reaction was "why isn't this > an 'andl $~X86_EFLAGS_IF' instead"? > > Yeah, I guess the 'andl' is two bytes longer (due to the 32-bit > constant - because IF is bit 9, you can't use a byte constant, and you > don't want to get a partial word write just before the popfl). > > But btr is really pretty heavy operation for older CPU's (it's gotten > better, but 32-bit code presumably cares more about the older CPUs). > > It really doesn't matter, I guess. The btr goes back to commit > c2c9b52fab0d ("x86/entry/32: Restore FLAGS on SYSEXIT"). > > Andy? > BTR is way more leet than AND! Seriously, though, I've never really tried to shave cycles off the 32-bit code, and BTR is shorter, and I didn't spend more than about one brain cycle thinking about it. I guess that BTR has a more complicated flags pipeline (the output flags depend on the input, not just the output) and probably uses some more complicated ALU circuit as compared to ANDL. --Andy
Re: [PATCH v3] add param that allows bootline control of hardened usercopy
On 07/02/2018 02:43 PM, Kees Cook wrote: > On Sat, Jun 30, 2018 at 1:43 PM, Christoph von Recklinghausen > wrote: >> The last issue I'm chasing is build failures on ARCH=m68k. The error is >> atomic_read and friends needed by the jump label code not being found. >> The config has CONFIG_BROKEN_ON_SMP=y, so the jump label calls I added >> will only be made #ifndef CONFIG_BROKEN_ON_SMP. Do you think that's >> worth a mention in the blurb that's added to >> Documentation/admin-guide/kernel-parameters.txt? > Uhm, that's weird -- I think the configs on m68k need fixing then? I > don't want to have to sprinkle that ifdef in generic code. > > How are other users of static keys and jump labels dealing with m68k > weirdness? > > -Kees > There's also CONFIG_JUMP_LABEL which is defined in x86_64 but not defined in the m68k configs. I'll use that instead. In hindsight I should have spotted that but didn't. Thanks, Chris
Re: [PATCH 2/3] pinctrl: msm: Mux out gpio function with gpio_request()
On Thu 28 Jun 10:14 PDT 2018, Stephen Boyd wrote: > Quoting Linus Walleij (2018-06-28 07:25:46) > > On Fri, Jun 22, 2018 at 8:29 PM Bjorn Andersson > > wrote: > > > On Fri 22 Jun 10:58 PDT 2018, Bjorn Andersson wrote: > > > > On Mon 18 Jun 13:52 PDT 2018, Stephen Boyd wrote: > > > > > > > > > We rely on devices to use pinmuxing configurations in DT to select the > > > > > GPIO function (function 0) if they're going to use the gpio in GPIO > > > > > mode. Let's simplify things for driver authors by implementing > > > > > gpio_request_enable() for this pinctrl driver to mux out the GPIO > > > > > function when the gpio is use from gpiolib. > > > > > > > > > > Cc: Bjorn Andersson > > > > > > > > Reviewed-by: Bjorn Andersson > > (...) > > > While both patch 2 and 3 are convenient ways to get around the annoyance > > > of having to specify a pinmux state both patches then ends up relying on > > > some default pinconf state; which I think is bad. > > What default state are we relying on? The reset state of the pins? I'm > very confused by this statement. These last two patches are making sure > that when a GPIO is requested as a GPIO or IRQ, it's really in GPIO > mode. Yes and this is convenient, as the TLMM is both multiplexor and gpio controller this is probably what people would expect. However looking at the downstream code people don't think this way (i.e. many drivers calls gpio_request() to get some sort of exclusive access to its pins - not to request gpio to be the muxed function). But my concern is related to pinconf, not pinmux - automating pinmuxing doesn't change the fact that the systems integrator should make sure to configure appropriate pull properties on the pins. > If this code is not in place, then we'll allow drivers to request > a GPIO but pinmuxing won't be called to actually mux that pin to GPIO > mode unless they have a DT conf for function = "gpio". That seems > entirely unexpected and easy to mess up. > I do agree with this, but the opposite doesn't feel crystal clear either. > > > > Nothing stops you from setting up a default conf in > > this callback though. > > > > But admittedly this call was added for simpler hardware. > > > > > Further more in situations like i2c-qup (downstream), where the pins are > > > requested as gpios in order to "bitbang" a reset this would mean that > > > the driver has to counter the convenience; by either switching in the > > > default pinmux at the end of probe or postponing the gpio_request() to > > > the invocation of reset and then, after issuing the gpio_release, > > > switching in the default pinmux explicitly again. > > > > That's a bigger problem. If the system is using device and GPIO > > mode orthogonally, it'd be good to leave like this. > > > > Doesn't that driver need to explicitly mux GPIO mode vs. device mode > right now? So having gpio_request() do the muxing to GPIO mode and then > explicit muxing of the pin to device mode would be what we have after > this patch, while before this patch we would have mux to GPIO, request > GPIO (nop), mux to device. We saved an explicit pinmux call? > It's currently possible to call gpio_request() in the i2c driver's probe function and then mux when needed. With this patch you would have to either follow the gpio_request with a mux-to-default or defer the gpio_request until the point where they today would have an explicit mux-to-gpio. Regards, Bjorn
Re: [PATCH 12/15] arm: dts: highbank: Add missing OPP properties for CPUs
On Fri, May 25, 2018 at 4:32 AM Viresh Kumar wrote: > > The OPP properties, like "operating-points", should either be present > for all the CPUs of a cluster or none. If these are present only for a > subset of CPUs of a cluster then things will start falling apart as soon > as the CPUs are brought online in a different order. For example, this > will happen because the operating system looks for such properties in > the CPU node it is trying to bring up, so that it can create an OPP > table. > > Add such missing properties. > > Fix other missing property (clock latency) as well to make it all > work. > > Signed-off-by: Viresh Kumar > --- > arch/arm/boot/dts/highbank.dts | 30 ++ > 1 file changed, 30 insertions(+) Acked-by: Rob Herring ARM-SOC maintainers, Please apply this. Rob
[RFC PATCH for 4.18 2/2] rseq: validate rseq->rseq_cs padding to be zero
On 32-bit kernels, the rseq->rseq_cs_padding field is never read by the kernel. However, 64-bit kernels dealing with 32-bit compat tasks read the full 64-bit in its entirety, and terminates the offending process with a segmentation fault if the upper 32 bits are set due to failure of copy_from_user(). Ensure that both 32-bit and 64-bit kernels dealing with 32-bit tasks end up terminating offending tasks with a segmentation fault if the upper 32-bit padding bits (rseq->rseq_cs_padding) are set by explicitly ensuring that padding is zero on 32-bit kernels. Signed-off-by: Mathieu Desnoyers CC: "Paul E. McKenney" CC: Peter Zijlstra CC: Paul Turner CC: Thomas Gleixner CC: Andy Lutomirski CC: Andi Kleen CC: Dave Watson CC: Chris Lameter CC: Ingo Molnar CC: "H. Peter Anvin" CC: Ben Maurer CC: Steven Rostedt CC: Josh Triplett CC: Linus Torvalds CC: Andrew Morton CC: Russell King CC: Catalin Marinas CC: Will Deacon CC: Michael Kerrisk CC: Boqun Feng CC: linux-...@vger.kernel.org --- kernel/rseq.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/kernel/rseq.c b/kernel/rseq.c index 2e5d88f09baf..c4c48157198f 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -112,6 +112,29 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t) return 0; } +#ifndef __LP64__ +/* + * Ensure that padding is zero. + */ +static int check_rseq_cs_padding(struct task_struct *t) +{ + unsigned long pad; + int ret; + + ret = __get_user(pad, >rseq->rseq_cs_padding); + if (ret) + return ret; + if (pad) + return -EFAULT; + return 0; +} +#else +static int check_rseq_cs_padding(struct task_struct *t) +{ + return 0; +} +#endif + static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) { struct rseq_cs __user *urseq_cs; @@ -123,6 +146,9 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) ret = __get_user(ptr, >rseq->rseq_cs); if (ret) return ret; + ret = check_rseq_cs_padding(t); + if (ret) + return ret; if (!ptr) { memset(rseq_cs, 0, sizeof(*rseq_cs)); return 0; -- 2.11.0
[RFC PATCH for 4.18 1/2] rseq: use __u64 for rseq_cs fields, validate abort_ip < TASK_SIZE
Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a consistent behavior for a 32-bit binary executed on 32-bit kernels and in compat mode on 64-bit kernels. Validating the value of abort_ip field to be below TASK_SIZE ensures the kernel don't return to an invalid address when returning to userspace after an abort. I don't fully trust each architecture code to consistently deal with invalid return addresses. If validation fails, the process is killed with a segmentation fault. Signed-off-by: Mathieu Desnoyers CC: "Paul E. McKenney" CC: Peter Zijlstra CC: Paul Turner CC: Thomas Gleixner CC: Andy Lutomirski CC: Andi Kleen CC: Dave Watson CC: Chris Lameter CC: Ingo Molnar CC: "H. Peter Anvin" CC: Ben Maurer CC: Steven Rostedt CC: Josh Triplett CC: Linus Torvalds CC: Andrew Morton CC: Russell King CC: Catalin Marinas CC: Will Deacon CC: Michael Kerrisk CC: Boqun Feng CC: linux-...@vger.kernel.org --- include/uapi/linux/rseq.h | 6 +++--- kernel/rseq.c | 5 +++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h index d620fa43756c..519ad6e176d1 100644 --- a/include/uapi/linux/rseq.h +++ b/include/uapi/linux/rseq.h @@ -52,10 +52,10 @@ struct rseq_cs { __u32 version; /* enum rseq_cs_flags */ __u32 flags; - LINUX_FIELD_u32_u64(start_ip); + __u64 start_ip; /* Offset from start_ip. */ - LINUX_FIELD_u32_u64(post_commit_offset); - LINUX_FIELD_u32_u64(abort_ip); + __u64 post_commit_offset; + __u64 abort_ip; } __attribute__((aligned(4 * sizeof(__u64; /* diff --git a/kernel/rseq.c b/kernel/rseq.c index 22b6acf1ad63..2e5d88f09baf 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -128,7 +128,8 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) return 0; } urseq_cs = (struct rseq_cs __user *)ptr; - if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs))) + if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)) || + rseq_cs->abort_ip >= TASK_SIZE) return -EFAULT; if (rseq_cs->version > 0) return -EINVAL; @@ -137,7 +138,7 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs) if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset) return -EINVAL; - usig = (u32 __user *)(rseq_cs->abort_ip - sizeof(u32)); + usig = (u32 __user *)(unsigned long)(rseq_cs->abort_ip - sizeof(u32)); ret = get_user(sig, usig); if (ret) return ret; -- 2.11.0
Re: [PATCH 1/5] kconfig: include common Kconfig files from top-level Kconfig
On 07/02/18 07:47, Christoph Hellwig wrote: > Instead of duplicating the source statements in every architecture just > do it once in the toplevel Kconfig file. > > Signed-off-by: Christoph Hellwig > --- > Kconfig | 22 ++ > arch/alpha/Kconfig | 20 > arch/arc/Kconfig| 16 > arch/arm/Kconfig| 25 - > arch/arm64/Kconfig | 23 --- > arch/c6x/Kconfig| 24 > arch/h8300/Kconfig | 24 > arch/hexagon/Kconfig| 16 > arch/ia64/Kconfig | 20 > arch/m68k/Kconfig | 24 > arch/microblaze/Kconfig | 24 > arch/mips/Kconfig | 24 > arch/nds32/Kconfig | 16 > arch/nios2/Kconfig | 24 > arch/openrisc/Kconfig | 23 --- > arch/parisc/Kconfig | 24 > arch/powerpc/Kconfig| 19 --- > arch/riscv/Kconfig | 24 > arch/s390/Kconfig | 24 > arch/sh/Kconfig | 24 > arch/sparc/Kconfig | 24 > arch/unicore32/Kconfig | 24 > arch/x86/Kconfig| 22 +- > arch/xtensa/Kconfig | 25 - > 24 files changed, 23 insertions(+), 512 deletions(-) > > diff --git a/Kconfig b/Kconfig > index a90d9f9e268b..5499b1273ba5 100644 > --- a/Kconfig > +++ b/Kconfig > @@ -10,3 +10,25 @@ comment "Compiler: $(CC_VERSION_TEXT)" > source "scripts/Kconfig.include" > > source "arch/$(SRCARCH)/Kconfig" > + > +source "init/Kconfig" > + > +source "kernel/Kconfig.freezer" > + > +menu "Executable file formats" > +source "fs/Kconfig.binfmt" > +endmenu > + > +source "mm/Kconfig" > + > +source "net/Kconfig" > + > +source "drivers/Kconfig" > + > +source "fs/Kconfig" > + > +source "security/Kconfig" > + > +source "crypto/Kconfig" > + > +source "lib/Kconfig" FWIW, I prefer this modification, but it's not a deal breaker. --- From: Randy Dunlap Present "General setup" before "Processor type and features". This is done by sourcing arch/$(SRCARCH)/Kconfig before arch/Kconfig inside init/Kconfig. Signed-off-by: Randy Dunlap --- Kconfig |2 -- init/Kconfig |6 ++ 2 files changed, 6 insertions(+), 2 deletions(-) --- linux-next-20180702.orig/Kconfig +++ linux-next-20180702/Kconfig @@ -9,8 +9,6 @@ comment "Compiler: $(CC_VERSION_TEXT)" source "scripts/Kconfig.include" -source "arch/$(SRCARCH)/Kconfig" - source "init/Kconfig" source "kernel/Kconfig.freezer" --- linux-next-20180702.orig/init/Kconfig +++ linux-next-20180702/init/Kconfig @@ -1717,6 +1717,12 @@ config PROFILING config TRACEPOINTS bool +# Note: arch/$(SRCARCH)/Kconfig needs to be before arch/Kconfig +# so that each $ARCH can specify its values for CONFIG_PGTABLE_LEVELS +# before the default value is found in arch/Kconfig. + +source "arch/$(SRCARCH)/Kconfig" + source "arch/Kconfig" endmenu# General setup
Re: [PATCH] mmc: dw_mmc: fix card threshold control configuration
Hi, On Mon, Jun 11, 2018 at 7:17 AM, Shawn Lin wrote: > On 2018/6/11 20:20, Ulf Hansson wrote: >> >> + Shawn Lin, Evgeniy Didin, Doug Andersson >> >> On 29 May 2018 at 12:38, Qing Xia wrote: >>> >>> From: x00270170 >>> >>> Card write threshold control is supposed to be set since controller >>> version 2.80a for data write in HS400 mode and data read in >>> HS200/HS400/SDR104 mode. However the current code returns without >>> configuring it in the case of data writing in HS400 mode. >>> Meanwhile the patch fixes that the current code goes to >>> 'disable' when doing data reading in HS400 mode. >>> > > I'm more or less unable to review this, as I don't have 2.80a databook, > nor a such platform to verify it. :( Sorry for not responding earlier. I didn't have a lot of context here so reviewing it never made it to the top of my queue. ...but quickly checking what I can... I'm in almost the same boat as Shawn. I don't have any HS400 hardware using dw_mmc available to me, but I do seem to have the 2.80a databook hanging around and it agrees that we need to enable the card write threshold for HS400 writes. That also matches the comments, so the patch seems right to me. Probably Jaehoon would make a better reviewer since he submitted the original code and also has more familiarity with HS400 on dw_mmc. I don't think I have enough context to give a full Reviewed-by, but in the very least I can confirm that the patch seems sane. -Doug
Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE
On Mon, Jul 2, 2018 at 7:32 AM Mathieu Desnoyers wrote: > > - On Jun 29, 2018, at 4:39 PM, Andy Lutomirski l...@amacapital.net wrote: > > > On Fri, Jun 29, 2018 at 12:48 PM, Mathieu Desnoyers > > wrote: > >> There are two aspects I'm concerned about here: > >> > >> 1) security: we don't want 32-bit user-space to feed a 64-bit value over > >> 4GB > >>as abort_ip that may end up causing OOPSes on architectures that would > >>lack proper validation of those values on return to userspace. > > > > I'm not too worried about this. As long as you're doing it from > > signal-delivery context (which you are AFAICT) you're fine. > > No, it's not just signal-delivery context. It's _also_ called from > return to usermode loop, which can by called on return from > interrupt/trap/syscall. > TIF_NOTIFY_RESUME context in the exit slowpath is fine, too. > > > > But I re-read the code and I think I have a really straightforward > > solution. Two choices: > > > > (1) Change instruction_pointer_set() to return an error code if the > > address passed in is garbage in a way that could cause unexpected > > behavior (like >=2^32 on x86_64 if regs->cs is 32-bit). It has very > > very few callers. > > This would take care of my security concern wrt abort_ip, but would not > provide consistent behavior for the other fields. Also, perhaps this > kind of change should aim the next merge window ? It's not about security. The idea is that instruction_pointer_set() should return some indication of whether it actually set the instruction pointer to the requested value. On x86, if you have !user_64bit_mode(regs) and you call instruction_pointer_set() to set ip to 0xbaadc0de12345678, then you end up with a state where we will probably execute user code at the address 0x12345678. Conversely, if you have user_64bit_mode(regs) == true and you set ip to 0xbaadc0de12345678, then you will end up sending a signal to the task because 0xbaadc0de12345678 is not executable (and, in fact, is highly likely to be noncanonical). So I would argue that the semantics *should* be: /* * Attempts to modify @regs such that the next user instruction to be executed is * the instruction at @addr. instruction_pointer_set() may return false to indicate * that addr was invalid in the sense that the next user instruction executed * might be some other address instead. The most likely cause is that * regs refers to a 32-bit compat context, addr != (u32)addr, and the architecture * might silently truncate the address on the next return to user code. * * instruction_pointer_set() must only be called from a context in which the architecture * allows arbitrary modifications of @regs. * * Architecture implementations promise that calling instruction_pointer_set() will not * crash or otherwise corrupt the kernel when called from a valid context, regardless * of what value is passed in @addr. */ bool instruction_pointer_set(struct pt_regs *regs, unsigned long addr); > > > > > (2) Add instruction_pointer_validate() to go along with > > instruction_pointer_set(). > > > > That should be enough to solve the problem, right? > > This would only handle the "security" part of the matter, which > is specifically related to rseq->rseq_cs->abort_ip. > > What is left is ensuring that we have consistent behavior for > other fields: > > [ Note: we have introduced this helper macro: LINUX_FIELD_u32_u64 > which defines a field which is 64-bit for 64-bit processes, and 32-bit > with 32-bit of padding for 32-bit processes. ] > > * rseq->rseq_cs: (userspace pointer to user-space, updated by user-space > with single-copy atomicity): current type: LINUX_FIELD_u32_u64, > cannot be changed to __u64 due to single-copy atomicity requirement, > > * rseq->rseq_cs->start_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > * rseq->rseq_cs->post_commit_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > * rseq->rseq_cs->abort_ip: currently a LINUX_FIELD_u32_u64, > could become a __u64, > > For abort_ip, changing the type to __u64 and using the > instruction_pointer_validate() approach you propose would work. > > For start_ip and post_commit_ip, we need to decide whether we > want to kill a 32-bit process setting the high bits or if we just > accept and use the full __u64 content on both 32-bit and 64-bit > kernels. Those two fields are only used for arithmetic comparison. > Using the full __u64 content means using 64-bit arithmetic on > 32-bit native kernels though. Just use the 64-bit values, I think. I see no point in killing the task. > > For rseq->rseq_cs, we cannot use __u64 due to single-copy atomicity > update requirement for 32-bit processes. However, we are using this > field in a copy_from_user(), so it will EFAULT if the high-bits are > set by a compat 32-bit task on a 64-bit kernel. We can therefore check > that the padding is zeroed explicitly on a native 32-bit kernel to > provide a consistent behavior.
Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping
On 7/2/18 5:33 AM, Kirill A. Shutemov wrote: On Sat, Jun 30, 2018 at 06:39:44AM +0800, Yang Shi wrote: When running some mmap/munmap scalability tests with large memory (i.e. 300GB), the below hung task issue may happen occasionally. INFO: task ps:14018 blocked for more than 120 seconds. Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D0 14018 1 0x0004 885582f84000 885e8682f000 880972943000 885ebf499bc0 8828ee12 c900349bfca8 817154d0 0040 00ff812f872a 885ebf499bc0 024000d000948300 880972943000 Call Trace: [] ? __schedule+0x250/0x730 [] schedule+0x36/0x80 [] rwsem_down_read_failed+0xf0/0x150 [] call_rwsem_down_read_failed+0x18/0x30 [] down_read+0x20/0x40 [] proc_pid_cmdline_read+0xd9/0x4e0 [] ? do_filp_open+0xa5/0x100 [] __vfs_read+0x37/0x150 [] ? security_file_permission+0x9b/0xc0 [] vfs_read+0x96/0x130 [] SyS_read+0x55/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0xc5 It is because munmap holds mmap_sem from very beginning to all the way down to the end, and doesn't release it in the middle. When unmapping large mapping, it may take long time (take ~18 seconds to unmap 320GB mapping with every single page mapped on an idle machine). It is because munmap holds mmap_sem from very beginning to all the way down to the end, and doesn't release it in the middle. When unmapping large mapping, it may take long time (take ~18 seconds to unmap 320GB mapping with every single page mapped on an idle machine). Zapping pages is the most time consuming part, according to the suggestion from Michal Hock [1], zapping pages can be done with holding read mmap_sem, like what MADV_DONTNEED does. Then re-acquire write mmap_sem to cleanup vmas. All zapped vmas will have VM_DEAD flag set, the page fault to VM_DEAD vma will trigger SIGSEGV. Define large mapping size thresh as PUD size or 1GB, just zap pages with read mmap_sem for mappings which are >= thresh value. If the vma has VM_LOCKED | VM_HUGETLB | VM_PFNMAP or uprobe, then just fallback to regular path since unmapping those mappings need acquire write mmap_sem. For the time being, just do this in munmap syscall path. Other vm_munmap() or do_munmap() call sites remain intact for stability reason. The below is some regression and performance data collected on a machine with 32 cores of E5-2680 @ 2.70GHz and 384GB memory. With the patched kernel, write mmap_sem hold time is dropped to us level from second. [1] https://lwn.net/Articles/753269/ Cc: Michal Hocko Cc: Matthew Wilcox Cc: Laurent Dufour Cc: Andrew Morton Signed-off-by: Yang Shi --- mm/mmap.c | 136 +- 1 file changed, 134 insertions(+), 2 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 87dcf83..d61e08b 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2763,6 +2763,128 @@ static int munmap_lookup_vma(struct mm_struct *mm, struct vm_area_struct **vma, return 1; } +/* Consider PUD size or 1GB mapping as large mapping */ +#ifdef HPAGE_PUD_SIZE +#define LARGE_MAP_THRESH HPAGE_PUD_SIZE +#else +#define LARGE_MAP_THRESH (1 * 1024 * 1024 * 1024) +#endif PUD_SIZE is defined everywhere. If THP is defined, otherwise it is: #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; }) + +/* Unmap large mapping early with acquiring read mmap_sem */ +static int do_munmap_zap_early(struct mm_struct *mm, unsigned long start, + size_t len, struct list_head *uf) +{ + unsigned long end = 0; + struct vm_area_struct *vma = NULL, *prev, *tmp; + bool success = false; + int ret = 0; + + if (!munmap_addr_sanity(start, len)) + return -EINVAL; + + len = PAGE_ALIGN(len); + + end = start + len; + + /* Just deal with uf in regular path */ + if (unlikely(uf)) + goto regular_path; + + if (len >= LARGE_MAP_THRESH) { + /* +* need write mmap_sem to split vma and set VM_DEAD flag +* splitting vma up-front to save PITA to clean if it is failed What errors do you talk about? ENOMEM on VMA split? Anything else? Yes, ENOMEM on vma split. +*/ + down_write(>mmap_sem); + ret = munmap_lookup_vma(mm, , , start, end); + if (ret != 1) { + up_write(>mmap_sem); + return ret; + } + /* This ret value might be returned, so reset it */ + ret = 0; + + /* +* Unmapping vmas, which has VM_LOCKED|VM_HUGETLB|VM_PFNMAP +* flag set or has uprobes set, need acquire write map_sem, +* so skip them in early zap. Just deal with such mapping in +* regular path. +
Re: [PATCH bpf-next 2/3] bpf: btf: add btf json print functionality
On Sun, 1 Jul 2018 11:31:47 +0100, Okash Khawaja wrote: > On Wed, Jun 27, 2018 at 02:56:49PM +0200, Daniel Borkmann wrote: > > On 06/27/2018 01:47 PM, Okash Khawaja wrote: > > > On Wed, Jun 27, 2018 at 12:34:35PM +0200, Daniel Borkmann wrote: > > >> On 06/27/2018 12:35 AM, Jakub Kicinski wrote: > > >>> On Tue, 26 Jun 2018 15:27:09 -0700, Martin KaFai Lau wrote: > > On Tue, Jun 26, 2018 at 01:31:33PM -0700, Jakub Kicinski wrote: > > >> [...] > > > Implementing both outputs in one series will help you structure your > > > code to best suit both of the formats up front. > > hex and "formatted" are the only things missing? As always, things > > can be refactored when new use case comes up. Lets wait for > > Okash input. > > > > Regardless, plaintext is our current use case. Having the current > > patchset in does not stop us or others from contributing other use > > cases (json, "bpftool map find"...etc), and IMO it is actually > > the opposite. Others may help us get there faster than us alone. > > We should not stop making forward progress and take this patch > > as hostage because "abc" and "xyz" are not done together. > > >>> > > >>> Parity between JSON and plain text output is non negotiable. > > >> > > >> Longish discussion and some confusion in this thread. :-) First of all > > >> thanks a lot for working on it, very useful! > > > Thanks :) > > > > > >> My $0.02 on it is that so far > > >> great care has been taken in bpftool to indeed have feature parity > > >> between > > >> JSON and plain text, so it would be highly desirable to keep continuing > > >> this practice if the consensus is that it indeed is feasible and makes > > >> sense wrt BTF data. There has been mentioned that given BTF data can be > > >> dynamic depending on what the user loads via bpf(2) so a potential JSON > > >> output may look different/break each time anyway. This however could all > > >> be > > >> embedded under a container object that has a fixed key like 'formatted' > > >> where tools like jq(1) can query into it. I think this would be fine > > >> since > > >> the rest of the (non-dynamic) output is still retained as-is and then > > >> wouldn't confuse or collide with existing users, and anyone > > >> programmatically > > >> parsing deeper into the BTF data under such JSON container object needs > > >> to have awareness of what specific data it wants to query from it; so > > >> there's no conflict wrt breaking anything here. Imho, both outputs would > > >> be very valuable. > > > Okay I can add "formatted" object under json output. > > > > > > One thing to note here is that the fixed output will change if the map > > > itself changes. So someone writing a program that consumes that fixed > > > output will have to account for his program breaking in future, thus > > > > Yes, that aspect is fine though, any program/script parsing this would need > > to be aware of the underlying map type to make sense of it (e.g. per-cpu vs > > non per-cpu maps to name one). But that info it could query/verify already > > beforehand via bpftool as well (via normal map info dump for a given id). > > > > > breaking backward compatibility anyway as far as the developer is > > > concerned :) > > > > > > I will go ahead with work on "formatted" object. > > > > Cool, thanks, > > Daniel > > > hi, > > couple of questions: > > 1. just to be sure, formatted section will be on the same level as "key" > and "value"? so something like following: > > > $ bpftool map dump -p id 8 > [{ > "key": ["0x00","0x00","0x00","0x00" > ], > "value": [... > ], > "formatted": { > "key": 0, > "value": { > "int_field": 3, > "pointerfield": 2152930552, > ... > } > } > }] Looks good, yes! > 2. i noticed that the ouput in v1 has all the keys and values on the > same level. in v2, i'll change them so that each key-value pair is a > separate object. let me know what you think. For non-JSON output? No preference, whatever looks better :) Empty line between key/value pairs to visually separate them could also work. But up to you. > finally, i noticed there is a map lookup command which also prints map > entries. do want that to also be btf-printed in this patchset? It would be nice to share the printing code for the two, yes.
[PATCH] iscsi_ibft: Mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva --- drivers/firmware/iscsi_ibft.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/firmware/iscsi_ibft.c b/drivers/firmware/iscsi_ibft.c index 6bc8e66..ba64568 100644 --- a/drivers/firmware/iscsi_ibft.c +++ b/drivers/firmware/iscsi_ibft.c @@ -542,6 +542,7 @@ static umode_t __init ibft_check_tgt_for(void *data, int type) case ISCSI_BOOT_TGT_NIC_ASSOC: case ISCSI_BOOT_TGT_CHAP_TYPE: rc = S_IRUGO; + /* fall through */ case ISCSI_BOOT_TGT_NAME: if (tgt->tgt_name_len) rc = S_IRUGO; -- 2.7.4
[PATCH] liquidio: make timeout HZ independent and readable
schedule_timeout_* takes a timeout in jiffies but the code currently is passing in a constant which makes this timeout HZ dependent. So define a constant with (hopefully) meaningful name and pass it through msecs_to_jiffies() to fix the HZ dependency. Signed-off-by: Nicholas Mc Guire commit f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") --- Problem found by experimental coccinelle script The current wait time can vary by a factor 10 depending on the HZ setting chose, which does not seem reasonable here. The below patch sets the timeout to 1s - which is the current duration assuming a setting of HZ== 100. It is though not clear if this is the intent or if it should be shorter as it is not clear what HZ setting was assumed during design and used for testing. This needs an ack by someone who knows the device and can confirm that waiting 1s for in-flight requests on device removal is reasonable. Patch was compile tested with: x86_64_defconfig (implies CONFIG_NET_VENDOR_CAVIUM=y) (with a large number of sparse warnings though unrelated to the proposed change) Patch is against 4.18-rc2 (localversion-next is -next-20180702) drivers/net/ethernet/cavium/liquidio/lio_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 7cb4e75..b2d0598 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -91,6 +91,9 @@ static int octeon_console_debug_enabled(u32 console) */ #define LIO_SYNC_OCTEON_TIME_INTERVAL_MS 6 +/* time to wait for possible in-flight requests in milliseconds */ +#define WAIT_INFLIGHT_REQUEST msecs_to_jiffies(1000) + struct lio_trusted_vf_ctx { struct completion complete; int status; @@ -259,7 +262,7 @@ static inline void pcierror_quiesce_device(struct octeon_device *oct) force_io_queues_off(oct); /* To allow for in-flight requests */ - schedule_timeout_uninterruptible(100); + schedule_timeout_uninterruptible(WAIT_INFLIGHT_REQUEST); if (wait_for_pending_requests(oct)) dev_err(>pci_dev->dev, "There were pending requests\n"); -- 2.1.4
Re: [PATCH v10 7/7] i2c: fsi: Add bus recovery
> > This all won't have any effect since you never call i2c_recover_bus > > which calls back into i2c_bus_recovery_info callbacks. > > Ah, I thought there would be some use of this in the core or in client > drivers, or some ioctl interface. Would there be any outside users of these > callbacks in the future? No. i2c_recover_bus() is only to recover from stalled SDA and the only place where this can be detected is inside the I2C master driver. It is not an event which can be controlled by a user. signature.asc Description: PGP signature
Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE
On Mon, Jul 2, 2018 at 12:02 PM Andy Lutomirski wrote: > > Works for me. Linus, any objection? I think the 4.19 stage may be overkill, but I don't hate it, so no real objections. If the main reason for this is that we silently clear the upper bits when returning to compat mode, I actually think that a better fix would be to just fix that. We shouldn't silently ignore bogus data in the return path. But I don't care enough, I think. Linus
Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code
On 07/02/2018 03:47 PM, Dave Hansen wrote: > On 07/01/2018 07:04 PM, Pavel Tatashin wrote: >> +for_each_present_section_nr(pnum_begin + 1, pnum_end) { >> +int nid = sparse_early_nid(__nr_to_section(pnum_end)); >> >> +if (nid == nid_begin) { >> +map_count++; >> continue; >> } > >> +sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); >> +nid_begin = nid; >> +pnum_begin = pnum_end; >> +map_count = 1; >> } > > Ugh, this is really hard to read. Especially because the pnum "counter" > is called "pnum_end". I called it pnum_end, because that is what is passed to sparse_init_nid(), but I see your point, and I can rename pnum_end to simply pnum if that will make things look better. > > So, this is basically a loop that collects all of the adjacent sections > in a given single nid and then calls sparse_init_nid(). pnum_end in > this case is non-inclusive, so the sparse_init_nid() call is actually > for the *previous* nid that pnum_end is pointing _past_. > > This *really* needs commenting. There is a comment before sparse_init_nid() about inclusiveness: 434 /* 435 * Initialize sparse on a specific node. The node spans [pnum_begin, pnum_end) 436 * And number of present sections in this node is map_count. 437 */ 438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin, 439unsigned long pnum_end, 440unsigned long map_count) Thank you, Pavel
Re: [PATCH v3 1/2] mm/sparse: add sparse_init_nid()
> @@ -2651,6 +2651,14 @@ void sparse_mem_maps_populate_node(struct page > **map_map, > unsigned long pnum_end, > unsigned long map_count, > int nodeid); > +struct page * sparse_populate_node(unsigned long pnum_begin, CodingStyle: put the "*" next to the function name, no space, please. > +unsigned long pnum_end, > +unsigned long map_count, > +int nid); > +struct page * sparse_populate_node_section(struct page *map_base, > +unsigned long map_index, > +unsigned long pnum, > +int nid); These two functions are named in very similar ways. Do they do similar things? > struct page *sparse_mem_map_populate(unsigned long pnum, int nid, > struct vmem_altmap *altmap); > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index e1a54ba411ec..b3e325962306 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -311,3 +311,52 @@ void __init sparse_mem_maps_populate_node(struct page > **map_map, > vmemmap_buf_end = NULL; > } > } > + > +struct page * __init sparse_populate_node(unsigned long pnum_begin, > + unsigned long pnum_end, > + unsigned long map_count, > + int nid) > +{ Could you comment what the function is doing, please? > + unsigned long size = sizeof(struct page) * PAGES_PER_SECTION; > + unsigned long pnum, map_index = 0; > + void *vmemmap_buf_start; > + > + size = ALIGN(size, PMD_SIZE) * map_count; > + vmemmap_buf_start = __earlyonly_bootmem_alloc(nid, size, > + PMD_SIZE, > + __pa(MAX_DMA_ADDRESS)); Let's not repeat the mistakes of the previous version of the code. Please explain why we are aligning this. Also, __earlyonly_bootmem_alloc()->memblock_virt_alloc_try_nid_raw() claims to be aligning the size. Do we also need to do it here? Yes, I know the old code did this, but this is the cost of doing a rewrite. :) > + if (vmemmap_buf_start) { > + vmemmap_buf = vmemmap_buf_start; > + vmemmap_buf_end = vmemmap_buf_start + size; > + } It would be nice to call out that these are globals that other code picks up. > + for (pnum = pnum_begin; map_index < map_count; pnum++) { > + if (!present_section_nr(pnum)) > + continue; > + if (!sparse_mem_map_populate(pnum, nid, NULL)) > + break; ^ This consumes "vmemmap_buf", right? That seems like a really nice thing to point out here if so. > + map_index++; > + BUG_ON(pnum >= pnum_end); > + } > + > + if (vmemmap_buf_start) { > + /* need to free left buf */ > + memblock_free_early(__pa(vmemmap_buf), > + vmemmap_buf_end - vmemmap_buf); > + vmemmap_buf = NULL; > + vmemmap_buf_end = NULL; > + } > + return pfn_to_page(section_nr_to_pfn(pnum_begin)); > +} > + > +/* > + * Return map for pnum section. sparse_populate_node() has populated memory > map > + * in this node, we simply do pnum to struct page conversion. > + */ > +struct page * __init sparse_populate_node_section(struct page *map_base, > + unsigned long map_index, > + unsigned long pnum, > + int nid) > +{ > + return pfn_to_page(section_nr_to_pfn(pnum)); > +} What is up with all of the unused arguments to this function? > diff --git a/mm/sparse.c b/mm/sparse.c > index d18e2697a781..c18d92b8ab9b 100644 > --- a/mm/sparse.c > +++ b/mm/sparse.c > @@ -456,6 +456,43 @@ void __init sparse_mem_maps_populate_node(struct page > **map_map, > __func__); > } > } > + > +static unsigned long section_map_size(void) > +{ > + return PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION); > +} Seems like if we have this, we should use it wherever possible, like sparse_populate_node(). > +/* > + * Try to allocate all struct pages for this node, if this fails, we will > + * be allocating one section at a time in sparse_populate_node_section(). > + */ > +struct page * __init sparse_populate_node(unsigned long pnum_begin, > + unsigned long pnum_end, > + unsigned long map_count, > + int nid) > +{ > + return memblock_virt_alloc_try_nid_raw(section_map_size() * map_count, > +PAGE_SIZE, __pa(MAX_DMA_ADDRESS), > +
Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code
On 07/02/2018 12:54 PM, Pavel Tatashin wrote: > > > On 07/02/2018 03:47 PM, Dave Hansen wrote: >> On 07/01/2018 07:04 PM, Pavel Tatashin wrote: >>> + for_each_present_section_nr(pnum_begin + 1, pnum_end) { >>> + int nid = sparse_early_nid(__nr_to_section(pnum_end)); >>> >>> + if (nid == nid_begin) { >>> + map_count++; >>> continue; >>> } >> >>> + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); >>> + nid_begin = nid; >>> + pnum_begin = pnum_end; >>> + map_count = 1; >>> } >> >> Ugh, this is really hard to read. Especially because the pnum "counter" >> is called "pnum_end". > > I called it pnum_end, because that is what is passed to > sparse_init_nid(), but I see your point, and I can rename pnum_end to > simply pnum if that will make things look better. Could you just make it a helper that takes a beginning pnum and returns the number of consecutive sections? >> So, this is basically a loop that collects all of the adjacent sections >> in a given single nid and then calls sparse_init_nid(). pnum_end in >> this case is non-inclusive, so the sparse_init_nid() call is actually >> for the *previous* nid that pnum_end is pointing _past_. >> >> This *really* needs commenting. > > There is a comment before sparse_init_nid() about inclusiveness: > > 434 /* > 435 * Initialize sparse on a specific node. The node spans [pnum_begin, > pnum_end) > 436 * And number of present sections in this node is map_count. > 437 */ > 438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin, > 439unsigned long pnum_end, > 440unsigned long map_count) Which I totally missed. Could you comment the code, please?
Re: [PATCH v3 2/2] mm/sparse: start using sparse_init_nid(), and remove old code
On Mon, Jul 2, 2018 at 4:00 PM Dave Hansen wrote: > > On 07/02/2018 12:54 PM, Pavel Tatashin wrote: > > > > > > On 07/02/2018 03:47 PM, Dave Hansen wrote: > >> On 07/01/2018 07:04 PM, Pavel Tatashin wrote: > >>> + for_each_present_section_nr(pnum_begin + 1, pnum_end) { > >>> + int nid = sparse_early_nid(__nr_to_section(pnum_end)); > >>> > >>> + if (nid == nid_begin) { > >>> + map_count++; > >>> continue; > >>> } > >> > >>> + sparse_init_nid(nid_begin, pnum_begin, pnum_end, map_count); > >>> + nid_begin = nid; > >>> + pnum_begin = pnum_end; > >>> + map_count = 1; > >>> } > >> > >> Ugh, this is really hard to read. Especially because the pnum "counter" > >> is called "pnum_end". > > > > I called it pnum_end, because that is what is passed to > > sparse_init_nid(), but I see your point, and I can rename pnum_end to > > simply pnum if that will make things look better. > > Could you just make it a helper that takes a beginning pnum and returns > the number of consecutive sections? But sections do not have to be consequent. Some nodes may have sections that are not present. So we are looking for two values: map_count -> which is number of present sections and node_end for the current node i.e. the first section of the next node. So the helper would need to return two things, and would basically repeat the same code that is done in this function. > > >> So, this is basically a loop that collects all of the adjacent sections > >> in a given single nid and then calls sparse_init_nid(). pnum_end in > >> this case is non-inclusive, so the sparse_init_nid() call is actually > >> for the *previous* nid that pnum_end is pointing _past_. > >> > >> This *really* needs commenting. > > > > There is a comment before sparse_init_nid() about inclusiveness: > > > > 434 /* > > 435 * Initialize sparse on a specific node. The node spans [pnum_begin, > > pnum_end) > > 436 * And number of present sections in this node is map_count. > > 437 */ > > 438 static void __init sparse_init_nid(int nid, unsigned long pnum_begin, > > 439unsigned long pnum_end, > > 440unsigned long map_count) > > Which I totally missed. Could you comment the code, please? Sure, I will add a comment into sparse_init() as well.
Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE
On Mon, Jul 2, 2018 at 12:31 PM, Linus Torvalds wrote: > On Mon, Jul 2, 2018 at 12:02 PM Andy Lutomirski wrote: >> >> Works for me. Linus, any objection? > > I think the 4.19 stage may be overkill, but I don't hate it, so no > real objections. > > If the main reason for this is that we silently clear the upper bits > when returning to compat mode, I actually think that a better fix > would be to just fix that. We shouldn't silently ignore bogus data in > the return path. > > But I don't care enough, I think. Like this: diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 3b2490b81918..ec40223c8856 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -170,6 +170,26 @@ static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) if (cached_flags & _TIF_USER_RETURN_NOTIFY) fire_user_return_notifiers(); +if (unlikely(!user_64bit_mode(regs) && + (regs->ip & 0xull))) { +siginfo_t info; +struct task_struct *tsk = current; + +/* I haven't thought about this *that* hard. */ +clear_siginfo(); +tsk->thread.cr2= regs->ip; +tsk->thread.trap_nr = X86_TRAP_PF; +tsk->thread.error_code = X86_PF_USER | X86_PF_INSTR; +info.si_signo = SIGSEGV; +info.si_errno = 0; +info.si_code = SEGV_MAPERR; +info.si_addr = (void __user *)regs->ip; +/* si_addr_lsb? */ +force_sig_info(SIGSEGV, , tsk); + +/* And we'll go through the loop again. */ +} + /* Disable IRQs and retry */ local_irq_disable(); It's whitespace damaged and barely tested, but it seems to at least not be completely busted. I don't really love doing this, though.
Re: [PATCH v2 1/2] arm64: dts: qcom: pm8998: Add spmi-temp-alarm node
Hi, On Mon, Jul 2, 2018 at 11:10 AM, Matthias Kaehlcke wrote: > This adds the spmi-temp-alarm node to pm8998 based on the examples in the > bindings. > > Signed-off-by: Matthias Kaehlcke > --- > Changes in v2: > - none > > arch/arm64/boot/dts/qcom/pm8998.dtsi | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi > b/arch/arm64/boot/dts/qcom/pm8998.dtsi > index 92bed1e7d4bb..2f4989e7ef68 100644 > --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi > +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi > @@ -11,6 +11,13 @@ > #address-cells = <1>; > #size-cells = <0>; > > + pm8998_temp: qcom,temp-alarm@2400 { Remove "qcom," from the node name (AKA please change to "temp-alarm@2400"). Someone internal in Qualcomm seems to have started this trend so you see it on all downstream kernels, but upstream device tree isn't supposed to have it. > + compatible = "qcom,spmi-temp-alarm"; > + reg = <0x2400 0x100>; Why are there two numbers for the "reg"? Should just be 0x2400. -Doug
Re: [PATCH V4 4/7] PCI: Unify try slot and bus reset API
On 6/29/2018 5:43 PM, Andy Shevchenko wrote: >> +int pci_try_reset_bus(struct pci_dev *pdev) >> +{ >> + bool slot = false; >> + >> + if (!pci_probe_reset_slot(pdev->slot)) >> + slot = true; >> + >> + return slot ? __pci_try_reset_slot(pdev->slot) : >> + __pci_try_reset_bus(pdev->bus); > This can be as simple as > > return pci_probe_reset_slot(pdev->slot) ? > __pci_try_reset_bus(pdev->slot) : __pci_try_reset_slot(pdev->bus); > > done -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH v4 0/6] thermal: tsens: Refactoring for TSENSv2 IP
On Mon, Jul 02, 2018 at 06:14:03PM +0530, Amit Kucheria wrote: > This series is a mixed bag: > - Some code moves to allow code sharing between various v2.x.y versions of > the TSENS IP, > - new qcom,tsens-v2.4.0 DT property for SDM845 and a generic qcom,tsens-v2 > property as a fallback compatible for all v2.x.y platforms, > - new platform support (sdm845) > - a cleanup patch and > - a DT change to have a common way to deal with the SROT and TM registers > despite slightly different features across the IP family and different > register offsets. > > Rob mentioned offline that we should expose the full version string of the > TSENS IP (x.y.z) and have a fallback compatible. I hope patch 4 does what > you were looking for. Applied patches 1, 3, and 4. Patch 5 needs a description. Patches 2 and 6 go via your arch tree. > > Regards, > Amit > > Amit Kucheria (6): > thermal: tsens: Get rid of unused fields in structure > dt: qcom: 8996: thermal: Move to DT initialisation > thermal: tsens: Rename tsens-8996 to tsens-v2 for reuse > thermal: tsens: Add support for SDM845 > thermal: tsens: Check if we have valid data before reading > arm64: dts: sdm845: Add tsens nodes > > .../devicetree/bindings/thermal/qcom-tsens.txt | 2 ++ > arch/arm64/boot/dts/qcom/msm8996.dtsi | 12 ++- > arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 + > drivers/thermal/qcom/Makefile | 2 +- > drivers/thermal/qcom/{tsens-8996.c => tsens-v2.c} | 39 > -- > drivers/thermal/qcom/tsens.c | 6 > drivers/thermal/qcom/tsens.h | 7 ++-- > 7 files changed, 62 insertions(+), 22 deletions(-) > rename drivers/thermal/qcom/{tsens-8996.c => tsens-v2.c} (66%) > > -- > 2.7.4 >
[PATCH v2 2/2] arm64: dts: qcom: pm8998: Add pm8998 thermal zone
The thermal zone uses spmi-temp-alarm as sensor. If the sensor is configured without an IIO input it always reports 37°C for temperatures below the first hardware trip point at 105°C. This hardware trip point is configured as critical trip point, to initiate a system shutdown before the temperature reaches the next hardware trip point at 125°C, where the PMIC performs a partial shutdown. The temperature of the critical trip point can be raised after adding the die temperature ADC as IIO input for spmi-temp-alarm, which significantly increases the precision of the temperature measurements. Signed-off-by: Matthias Kaehlcke --- Changes in v2: - defined 'thermal-zones' node in pm8998.dtsi instead of using a label to refer to it - use 105°C hardware trip point as critical trip point - reduced number of trip points to 2 - lowered temperature of passive trip point - updated trip point names and added labels - updated commit message arch/arm64/boot/dts/qcom/pm8998.dtsi | 25 + 1 file changed, 25 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi b/arch/arm64/boot/dts/qcom/pm8998.dtsi index 2f4989e7ef68..e7caa334e6c7 100644 --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi @@ -3,6 +3,7 @@ #include #include +#include _bus { pm8998_lsid0: pmic@0 { @@ -60,3 +61,27 @@ #size-cells = <0>; }; }; + +/ { + thermal-zones { + pm8998 { + polling-delay-passive = <250>; + polling-delay = <1000>; + + thermal-sensors = <_temp>; + + trips { + pm8998_alert0: pm8998-alert0 { + temperature = <95000>; + hysteresis = <2000>; + type = "passive"; + }; + pm8998_crit: pm8998-crit { + temperature = <105000>; + hysteresis = <2000>; + type = "critical"; + }; + }; + }; + }; +}; -- 2.18.0.399.gad0ab374a1-goog
[PATCH] drbd: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used in this case: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva --- drivers/block/drbd/drbd_receiver.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index be9450f..a36a307 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -2790,6 +2790,7 @@ static int receive_DataRequest(struct drbd_connection *connection, struct packet then we would do something smarter here than reading the block... */ peer_req->flags |= EE_RS_THIN_REQ; + /* fall through */ case P_RS_DATA_REQUEST: peer_req->w.cb = w_e_end_rsdata_req; fault_type = DRBD_FAULT_RS_RD; @@ -2968,6 +2969,7 @@ static int drbd_asb_recover_0p(struct drbd_peer_device *peer_device) __must_hold /* Else fall through to one of the other strategies... */ drbd_warn(device, "Discard younger/older primary did not find a decision\n" "Using discard-least-changes instead\n"); + /* fall through */ case ASB_DISCARD_ZERO_CHG: if (ch_peer == 0 && ch_self == 0) { rv = test_bit(RESOLVE_CONFLICTS, _device->connection->flags) @@ -2979,6 +2981,7 @@ static int drbd_asb_recover_0p(struct drbd_peer_device *peer_device) __must_hold } if (after_sb_0p == ASB_DISCARD_ZERO_CHG) break; + /* else: fall through */ case ASB_DISCARD_LEAST_CHG: if (ch_self < ch_peer) rv = -1; -- 2.7.4
Re: [RFC v3 PATCH 5/5] x86: check VM_DEAD flag in page fault
On 7/2/18 10:57 AM, Michal Hocko wrote: On Mon 02-07-18 10:24:27, Yang Shi wrote: On 7/2/18 6:37 AM, Michal Hocko wrote: On Mon 02-07-18 15:33:11, Laurent Dufour wrote: On 02/07/2018 14:45, Michal Hocko wrote: On Mon 02-07-18 14:26:09, Laurent Dufour wrote: On 02/07/2018 14:15, Michal Hocko wrote: [...] We already do have a model for that. Have a look at MMF_UNSTABLE. MMF_UNSTABLE is a mm's flag, here this is a VMA's flag which is checked. Yeah, and we have the VMA ready for all places where we do check the flag. check_stable_address_space can be made to get vma rather than mm. Yeah, this would have been more efficient to check that flag at the beginning of the page fault handler rather than the end, but this way it will be easier to handle the speculative page fault too ;) The thing is that it doesn't really need to be called earlier. You are not risking data corruption on file backed mappings. OK, I just think it could save a few cycles to check the flag earlier. This should be an extremely rare case. Just think about it. It should only ever happen when an access races with munmap which itself is questionable if not an outright bug. If nobody think it is necessary, we definitely could re-use check_stable_address_space(), If we really need this whole VM_DEAD thing then it should be better handled at the same place rather than some ad-hoc places. just return VM_FAULT_SIGSEGV for VM_DEAD vma, and check for both shared and non-shared. Why would you even care about shared mappings? Just thought about we are dealing with VM_DEAD, which means the vma will be tore down soon regardless it is shared or non-shared. MMF_UNSTABLE doesn't care about !shared case.
[PATCH v2 1/2] arm64: dts: qcom: pm8998: Add spmi-temp-alarm node
This adds the spmi-temp-alarm node to pm8998 based on the examples in the bindings. Signed-off-by: Matthias Kaehlcke --- Changes in v2: - none arch/arm64/boot/dts/qcom/pm8998.dtsi | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/pm8998.dtsi b/arch/arm64/boot/dts/qcom/pm8998.dtsi index 92bed1e7d4bb..2f4989e7ef68 100644 --- a/arch/arm64/boot/dts/qcom/pm8998.dtsi +++ b/arch/arm64/boot/dts/qcom/pm8998.dtsi @@ -11,6 +11,13 @@ #address-cells = <1>; #size-cells = <0>; + pm8998_temp: qcom,temp-alarm@2400 { + compatible = "qcom,spmi-temp-alarm"; + reg = <0x2400 0x100>; + interrupts = <0x0 0x24 0x0 IRQ_TYPE_EDGE_RISING>; + #thermal-sensor-cells = <0>; + }; + pm8998_gpio: gpios@c000 { compatible = "qcom,pm8998-gpio", "qcom,spmi-gpio"; reg = <0xc000>; -- 2.18.0.399.gad0ab374a1-goog
Re: [PATCH v3] add param that allows bootline control of hardened usercopy
On Sat, Jun 30, 2018 at 1:43 PM, Christoph von Recklinghausen wrote: > The last issue I'm chasing is build failures on ARCH=m68k. The error is > atomic_read and friends needed by the jump label code not being found. > The config has CONFIG_BROKEN_ON_SMP=y, so the jump label calls I added > will only be made #ifndef CONFIG_BROKEN_ON_SMP. Do you think that's > worth a mention in the blurb that's added to > Documentation/admin-guide/kernel-parameters.txt? Uhm, that's weird -- I think the configs on m68k need fixing then? I don't want to have to sprinkle that ifdef in generic code. How are other users of static keys and jump labels dealing with m68k weirdness? -Kees -- Kees Cook Pixel Security
Re: [PATCH 0/2] tools/memory-model: remove ACCESS_ONCE()
On Thu, Jun 28, 2018 at 06:51:11PM +0200, Andrea Parri wrote: > > 1bc179880fba docs: atomic_ops: Describe atomic_set as a write operation > > > > The above patches need at least one additional Acked-by > > or Reviewed-by. If any of you gets a chance, please do > > look them over. > > Glad this came out. ;-) > > No objection to the patch: feel free to add my Reviewed-by: tag. Done, thank you! > (BTW, atomic_set() would be better mapped to WRITE_ONCE()... in fact, to > be fair, some archs do it the __asm__ __volatile__() way). > > I do however have some suggestions concerning "the process": searching > LKML for the patch and the related discussion, I could only find: > > [PATCH] docs: atomic_ops: atomic_set is a write (not read) operation > > and I realize that none of the person Cc:-ed in this thread, except you, > were Cc:-ed in that discussion (in compliance with get_maintainer.pl). > > My suggestions: > > 1) Merge the file touched by that patch into (the recently created): > > Documentation/atomic_t.txt > > (FWIW, queued in my TODO list). Some consolidation of documentation would be good. ;-) Thoughts from others? > 2) Add the entry: > > F: Documentation/atomic_t.txt > > to the "ATOMIC INFRASTRUCTURE" subsystem in the MAINTAINERS file so > that developers can easily find (the intended?) reviewers for their > patch. (Of course, this will need ACK from the ATOMIC people). If the merging will take awhile, it might also be good to put Documentation/core-api/atomic_ops.rst somewhere as well. Thanx, Paul
Re: [PATCH v2] mm: teach dump_page() to correctly output poisoned struct pages
On Mon 02-07-18 14:05:36, Pavel Tatashin wrote: [...] > void __dump_page(struct page *page, const char *reason) > { > + bool page_poisoned = PagePoisoned(page); > + int mapcount; > + > + /* > + * If struct page is poisoned don't access Page*() functions as that > + * leads to recursive loop. Page*() check for poisoned pages, and calls > + * dump_page() when detected. > + */ > + if (page_poisoned) { > + pr_emerg("page:%px is uninitialized and poisoned", page); > + goto hex_only; > + } Thanks for the updated comment. Exactly what I was looking for! -- Michal Hocko SUSE Labs
Re: [v7, 02/10] counter: Documentation: Add Generic Counter sysfs documentation
On 06/21/2018 04:07 PM, William Breathitt Gray wrote: This patch adds standard documentation for the userspace sysfs attributes of the Generic Counter interface. Reviewed-by: Jonathan Cameron Signed-off-by: William Breathitt Gray --- Documentation/ABI/testing/sysfs-bus-counter | 230 MAINTAINERS | 1 + 2 files changed, 231 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-counter diff --git a/Documentation/ABI/testing/sysfs-bus-counter b/Documentation/ABI/testing/sysfs-bus-counter new file mode 100644 index ..0bb0dce67bf8 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-counter @@ -0,0 +1,230 @@ +What: /sys/bus/counter/devices/counterX/countY/count +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Count data of Count Y represented as a string. + +What: /sys/bus/counter/devices/counterX/countY/ceiling +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Count value ceiling for Count Y. This is the upper limit for the + respective counter. + +What: /sys/bus/counter/devices/counterX/countY/floor +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Count value floor for Count Y. This is the lower limit for the + respective counter. + +What: /sys/bus/counter/devices/counterX/countY/count_mode +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Count mode for channel Y. The ceiling and floor values for + Count Y are used by the count mode where required. The following + count modes are available: + + Normal: + Counting is continuous in either direction. + + Range Limit: + An upper or lower limit is set, mimicking limit switches + in the mechanical counterpart. The upper limit is set to + the Count Y ceiling value, while the lower limit is set + to the Count Y floor value. The counter freezes at + count = ceiling when counting up, and at count = floor + when counting down. At either of these limits, the + counting is resumed only when the count direction is + reversed. + + Non-Recycle: + The counter is disabled whenever a counter overflow or + underflow takes place. The counter is re-enabled when a + new count value is loaded to the counter via a preset + operation or direct write. + + Modulo-N: + A count value boundary is set between the Count Y floor + value and the Count Y ceiling value. The counter is + reset to the Cunt Y floor value at count = ceiling when + counting up, while the counter is set to the Count Y + ceiling value at count = floor when counting down; the + counter does not freeze at the boundary points, but + counts continuously throughout. + This might be a bit misleading since the values returned are all lower case, e.g. "rising edge", whereas they are listed here in Title Case. +What: /sys/bus/counter/devices/counterX/countY/count_mode_available +What: /sys/bus/counter/devices/counterX/countY/error_noise_available +What: /sys/bus/counter/devices/counterX/countY/function_available +What: /sys/bus/counter/devices/counterX/countY/signalZ_action_available +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Discrete set of available values for the respective Count Y + configuration are listed in this file. Values are delineated by + newline characters. + +What: /sys/bus/counter/devices/counterX/countY/direction +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Read-only attribute that indicates the count direction of Count + Y. Two count directions are available: forward and backward. + + Some counter devices are able to determine the direction of + their counting. For example, quadrature encoding counters can + determine the direction of movement by evaluating the leading + phase of the respective A and B quadrature encoding signals. + This attribute exposes such count directions. + +What: /sys/bus/counter/devices/counterX/countY/enable +KernelVersion: 4.19 +Contact: linux-...@vger.kernel.org +Description: + Whether channel Y counter is enabled. Valid attribute values are +
Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier
On Mon, Jul 2, 2018 at 3:23 AM Benjamin Herrenschmidt wrote: > > Let me try one last ditch attempt to convince you using maybe a > different perspective: this is how sysfs is intended to work and how > the device model already does everywhere else except the gluedirs. So don't get me wrong. I don't think that patch is _wrong_. It may well be the best thing we can do now. I just think some of the arguments for the patch are bogus. I still think that the auto-cleanup is actually a good thing in general. Not because it simplifies things (which it can do), but because it fundamentally *allows* you to use less locking. And that ends up being my real reason to not like your patch all that much. It depends on (a) an extra reference count (b) on fairly heavy-handed locking (ie the whole "lock on release too, not just on allocation"). and I think both of those are non-optimal. So: > The actual lifetime of the struct device is handled *separately* by > device_get/put which are just wrappers on kobject_get/put. I agree that that is the end result of your patch (and perhaps the buggy intent of the original code). I just don't necessarily agree it's a great thing in general. It basically uses sysfs visibility as an argument for why lifetimes should differ from the refcounted lifetimes. And that may be a practical argument, but I don't think it's a very good argument in general. I think it would arguably be much better to be less serialized, and depend on refcounting more, and make less of a deal about the sysfs visibility. For example, if we *really* add back the exact same device immediately after removing it, and the device was still in use somehow (ie the refcount never went down to zero), maybe we really should not have reused the same name? Or if we do re-use the same name, maybe we should have re-used the device node itself? See what I'm saying? I understand where your patch is coming from, but I am suspicious of the fact that it seems to want to re-use a name (but not the object) that is by definition still in use. Maybe that's the right thing in this case. Considering that we have a real oops and a real problem, and I don't have an alternative patch, I guess the "patches talk, bullshit walks" rule applies. Linus
Re: hrtimer become inaccurate with RT patch
On Mon, Jul 2, 2018 at 2:34 AM, gengdongjiu wrote: > Hi Thomas/Anna/John, > > Recently I found that the hrtimer become inaccurate when there is a RT > process runs on the same cpu core, and the kernel has applied preempt_rt > patch. > The Linux kernel version is v4.1.46, and the preempt_rt patch is > patch-4.1.46-rt52.patch. > I know that in the preempt_rt environment the interrupt handlers no > longer run in interrupt context but in process context, so that RT > process will not be interrupt. But if the hrtimer is also runs in > process context the timer is useless when it's inaccurate. so I want to > consult you whether this is expected behavior? whether is reasonable to move > the timer IRQ > handling to a thread? I've not looked at the PREEMPT_RT code in a long time, but years ago there was a tension in that there is not an easy way track ownership of timers. Thus timers all fired at the same priority of the hrtimer irq thread. This thread could be moved up or down in priority, but the problem was all timers would fire with the same priority. So either the thread priority was so high that low-priority process could generate a bunch of timers which would interrupt higher priority tasks, or the thread priority was lower, so a high priority task could block all timers. There was some handwavy talk of trying to keep per-process timer lists, so the hrtimer irq could still be in irq context but the firing logic it didn't do anything but mark its task as runnable and do the the actual timer firing logic before we eventually run the task (in proper rt priority order), in a fashion similar to signals. But I'm not sure if any attempts were made in that direction. I also think it was an open question if there's any logic in kernel that depend on strict in-order kernel timer processing, so its possible there could be odd inversion issues where high priority timer logic is waiting on /expecting lower priority timers to fire, etc, so its probably an area of research. thanks -john
Re: [PATCH v3 0/8] gnss: add new GNSS subsystem
On Fri 2018-06-29 14:09:14, Johan Hovold wrote: > On Fri, Jun 29, 2018 at 02:05:54PM +0200, Pavel Machek wrote: > > On Fri 2018-06-29 13:46:46, Johan Hovold wrote: > > > On Fri, Jun 29, 2018 at 11:46:07AM +0200, Pavel Machek wrote: > > > > > > > > > > Finally, note that documentation (including kerneldoc) remains to be > > > > > > written, but hopefully this will not hinder review given that the > > > > > > current interfaces are fairly self-describing. > > > > > > > > > > This all looks great. Thanks for doing this work and adding a new > > > > > subsystem for something that has been asked for for many years. > > > > > > > > > > All now merged in my tree, nice job! > > > > > > > > I don't think discussion was finished on this one. > > > > > > > > In particular, we agreed that /dev/gnssrawX would be better device > > > > name, so that we still have place where to put proper abstraction > > > > layer in future. > > > > > > I did not agree with you on that. I said we could consider that name if > > > this was to be changed at all, which I do not think is necessary for > > > the reasons spelled out in this thread. > > > > So, again: there's nothing gnss specific in those patches. It does not > > know about the format of the data passed around. (Best you can claim > > that somehow data flow characteristics are unique to gnss.) And this > > takes namespace needed for real gnss subsystem. Please don't do it. > > This is the real gnss subsystem. Get over it. Congratulations. You have created gnss subsystem that has 0 lines of code that are gnss-specific. This is not real gnss subsystem. This is pipe that passes data, similar to /dev/psaux or mouse on /dev/ttyS0. Sooner or later, real gnss subsystem (with unified interface) will be needed, as it was for input, and this "pipe and gpio" thing should not hog required namespace. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: hrtimer become inaccurate with RT patch
On 2018-07-02 19:19:07 [+0800], gengdongjiu wrote: > Hi Sebastian , Hi gengdongjiu, > > the 4.1 series is no longer supported (neither RT wise nor non-RT, > > https://www.kernel.org/category/releases.html). I suggest to move away. > > If you notice this problem now it is hardly a long running project. > yes, I Know, but we found the latest RT 4.14 series also has the same problem, > so this is common issue. This does not change what I wrote regarding the v4.1 series. Also you could have mention v4.14 instead v4.1 if you really tested on v4.14. > >> process will not be interrupt. But if the hrtimer is also runs in > >> process context the timer is useless when it's inaccurate. so I want to > >> consult you whether this is expected behavior? whether is reasonable to > >> move the timer IRQ > >> handling to a thread? > > > > This depends on your expectations. The timer is defined not to fire > > before the programmed time. So it fires as soon as possible _after_ the > > programmed time. > It is reasonable that the timer is defined not to fire before the programmed > time. > but we found it fires long _after_ the programmed time. For example, we > define it to > fire after 2s, but it will fire after 5s, so it is very later than the > expectations. under normal circumstances I would expect to have a few µs delay due to wakeup of the softirq thread. Not seconds. This is either broken HW or a long running RT thread which blocks the expected execution. > I think the reason may be > that the timer handler thread is preempted by another higher priority thread. > so from for this issue, > the timer handler should be in IRQ context instead of the process context or > increase the timer handler thread priority, right? speculating on what is going on and acting based on speculation is one way to handle situation. You could also enable tracing to see - when does the timer fire - when does the thread wake up - when does is timer's function start / complete and then you know what *really* causes the delay. The hrtimer and sched tracepoints should provide enough information. Based on that you can figure out if it is wise the toggle the irqsave flag or change something else so that the system does not run ~3sec RT secs without a break. Sebastian
[PATCH] fixup! firmware: raspberrypi: Remove VLA usage
Kees - with this fix to your patch, the kernel boots again (otherwise, the FW would try to parse the uninitialized bits of stack and throw errors). If you're good with me squashing this in, I'll do so and send it to -next. Signed-off-by: Eric Anholt --- drivers/firmware/raspberrypi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c index b80f15214b73..a200a2174611 100644 --- a/drivers/firmware/raspberrypi.c +++ b/drivers/firmware/raspberrypi.c @@ -162,7 +162,7 @@ int rpi_firmware_property(struct rpi_firmware *fw, memcpy(data + sizeof(struct rpi_firmware_property_tag_header), tag_data, buf_size); - ret = rpi_firmware_property_list(fw, , sizeof(data)); + ret = rpi_firmware_property_list(fw, , buf_size + sizeof(*header)); memcpy(tag_data, data + sizeof(struct rpi_firmware_property_tag_header), buf_size); -- 2.18.0
Re: [PATCH v2] mm: teach dump_page() to correctly output poisoned struct pages
On Mon, 2 Jul 2018 14:05:36 -0400 Pavel Tatashin wrote: > If struct page is poisoned, and uninitialized access is detected via > PF_POISONED_CHECK(page) dump_page() is called to output the page. But, > the dump_page() itself accesses struct page to determine how to print > it, and therefore gets into a recursive loop. > > For example: > dump_page() > __dump_page() > PageSlab(page) >PF_POISONED_CHECK(page) > VM_BUG_ON_PGFLAGS(PagePoisoned(page), page) > dump_page() recursion loop. > > Fixes: f165b378bbdf ("mm: uninitialized struct page poisoning sanity > checking") > > Signed-off-by: Pavel Tatashin > Acked-by: Michal Hocko Thanks. I added a cc:stable to make sure this gets into 4.17.x.
Re: [PATCH v4] ata: Remove depends on HAS_DMA in case of platform dependency
Hi Tejun, On Mon, Jul 2, 2018 at 6:12 PM Tejun Heo wrote: > On Fri, Jun 22, 2018 at 01:03:07PM +0200, Geert Uytterhoeven wrote: > > Remove dependencies on HAS_DMA where a Kconfig symbol depends on another > > symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST". > > In most cases this other symbol is an architecture or platform specific > > symbol, or PCI. > > > > Generic symbols and drivers without platform dependencies keep their > > dependencies on HAS_DMA, to prevent compiling subsystems or drivers that > > cannot work anyway. > > > > This simplifies the dependencies, and allows to improve compile-testing. > > > > Signed-off-by: Geert Uytterhoeven > > Reviewed-by: Mark Brown > > Acked-by: Robin Murphy > > This looks fine to me but how do you wanna route it? Should I apply > it to for-4.18-fixes or should it go with arch changes in some other > tree? Please queue it in the ata tree. There are no related arch changes. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v3] x86/e820: put !E820_TYPE_RAM regions into memblock.reserved
> So I expect this patch needs a cc:stable, which I'll add. > > The optimiation patch seems less important and I'd like to hold that > off for 4.19-rc1? Hi Andrew, Should I resend the optimization patch [1] once 4.18 is released, or will you include it, and I do not need to do anything? [1] http://lkml.kernel.org/r/20180615155733.1175-1-pasha.tatas...@oracle.com Thank you, Pavel
Re: [RFC PATCH] ata: ahci: Enable DEVSLP by default on x86 modern standby platform
On Mon, Jul 02, 2018 at 12:08:45PM -0700, Srinivas Pandruvada wrote: > From: Srinivas > > One of the requirement for modern x86 system to enter lowest power mode > (SLP_S0) is SATA IP block to be off. This is true even during when > platform is suspended to idle and not only in opportunistic (runtime) > suspend. > > Several of these system don't have traditional ACPI S3, so it is > important that they enter SLP_S0 state, to avoid draining battery even > during suspend even with out of the box Linux installation. > > SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here > user has to either use scsi-host sysfs or tools like powertop to set > the sata-host link_power_management_policy to min_power. > > This change sets by default link power management policy to min_power > for some platforms. To avoid regressions, the following conditions > are used: > - The kernel config is already set to use med_power_with_dipm or deeper > - System is a modern standby system using ACPI low power idle flag > - The platform is not blacklisted for Suspend to Idle and suspend > to idle is used instead of S3 > This combination will make sure that systems are fairly recent and > since getting shipped with modern standby standby, the DEVSLP function > is already validated. > > Signed-off-by: Srinivas Pandruvada Seems sane to me. Hans, what do you think? Thanks. -- tejun
[PATCH v2 2/7] tracing: Split up onmatch action data
From: Tom Zanussi Currently, the onmatch action data binds the onmatch action to data related to synthetic event generation. Since we want to allow the onmatch handler to potentially invoke a different action, and because we expect other handlers to generate synthetic events, we need to separate the data related to these two functions. Also rename the onmatch data to something more descriptive. Signed-off-by: Tom Zanussi --- kernel/trace/trace_events_hist.c | 59 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index ede7a27fa52b..61a3a8ca2f76 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -347,13 +347,14 @@ struct action_data { unsigned intn_params; char*params[SYNTH_FIELDS_MAX]; + unsigned intvar_ref_idx; + struct synth_event *synth_event; + union { struct { - unsigned intvar_ref_idx; - char*match_event; - char*match_event_system; - struct synth_event *synth_event; - } onmatch; + char*event; + char*event_system; + } match_data; struct { char*var_str; @@ -1001,9 +1002,9 @@ static void action_trace(struct hist_trigger_data *hist_data, struct ring_buffer_event *rbe, struct action_data *data, u64 *var_ref_vals) { - struct synth_event *event = data->onmatch.synth_event; + struct synth_event *event = data->synth_event; - trace_synth(event, var_ref_vals, data->onmatch.var_ref_idx); + trace_synth(event, var_ref_vals, data->var_ref_idx); } struct hist_var_data { @@ -1573,8 +1574,8 @@ find_match_var(struct hist_trigger_data *hist_data, char *var_name) struct action_data *data = hist_data->actions[i]; if (data->handler == HANDLER_ONMATCH) { - char *system = data->onmatch.match_event_system; - char *event_name = data->onmatch.match_event; + char *system = data->match_data.event_system; + char *event_name = data->match_data.event; file = find_var_file(tr, system, event_name, var_name); if (!file) @@ -3521,15 +3522,15 @@ static void onmatch_destroy(struct action_data *data) mutex_lock(_event_mutex); - kfree(data->onmatch.match_event); - kfree(data->onmatch.match_event_system); + kfree(data->match_data.event); + kfree(data->match_data.event_system); kfree(data->action_name); for (i = 0; i < data->n_params; i++) kfree(data->params[i]); - if (data->onmatch.synth_event) - data->onmatch.synth_event->ref--; + if (data->synth_event) + data->synth_event->ref--; kfree(data); @@ -3611,8 +3612,8 @@ trace_action_find_var(struct hist_trigger_data *hist_data, hist_field = find_target_event_var(hist_data, system, event, var); if (!hist_field) { if (!system && data->handler == HANDLER_ONMATCH) { - system = data->onmatch.match_event_system; - event = data->onmatch.match_event; + system = data->match_data.event_system; + event = data->match_data.event; } hist_field = find_event_var(hist_data, system, event, var); @@ -3651,8 +3652,8 @@ trace_action_create_field_var(struct hist_trigger_data *hist_data, * event. */ if (!system && data->handler == HANDLER_ONMATCH) { - system = data->onmatch.match_event_system; - event = data->onmatch.match_event; + system = data->match_data.event_system; + event = data->match_data.event; } /* @@ -3764,8 +3765,8 @@ static int trace_action_create(struct hist_trigger_data *hist_data, goto err; } - data->onmatch.synth_event = event; - data->onmatch.var_ref_idx = var_ref_idx; + data->synth_event = event; + data->var_ref_idx = var_ref_idx; out: return ret; err: @@ -3855,14 +3856,14 @@ static struct action_data *onmatch_parse(struct trace_array *tr, char *str) goto free; } - data->onmatch.match_event = kstrdup(match_event, GFP_KERNEL); - if (!data->onmatch.match_event) { + data->match_data.event = kstrdup(match_event, GFP_KERNEL);
Re: [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields
On 07/02/2018 02:53 AM, Jan Kara wrote: > On Sun 01-07-18 17:56:53, john.hubb...@gmail.com wrote: >> From: John Hubbard >> > ... > >> @@ -904,12 +907,24 @@ static inline void get_page(struct page *page) >> */ >> VM_BUG_ON_PAGE(page_ref_count(page) <= 0, page); >> page_ref_inc(page); >> + >> +if (unlikely(PageDmaPinned(page))) >> +__get_page_for_pinned_dma(page); >> } >> >> static inline void put_page(struct page *page) >> { >> page = compound_head(page); >> >> +/* Because the page->dma_pinned_* fields are unioned with >> + * page->lru, there is no way to do classical refcount-style >> + * decrement-and-test-for-zero. Instead, PageDmaPinned(page) must >> + * be checked, in order to safely check if we are allowed to decrement >> + * page->dma_pinned_count at all. >> + */ >> +if (unlikely(PageDmaPinned(page))) >> +__put_page_for_pinned_dma(page); >> + > > These two are just wrong. You cannot make any page reference for > PageDmaPinned() account against a pin count. First, it is just conceptually > wrong as these references need not be long term pins, second, you can > easily race like: > > PinnerRandom process > get_page(page) > pin_page_for_dma() > put_page(page) >-> oops, page gets unpinned too early > I'll drop this approach, without mentioning any of the locking that is hiding in there, since that was probably breaking other rules anyway. :) Thanks for your patience in reviewing this. > So you really have to create counterpart to get_user_pages() - like > put_user_page() or whatever... It is inconvenient to have to modify all GUP > users but I don't see a way around that. OK, there will be a long-ish pause, while I go visit all the gup sites. I count about 88 callers, which is not nearly as crazy as my first casual grep showed, but still quite a chunk, since I have to track down where each one does its put_page call(s). It's definitely worth the effort, though. These pins just plain need some special handling in order to get everything correct. thanks, -- John Hubbard NVIDIA
Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for large mapping
On Mon, 2 Jul 2018 16:05:02 +0200 Michal Hocko wrote: > On Fri 29-06-18 20:15:47, Andrew Morton wrote: > [...] > > Would one of your earlier designs have addressed all usecases? I > > expect the dumb unmap-a-little-bit-at-a-time approach would have? > > It has been already pointed out that this will not work. I said "one of". There were others. > You simply > cannot drop the mmap_sem during unmap because another thread could > change the address space under your feet. So you need some form of > VM_DEAD and handle concurrent and conflicting address space operations. Unclear that this is a problem. If a thread does an unmap of a range of virtual address space, there's no guarantee that upon return some other thread has not already mapped new stuff into that address range. So what's changed?
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers > wrote: > > - On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers > mathieu.desnoy...@efficios.com wrote: > >> - On Jul 2, 2018, at 7:06 PM, Linus Torvalds >> torva...@linux-foundation.org >> wrote: >> >>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers >>> wrote: Unfortunately, that rseq->rseq_cs field needs to be updated by user-space with single-copy atomicity. Therefore, we want 32-bit user-space to initialize the padding with 0, and only update the low bits with single-copy atomicity. >>> >>> Well... It's actually still single-copy atomicity as a 64-bit value. >>> >>> Why? Because it doesn't matter how you write the upper bits. You'll be >>> writing the same value to them (zero) anyway. >>> >>> So who cares if the write ends up being two instructions, because the >>> write to the upper bits doesn't actually *do* anything. >>> >>> Hmm? >> >> Are there any kind of guarantees that a __u64 update on a 32-bit architecture >> won't be torn into something daft like byte-per-byte stores when performed >> from C code ? >> >> I don't worry whether the upper bits get updated or how, but I really care >> about not having store tearing of the low bits update. > > For the records, most updates of those low bits are done in assembly > from critical sections, for which we control exactly how the update is > performed. > > However, there is one helper function in user-space that updates that value > from C through a volatile store, e.g.: > > static inline void rseq_prepare_unload(void) > { >__rseq_abi.rseq_cs = 0; > } How about making the field be: union { __u64 rseq_cs; struct { __u32 rseq_cs_low; __u32 rseq_cs_high; }; }; 32-bit user code that cares about performance can just write to rseq_cs_low because it already knows that rseq_cs_high == 0. The header could even supply a static inline helper write_rseq_cs() that atomically writes a pointer and just does the right thing for 64-bit, for 32-bit BE, and for 32-bit LE. I think the union really is needed because we can’t rely on user code being built with -fno-strict-aliasing. Or the helper could use inline asm. Anyway, the point is that we get optimal code generation (a single instruction write of the correct number of bits) without any compat magic in the kernel. > > Thanks, > > Mathieu > >> >> Thanks, >> >> Mathieu >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
On Mon, 2 Jul 2018, Mathieu Desnoyers wrote: > > > > Platforms with 32 bit word size only guarantee atomicity of a 32 bit > > write or RMV instruction. > > > > Special instructions may exist on a platform to perform 64 bit atomic > > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee > > atomicity8. > > > > So use the macros that we have to guarantee 64 bit ops and you should be > > fine. See linux/arch/x86/include/asm/atomic64_32.h > > We are talking about user-space here. What we need is a single instruction > atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion > is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64 > on a 32-bit architecture should be considered to provide single-copy atomicity > on the low 32 bits. Right. You would need to make this work for userspace. atomic64_32.h is a good reference as to which instructions provide 64 bit atomicity on 32 bit platforms.
Re: Build regressions/improvements in v4.18-rc3
Helge Deller writes: > On 02.07.2018 16:09, Geert Uytterhoeven wrote: >> On Mon, Jul 2, 2018 at 4:01 PM Geert Uytterhoeven >> wrote: >>> JFYI, when comparing v4.18-rc3[1] to v4.18-rc2[3], the summaries are: >> ... > > Both of the following are simply happening because of old compiler which is > being used: > >>> [Deleted 26903 lines about "warning: ... [-Wpointer-sign]" on >>> parisc-allmodconfig] It's GCC 4.6.3. Are you saying that's not supported anymore? I can update to 8.1.0 if that's more useful. cheers
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
- On Jul 2, 2018, at 8:35 PM, Chris Lameter c...@linux.com wrote: > On Mon, 2 Jul 2018, Mathieu Desnoyers wrote: > >> > >> > Platforms with 32 bit word size only guarantee atomicity of a 32 bit >> > write or RMV instruction. >> > >> > Special instructions may exist on a platform to perform 64 bit atomic >> > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee >> > atomicity8. >> > >> > So use the macros that we have to guarantee 64 bit ops and you should be >> > fine. See linux/arch/x86/include/asm/atomic64_32.h >> >> We are talking about user-space here. What we need is a single instruction >> atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion >> is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64 >> on a 32-bit architecture should be considered to provide single-copy >> atomicity >> on the low 32 bits. > > Right. You would need to make this work for userspace. atomic64_32.h is a > good reference as to which instructions provide 64 bit atomicity on 32 > bit platforms. We only need to update a pointer, so we don't need 64-bit atomicity on 32-bit processes. What we need is to ensure single-copy atomicity of the 32-bit pointer update on the 32-bit process in a field read from the kernel as a __u64. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
[PATCH 2/2] x86/build/vdso: simplify cmd_vdso2c
No reason to use 'define' directive here. Just use the = operator. Signed-off-by: Masahiro Yamada --- arch/x86/entry/vdso/Makefile | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 261802b..b9ed1aa 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -58,9 +58,7 @@ HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(src hostprogs-y+= vdso2c quiet_cmd_vdso2c = VDSO2C $@ -define cmd_vdso2c - $(obj)/vdso2c $< $(<:%.dbg=%) $@ -endef + cmd_vdso2c = $(obj)/vdso2c $< $(<:%.dbg=%) $@ $(obj)/vdso-image-%.c: $(obj)/vdso%.so.dbg $(obj)/vdso%.so $(obj)/vdso2c FORCE $(call if_changed,vdso2c) -- 2.7.4
[PATCH 0/2] x86/build/vdso: a little more Makefile cleanups
At first, I sent the first patch to UML ML, but they did not pick it up. Instead, I was able to get Acked-by from Richard, one of the UML maintainers. https://patchwork.kernel.org/patch/10399787/ I am resending it to x86 ML. Masahiro Yamada (2): x86/build/vdso: remove unused vdso-syms.lds x86/build/vdso: simplify cmd_vdso2c arch/x86/entry/vdso/Makefile | 4 +--- arch/x86/um/vdso/.gitignore | 1 - arch/x86/um/vdso/Makefile| 16 3 files changed, 1 insertion(+), 20 deletions(-) -- 2.7.4
[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map
From: Sai Praneeth efi_memmap_install(), unmaps the existing memory map and installs a new memory map but doesn't free the memory allocated to the existing memory map. Fortunately, the details about the existing memory map (like the physical address, number of entries and type of memory) are stored in efi.memmap. Hence, use them to free the memory. In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to install a new memory map, instead we use efi_memmap_init_late(). Hence, free existing memory map there too before installing a new memory map. Generally, memory for new memory map is allocated using efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using realloc_pages() [please see efi_map_regions()]. So, it's OK to free this memory using efi_memmap_free() in efi_free_boot_services(). Also, note that the first time efi_free_memmap() is called either from efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot sequence], we are actually freeing memblock_reserved memory which isn't allocated by efi_memmap_alloc(). So, there are two outliers where we use efi_free_memmap() to free memory allocated through other sources rather than efi_memmap_alloc(). Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/efi.c | 3 +++ arch/x86/platform/efi/quirks.c | 6 ++ drivers/firmware/efi/fake_mem.c | 3 +++ 3 files changed, 12 insertions(+) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index cda54abf25a6..7756426e93b5 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void) * firmware via SetVirtualAddressMap(). */ efi_memmap_unmap(); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) { pr_err("Failed to remap late EFI memory map\n"); diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 11fa6ac9f0c2..11800f3cbb93 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memmap_insert(, new, ); early_memunmap(new, new_size); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); efi_memmap_install(new_phys, num_entries, alloc_type); } @@ -452,6 +455,9 @@ void __init efi_free_boot_services(void) memunmap(new); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); if (efi_memmap_install(new_phys, num_entries, alloc_type)) { pr_err("Could not install new EFI memmap\n"); return; diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 82dcfa1c340b..a47754efb796 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -90,6 +90,9 @@ void __init efi_fake_memmap(void) /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type); /* print new EFI memmap */ -- 2.7.4
[PATCH 1/2] x86/build/vdso: remove unused vdso-syms.lds
This file contains symbol values, and was originally linked into vmlinux, but I have no idea what it was actually used for. Since commit 827880ec260b ("x86/um: thin archives build fix"), it is not even linked. Now it is completely orphan, and no problem has been reported. It is a proof that this file was not needed in the first place. Signed-off-by: Masahiro Yamada Acked-by: Richard Weinberger Acked-by: Ingo Molnar --- arch/x86/um/vdso/.gitignore | 1 - arch/x86/um/vdso/Makefile | 16 2 files changed, 17 deletions(-) diff --git a/arch/x86/um/vdso/.gitignore b/arch/x86/um/vdso/.gitignore index 9cac6d0..f8b69d8 100644 --- a/arch/x86/um/vdso/.gitignore +++ b/arch/x86/um/vdso/.gitignore @@ -1,2 +1 @@ -vdso-syms.lds vdso.lds diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile index b2d6967..822ccdb 100644 --- a/arch/x86/um/vdso/Makefile +++ b/arch/x86/um/vdso/Makefile @@ -53,22 +53,6 @@ $(vobjs): KBUILD_CFLAGS += $(CFL) CFLAGS_REMOVE_vdso-note.o = -pg -fprofile-arcs -ftest-coverage CFLAGS_REMOVE_um_vdso.o = -pg -fprofile-arcs -ftest-coverage -targets += vdso-syms.lds -extra-$(VDSO64-y) += vdso-syms.lds - -# -# Match symbols in the DSO that look like VDSO*; produce a file of constants. -# -sed-vdsosym := -e 's/^00*/0/' \ - -e 's/^\([0-9a-fA-F]*\) . \(VDSO[a-zA-Z0-9_]*\)$$/\2 = 0x\1;/p' -quiet_cmd_vdsosym = VDSOSYM $@ -define cmd_vdsosym - $(NM) $< | LC_ALL=C sed -n $(sed-vdsosym) | LC_ALL=C sort > $@ -endef - -$(obj)/%-syms.lds: $(obj)/%.so.dbg FORCE - $(call if_changed,vdsosym) - # # The DSO images are built using a special linker script. # -- 2.7.4
Re: [PATCH v9 0/6] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64
On 7/2/2018 7:40 PM, Michal Hocko Wrote: > On Fri 29-06-18 10:29:17, Jia He wrote: >> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >> where possible") tried to optimize the loop in memmap_init_zone(). But >> there is still some room for improvement. > > It would be great to shortly describe those optimization from high level > POV. > >> >> Patch 1 introduce new config to make codes more generic >> Patch 2 remain the memblock_next_valid_pfn on arm and arm64 >> Patch 3 optimizes the memblock_next_valid_pfn() >> Patch 4~6 optimizes the early_pfn_valid() >> >> As for the performance improvement, after this set, I can see the time >> overhead of memmap_init() is reduced from 27956us to 13537us in my >> armv8a server(QDF2400 with 96G memory, pagesize 64k). > > So this is 13ms saving when booting 96G machine. Is this really worth > the additional code? Are there any other benefits? Sorry, Michal I missed one thing. This 13ms optimization is merely the result of my patch 3~6 Patch 1 is originated by Paul Burton in commit b92df1de5d289. In its description, === James said "I have tested this patch on a virtual model of a Samurai CPU with a sparse memory map. The kernel boot time drops from 109 to 62 seconds. " === -- Cheers, Jia > [...] >> arch/arm/Kconfig | 4 +++ >> arch/arm/mm/init.c| 1 + >> arch/arm64/Kconfig| 4 +++ >> arch/arm64/mm/init.c | 1 + >> include/linux/early_pfn.h | 79 >> +++ >> include/linux/memblock.h | 2 ++ >> include/linux/mmzone.h| 18 ++- >> mm/Kconfig| 3 ++ >> mm/memblock.c | 9 ++ >> mm/page_alloc.c | 5 ++- >> 10 files changed, 124 insertions(+), 2 deletions(-) >> create mode 100644 include/linux/early_pfn.h >
RE: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio struct
Hi, Linus Anson Huang Best Regards! > -Original Message- > From: Linus Walleij [mailto:linus.wall...@linaro.org] > Sent: Monday, July 2, 2018 7:42 PM > To: Anson Huang > Cc: Ulf Hansson ; Masahiro Yamada > ; Adrian Hunter > ; evgr...@chromium.org; Shawn Lin > ; Fabio Estevam ; > linux-mmc ; linux-kernel@vger.kernel.org; > dl-linux-imx > Subject: Re: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio > struct > > On Mon, Jul 2, 2018 at 3:32 AM Anson Huang > wrote: > > > commit bfd694d5e21c ("mmc: core: Add tunable delay before detecting > > card after card is inserted") adds > > "u32 cd_debounce_delay_ms" to the last of mmc_gpio struct and cause > > "char cd_label[0]" NOT work as string pointer of card detect label, > > when "cat /proc/interrupts", the devname for card detect gpio is > > incorrect as below: > > > > 144: 0 gpio-mxc 22 Edge ▒ > > 161: 0 gpio-mxc 7 Edge ▒ > > > > Move the cd_label field down to fix this, and drop the zero from the > > array size to prevent future similar bugs, the result is correct as > > below: > > > > 144: 0 gpio-mxc 22 Edge 2198000.mmc cd > > 161: 0 gpio-mxc 7 Edge 219.mmc cd > > > > Fixes: bfd694d5e21c ("mmc: core: Add tunable delay before detecting > > card after card is inserted") > > Signed-off-by: Anson Huang > > Tested-by: Fabio Estevam > > Curious. > > > --- a/drivers/mmc/core/slot-gpio.c > > +++ b/drivers/mmc/core/slot-gpio.c > > @@ -27,8 +27,8 @@ struct mmc_gpio { > > bool override_cd_active_level; > > irqreturn_t (*cd_gpio_isr)(int irq, void *dev_id); > > char *ro_label; > > - char cd_label[0]; > > u32 cd_debounce_delay_ms; > > + char cd_label[]; > > Not that I am the smartest in the world when it comes to how the C compilers > work this out. > > But isn't that equivalent to: > char *cd_label; > ? I think C compiler will check structure if it has flexible array, the flexible array much be at end of a structure, and its size can be dynamic changed by mallocing a memory for this structure, there are many such case in kernel. If the flexible array is NOT at the end of a struct, error " error: flexible array member not at end of struct" will come out by C compiler. > > Look at this from drivers/mmc/core/slot-gpio.c: > > ctx->ro_label = ctx->cd_label + len; > ctx->cd_debounce_delay_ms = 200; > snprintf(ctx->cd_label, len, "%s cd", dev_name(host->parent)); > snprintf(ctx->ro_label, len, "%s ro", dev_name(host->parent)); > host->slot.handler_priv = ctx; > host->slot.cd_irq = -EINVAL; > > So now that ro_label is char * that points len chars into the struct and that > is all > fine as the ctx is allocated like that: > > struct mmc_gpio *ctx = devm_kzalloc(host->parent, >sizeof(*ctx) + 2 * len, GFP_KERNEL); > > And I've seen this being done to allocate a state with some dynamic strings > after it. > > But I just don't like this, it seems fragile and now it broke. I think the difference of current implementation and using two string pointers are, current implementation can only alloc memory for mmc_gpio structure once, the ro_label's address can be got by cd_label's address. If using two string pointers for ro_label and cd_label, we have to alloc twice for them separately. > > What about this: > > From facb3799f479bfd4dad99c25c9c5d4c77b14c9b0 Mon Sep 17 00:00:00 > 2001 > From: Linus Walleij > Date: Mon, 2 Jul 2018 13:35:01 +0200 > Subject: [PATCH] mmc: slot-gpio: Allocate GPIO labels dynamically > > The use of string pointers in the MMC slot GPIO context is pretty dubious, > allocating some 2*len extra bytes for each label of the ro and wp pins. > > Tidy this up using kasprintf() with dynamic allocation of labels for these > strings. > > Signed-off-by: Linus Walleij > --- > drivers/mmc/core/slot-gpio.c | 15 ++- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c index > ef05e0039378..c417801b366e 100644 > --- a/drivers/mmc/core/slot-gpio.c > +++ b/drivers/mmc/core/slot-gpio.c > @@ -27,7 +27,7 @@ struct mmc_gpio { > bool override_cd_active_level; > irqreturn_t (*cd_gpio_isr)(int irq, void *dev_id); > char *ro_label; > -char cd_label[0]; > +char *cd_label; > u32 cd_debounce_delay_ms; > }; > > @@ -47,13 +47,18 @@ int mmc_gpio_alloc(struct mmc_host *host) { > size_t len = strlen(dev_name(host->parent)) + 4; len is no need any more. > struct mmc_gpio *ctx = devm_kzalloc(host->parent, > -sizeof(*ctx) + 2 * len,GFP_KERNEL); > +sizeof(*ctx), GFP_KERNEL); > > if (ctx) { > -ctx->ro_label = ctx->cd_label + len; > ctx->cd_debounce_delay_ms = 200; > -snprintf(ctx->cd_label, len, "%s cd", dev_name(host->parent)); > -snprintf(ctx->ro_label, len, "%s ro",
Re: [PATCH 0/2] serial: 8250_dw: add fractional divisor support
Hi, On Mon, 2 Jul 2018 14:51:03 +0300 Andy Shevchenko wrote: > On Mon, 2018-07-02 at 13:18 +0300, Andy Shevchenko wrote: > > On Mon, 2018-07-02 at 18:04 +0800, Jisheng Zhang wrote: > > > For Synopsys DesignWare 8250 uart which version >= 4.00a, there's a > > > valid divisor latch fraction register. The fractional divisor width > > > is > > > 4bits ~ 6bits. > > > > > > > There are several serial IPs that have fractional divider built-in. > > None > > is using any specific hooks. Why do you need in your case, esp. taking > > into consideration that we have a custom ->set_termios() callback? > > Okay, I see that in 8250 we have hooks embedded into 8250_port.c which > is not the best solution. > > For example it prevents better splitting Exar code. > So, we would need these hooks, but better to integrate them in the same > way like it's done for the rest of 8250 ones, i.e. > - rename existing to have a "do" word > - create new functions which would be a replacement that choose between > "do" variant and custom one > - not sure if we need to export "do" variants (at least for now) So you mean add the support as following: 1.rename current serial8250_set_divisor as serial8250_do_set_divisor 2.add a new serial8250_set_divisor which will be as simple as static void serial8250_set_divisor(struct uart_port *port, ...) { struct uart_8250_port *up = up_to_u8250p(port); if (up->set_divisor) up->set_divisor(...); else serial8250_do_set_divisor(...); } could you please confirm? Another issue is I'm not sure which struct to add the hook, struct uart_port or struct uart_8250_port? Currently, it seems that only uart_8250_port needs this hook, sure, adding the hook to struct uart_port is fine either. Could you please kindly give some suggestions? Thanks, Jisheng
Re: [PATCH] gcc-plugins: remove unused GCC_PLUGIN_SUBDIR
Hi Kees, 2018-07-03 11:18 GMT+09:00 Kees Cook : > On Mon, Jul 2, 2018 at 5:39 PM, Masahiro Yamada > wrote: >> GCC_PLUGIN_SUBDIR has never been used. If you really need this in >> the future, please re-add it then. >> >> For now, the code is unused. Remove. >> >> 'export HOSTLIBS' is not necessary either. >> >> Signed-off-by: Masahiro Yamada > > Acked-by: Kees Cook > > Is this going via your tree, or should I take it via the gcc-plugins tree? > > Thanks! > > -Kees Could you pick it up to your tree? Thanks. -- Best Regards Masahiro Yamada
Re: [PATCH V2] mmc: core: cd_label must be last entry of mmc_gpio struct
On Mon, Jul 2, 2018 at 11:13 PM, Anson Huang wrote: > I think either way is OK, since flexible array is used in kernel code quite > commonly, > so I prefer to make code change as small as possible, the original patch can > also prevent > similar bug in future. And like below commit Fabio pointed out, it also has > same kind > of fix: a158531f3c92 ("gpio: 74x164: Fix crash during .remove()"). Thanks. I am also fine with this patch or the one from Linus. Maybe Anson's patch could be applied to 4.18-rc as a bug fix and Linus' patch could be applied to 4.19-rc1 as a cleanup/improvement? Ulf, What would you prefer? Thanks
Re: [PATCH] gcc-plugins: remove unused GCC_PLUGIN_SUBDIR
On Mon, Jul 2, 2018 at 7:23 PM, Masahiro Yamada wrote: > Hi Kees, > > 2018-07-03 11:18 GMT+09:00 Kees Cook : >> On Mon, Jul 2, 2018 at 5:39 PM, Masahiro Yamada >> wrote: >>> GCC_PLUGIN_SUBDIR has never been used. If you really need this in >>> the future, please re-add it then. >>> >>> For now, the code is unused. Remove. >>> >>> 'export HOSTLIBS' is not necessary either. >>> >>> Signed-off-by: Masahiro Yamada >> >> Acked-by: Kees Cook >> >> Is this going via your tree, or should I take it via the gcc-plugins tree? >> >> Thanks! >> >> -Kees > > Could you pick it up to your tree? Happily; thanks! -Kees -- Kees Cook Pixel Security
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
- On Jul 2, 2018, at 10:18 PM, Linus Torvalds torva...@linux-foundation.org wrote: > On Mon, Jul 2, 2018 at 7:01 PM Mathieu Desnoyers > wrote: >> >> One thing to consider is how we will implement the load of that pointer >> on the kernel side. > > Use "get_user()". It works for 64-bit objects too, and it will be > atomic in the 32-bit sub-parts on a 32-bit architecture. Is it really ? Last time we had this discussion, not all architectures guaranteed that reading a 64-bit integer would happen in two atomic 32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64() macro as it stands today (rather than using a union). > > Again: there is no point in trying to be atomic in the full 64 bits > (when you're running on 32-bit). The upper bits don't have to "match" > the lower bits. They just have to be zero. So doing it as two loads is > fine - the same way it's perfectly fine to do it as two stores (since > the store to the upper bits will always be zero). I'd be fine with two atomic loads, but I'd rather have a strong confirmation about this, because last time around there were architectures where it was not true as far as I recall. Thanks, Mathieu > > Linus -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
On Mon, Jul 2, 2018 at 7:30 PM Mathieu Desnoyers wrote: > > > Is it really ? Last time we had this discussion, not all architectures > guaranteed that reading a 64-bit integer would happen in two atomic > 32-bit sub-parts. All architectures that matter do. Please don't overdesign this, or try to make a problem out of something that isn't a problem. Sure, maybe some toy architecture does a 8-byte "get_user()" as a "copy_from_user()" one byte at a time, because that's the best way to do unaligned accesses. But nobody will ever care about rseq on such a thing anyway. Let it go. Linus
[lkp-robot] ee410f15b1 BUG: kernel hang in boot stage
Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit ee410f15b1418f2f4428e79980674c979081bcb7 Author: Thierry Escande AuthorDate: Thu Jun 14 15:28:15 2018 -0700 Commit: Linus Torvalds CommitDate: Fri Jun 15 07:55:25 2018 +0900 lib/test_printf.c: call wait_for_random_bytes() before plain %p tests If the test_printf module is loaded before the crng is initialized, the plain 'p' tests will fail because the printed address will not be hashed and the buffer will contain '(ptrval)' instead. This patch adds a call to wait_for_random_bytes() before plain 'p' tests to make sure the crng is initialized. Link: http://lkml.kernel.org/r/20180604113708.11554-1-thierry.esca...@linaro.org Signed-off-by: Thierry Escande Acked-by: Tobin C. Harding Reviewed-by: Andrew Morton Cc: David Miller Cc: Rasmus Villemoes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds 608dbdfb1f hexagon: drop the unused variable zero_page_mask ee410f15b1 lib/test_printf.c: call wait_for_random_bytes() before plain %p tests 883c9ab9eb Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux e3c7283c19 Add linux-next specific files for 20180629 +---++++---+ | | 608dbdfb1f | ee410f15b1 | 883c9ab9eb | next-20180629 | +---++++---+ | boot_successes| 35 | 0 | 19 | 13 | | boot_failures | 0 | 15 || | | BUG:kernel_hang_in_boot_stage | 0 | 15 || | +---++++---+ [9.488584] - [9.491008] Testing concurrent rhashtable access from 10 threads [ 21.577749] test 3125 add/delete pairs into rhlist [ 21.734553] test 3125 random rhlist add/delete operations [ 21.813107] Started 10 threads, 0 failed, rhltable test returns 0 BUG: kernel hang in boot stage # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD git bisect start 7daf201d7fe8334e2d2364d4e8ed3394ec9af819 v4.17 -- git bisect good a16afaf7928b74c30a4727cdcaa67bd10675a55d # 08:00 G 11 00 0 Merge tag 'for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply git bisect good dc594c39f7a9dcdfd5dbb1a446ac6d06182e2472 # 08:13 G 11 00 0 Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client git bisect bad 81e97f01371f4e1701feeafe484665112cd9ddc2 # 08:33 B 0 1 15 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid git bisect bad 35773c93817c5f2df264d013978e7551056a063a # 08:55 B 0 1 15 0 Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs git bisect bad 8949170cf48e91da7e4e69a59e2842d81d9a5885 # 09:26 B 0 1 15 0 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect bad becfc5e97cbab00b25a592aabc36838ec7217d1f # 09:49 B 0 10 24 0 Merge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm git bisect good 7a932516f55cdf430c7cce78df2010ff7db6b874 # 10:21 G 11 00 0 Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground git bisect bad b5d903c2d656e9bc54bc76554a477d796a63120d # 10:44 B 0 1 15 0 Merge branch 'akpm' (patches from Andrew) git bisect good 3fb3894b84c2e0f83cb1e4f4e960243742e6b3a6 # 11:06 G 10 00 0 kernel/relay.c: change return type to vm_fault_t git bisect good 14f28f5776927be30717986f86b765d49eec392c # 11:20 G 10 00 0 ipc: use new return type vm_fault_t git bisect good fe6bdfc8e1e131720abbe77a2eb990c94c9024cb # 11:44 G 10 00 0 mm: fix oom_kill event handling git bisect good 608dbdfb1f0299f4500e56d62b0d84c44dcfa3be # 11:56 G 11 00 0 hexagon: drop the unused variable zero_page_mask git bisect bad ee410f15b1418f2f4428e79980674c979081bcb7 # 12:16 B 0 1 15 0 lib/test_printf.c: call wait_for_random_bytes() before plain %p tests # first bad commit: [ee410f15b1418f2f4428e79980674c979081bcb7] lib/test_printf.c: call wait_for_random_bytes() before plain %p tests git bisect good 608dbdfb1f0299f4500e56d62b0d84c44dcfa3be # 12:42 G 30 00 0 hexagon: drop the unused variable zero_page_mask # extra tests with debug options git bisect bad ee410f15b1418f2f4428e79980674c979081bcb7 # 13:00 B 0 11 25 0 lib/test_printf.c: call
Re: [PATCH 0/2] serial: 8250_dw: add fractional divisor support
On Tue, 3 Jul 2018 10:22:57 +0800 Jisheng Zhang wrote: > Hi, > > On Mon, 2 Jul 2018 14:51:03 +0300 Andy Shevchenko wrote: > > > On Mon, 2018-07-02 at 13:18 +0300, Andy Shevchenko wrote: > > > On Mon, 2018-07-02 at 18:04 +0800, Jisheng Zhang wrote: > > > > For Synopsys DesignWare 8250 uart which version >= 4.00a, there's a > > > > valid divisor latch fraction register. The fractional divisor width > > > > is > > > > 4bits ~ 6bits. > > > > > > > > > > There are several serial IPs that have fractional divider built-in. > > > None > > > is using any specific hooks. Why do you need in your case, esp. taking > > > into consideration that we have a custom ->set_termios() callback? > > > > Okay, I see that in 8250 we have hooks embedded into 8250_port.c which > > is not the best solution. > > > > For example it prevents better splitting Exar code. > > So, we would need these hooks, but better to integrate them in the same > > way like it's done for the rest of 8250 ones, i.e. > > - rename existing to have a "do" word > > - create new functions which would be a replacement that choose between > > "do" variant and custom one > > - not sure if we need to export "do" variants (at least for now) > > So you mean add the support as following: > > 1.rename current serial8250_set_divisor as serial8250_do_set_divisor > > 2.add a new serial8250_set_divisor which will be as simple as > > static void serial8250_set_divisor(struct uart_port *port, ...) > { > struct uart_8250_port *up = up_to_u8250p(port); > > if (up->set_divisor) > up->set_divisor(...); > else > serial8250_do_set_divisor(...); > } > > could you please confirm? > > Another issue is I'm not sure which struct to add the hook, > struct uart_port or struct uart_8250_port? Currently, it seems that only > uart_8250_port needs this hook, sure, adding the hook to struct uart_port > is fine either. Could you please kindly give some suggestions? patching struct uart_port seems a bit overhead. After reading the code again, I propose another solution, similar as what dl_write() is used in 8250 core: 1.introduce the hook to struct uart_8250_port as my previous patches do, 2.rename current serial8250_set_divisor() as default_serial_get_divisor() then introduce a new serial8250_set_divisor() as: static inline void serial8250_set_divisor(struct uart_8250_port *up,) { up->set_divisor(); } and point up->set-divisor to default_serial_get_divisor what do you think about this solution? Thanks > > Thanks, > Jisheng
Re: [PATCH V2 02/19] csky: defconfig
On Sun, Jul 1, 2018 at 11:34 AM Guo Ren wrote: > Needs a commit msg. Perhaps some overview of what's in each config. > Signed-off-by: Guo Ren > --- > arch/csky/configs/gx66xx_defconfig | 549 > + > arch/csky/configs/qemu_ck807_defconfig | 541 > 2 files changed, 1090 insertions(+) > create mode 100644 arch/csky/configs/gx66xx_defconfig > create mode 100644 arch/csky/configs/qemu_ck807_defconfig Are these configs mutually exclusive? We try to have one kernel build serve many platforms. So you'd probably want to divide things between the 2 ABIs. It looks like you still have lots of options enabled that I wouldn't expect you to need. Start with something more minimal for what you need to boot and support upstream. For a full config, you can use allmodconfig at least to build test. Rob
Re: Linux 3.18.111
Hello, On 2018년 05월 30일 16:32, Greg KH wrote: > I'm announcing the release of the 3.18.111 kernel. > > All users of the 3.18 kernel series must upgrade. > > The updated 3.18.y git tree can be found at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > linux-3.18.y > and can be browsed at the normal kernel.org git web browser: > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary > > thanks, > > greg k-h > > > do d_instantiate/unlock_new_inode combinations safely Recent my test in 3.18.113 kernel with security smack showed following crash during mkdir on ext4 fs. Unable to handle kernel paging request at virtual address ff98 pgd = ffc012411000 [ff98] *pgd=, *pud= [ cut here ] Kernel BUG at ffc0007d9430 [verbose debug info unavailable] Internal error: Oops - BUG: 9605 [#1] PREEMPT SMP CPU: 0 MPIDR: 8000 PID: 1237 Comm: mkdir Not tainted 3.18.113-00083-g1bfc02f-dirty #29-Tizen task: ffc02cbc2340 ti: ffc02b7fc000 task.ti: ffc02b7fc000 PC is at down_read+0x24/0x54 LR is at down_read+0x24/0x54 [...] Call trace: [] down_read+0x24/0x54 [] ext4_xattr_get+0x74/0x1f4 [] ext4_xattr_security_get+0x28/0x38 [] generic_getxattr+0x4c/0x60 [] smk_fetch.isra.6+0x8c/0xe0 [] smack_d_instantiate+0x194/0x324 [] security_d_instantiate+0x24/0x30 [] d_instantiate_new+0x34/0x94 [] ext4_mkdir+0x284/0x354 [] vfs_mkdir+0xc0/0x150 [] SyS_mkdirat+0x88/0xb8 [] SyS_mkdir+0x18/0x20 Code: aa0003f3 b00017c0 912e1000 97e38943 (c85f7e60) ---[ end trace b1ad797d63dae9c5 ]--- It is because d_instantiate_new() added from above commit calls security_d_instantiate() before calling __d_instantiate() and dentry->d_inode is not yet set and null. In 3.18.113 kernel, inode->i_op_getxattr() of ext4 is still generic_getxattr() and it only has dentry parameter without inode, so it tries to access dentry->d_inode. I did not test with selinux, but selinux also calls inode->i_op_getxattr() from selinux_d_instantiate(), so maybe there is also same issue. Best Regards, - Seung-Woo Kim -- Seung-Woo Kim Samsung Research --
Re: [PATCH V2 01/19] csky: Build infrastructure
On Sun, Jul 1, 2018 at 11:36 AM Guo Ren wrote: > > Signed-off-by: Guo Ren > --- [...] > +config CSKY_BUILTIN_DTB > + bool "Use kernel builtin dtb" > + > +config CSKY_BUILTIN_DTB_NAME > + string "kernel builtin dtb name" > + depends on CSKY_BUILTIN_DTB > +endmenu These options generally exist for backwards compatibility with legacy bootloaders that don't support DT which shouldn't apply here given this is a new arch. If we need this for other reasons, it should not be an architecture specific option. Rob
Re: [PATCH] tick: prefer a lower rating device only if it's CPU local device
Hi Sudeep, On Wed, May 9, 2018 at 9:02 AM Sudeep Holla wrote: > > Checking the equality of cpumask for both new and old tick device doesn't > ensure that it's CPU local device. This will cause issue if a low rating > clockevent tick device is registered first followed by the registration > of higher rating clockevent tick device. > > In such case, clockevents_released list will never get emptied as both > the devices get selected as preferred one and we will loop forever in > clockevents_notify_released. > > Cc: Frederic Weisbecker > Cc: Thomas Gleixner > Signed-off-by: Sudeep Holla I've got a arm32 board (meson8b-odroidc1) that's been failing in kernelCI.org since the merge window (boot log[1]), and I finally got around to bisecting it[2]. Unfortunately, the bisect pointed at a merge commit, but with some trial and error (and a suggestion by Arnd) I was able to test that revering $SUBJECT commit[3], my problem goes away. Another interesting data point is that disabling SMP (either by "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go away, without needing to revert this patch. AFAICT, this platform, is using a single timer as a clocksource ("amlogic,meson6-timer") which is not a per-CPU timer. I ran out of time to keep digging on this issue, and I'm still not sure exactly what's going on, but I wanted to report it in case anyone else has any ideas, and so we can hopefully get it fixed during the -rc cycle. Kevin [1] https://storage.kernelci.org/mainline/master/v4.18-rc2-357-gd3bc0e67f852/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-meson8b-odroidc1.html [2] http://termbin.com/mk07 [3] in mainline as: 1332a9055801 tick: Prefer a lower rating device only if it's CPU local device > --- > kernel/time/tick-common.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > Hi Thomas, > > I am seeing this issue on my Juno devboard, where system wide timers > with rating 300 and 400 are registered in same order and we get stuck in > a loop in clockevents_notify_released. Let me know if this looks sane or > you have any suggestions that I can try out. > > Regards, > Sudeep > > diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c > index 49edc1c4f3e6..78e598334007 100644 > --- a/kernel/time/tick-common.c > +++ b/kernel/time/tick-common.c > @@ -277,7 +277,8 @@ static bool tick_check_preferred(struct > clock_event_device *curdev, > */ > return !curdev || > newdev->rating > curdev->rating || > - !cpumask_equal(curdev->cpumask, newdev->cpumask); > + (!cpumask_equal(curdev->cpumask, newdev->cpumask) && > + !tick_check_percpu(curdev, newdev, smp_processor_id())); > } > > /* > -- > 2.7.4 >
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
On Mon, 2 Jul 2018, Mathieu Desnoyers wrote: > Are there any kind of guarantees that a __u64 update on a 32-bit architecture > won't be torn into something daft like byte-per-byte stores when performed > from C code ? > > I don't worry whether the upper bits get updated or how, but I really care > about not having store tearing of the low bits update. Platforms with 32 bit word size only guarantee atomicity of a 32 bit write or RMV instruction. Special instructions may exist on a platform to perform 64 bit atomic updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee atomicity8. So use the macros that we have to guarantee 64 bit ops and you should be fine. See linux/arch/x86/include/asm/atomic64_32.h
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
- On Jul 2, 2018, at 8:19 PM, Chris Lameter c...@linux.com wrote: > On Mon, 2 Jul 2018, Mathieu Desnoyers wrote: > >> Are there any kind of guarantees that a __u64 update on a 32-bit architecture >> won't be torn into something daft like byte-per-byte stores when performed >> from C code ? >> >> I don't worry whether the upper bits get updated or how, but I really care >> about not having store tearing of the low bits update. > > Platforms with 32 bit word size only guarantee atomicity of a 32 bit > write or RMV instruction. > > Special instructions may exist on a platform to perform 64 bit atomic > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee > atomicity8. > > So use the macros that we have to guarantee 64 bit ops and you should be > fine. See linux/arch/x86/include/asm/atomic64_32.h We are talking about user-space here. What we need is a single instruction atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64 on a 32-bit architecture should be considered to provide single-copy atomicity on the low 32 bits. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
RE: [PATCH] bindings: add clocks optional binding for imx gpio
Hi, Linus Anson Huang Best Regards! > -Original Message- > From: Linus Walleij [mailto:linus.wall...@linaro.org] > Sent: Monday, July 2, 2018 9:46 PM > To: Anson Huang ; Fabio Estevam > > Cc: Rob Herring ; Mark Rutland > ; open list:GPIO SUBSYSTEM > ; open list:OPEN FIRMWARE AND FLATTENED > DEVICE TREE BINDINGS ; > linux-kernel@vger.kernel.org; dl-linux-imx > Subject: Re: [PATCH] bindings: add clocks optional binding for imx gpio > > On Fri, Jun 29, 2018 at 5:34 AM Anson Huang > wrote: > > > Some i.MX SoCs have GPIO clock gate in CCM, accessing GPIO registers > > needs to enable GPIO clock gate first, i.MX GPIO driver will enable > > clock gate if there is clock property in GPIO node of dtb, add > > optional property to i.MX GPIO binding doc. > > > > Signed-off-by: Anson Huang > > Make sense since the gpio-mxc driver already supports this :) > > > +Optional properties: > > +- clocks: the clocks used by gpio bank > > Should the text be "the clock for clocking the GPIO silicon" > I guess that is what it is. And singularis? Yes, it is singularis, I will improve the text. > > Does it hurt to give the clock a name? Like the common "pclk" for peripheral > clock or something similar that other i.MX silicon uses? It is just because GPIO only needs one clock, and the driver does NOT get the clock using clock name, so the GPIO node in dtb also has no clock name specified, if we add a clock name here, dtb also need to be updated? And I saw other i.MX modules which have only one clock, they also have no clock name specified, like I2C on i.MX6QDL, I will send a V2 patch with text improved, thanks. Anson. > > Fabio: can we have your ACK on this too. > > Yours, > Linus Walleij
Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs
- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski l...@amacapital.net wrote: >> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers >> >> wrote: >> >> - On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers >> mathieu.desnoy...@efficios.com wrote: >> >>> - On Jul 2, 2018, at 7:06 PM, Linus Torvalds >>> torva...@linux-foundation.org >>> wrote: >>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers wrote: > > Unfortunately, that rseq->rseq_cs field needs to be updated by user-space > with single-copy atomicity. Therefore, we want 32-bit user-space to > initialize > the padding with 0, and only update the low bits with single-copy > atomicity. Well... It's actually still single-copy atomicity as a 64-bit value. Why? Because it doesn't matter how you write the upper bits. You'll be writing the same value to them (zero) anyway. So who cares if the write ends up being two instructions, because the write to the upper bits doesn't actually *do* anything. Hmm? >>> >>> Are there any kind of guarantees that a __u64 update on a 32-bit >>> architecture >>> won't be torn into something daft like byte-per-byte stores when performed >>> from C code ? >>> >>> I don't worry whether the upper bits get updated or how, but I really care >>> about not having store tearing of the low bits update. >> >> For the records, most updates of those low bits are done in assembly >> from critical sections, for which we control exactly how the update is >> performed. >> >> However, there is one helper function in user-space that updates that value >> from C through a volatile store, e.g.: >> >> static inline void rseq_prepare_unload(void) >> { >>__rseq_abi.rseq_cs = 0; >> } > > How about making the field be: > > union { > __u64 rseq_cs; > struct { > __u32 rseq_cs_low; > __u32 rseq_cs_high; > }; > }; > > 32-bit user code that cares about performance can just write to rseq_cs_low > because it already knows that rseq_cs_high == 0. > > The header could even supply a static inline helper write_rseq_cs() that > atomically writes a pointer and just does the right thing for 64-bit, for > 32-bit BE, and for 32-bit LE. > > I think the union really is needed because we can’t rely on user code being > built with -fno-strict-aliasing. Or the helper could use inline asm. > > Anyway, the point is that we get optimal code generation (a single instruction > write of the correct number of bits) without any compat magic in the kernel. That works for me! Any objection from anyone else for this approach ? Thanks, Mathieu > >> >> Thanks, >> >> Mathieu >> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH upstream] KASAN: slab-out-of-bounds Read in getname_kernel
On Mon, 2018-07-02 at 14:15 +0200, Dmitry Vyukov wrote: > On Mon, Jul 2, 2018 at 1:55 PM, tomas wrote: > > Yes, thanks. Please use my full name, Tomas Bortoli. > > > Please also include: > > Reported-by: syzbot+60c837b428dc84e83...@syzkaller.appspotmail.com Done. > > from the original bug report. This this help to keep automatic testing > process running. Umm ... should I include this email address on the actual cc when submitting the patch? > > Thanks > > > > > > On 07/02/2018 12:20 PM, Ian Kent wrote: > > > On Mon, 2018-07-02 at 10:31 +0200, tomas wrote: > > > > Hi Ian, > > > > > > > > you are welcome! > > > > > > > > yes your patch is much better. You should just put the "_IOC_NR" macro > > > > around "cmd" in the lines added to "validate_dev_ioctl" to make it work. > > > > > > LOL, yes, that was a dumb mistake. > > > > > > I'll send it to Andrew Morton, after some fairly simple sanity testing, > > > with both our Signed-off-by added. > > > > > > > Tomas > > > > > > > > > > > > On 07/02/2018 03:42 AM, Ian Kent wrote: > > > > > On Mon, 2018-07-02 at 09:10 +0800, Ian Kent wrote: > > > > > > On Mon, 2018-07-02 at 00:04 +0200, tomas wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've looked into this issue found by Syzbot and I made a patch: > > > > > > > > > > > > > > https://syzkaller.appspot.com/bug?id=d03abd8b42847f7f69b1d1d7f9720 > > > > > > > 8ae425 > > > > > > > b116 > > > > > > > 3 > > > > > > > > > > > > Umm ... oops! > > > > > > > > > > > > Thanks for looking into this Tomas. > > > > > > > > > > > > > The autofs subsystem does not check that the "path" parameter is > > > > > > > present > > > > > > > within the "param" struct passed by the userspace in case the > > > > > > > AUTOFS_DEV_IOCTL_OPENMOUNT_CMD command is passed. Indeed, it > > > > > > > assumes a > > > > > > > path is always provided (though a path is not always present, as > > > > > > > per how > > > > > > > the struct is defined: > > > > > > > https://github.com/torvalds/linux/blob/master/include/uapi/linux/a > > > > > > > uto_de > > > > > > > v-io > > > > > > > ct > > > > > > > l.h#L89). > > > > > > > Skipping the check provokes an oob read in "strlen", called by > > > > > > > "getname_kernel", in turn called by the autofs to assess the > > > > > > > length of > > > > > > > the non-existing path. > > > > > > > > > > > > > > To solve it, modify the "validate_dev_ioctl" function to check > > > > > > > also that > > > > > > > a path has been provided if the command is > > > > > > > AUTOFS_DEV_IOCTL_OPENMOUNT_CMD. > > > > > > > > > > > > > > > > > > > > > --- b/fs/autofs/dev-ioctl.c2018-07-01 23:10:16.059728621 > > > > > > > +0200around > > > > > > > +++ a/fs/autofs/dev-ioctl.c2018-07-01 23:10:24.311792133 +0200 > > > > > > > @@ -136,6 +136,9 @@ static int validate_dev_ioctl(int cmd, s > > > > > > > goto out; > > > > > > > } > > > > > > > } > > > > > > > +/* AUTOFS_DEV_IOCTL_OPENMOUNT_CMD without path */ > > > > > > > +else if(_IOC_NR(cmd) == AUTOFS_DEV_IOCTL_OPENMOUNT_CMD) > > > > > > > +return -EINVAL; > > > > > > > > > > > > My preference is to put the comment inside the else but ... > > > > > > > > > > > > There's another question, should the check be done in > > > > > > autofs_dev_ioctl_openmount() in the same way it's checked in other > > > > > > ioctls that need a path, such as in autofs_dev_ioctl_requester() > > > > > > and autofs_dev_ioctl_ismountpoint()? > > > > > > > > > > > > For consistency I'd say it should. > > > > > > > > > > > > > > > > > > > > err = 0;You should just put the "_IOC_NR" directive around > > > > > > > "cmd" in > > > > > > > the lines added to "validate_dev_ioctl" to make it work. > > > > > > > out: > > > > > > > > > > > > > > > > > > > > > Tested and solves the issue on Linus' main git tree. > > > > > > > > > > > > > > > > > > > > > > > > Or perhaps this (not even compile tested) patch would be better? > > > > > > > > > > autofs - fix slab out of bounds read in getname_kernel() > > > > > > > > > > From: Ian Kent > > > > > > > > > > The autofs subsystem does not check that the "path" parameter is > > > > > present for all cases where it is required when it is passed in > > > > > via the "param" struct. > > > > > > > > > > In particular it isn't checked for the AUTOFS_DEV_IOCTL_OPENMOUNT_CMD > > > > > ioctl command. > > > > > > > > > > To solve it, modify validate_dev_ioctl() function to check that a > > > > > path has been provided for ioctl commands that require it. > > > > > --- > > > > > fs/autofs/dev-ioctl.c | 15 +++ > > > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c > > > > > index ea4ca1445ab7..61c63715c3fb 100644 > > > > > --- a/fs/autofs/dev-ioctl.c > > > > > +++ b/fs/autofs/dev-ioctl.c > > > > > @@ -135,6 +135,11 @@ static int validate_dev_ioctl(int cmd, struct > > > > > autofs_dev_ioctl
Re: [PATCH 02/15] arm: dts: ls1021a: Add missing cooling device properties for CPUs
On Fri, May 25, 2018 at 04:01:48PM +0530, Viresh Kumar wrote: > The cooling device properties, like "#cooling-cells" and > "dynamic-power-coefficient", should either be present for all the CPUs > of a cluster or none. If these are present only for a subset of CPUs of > a cluster then things will start falling apart as soon as the CPUs are > brought online in a different order. For example, this will happen > because the operating system looks for such properties in the CPU node > it is trying to bring up, so that it can register a cooling device. > > Add such missing properties. > > Signed-off-by: Viresh Kumar Applied, thanks.
Re: [RFC PATCH 0/4] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory.
On Mon, Jul 02, 2018 at 08:18:55PM +0800, Baoquan He wrote: >Hi Chao, > >On 06/12/18 at 04:10pm, Chao Fan wrote: >> *** Issues need be discussed >> There are several issues I am not quite sure, please help review and >> give suggestions: >> >> 1) In PATCH 1, I copy the structures and functions from ACPI head file, >>so that ACPI head file will never been used here. I am not sure >>whether it's good to include ACPI head file or use the method in >>PATCH 1. If people think we can use ACPI head files directely, I >>will remove the PATCH 1. > >Usaully we try to reuse code even though they are located in different >life space. I think it applies to this SRAT handling. Copying ACPI codes Yes, you are right. But as you said, they are in different life space, so I don't have a better method to dig the SRAT table. If anyone have a good way or suggestion, please tell me. Thanks, Chao Fan >to kernel decompressing looks messy. Is there better way to reuse ACPI >code and read SRAT table, then get what we need in a simple way? > >Thanks >Baoquan >> >> ***Test results: >> - I did a very simple test, and it can get the memory information in >>bios and efi KVM guest machine, and put it by early printk. But no >>more tests, so it's with RFC tag. >> >> Any comments will be welcome. >> >> >> Chao Fan (4): >> x86/boot: Add acpitb.h to help parse acpi tables >> x86/boot: Add acpitb.c to parse acpi tables >> x86/boot/KASLR: Walk srat tables to filter immovable memory >> x86/boot/KASLR: Limit kaslr to choosing the immovable memory >> >> arch/x86/boot/compressed/Makefile | 1 + >> arch/x86/boot/compressed/acpitb.c | 245 ++ >> arch/x86/boot/compressed/acpitb.h | 175 + >> arch/x86/boot/compressed/kaslr.c | 120 +-- >> 4 files changed, 530 insertions(+), 11 deletions(-) >> create mode 100644 arch/x86/boot/compressed/acpitb.c >> create mode 100644 arch/x86/boot/compressed/acpitb.h >> >> -- >> 2.17.0 >> >> >> > >