Re: [PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER
On Wed, Oct 04, 2017 at 04:27:04PM -0700, Kees Cook wrote: > Drop the arguments from the macro and adjust all callers with the > following script: > > perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \ > $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h) > > Signed-off-by: Kees Cook> Acked-by: Geert Uytterhoeven # for m68k parts > --- > arch/arm/mach-ixp4xx/dsmg600-setup.c | 2 +- > arch/arm/mach-ixp4xx/nas100d-setup.c | 2 +- > arch/m68k/amiga/amisound.c| 2 +- > arch/m68k/mac/macboing.c | 2 +- > arch/mips/mti-malta/malta-display.c | 2 +- > arch/parisc/kernel/pdc_cons.c | 2 +- > arch/s390/mm/cmm.c| 2 +- > drivers/atm/idt77105.c| 4 ++-- > drivers/atm/iphase.c | 2 +- > drivers/block/ataflop.c | 8 > drivers/char/dtlk.c | 2 +- > drivers/char/hangcheck-timer.c| 2 +- > drivers/char/nwbutton.c | 2 +- > drivers/char/rtc.c| 2 +- > drivers/input/touchscreen/s3c2410_ts.c| 2 +- > drivers/net/cris/eth_v10.c| 6 +++--- > drivers/net/hamradio/yam.c| 2 +- > drivers/net/wireless/atmel/at76c50x-usb.c | 2 +- > drivers/staging/speakup/main.c| 2 +- > drivers/staging/speakup/synth.c | 2 +- > drivers/tty/cyclades.c| 2 +- > drivers/tty/isicom.c | 2 +- > drivers/tty/moxa.c| 2 +- > drivers/tty/rocket.c | 2 +- > drivers/tty/vt/keyboard.c | 2 +- > drivers/tty/vt/vt.c | 2 +- > drivers/watchdog/alim7101_wdt.c | 2 +- > drivers/watchdog/machzwd.c| 2 +- > drivers/watchdog/mixcomwd.c | 2 +- > drivers/watchdog/sbc60xxwdt.c | 2 +- > drivers/watchdog/sc520_wdt.c | 2 +- > drivers/watchdog/via_wdt.c| 2 +- > drivers/watchdog/w83877f_wdt.c| 2 +- > drivers/xen/grant-table.c | 2 +- > fs/pstore/platform.c | 2 +- > include/linux/timer.h | 4 ++-- > kernel/irq/spurious.c | 2 +- > lib/random32.c| 2 +- > net/atm/mpc.c | 2 +- > net/decnet/dn_route.c | 2 +- > net/ipv6/ip6_flowlabel.c | 2 +- > net/netrom/nr_loopback.c | 2 +- > security/keys/gc.c| 2 +- > sound/oss/midibuf.c | 2 +- > sound/oss/soundcard.c | 2 +- > sound/oss/sys_timer.c | 2 +- > sound/oss/uart6850.c | 2 +- > 47 files changed, 54 insertions(+), 54 deletions(-) Acked-by: Greg Kroah-Hartman
[PATCH] powerpc/tm: P9 disabled suspend mode workaround
Each POWER9 core is made of two super slices. Each super slice can only have one thread at a time in TM suspend mode. The super slice restricts ever entering a state where both threads are in suspend by aborting transactions on tsuspend or exceptions into the kernel. Unfortunately for context switch we need trechkpt which forces suspend mode. If a thread is already in suspend and a second thread needs to be restored that was suspended, the trechkpt must be executed. Currently the trechkpt will hang in this case until the other thread exits suspend. This causes problems for Linux resulting in hang and RCU stall detectors going off. To workaround this, we disable suspend in the core. This is done via a firmware change which stops the hardware ever getting into suspend. The hardware will always rollback a transaction on any tsuspend or entry into the kernel. Unfortunately userspace can construct a sigcontext which enables suspend. Thus userspace can force Linux into a path where trechkpt is executed. This patch blocks this from happening on POWER9 but sanity checking sigcontexts passed in. ptrace doesn't have this problem as only MSR SE and BE can be changed via ptrace. This patch also adds a number of WARN_ON() in case we every enter suspend when we shouldn't. This should catch systems that don't have the firmware change and are running TM. A future firmware change will allow suspend mode on POWER9 but that is going to require additional Linux changes to support. In the interim, this allows TM to continue to (partially) work while stopping userspace from crashing Linux. Signed-off-by: Michael Neuling--- arch/powerpc/include/asm/cputable.h | 1 + arch/powerpc/kernel/cputable.c | 7 +++ arch/powerpc/kernel/process.c | 2 ++ arch/powerpc/kernel/signal_32.c | 4 arch/powerpc/kernel/signal_64.c | 5 + 5 files changed, 19 insertions(+) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index a9bf921f4e..477e00b6a7 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -123,6 +123,7 @@ extern struct cpu_spec *identify_cpu(unsigned long offset, unsigned int pvr); extern void identify_cpu_name(unsigned int pvr); extern void do_feature_fixups(unsigned long value, void *fixup_start, void *fixup_end); +extern bool tm_suspend_supported(void); extern const char *powerpc_base_platform; diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 7608729160..70252843ff 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2301,6 +2301,13 @@ void __init identify_cpu_name(unsigned int pvr) } } +bool tm_suspend_supported(void) +{ + if (pvr_version_is(PVR_POWER9)) + return false; + return true; +} + #ifdef CONFIG_JUMP_LABEL_FEATURE_CHECKS struct static_key_true cpu_feature_keys[NUM_CPU_FTR_KEYS] = { diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bbf34..5b81673c50 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -903,6 +903,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk) if (!MSR_TM_ACTIVE(thr->regs->msr)) goto out_and_saveregs; + WARN_ON(!tm_suspend_supported()); + TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, " "ccr=%lx, msr=%lx, trap=%lx)\n", tsk->pid, thr->regs->nip, diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 92fb1c8dbb..9eac0131c0 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -519,6 +519,8 @@ static int save_tm_user_regs(struct pt_regs *regs, { unsigned long msr = regs->msr; + WARN_ON(!tm_suspend_supported()); + /* Remove TM bits from thread's MSR. The MSR in the sigcontext * just indicates to userland that we were doing a transaction, but we * don't want to return in transactional state. This also ensures @@ -769,6 +771,8 @@ static long restore_tm_user_regs(struct pt_regs *regs, int i; #endif + if (!tm_suspend_supported()) + return 1; /* * restore general registers but not including MSR or SOFTE. Also * take care of keeping r2 (TLS) intact if not a signal. diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index c83c115858..6d28caf849 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -214,6 +214,8 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc, BUG_ON(!MSR_TM_ACTIVE(regs->msr)); + WARN_ON(!tm_suspend_supported()); + /* Remove TM bits from thread's MSR. The MSR in the sigcontext * just indicates to userland that we were doing a transaction, but we * don't want to return in transactional
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Wed, 4 Oct 2017 14:37:53 -0700 "Paul E. McKenney"wrote: > From: Mathieu Desnoyers > > Provide a new command allowing processes to register their intent to use > the private expedited command. > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > only issue the barrier when scheduling into a task belonging to a > process that has registered to use expedited private. > > Processes are now required to register before using > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > > Changes since v1: > - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in > powerpc membarrier_arch_sched_in(), given that we want to specifically > check the next thread state. > - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. > - Use task_thread_info() to pass thread_info from task to > *_ti_thread_flag(). > > Changes since v2: > - Move membarrier_arch_sched_in() call to finish_task_switch(). > - Check for NULL t->mm in membarrier_arch_fork(). > - Use membarrier_sched_in() in generic code, which invokes the > arch-specific membarrier_arch_sched_in(). This fixes allnoconfig > build on PowerPC. > - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing > allnoconfig build on PowerPC. > - Build and runtime tested on PowerPC. > > Changes since v3: > - Simply rely on copy_mm() to copy the membarrier_private_expedited mm > field on fork. > - powerpc: test thread flag instead of reading > membarrier_private_expedited in membarrier_arch_fork(). > - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming > from kernel thread, since mmdrop() implies a full barrier. > - Set membarrier_private_expedited to 1 only after arch registration > code, thus eliminating a race where concurrent commands could succeed > when they should fail if issued concurrently with process > registration. > - Use READ_ONCE() for membarrier_private_expedited field access in > membarrier_private_expedited. Matches WRITE_ONCE() performed in > process registration. > > Changes since v4: > - Move powerpc hook from sched_in() to switch_mm(), based on feedback > from Nicholas Piggin. For now, the powerpc approach is okay by me. I plan to test others (e.g., taking runqueue locks) on larger systems, but that can be sent as an incremental patch at a later time. The main thing I would like is for people to review the userspace API. > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index 3a19c253bdb1..4af1b719c65f 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned > int flags) > current->flags = (current->flags & ~PF_MEMALLOC) | flags; > } > > +#ifdef CONFIG_MEMBARRIER > + > +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS > +#include > +#else > +static inline void membarrier_arch_switch_mm(struct mm_struct *prev, > + struct mm_struct *next, struct task_struct *tsk) > +{ > +} This is no longer required in architecture independent code, is it?
Re: [v2,1/2] powerpc/xive: fix IPI reset
On Wed, 2017-10-04 at 09:15:04 UTC, =?utf-8?q?C=C3=A9dric_Le_Goater?= wrote: > When resetting an IPI, hw_ipi should also be set to zero. > > Signed-off-by: Cédric Le GoaterSeries applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/74f1282114acc7d67e25745efe200f cheers
Re: [v3] powerpc: fix compile error on 64K pages on 40x, 44x
On Sun, 2017-10-01 at 14:33:03 UTC, Christian Lamparter wrote: > The mmu context on the 40x, 44x does not define pte_frag > entry. This causes gcc abort the compilation due to: > > setup-common.c: In function âsetup_archâ: > setup-common.c:908: error: âmm_context_tâ has no âpte_fragâ > > This patch fixes the issue by removing the pte_frag > initialization in setup-common.c. > > This is possible, because the compiler will do the > initialization, since the mm_context is a sub struct of > init_mm. init_mm is declared in mm_types.h as external linkage. > according to C99 6.2.4.3: > "An object whose identifier is declared with external linkage > [...] has static storage duration." > > C99 defines in 6.7.8.10 that: " > If an object that has static storage duration is not > initialized explicitly, then: > - if it has pointer type, it is initialized to a null pointer > [...] > " > > Signed-off-by: Christian Lamparter> Reviewed-by: Christophe Leroy Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/070e004912fed099263408bf2ff1bb cheers
Re: powerpc: Fix action argument for cpufeatures-based TLB flush
On Wed, 2017-09-27 at 04:55:51 UTC, Jeremy Kerr wrote: > Commit 41d0c2ecde introduced calls to __flush_tlb_power[89] from the > cpufeatures code, specifying the number of sets to flush. > > However, these functions take an action argument, not a number of sets. > This means we hit the BUG() in __flush_tlb_{206,300} when using > cpufeatures-style configuration. > > This change passes TLB_INVAL_SCOPE_GLOBAL instead. > > Signed-off-by: Jeremy Kerr> CC: Nicholas Piggin > Reviewed-by: Nicholas Piggin Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/3b7af5c0fd9631762d1c4d7b4cee76 cheers
Re: powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled
On Sun, 2017-09-24 at 17:30:43 UTC, Guenter Roeck wrote: > flush_tlb_kernel_range() may call smp_call_function_many() which expects > interrupts to be enabled. This results in a traceback. > > WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-9-g0666f56 #1 > task: cf83 task.stack: cf82e000 > NIP: c00a93c8 LR: c00a9634 CTR: 0001 > REGS: cf82fde0 TRAP: 0700 Not tainted (4.14.0-rc1-9-g0666f56) > MSR: 00021000CR: 2482 XER: > > GPR00: c00a9634 cf82fe90 cf83 c050ad3c c0015a54 0001 0001 > GPR08: 0001 cf82e000 2484 c0003150 > GPR16: 0001 c051 > GPR24: c0015a54 c050ad3c c051823c c050ad3c 0025 > NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc > LR [c00a9634] smp_call_function+0x3c/0x50 > Call Trace: > [cf82fe90] [0010] 0x10 (unreliable) > [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50 > [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38 > [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c > [cf82ff20] [c001484c] free_initmem+0x20/0x4c > [cf82ff30] [c000316c] kernel_init+0x1c/0x108 > [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64 > Instruction dump: > 7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f89 40beffac > 3d20c051 8929ac64 2f89 40beff9c <0fe0> 4b94 7fc3f378 7f64db78 > > Fixes: 3184cc4b6f6a ("powerpc/mm: Fix kernel RAM protection after freeing > ...") > Fixes: e611939fc8ec ("powerpc/mm: Ensure change_page_attr() doesn't ...") > Cc: Christophe Leroy > Signed-off-by: Guenter Roeck > Reviewed-by: Christophe Leroy Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/7c6a4f3b1641195119ddbb531200f4 cheers
Re: [v2] powerpc: configs: Add Skiroot defconfig
On Wed, 2017-10-04 at 04:23:24 UTC, Joel Stanley wrote: > This configuration is used by the OpenPower firmware for it's > Linux-as-bootloader implementation. Also known as the Petitboot kernel, > this configuration broke in 4.12 (CPU_HOTPLUG=n), so add it to the > upstream tree in order to get better coverage. > > Signed-off-by: Joel StanleyApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/c3dda4b0db9cf5044010e4de88e5e5 cheers
Re: [1/3] powerpc/lib/sstep: Add XER bits introduced in POWER ISA v3.0
On Fri, 2017-09-29 at 05:44:08 UTC, Sandipan Das wrote: > This adds definitions for the OV32 and CA32 bits of XER that > were introduced in POWER ISA v3.0. There are some existing > instructions that currently set the OV and CA bits based on > certain conditions. > > The emulation behaviour of all these instructions needs to > be updated to set these new bits accordingly. > > Signed-off-by: Sandipan Das> Acked-by: Naveen N. Rao Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/924c8feb041c3ef22d46ac2e746816 cheers
Re: [v2, 1/6] powerpc/watchdog: do not panic from locked CPU's IPI handler
On Fri, 2017-09-29 at 03:29:37 UTC, Nicholas Piggin wrote: > The SMP watchdog will detect locked CPUs and IPI them to print a > backtrace and registers. If panic on hard lockup is enabled, do > not panic from this handler, because that can cause recursion into > the IPI layer during the panic. > > The caller already panics in this case. > > Signed-off-by: Nicholas PigginSeries applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/842dc1dbabb5e874550b52d896851e cheers
Re: powerpc/powernv: Use early_radix_enabled in POWER9 tlb flush
On Wed, 2017-09-27 at 05:45:58 UTC, Nicholas Piggin wrote: > This code is used at boot and machine checks, so it should be using > early_radix_enabled() (which is usable any time). > > Signed-off-by: Nicholas PigginApplied to powerpc next, thanks. https://git.kernel.org/powerpc/c/969a86a2855d484a00205a424df1c6 cheers
Re: [1/3] powerpc: oprofile: use setup_timer() helper.
On Fri, 2017-09-22 at 11:34:58 UTC, Allen Pais wrote: > Use setup_timer function instead of initializing timer with the >function and data fields. > > Signed-off-by: Allen PaisSeries applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/83ad1e6a1dc049dd04be4564125a7f cheers
Re: [v2] cxl: Set the valid bit in PE for dedicated mode
On Mon, 2017-09-04 at 08:48:25 UTC, Vaibhav Jain wrote: > Make sure to set the valid-bit in software-state field of the > populated PE. This was earlier missing for dedicated mode AFUs, hence > was causing a PSL freeze when the AFU was activated. > > Signed-off-by: Vaibhav Jain> Acked-by: Frederic Barrat > Acked-by: Andrew Donnellan Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/8512bffd6226fea259f94fd23fc3b6 cheers
Re: [v3,1/2] powerpc/mm: Export flush_all_mm()
On Sun, 2017-09-03 at 18:15:12 UTC, Frederic Barrat wrote: > With the optimizations introduced by commit a46cc7a90fd8 > ("powerpc/mm/radix: Improve TLB/PWC flushes"), flush_tlb_mm() no > longer flushes the page walk cache with radix. This patch introduces > flush_all_mm(), which flushes everything, tlb and pwc, for a given mm. > > Signed-off-by: Frederic Barrat> Reviewed-By: Alistair Popple Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/6110236b9bbd177debc045c5fc2922 cheers
[PATCH] powerpc: Default to enabling STRICT_KERNEL_RWX
When available, CONFIG_KERNEL_RWX should be default-enabled. Cc: Benjamin HerrenschmidtCc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 809c468edab1..9a549bbfc278 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -178,6 +178,7 @@ config PPC select HAVE_ARCH_TRACEHOOK select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !RELOCATABLE && !HIBERNATION) select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX + select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT select HAVE_CBPF_JITif !PPC64 select HAVE_CONTEXT_TRACKINGif PPC64 select HAVE_DEBUG_KMEMLEAK -- 2.7.4 -- Kees Cook Pixel Security
Re: [PATCH 0/2] powerpc/xive: fix CPU hot unplug
David Gibsonwrites: > On Tue, Oct 03, 2017 at 08:24:07AM +0200, Cédric Le Goater wrote: >> On 10/03/2017 05:36 AM, David Gibson wrote: >> > On Mon, Oct 02, 2017 at 06:27:20PM +0200, Cédric Le Goater wrote: >> >> On 09/23/2017 10:26 AM, Cédric Le Goater wrote: >> >>> Hi, >> >>> >> >>> Here are a couple of small fixes to support CPU hot unplug. There are >> >>> still some issues to be investigated as, in some occasions, after a >> >>> couple of plug and unplug, the cpu which was removed receives a 'lost' >> >>> interrupt. This showed to be the decrementer under QEMU. >> >> >> >> So this seems to be a QEMU issue only which can be solved by >> >> removing the DEE bit from the LPCR on P9 processor when the CPU >> >> is stopped in rtas. PECE3 bit on P8 processors. >> >> >> >> I think these patches are valuable fixes for 4.14. The first >> >> is trivial and the second touches the common xive part but it >> >> is only called on the pseries platform. >> >> >> >> Could you please take a look ? >> > >> > Sorry, I think I've missed something here. >> > >> > Is there a qemu bug involved in this? Has there been a patch sent >> > that I didn't spot? >> >> >> No, not yet, but I will today probably. something like below to stop >> the decrementer when a CPU is stopped: >> >> --- qemu.git.orig/hw/ppc/spapr_rtas.c >> +++ qemu.git/hw/ppc/spapr_rtas.c >> @@ -174,6 +174,15 @@ static void rtas_start_cpu(PowerPCCPU *c >> kvm_cpu_synchronize_state(cs); >> >> env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); >> + >> +/* Enable DECR interrupt */ >> +if (env->mmu_model == POWERPC_MMU_3_00) { >> +env->spr[SPR_LPCR] |= LPCR_DEE; >> +} else { >> +/* P7 and P8 both have same bit for DECR */ >> +env->spr[SPR_LPCR] |= LPCR_P8_PECE3; >> +} >> + >> env->nip = start; >> env->gpr[3] = r3; >> cs->halted = 0; >> @@ -210,6 +219,13 @@ static void rtas_stop_self(PowerPCCPU *c >>* no need to bother with specific bits, we just clear it. >>*/ >> env->msr = 0; >> + >> +if (env->mmu_model == POWERPC_MMU_3_00) { >> +env->spr[SPR_LPCR] &= ~LPCR_DEE; >> +} else { >> +/* P7 and P8 both have same bit for DECR */ >> +env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; >> +} >> } >> >> static inline int sysparm_st(target_ulong addr, target_ulong len, >> >> I haven't yet because I fail to understand why the decrementer is not >> interrupting the dying CPU under xics as it is the case under XIVE. > > Oh.. ok. This sounds very similar to the problem Nikunj hit under TCG > with decrementer interrupts waking up a supposedly dead CPU. He had a > couple of proposed fixes, but we got bogged down trying to work out > why (with TCG at least). Yeah, I wasnt able to get to the exact reason for that. Regards Nikunj
Re: [PATCH 05/13] timer: Remove init_timer_deferrable() in favor of timer_setup()
Hi, On Wed, Oct 04, 2017 at 04:26:59PM -0700, Kees Cook wrote: > This refactors the only users of init_timer_deferrable() to use > the new timer_setup() and from_timer(). Removes definition of > init_timer_deferrable(). [...] > drivers/hsi/clients/ssi_protocol.c | 32 > Acked-by: Sebastian Reichel-- Sebastian signature.asc Description: PGP signature
Re: [PATCH 04/13] timer: Remove init_timer_pinned() in favor of timer_setup()
From: Kees CookDate: Wed, 4 Oct 2017 16:26:58 -0700 > This refactors the only users of init_timer_pinned() to use > the new timer_setup() and from_timer(). Drops the definition of > init_timer_pinned(). > > Cc: Chris Metcalf > Cc: Thomas Gleixner > Cc: net...@vger.kernel.org > Signed-off-by: Kees Cook For networking: Acked-by: David S. Miller
Re: [PATCH 05/13] timer: Remove init_timer_deferrable() in favor of timer_setup()
From: Kees CookDate: Wed, 4 Oct 2017 16:26:59 -0700 > This refactors the only users of init_timer_deferrable() to use > the new timer_setup() and from_timer(). Removes definition of > init_timer_deferrable(). > > Cc: Benjamin Herrenschmidt > Cc: Michael Ellerman > Cc: Sebastian Reichel > Cc: Harish Patil > Cc: Manish Chopra > Cc: Kalle Valo > Cc: linuxppc-dev@lists.ozlabs.org > Cc: net...@vger.kernel.org > Cc: linux-wirel...@vger.kernel.org > Signed-off-by: Kees Cook For networking: Acked-by: David S. Miller
Re: [PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER
From: Kees CookDate: Wed, 4 Oct 2017 16:27:04 -0700 > Drop the arguments from the macro and adjust all callers with the > following script: > > perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \ > $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h) > > Signed-off-by: Kees Cook > Acked-by: Geert Uytterhoeven # for m68k parts For networking: Acked-by: David S. Miller
Re: [PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER
On 10/04/2017 04:27 PM, Kees Cook wrote: Drop the arguments from the macro and adjust all callers with the following script: perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \ $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h) Signed-off-by: Kees CookAcked-by: Geert Uytterhoeven # for m68k parts For watchdog: Acked-by: Guenter Roeck --- arch/arm/mach-ixp4xx/dsmg600-setup.c | 2 +- arch/arm/mach-ixp4xx/nas100d-setup.c | 2 +- arch/m68k/amiga/amisound.c| 2 +- arch/m68k/mac/macboing.c | 2 +- arch/mips/mti-malta/malta-display.c | 2 +- arch/parisc/kernel/pdc_cons.c | 2 +- arch/s390/mm/cmm.c| 2 +- drivers/atm/idt77105.c| 4 ++-- drivers/atm/iphase.c | 2 +- drivers/block/ataflop.c | 8 drivers/char/dtlk.c | 2 +- drivers/char/hangcheck-timer.c| 2 +- drivers/char/nwbutton.c | 2 +- drivers/char/rtc.c| 2 +- drivers/input/touchscreen/s3c2410_ts.c| 2 +- drivers/net/cris/eth_v10.c| 6 +++--- drivers/net/hamradio/yam.c| 2 +- drivers/net/wireless/atmel/at76c50x-usb.c | 2 +- drivers/staging/speakup/main.c| 2 +- drivers/staging/speakup/synth.c | 2 +- drivers/tty/cyclades.c| 2 +- drivers/tty/isicom.c | 2 +- drivers/tty/moxa.c| 2 +- drivers/tty/rocket.c | 2 +- drivers/tty/vt/keyboard.c | 2 +- drivers/tty/vt/vt.c | 2 +- drivers/watchdog/alim7101_wdt.c | 2 +- drivers/watchdog/machzwd.c| 2 +- drivers/watchdog/mixcomwd.c | 2 +- drivers/watchdog/sbc60xxwdt.c | 2 +- drivers/watchdog/sc520_wdt.c | 2 +- drivers/watchdog/via_wdt.c| 2 +- drivers/watchdog/w83877f_wdt.c| 2 +- drivers/xen/grant-table.c | 2 +- fs/pstore/platform.c | 2 +- include/linux/timer.h | 4 ++-- kernel/irq/spurious.c | 2 +- lib/random32.c| 2 +- net/atm/mpc.c | 2 +- net/decnet/dn_route.c | 2 +- net/ipv6/ip6_flowlabel.c | 2 +- net/netrom/nr_loopback.c | 2 +- security/keys/gc.c| 2 +- sound/oss/midibuf.c | 2 +- sound/oss/soundcard.c | 2 +- sound/oss/sys_timer.c | 2 +- sound/oss/uart6850.c | 2 +- 47 files changed, 54 insertions(+), 54 deletions(-) diff --git a/arch/arm/mach-ixp4xx/dsmg600-setup.c b/arch/arm/mach-ixp4xx/dsmg600-setup.c index b3bd0e137f6d..b3689a141ec6 100644 --- a/arch/arm/mach-ixp4xx/dsmg600-setup.c +++ b/arch/arm/mach-ixp4xx/dsmg600-setup.c @@ -174,7 +174,7 @@ static int power_button_countdown; #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */ static void dsmg600_power_handler(unsigned long data); -static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler, 0, 0); +static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler); static void dsmg600_power_handler(unsigned long data) { diff --git a/arch/arm/mach-ixp4xx/nas100d-setup.c b/arch/arm/mach-ixp4xx/nas100d-setup.c index 4e0f762bc651..562d05f9888e 100644 --- a/arch/arm/mach-ixp4xx/nas100d-setup.c +++ b/arch/arm/mach-ixp4xx/nas100d-setup.c @@ -197,7 +197,7 @@ static int power_button_countdown; #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */ static void nas100d_power_handler(unsigned long data); -static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler, 0, 0); +static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler); static void nas100d_power_handler(unsigned long data) { diff --git a/arch/m68k/amiga/amisound.c b/arch/m68k/amiga/amisound.c index 90a60d758f8b..a23f48181fd6 100644 --- a/arch/m68k/amiga/amisound.c +++ b/arch/m68k/amiga/amisound.c @@ -66,7 +66,7 @@ void __init amiga_init_sound(void) } static void nosound( unsigned long ignored ); -static DEFINE_TIMER(sound_timer, nosound, 0, 0); +static DEFINE_TIMER(sound_timer, nosound); void amiga_mksound( unsigned int hz, unsigned int ticks ) { diff --git a/arch/m68k/mac/macboing.c b/arch/m68k/mac/macboing.c index ffaa1f6439ae..9a52aff183d0 100644 --- a/arch/m68k/mac/macboing.c +++ b/arch/m68k/mac/macboing.c @@ -56,7 +56,7 @@ static void ( *mac_special_bell )( unsigned int, unsigned int, unsigned int ); /* * our timer to start/continue/stop the bell */ -static DEFINE_TIMER(mac_sound_timer, mac_nosound, 0, 0); +static DEFINE_TIMER(mac_sound_timer, mac_nosound); /* * Sort of initialize
Re: [PATCH 09/13] timer: Remove users of expire and data arguments to DEFINE_TIMER
On 10/04/2017 04:27 PM, Kees Cook wrote: The expire and data arguments of DEFINE_TIMER are only used in two places and are ignored by the code (malta-display.c only uses mod_timer(), never add_timer(), so the preset expires value is ignored). Set both sets of arguments to zero. Cc: Ralf BaechleCc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Geert Uytterhoeven Cc: linux-m...@linux-mips.org Cc: linux-watch...@vger.kernel.org Signed-off-by: Kees Cook For watchdog: Acked-by: Guenter Roeck --- arch/mips/mti-malta/malta-display.c | 6 +++--- drivers/watchdog/alim7101_wdt.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/mips/mti-malta/malta-display.c b/arch/mips/mti-malta/malta-display.c index d4f807191ecd..ac813158b9b8 100644 --- a/arch/mips/mti-malta/malta-display.c +++ b/arch/mips/mti-malta/malta-display.c @@ -36,10 +36,10 @@ void mips_display_message(const char *str) } } -static void scroll_display_message(unsigned long data); -static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, HZ, 0); +static void scroll_display_message(unsigned long unused); +static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, 0, 0); -static void scroll_display_message(unsigned long data) +static void scroll_display_message(unsigned long unused) { mips_display_message(_string[display_count++]); if (display_count == max_display_count) diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c index 665e0e7dfe1e..3c1f6ac68ea9 100644 --- a/drivers/watchdog/alim7101_wdt.c +++ b/drivers/watchdog/alim7101_wdt.c @@ -71,7 +71,7 @@ MODULE_PARM_DESC(use_gpio, "Use the gpio watchdog (required by old cobalt boards)."); static void wdt_timer_ping(unsigned long); -static DEFINE_TIMER(timer, wdt_timer_ping, 0, 1); +static DEFINE_TIMER(timer, wdt_timer_ping, 0, 0); static unsigned long next_heartbeat; static unsigned long wdt_is_open; static char wdt_expect_close; @@ -87,7 +87,7 @@ MODULE_PARM_DESC(nowayout, *Whack the dog */ -static void wdt_timer_ping(unsigned long data) +static void wdt_timer_ping(unsigned long unused) { /* If we got a heartbeat pulse within the WDT_US_INTERVAL * we agree to ping the WDT
[PATCH] powerpc: Drop lockdep_assert_cpus_held call from arch_update_cpu_topology
It turns out that not all paths calling arch_update_cpu_topology hold cpu_hotplug_lock, but that's ok because those paths aren't supposed to race with any concurrent hotplug events. Callers of arch_update_cpu_topology are expected to know what they are doing when they call the function without holding the lock, so remove the lockdep warning. Here are two reported splats that turned out to be harmless (as far as I know, at least): [ 255.654622] [ cut here ] [ 255.654658] WARNING: CPU: 1 PID: 14 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x54/0x80 [ 255.654661] Modules linked in: bridge stp llc binfmt_misc kvm_hv kvm leds_powernv vmx_crypto powernv_rng rng_core led_class powernv_op_panel autofs4 xfs btrfs lzo_compress raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath mlx4_en raid10 crc32c_vpmsum lpfc be2net crc_t10dif crct10dif_generic crct10dif_common mlx4_core [ 255.654734] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: GW 4.13.0 #1 [ 255.654737] task: c01ff25fa100 task.stack: c01ff2578000 [ 255.654740] NIP: c00f8624 LR: c00f8618 CTR: c01763e0 [ 255.654742] REGS: c01ff257b780 TRAP: 0700 Tainted: GW (4.13.0) [ 255.654744] MSR: 90029033[ 255.654764] CR: 24200422 XER: [ 255.654766] CFAR: c01783d0 SOFTE: 1 GPR00: c00f8618 c01ff257ba00 c1042100 GPR04: 0001 GPR08: 001ffe49 c07fee782f50 GPR12: cfd80500 c012c878 c01ff6209180 GPR16: c0ef2728 GPR20: 0001 GPR24: c1087d50 c0fe0e7d c01448c0 GPR28: 001ffe49 0002 c1088050 [ 255.654842] NIP [c00f8624] lockdep_assert_cpus_held+0x54/0x80 [ 255.654845] LR [c00f8618] lockdep_assert_cpus_held+0x48/0x80 [ 255.654847] Call Trace: [ 255.654851] [c01ff257ba00] [c00f8618] lockdep_assert_cpus_held+0x48/0x80 (unreliable) [ 255.654858] [c01ff257ba20] [c007a848] arch_update_cpu_topology+0x18/0x30 [ 255.654864] [c01ff257ba40] [c016bd0c] partition_sched_domains+0x8c/0x4e0 [ 255.654870] [c01ff257baf0] [c020a914] cpuset_update_active_cpus+0x24/0x60 [ 255.654876] [c01ff257bb10] [c01449bc] sched_cpu_deactivate+0xfc/0x1a0 [ 255.654882] [c01ff257bc20] [c00f5a8c] cpuhp_invoke_callback+0x19c/0xe00 [ 255.654888] [c01ff257bcb0] [c00f6868] cpuhp_down_callbacks+0x78/0xf0 [ 255.654893] [c01ff257bd00] [c00f6c1c] cpuhp_thread_fun+0x1fc/0x210 [ 255.654899] [c01ff257bd50] [c0132f4c] smpboot_thread_fn+0x2fc/0x370 [ 255.654905] [c01ff257bdc0] [c012ca24] kthread+0x1b4/0x1c0 [ 255.654911] [c01ff257be30] [c000bcec] ret_from_kernel_thread+0x5c/0x70 [ 255.654914] Instruction dump: [ 255.654919] 4e800020 6000 6042 7c0802a6 3c62ffeb 3880 38639280 f8010030 [ 255.654934] 4807fd45 6000 2fa3 409e0020 <0fe0> e8010030 38210020 7c0803a6 [ 255.654950] ---[ end trace 06efa323f571f14b ]--- [ 255.894356] [ cut here ] and: [2.862745] [ cut here ] [2.862772] WARNING: CPU: 0 PID: 1 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x5c/0x70 [2.862784] Modules linked in: [2.862819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-00083-gb38923a #1 [2.862835] task: c03ff328 task.stack: c03ff6104000 [2.862847] NIP: c010ac7c LR: c010ac70 CTR: [2.862862] REGS: c03ff61078d0 TRAP: 0700 Not tainted (4.13.0-00083-gb38923a) [2.862874] MSR: 92029033 CR: 2422 XER: [2.863046] CFAR: c0190bbc SOFTE: 1 GPR00: c010ac70 c03ff6107b50 c1150500 GPR04: c0fdbe60 c123c5f8 GPR08: GPR12: 2482 cfd4 c000dc08 GPR16: GPR20: GPR24: c0eb392c c0f2bdb0 GPR28: c1199050 c123c4f8 [2.863597]
[PATCH 12/13] kthread: Convert callback to use from_timer()
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch kthread to use from_timer() and pass the timer pointer explicitly. Cc: Andrew MortonCc: Petr Mladek Cc: Tejun Heo Cc: Thomas Gleixner Cc: Oleg Nesterov Signed-off-by: Kees Cook --- include/linux/kthread.h | 10 +- kernel/kthread.c| 10 -- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 0d622b350d3f..35cbe3b0ce5b 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -75,7 +75,7 @@ extern int tsk_fork_get_node(struct task_struct *tsk); */ struct kthread_work; typedef void (*kthread_work_func_t)(struct kthread_work *work); -void kthread_delayed_work_timer_fn(unsigned long __data); +void kthread_delayed_work_timer_fn(struct timer_list *t); enum { KTW_FREEZABLE = 1 << 0, /* freeze during suspend */ @@ -116,8 +116,8 @@ struct kthread_delayed_work { #define KTHREAD_DELAYED_WORK_INIT(dwork, fn) { \ .work = KTHREAD_WORK_INIT((dwork).work, (fn)), \ - .timer = __TIMER_INITIALIZER(kthread_delayed_work_timer_fn, \ -(unsigned long)&(dwork), \ + .timer = __TIMER_INITIALIZER((TIMER_FUNC_TYPE)kthread_delayed_work_timer_fn,\ +(TIMER_DATA_TYPE)&(dwork.timer), \ TIMER_IRQSAFE),\ } @@ -164,8 +164,8 @@ extern void __kthread_init_worker(struct kthread_worker *worker, do {\ kthread_init_work(&(dwork)->work, (fn));\ __setup_timer(&(dwork)->timer, \ - kthread_delayed_work_timer_fn,\ - (unsigned long)(dwork), \ + (TIMER_FUNC_TYPE)kthread_delayed_work_timer_fn,\ + (TIMER_DATA_TYPE)&(dwork)->timer, \ TIMER_IRQSAFE); \ } while (0) diff --git a/kernel/kthread.c b/kernel/kthread.c index 1c19edf82427..ba3992c8c375 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -798,15 +798,14 @@ EXPORT_SYMBOL_GPL(kthread_queue_work); /** * kthread_delayed_work_timer_fn - callback that queues the associated kthread * delayed work when the timer expires. - * @__data: pointer to the data associated with the timer + * @t: pointer to the expired timer * * The format of the function is defined by struct timer_list. * It should have been called from irqsafe timer with irq already off. */ -void kthread_delayed_work_timer_fn(unsigned long __data) +void kthread_delayed_work_timer_fn(struct timer_list *t) { - struct kthread_delayed_work *dwork = - (struct kthread_delayed_work *)__data; + struct kthread_delayed_work *dwork = from_timer(dwork, t, timer); struct kthread_work *work = >work; struct kthread_worker *worker = work->worker; @@ -837,8 +836,7 @@ void __kthread_queue_delayed_work(struct kthread_worker *worker, struct timer_list *timer = >timer; struct kthread_work *work = >work; - WARN_ON_ONCE(timer->function != kthread_delayed_work_timer_fn || -timer->data != (unsigned long)dwork); + WARN_ON_ONCE(timer->function != (TIMER_FUNC_TYPE)kthread_delayed_work_timer_fn); /* * If @delay is 0, queue @dwork->work immediately. This is for -- 2.7.4
[PATCH 13/13] workqueue: Convert callback to use from_timer()
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch workqueue to use from_timer() and pass the timer pointer explicitly. Cc: Tejun HeoCc: Lai Jiangshan Signed-off-by: Kees Cook --- include/linux/workqueue.h | 15 --- kernel/workqueue.c| 7 +++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index f4960260feaf..f3c47a05fd06 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -17,7 +17,7 @@ struct workqueue_struct; struct work_struct; typedef void (*work_func_t)(struct work_struct *work); -void delayed_work_timer_fn(unsigned long __data); +void delayed_work_timer_fn(struct timer_list *t); /* * The first word is the work queue pointer and the flags rolled into @@ -175,8 +175,8 @@ struct execute_work { #define __DELAYED_WORK_INITIALIZER(n, f, tflags) { \ .work = __WORK_INITIALIZER((n).work, (f)), \ - .timer = __TIMER_INITIALIZER(delayed_work_timer_fn, \ -(unsigned long)&(n), \ + .timer = __TIMER_INITIALIZER((TIMER_FUNC_TYPE)delayed_work_timer_fn,\ +(TIMER_DATA_TYPE)&(n.timer), \ (tflags) | TIMER_IRQSAFE), \ } @@ -241,8 +241,9 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; } #define __INIT_DELAYED_WORK(_work, _func, _tflags) \ do {\ INIT_WORK(&(_work)->work, (_func)); \ - __setup_timer(&(_work)->timer, delayed_work_timer_fn, \ - (unsigned long)(_work), \ + __setup_timer(&(_work)->timer, \ + (TIMER_FUNC_TYPE)delayed_work_timer_fn, \ + (TIMER_DATA_TYPE)&(_work)->timer, \ (_tflags) | TIMER_IRQSAFE); \ } while (0) @@ -250,8 +251,8 @@ static inline unsigned int work_static(struct work_struct *work) { return 0; } do {\ INIT_WORK_ONSTACK(&(_work)->work, (_func)); \ __setup_timer_on_stack(&(_work)->timer, \ - delayed_work_timer_fn, \ - (unsigned long)(_work), \ + (TIMER_FUNC_TYPE)delayed_work_timer_fn,\ + (TIMER_DATA_TYPE)&(_work)->timer,\ (_tflags) | TIMER_IRQSAFE); \ } while (0) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index a5361fc6215d..c77fdf6bf24f 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1492,9 +1492,9 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq, } EXPORT_SYMBOL(queue_work_on); -void delayed_work_timer_fn(unsigned long __data) +void delayed_work_timer_fn(struct timer_list *t) { - struct delayed_work *dwork = (struct delayed_work *)__data; + struct delayed_work *dwork = from_timer(dwork, t, timer); /* should have been called from irqsafe timer with irq already off */ __queue_work(dwork->cpu, dwork->wq, >work); @@ -1508,8 +1508,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, struct work_struct *work = >work; WARN_ON_ONCE(!wq); - WARN_ON_ONCE(timer->function != delayed_work_timer_fn || -timer->data != (unsigned long)dwork); + WARN_ON_ONCE(timer->function != (TIMER_FUNC_TYPE)delayed_work_timer_fn); WARN_ON_ONCE(timer_pending(timer)); WARN_ON_ONCE(!list_empty(>entry)); -- 2.7.4
[PATCH 11/13] timer: Remove expires argument from __TIMER_INITIALIZER()
The expires field is normally initialized during the first mod_timer() call. It was unused by all callers, so remove it from the macro. Signed-off-by: Kees Cook--- include/linux/kthread.h | 2 +- include/linux/timer.h | 5 ++--- include/linux/workqueue.h | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 82e197eeac91..0d622b350d3f 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -117,7 +117,7 @@ struct kthread_delayed_work { #define KTHREAD_DELAYED_WORK_INIT(dwork, fn) { \ .work = KTHREAD_WORK_INIT((dwork).work, (fn)), \ .timer = __TIMER_INITIALIZER(kthread_delayed_work_timer_fn, \ -0, (unsigned long)&(dwork),\ +(unsigned long)&(dwork), \ TIMER_IRQSAFE),\ } diff --git a/include/linux/timer.h b/include/linux/timer.h index 91e5a2cc81b5..10685c33e679 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -63,10 +63,9 @@ struct timer_list { #define TIMER_TRACE_FLAGMASK (TIMER_MIGRATING | TIMER_DEFERRABLE | TIMER_PINNED | TIMER_IRQSAFE) -#define __TIMER_INITIALIZER(_function, _expires, _data, _flags) { \ +#define __TIMER_INITIALIZER(_function, _data, _flags) {\ .entry = { .next = TIMER_ENTRY_STATIC },\ .function = (_function),\ - .expires = (_expires), \ .data = (_data),\ .flags = (_flags), \ __TIMER_LOCKDEP_MAP_INITIALIZER(\ @@ -75,7 +74,7 @@ struct timer_list { #define DEFINE_TIMER(_name, _function) \ struct timer_list _name = \ - __TIMER_INITIALIZER(_function, 0, 0, 0) + __TIMER_INITIALIZER(_function, 0, 0) void init_timer_key(struct timer_list *timer, unsigned int flags, const char *name, struct lock_class_key *key); diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 1c49431f3121..f4960260feaf 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -176,7 +176,7 @@ struct execute_work { #define __DELAYED_WORK_INITIALIZER(n, f, tflags) { \ .work = __WORK_INITIALIZER((n).work, (f)), \ .timer = __TIMER_INITIALIZER(delayed_work_timer_fn, \ -0, (unsigned long)&(n),\ +(unsigned long)&(n), \ (tflags) | TIMER_IRQSAFE), \ } -- 2.7.4
[PATCH 09/13] timer: Remove users of expire and data arguments to DEFINE_TIMER
The expire and data arguments of DEFINE_TIMER are only used in two places and are ignored by the code (malta-display.c only uses mod_timer(), never add_timer(), so the preset expires value is ignored). Set both sets of arguments to zero. Cc: Ralf BaechleCc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Geert Uytterhoeven Cc: linux-m...@linux-mips.org Cc: linux-watch...@vger.kernel.org Signed-off-by: Kees Cook --- arch/mips/mti-malta/malta-display.c | 6 +++--- drivers/watchdog/alim7101_wdt.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/mips/mti-malta/malta-display.c b/arch/mips/mti-malta/malta-display.c index d4f807191ecd..ac813158b9b8 100644 --- a/arch/mips/mti-malta/malta-display.c +++ b/arch/mips/mti-malta/malta-display.c @@ -36,10 +36,10 @@ void mips_display_message(const char *str) } } -static void scroll_display_message(unsigned long data); -static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, HZ, 0); +static void scroll_display_message(unsigned long unused); +static DEFINE_TIMER(mips_scroll_timer, scroll_display_message, 0, 0); -static void scroll_display_message(unsigned long data) +static void scroll_display_message(unsigned long unused) { mips_display_message(_string[display_count++]); if (display_count == max_display_count) diff --git a/drivers/watchdog/alim7101_wdt.c b/drivers/watchdog/alim7101_wdt.c index 665e0e7dfe1e..3c1f6ac68ea9 100644 --- a/drivers/watchdog/alim7101_wdt.c +++ b/drivers/watchdog/alim7101_wdt.c @@ -71,7 +71,7 @@ MODULE_PARM_DESC(use_gpio, "Use the gpio watchdog (required by old cobalt boards)."); static void wdt_timer_ping(unsigned long); -static DEFINE_TIMER(timer, wdt_timer_ping, 0, 1); +static DEFINE_TIMER(timer, wdt_timer_ping, 0, 0); static unsigned long next_heartbeat; static unsigned long wdt_is_open; static char wdt_expect_close; @@ -87,7 +87,7 @@ MODULE_PARM_DESC(nowayout, * Whack the dog */ -static void wdt_timer_ping(unsigned long data) +static void wdt_timer_ping(unsigned long unused) { /* If we got a heartbeat pulse within the WDT_US_INTERVAL * we agree to ping the WDT -- 2.7.4
[PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER
Drop the arguments from the macro and adjust all callers with the following script: perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \ $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h) Signed-off-by: Kees CookAcked-by: Geert Uytterhoeven # for m68k parts --- arch/arm/mach-ixp4xx/dsmg600-setup.c | 2 +- arch/arm/mach-ixp4xx/nas100d-setup.c | 2 +- arch/m68k/amiga/amisound.c| 2 +- arch/m68k/mac/macboing.c | 2 +- arch/mips/mti-malta/malta-display.c | 2 +- arch/parisc/kernel/pdc_cons.c | 2 +- arch/s390/mm/cmm.c| 2 +- drivers/atm/idt77105.c| 4 ++-- drivers/atm/iphase.c | 2 +- drivers/block/ataflop.c | 8 drivers/char/dtlk.c | 2 +- drivers/char/hangcheck-timer.c| 2 +- drivers/char/nwbutton.c | 2 +- drivers/char/rtc.c| 2 +- drivers/input/touchscreen/s3c2410_ts.c| 2 +- drivers/net/cris/eth_v10.c| 6 +++--- drivers/net/hamradio/yam.c| 2 +- drivers/net/wireless/atmel/at76c50x-usb.c | 2 +- drivers/staging/speakup/main.c| 2 +- drivers/staging/speakup/synth.c | 2 +- drivers/tty/cyclades.c| 2 +- drivers/tty/isicom.c | 2 +- drivers/tty/moxa.c| 2 +- drivers/tty/rocket.c | 2 +- drivers/tty/vt/keyboard.c | 2 +- drivers/tty/vt/vt.c | 2 +- drivers/watchdog/alim7101_wdt.c | 2 +- drivers/watchdog/machzwd.c| 2 +- drivers/watchdog/mixcomwd.c | 2 +- drivers/watchdog/sbc60xxwdt.c | 2 +- drivers/watchdog/sc520_wdt.c | 2 +- drivers/watchdog/via_wdt.c| 2 +- drivers/watchdog/w83877f_wdt.c| 2 +- drivers/xen/grant-table.c | 2 +- fs/pstore/platform.c | 2 +- include/linux/timer.h | 4 ++-- kernel/irq/spurious.c | 2 +- lib/random32.c| 2 +- net/atm/mpc.c | 2 +- net/decnet/dn_route.c | 2 +- net/ipv6/ip6_flowlabel.c | 2 +- net/netrom/nr_loopback.c | 2 +- security/keys/gc.c| 2 +- sound/oss/midibuf.c | 2 +- sound/oss/soundcard.c | 2 +- sound/oss/sys_timer.c | 2 +- sound/oss/uart6850.c | 2 +- 47 files changed, 54 insertions(+), 54 deletions(-) diff --git a/arch/arm/mach-ixp4xx/dsmg600-setup.c b/arch/arm/mach-ixp4xx/dsmg600-setup.c index b3bd0e137f6d..b3689a141ec6 100644 --- a/arch/arm/mach-ixp4xx/dsmg600-setup.c +++ b/arch/arm/mach-ixp4xx/dsmg600-setup.c @@ -174,7 +174,7 @@ static int power_button_countdown; #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */ static void dsmg600_power_handler(unsigned long data); -static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler, 0, 0); +static DEFINE_TIMER(dsmg600_power_timer, dsmg600_power_handler); static void dsmg600_power_handler(unsigned long data) { diff --git a/arch/arm/mach-ixp4xx/nas100d-setup.c b/arch/arm/mach-ixp4xx/nas100d-setup.c index 4e0f762bc651..562d05f9888e 100644 --- a/arch/arm/mach-ixp4xx/nas100d-setup.c +++ b/arch/arm/mach-ixp4xx/nas100d-setup.c @@ -197,7 +197,7 @@ static int power_button_countdown; #define PBUTTON_HOLDDOWN_COUNT 4 /* 2 secs */ static void nas100d_power_handler(unsigned long data); -static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler, 0, 0); +static DEFINE_TIMER(nas100d_power_timer, nas100d_power_handler); static void nas100d_power_handler(unsigned long data) { diff --git a/arch/m68k/amiga/amisound.c b/arch/m68k/amiga/amisound.c index 90a60d758f8b..a23f48181fd6 100644 --- a/arch/m68k/amiga/amisound.c +++ b/arch/m68k/amiga/amisound.c @@ -66,7 +66,7 @@ void __init amiga_init_sound(void) } static void nosound( unsigned long ignored ); -static DEFINE_TIMER(sound_timer, nosound, 0, 0); +static DEFINE_TIMER(sound_timer, nosound); void amiga_mksound( unsigned int hz, unsigned int ticks ) { diff --git a/arch/m68k/mac/macboing.c b/arch/m68k/mac/macboing.c index ffaa1f6439ae..9a52aff183d0 100644 --- a/arch/m68k/mac/macboing.c +++ b/arch/m68k/mac/macboing.c @@ -56,7 +56,7 @@ static void ( *mac_special_bell )( unsigned int, unsigned int, unsigned int ); /* * our timer to start/continue/stop the bell */ -static DEFINE_TIMER(mac_sound_timer, mac_nosound, 0, 0); +static DEFINE_TIMER(mac_sound_timer, mac_nosound); /* * Sort of initialize the sound chip (called from mac_mksound on the first diff --git a/arch/mips/mti-malta/malta-display.c b/arch/mips/mti-malta/malta-display.c index ac813158b9b8..063de44675ce 100644 ---
[PATCH 07/13] timer: Remove last user of TIMER_INITIALIZER
Drops the last user of TIMER_INITIALIZER and adapts timer.h to use the internal version. Cc: Arnd BergmannCc: Greg Kroah-Hartman Cc: Mark Gross Cc: Thomas Gleixner Signed-off-by: Kees Cook --- drivers/char/tlclk.c | 12 +--- include/linux/timer.h | 2 +- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/char/tlclk.c b/drivers/char/tlclk.c index 6210bff46341..8eeb4190207d 100644 --- a/drivers/char/tlclk.c +++ b/drivers/char/tlclk.c @@ -184,9 +184,8 @@ static unsigned int telclk_interrupt; static int int_events; /* Event that generate a interrupt */ static int got_event; /* if events processing have been done */ -static void switchover_timeout(unsigned long data); -static struct timer_list switchover_timer = - TIMER_INITIALIZER(switchover_timeout , 0, 0); +static void switchover_timeout(struct timer_list *t); +static struct timer_list switchover_timer; static unsigned long tlclk_timer_data; static struct tlclk_alarms *alarm_events; @@ -805,7 +804,7 @@ static int __init tlclk_init(void) goto out3; } - init_timer(_timer); + timer_setup(_timer, switchover_timeout, 0); ret = misc_register(_miscdev); if (ret < 0) { @@ -855,9 +854,9 @@ static void __exit tlclk_cleanup(void) } -static void switchover_timeout(unsigned long data) +static void switchover_timeout(struct timer_list *unused) { - unsigned long flags = *(unsigned long *) data; + unsigned long flags = tlclk_timer_data; if ((flags & 1)) { if ((inb(TLCLK_REG1) & 0x08) != (flags & 0x08)) @@ -922,7 +921,6 @@ static irqreturn_t tlclk_interrupt(int irq, void *dev_id) /* TIMEOUT in ~10ms */ switchover_timer.expires = jiffies + msecs_to_jiffies(10); tlclk_timer_data = inb(TLCLK_REG1); - switchover_timer.data = (unsigned long) _timer_data; mod_timer(_timer, switchover_timer.expires); } else { got_event = 1; diff --git a/include/linux/timer.h b/include/linux/timer.h index 10cc45ca5803..4f7476e4a727 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -87,7 +87,7 @@ struct timer_list { #define DEFINE_TIMER(_name, _function, _expires, _data)\ struct timer_list _name = \ - TIMER_INITIALIZER(_function, _expires, _data) + __TIMER_INITIALIZER(_function, _expires, _data, 0) void init_timer_key(struct timer_list *timer, unsigned int flags, const char *name, struct lock_class_key *key); -- 2.7.4
[PATCH 06/13] timer: Remove users of TIMER_DEFERRED_INITIALIZER
This removes uses of TIMER_DEFERRED_INITIALIZER and chooses a location to call timer_setup() from before add_timer() or mod_timer() is called. Adjusts callbacks to use from_timer() as needed. Cc: Martin SchwidefskyCc: Heiko Carstens Cc: Tejun Heo Cc: Lai Jiangshan Cc: linux-s...@vger.kernel.org Signed-off-by: Kees Cook --- arch/s390/kernel/lgr.c | 6 +++--- arch/s390/kernel/topology.c | 6 +++--- kernel/workqueue.c | 8 +++- 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/arch/s390/kernel/lgr.c b/arch/s390/kernel/lgr.c index ae7dff110054..bf9622f0e6b1 100644 --- a/arch/s390/kernel/lgr.c +++ b/arch/s390/kernel/lgr.c @@ -153,14 +153,13 @@ static void lgr_timer_set(void); /* * LGR timer callback */ -static void lgr_timer_fn(unsigned long ignored) +static void lgr_timer_fn(struct timer_list *unused) { lgr_info_log(); lgr_timer_set(); } -static struct timer_list lgr_timer = - TIMER_DEFERRED_INITIALIZER(lgr_timer_fn, 0, 0); +static struct timer_list lgr_timer; /* * Setup next LGR timer @@ -181,6 +180,7 @@ static int __init lgr_init(void) debug_register_view(lgr_dbf, _hex_ascii_view); lgr_info_get(_info_last); debug_event(lgr_dbf, 1, _info_last, sizeof(lgr_info_last)); + timer_setup(_timer, lgr_timer_fn, TIMER_DEFERRABLE); lgr_timer_set(); return 0; } diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c index ed0bdd220e1a..d7ece9888c29 100644 --- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -320,15 +320,14 @@ static void topology_flush_work(void) flush_work(_work); } -static void topology_timer_fn(unsigned long ignored) +static void topology_timer_fn(struct timer_list *unused) { if (ptf(PTF_CHECK)) topology_schedule_update(); set_topology_timer(); } -static struct timer_list topology_timer = - TIMER_DEFERRED_INITIALIZER(topology_timer_fn, 0, 0); +static struct timer_list topology_timer; static atomic_t topology_poll = ATOMIC_INIT(0); @@ -597,6 +596,7 @@ static struct ctl_table topology_dir_table[] = { static int __init topology_init(void) { + timer_setup(_timer, topology_timer_fn, TIMER_DEFERRABLE); if (MACHINE_HAS_TOPOLOGY) set_topology_timer(); else diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 64d0edf428f8..a5361fc6215d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -5390,11 +5390,8 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq) { } */ #ifdef CONFIG_WQ_WATCHDOG -static void wq_watchdog_timer_fn(unsigned long data); - static unsigned long wq_watchdog_thresh = 30; -static struct timer_list wq_watchdog_timer = - TIMER_DEFERRED_INITIALIZER(wq_watchdog_timer_fn, 0, 0); +static struct timer_list wq_watchdog_timer; static unsigned long wq_watchdog_touched = INITIAL_JIFFIES; static DEFINE_PER_CPU(unsigned long, wq_watchdog_touched_cpu) = INITIAL_JIFFIES; @@ -5408,7 +5405,7 @@ static void wq_watchdog_reset_touched(void) per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies; } -static void wq_watchdog_timer_fn(unsigned long data) +static void wq_watchdog_timer_fn(struct timer_list *unused) { unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ; bool lockup_detected = false; @@ -5510,6 +5507,7 @@ module_param_cb(watchdog_thresh, _watchdog_thresh_ops, _watchdog_thresh, static void wq_watchdog_init(void) { + timer_setup(_watchdog_timer, wq_watchdog_timer_fn, TIMER_DEFERRABLE); wq_watchdog_set_thresh(wq_watchdog_thresh); } -- 2.7.4
[PATCH 08/13] timer: Remove unused static initializer macros
This removes the now unused TIMER_*INITIALIZER macros: TIMER_INITIALIZER TIMER_PINNED_INITIALIZER TIMER_DEFERRED_INITIALIZER TIMER_PINNED_DEFERRED_INITIALIZER Signed-off-by: Kees Cook--- include/linux/timer.h | 12 1 file changed, 12 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 4f7476e4a727..a33220311361 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -73,18 +73,6 @@ struct timer_list { __FILE__ ":" __stringify(__LINE__)) \ } -#define TIMER_INITIALIZER(_function, _expires, _data) \ - __TIMER_INITIALIZER((_function), (_expires), (_data), 0) - -#define TIMER_PINNED_INITIALIZER(_function, _expires, _data) \ - __TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_PINNED) - -#define TIMER_DEFERRED_INITIALIZER(_function, _expires, _data) \ - __TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE) - -#define TIMER_PINNED_DEFERRED_INITIALIZER(_function, _expires, _data) \ - __TIMER_INITIALIZER((_function), (_expires), (_data), TIMER_DEFERRABLE | TIMER_PINNED) - #define DEFINE_TIMER(_name, _function, _expires, _data)\ struct timer_list _name = \ __TIMER_INITIALIZER(_function, _expires, _data, 0) -- 2.7.4
[PATCH 05/13] timer: Remove init_timer_deferrable() in favor of timer_setup()
This refactors the only users of init_timer_deferrable() to use the new timer_setup() and from_timer(). Removes definition of init_timer_deferrable(). Cc: Benjamin HerrenschmidtCc: Michael Ellerman Cc: Sebastian Reichel Cc: Harish Patil Cc: Manish Chopra Cc: Kalle Valo Cc: linuxppc-dev@lists.ozlabs.org Cc: net...@vger.kernel.org Cc: linux-wirel...@vger.kernel.org Signed-off-by: Kees Cook --- arch/powerpc/mm/numa.c | 12 +-- drivers/hsi/clients/ssi_protocol.c | 32 drivers/net/ethernet/qlogic/qlge/qlge_main.c | 11 -- drivers/net/vxlan.c | 8 +++ drivers/net/wireless/ath/ath6kl/recovery.c | 9 include/linux/timer.h| 2 -- 6 files changed, 34 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b95c584ce19d..f9b6107d6854 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1453,7 +1453,7 @@ static void topology_schedule_update(void) schedule_work(_work); } -static void topology_timer_fn(unsigned long ignored) +static void topology_timer_fn(struct timer_list *unused) { if (prrn_enabled && cpumask_weight(_associativity_changes_mask)) topology_schedule_update(); @@ -1463,14 +1463,11 @@ static void topology_timer_fn(unsigned long ignored) reset_topology_timer(); } } -static struct timer_list topology_timer = - TIMER_INITIALIZER(topology_timer_fn, 0, 0); +static struct timer_list topology_timer; static void reset_topology_timer(void) { - topology_timer.data = 0; - topology_timer.expires = jiffies + 60 * HZ; - mod_timer(_timer, topology_timer.expires); + mod_timer(_timer, jiffies + 60 * HZ); } #ifdef CONFIG_SMP @@ -1530,7 +1527,8 @@ int start_topology_update(void) prrn_enabled = 0; vphn_enabled = 1; setup_cpu_associativity_change_counters(); - init_timer_deferrable(_timer); + timer_setup(_timer, topology_timer_fn, + TIMER_DEFERRABLE); reset_topology_timer(); } } diff --git a/drivers/hsi/clients/ssi_protocol.c b/drivers/hsi/clients/ssi_protocol.c index 93d28c0ec8bf..67af03d3aeb3 100644 --- a/drivers/hsi/clients/ssi_protocol.c +++ b/drivers/hsi/clients/ssi_protocol.c @@ -464,10 +464,10 @@ static void ssip_error(struct hsi_client *cl) hsi_async_read(cl, msg); } -static void ssip_keep_alive(unsigned long data) +static void ssip_keep_alive(struct timer_list *t) { - struct hsi_client *cl = (struct hsi_client *)data; - struct ssi_protocol *ssi = hsi_client_drvdata(cl); + struct ssi_protocol *ssi = from_timer(ssi, t, keep_alive); + struct hsi_client *cl = ssi->cl; dev_dbg(>device, "Keep alive kick in: m(%d) r(%d) s(%d)\n", ssi->main_state, ssi->recv_state, ssi->send_state); @@ -490,9 +490,19 @@ static void ssip_keep_alive(unsigned long data) spin_unlock(>lock); } -static void ssip_wd(unsigned long data) +static void ssip_rx_wd(struct timer_list *t) +{ + struct ssi_protocol *ssi = from_timer(ssi, t, rx_wd); + struct hsi_client *cl = ssi->cl; + + dev_err(>device, "Watchdog trigerred\n"); + ssip_error(cl); +} + +static void ssip_tx_wd(unsigned long data) { - struct hsi_client *cl = (struct hsi_client *)data; + struct ssi_protocol *ssi = from_timer(ssi, t, tx_wd); + struct hsi_client *cl = ssi->cl; dev_err(>device, "Watchdog trigerred\n"); ssip_error(cl); @@ -1084,15 +1094,9 @@ static int ssi_protocol_probe(struct device *dev) } spin_lock_init(>lock); - init_timer_deferrable(>rx_wd); - init_timer_deferrable(>tx_wd); - init_timer(>keep_alive); - ssi->rx_wd.data = (unsigned long)cl; - ssi->rx_wd.function = ssip_wd; - ssi->tx_wd.data = (unsigned long)cl; - ssi->tx_wd.function = ssip_wd; - ssi->keep_alive.data = (unsigned long)cl; - ssi->keep_alive.function = ssip_keep_alive; + timer_setup(>rx_wd, ssip_rx_wd, TIMER_DEFERRABLE); + timer_setup(>tx_wd, ssip_tx_wd, TIMER_DEFERRABLE); + timer_setup(>keep_alive, ssip_keep_alive, 0); INIT_LIST_HEAD(>txqueue); INIT_LIST_HEAD(>cmdqueue); atomic_set(>tx_usecnt, 0); diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c index 9feec7009443..29fea74bff2e 100644 --- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c +++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c @@ -4725,9 +4725,9 @@ static const struct net_device_ops qlge_netdev_ops =
[PATCH 00/13] timer: Start conversion to timer_setup()
Hi, This is the first of many timer infrastructure cleanups to simplify the timer API[1]. All of these patches are expected to land via the timer tree, so Acks (or corrections) appreciated. These patches refactor various users of timer API that are NOT just using init_timer() or setup_timer() (which is the vast majority of users, and are being converted separately). These changes are focused on the lesser-used init_timer_*(), TIMER_*INITIALIZER(), and DEFINE_TIMER() methods of preparing a timer. Thanks! -Kees [1] https://git.kernel.org/linus/686fef928bba6be13cabe639f154af7d72b63120
[PATCH 03/13] timer: Remove init_timer_on_stack() in favor of timer_setup_on_stack()
Remove uses of init_timer_on_stack() with open-coded function and data assignments that could be expressed using timer_setup_on_stack(). Several were removed from the stack entirely since there was a one-to-one mapping of parent structure to timer, those are switched to using timer_setup() instead. All related callbacks were adjusted to use from_timer(). Cc: "Rafael J. Wysocki"Cc: Pavel Machek Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Stefan Richter Cc: Sudip Mukherjee Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Julian Wiedmann Cc: Ursula Braun Cc: Michael Reed Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: Thomas Gleixner Cc: linux...@vger.kernel.org Cc: linux1394-de...@lists.sourceforge.net Cc: linux-s...@vger.kernel.org Cc: linux-s...@vger.kernel.org Signed-off-by: Kees Cook --- drivers/base/power/main.c | 8 +++- drivers/firewire/core-transaction.c | 10 +- drivers/parport/ieee1284.c | 21 +++-- drivers/s390/char/tape.h| 1 + drivers/s390/char/tape_std.c| 18 ++ drivers/s390/net/lcs.c | 16 ++-- drivers/s390/net/lcs.h | 1 + drivers/scsi/qla1280.c | 14 +- drivers/scsi/qla1280.h | 1 + include/linux/parport.h | 1 + include/linux/timer.h | 2 -- 11 files changed, 36 insertions(+), 57 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 770b1539a083..ae47b2ec84b4 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -478,9 +478,9 @@ struct dpm_watchdog { * There's not much we can do here to recover so panic() to * capture a crash-dump in pstore. */ -static void dpm_watchdog_handler(unsigned long data) +static void dpm_watchdog_handler(struct timer_list *t) { - struct dpm_watchdog *wd = (void *)data; + struct dpm_watchdog *wd = from_timer(wd, t, timer); dev_emerg(wd->dev, " DPM device timeout \n"); show_stack(wd->tsk, NULL); @@ -500,11 +500,9 @@ static void dpm_watchdog_set(struct dpm_watchdog *wd, struct device *dev) wd->dev = dev; wd->tsk = current; - init_timer_on_stack(timer); + timer_setup_on_stack(timer, dpm_watchdog_handler, 0); /* use same timeout value for both suspend and resume */ timer->expires = jiffies + HZ * CONFIG_DPM_WATCHDOG_TIMEOUT; - timer->function = dpm_watchdog_handler; - timer->data = (unsigned long)wd; add_timer(timer); } diff --git a/drivers/firewire/core-transaction.c b/drivers/firewire/core-transaction.c index d6a09b9cd8cc..4372f9e4b0da 100644 --- a/drivers/firewire/core-transaction.c +++ b/drivers/firewire/core-transaction.c @@ -137,9 +137,9 @@ int fw_cancel_transaction(struct fw_card *card, } EXPORT_SYMBOL(fw_cancel_transaction); -static void split_transaction_timeout_callback(unsigned long data) +static void split_transaction_timeout_callback(struct timer_list *timer) { - struct fw_transaction *t = (struct fw_transaction *)data; + struct fw_transaction *t = from_timer(t, timer, split_timeout_timer); struct fw_card *card = t->card; unsigned long flags; @@ -373,8 +373,8 @@ void fw_send_request(struct fw_card *card, struct fw_transaction *t, int tcode, t->tlabel = tlabel; t->card = card; t->is_split_transaction = false; - setup_timer(>split_timeout_timer, - split_transaction_timeout_callback, (unsigned long)t); + timer_setup(>split_timeout_timer, + split_transaction_timeout_callback, 0); t->callback = callback; t->callback_data = callback_data; @@ -423,7 +423,7 @@ int fw_run_transaction(struct fw_card *card, int tcode, int destination_id, struct transaction_callback_data d; struct fw_transaction t; - init_timer_on_stack(_timeout_timer); + timer_setup_on_stack(_timeout_timer, NULL, 0); init_completion(); d.payload = payload; fw_send_request(card, , tcode, destination_id, generation, speed, diff --git a/drivers/parport/ieee1284.c b/drivers/parport/ieee1284.c index 74cc6dd982d2..2d1a5c737c6e 100644 --- a/drivers/parport/ieee1284.c +++ b/drivers/parport/ieee1284.c @@ -44,10 +44,11 @@ static void parport_ieee1284_wakeup (struct parport *port) up (>physport->ieee1284.irq); } -static struct parport *port_from_cookie[PARPORT_MAX]; -static void timeout_waiting_on_port (unsigned long cookie) +static void timeout_waiting_on_port (struct timer_list *t) { -
[PATCH 04/13] timer: Remove init_timer_pinned() in favor of timer_setup()
This refactors the only users of init_timer_pinned() to use the new timer_setup() and from_timer(). Drops the definition of init_timer_pinned(). Cc: Chris MetcalfCc: Thomas Gleixner Cc: net...@vger.kernel.org Signed-off-by: Kees Cook --- drivers/net/ethernet/tile/tilepro.c | 9 - include/linux/timer.h | 2 -- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/tile/tilepro.c b/drivers/net/ethernet/tile/tilepro.c index 49ccee4b9aec..56d06282fbde 100644 --- a/drivers/net/ethernet/tile/tilepro.c +++ b/drivers/net/ethernet/tile/tilepro.c @@ -608,9 +608,9 @@ static void tile_net_schedule_egress_timer(struct tile_net_cpu *info) * ISSUE: Maybe instead track number of expected completions, and free * only that many, resetting to zero if "pending" is ever false. */ -static void tile_net_handle_egress_timer(unsigned long arg) +static void tile_net_handle_egress_timer(struct timer_list *t) { - struct tile_net_cpu *info = (struct tile_net_cpu *)arg; + struct tile_net_cpu *info = from_timer(info, t, egress_timer); struct net_device *dev = info->napi.dev; /* The timer is no longer scheduled. */ @@ -1004,9 +1004,8 @@ static void tile_net_register(void *dev_ptr) BUG(); /* Initialize the egress timer. */ - init_timer_pinned(>egress_timer); - info->egress_timer.data = (long)info; - info->egress_timer.function = tile_net_handle_egress_timer; + timer_setup(>egress_timer, tile_net_handle_egress_timer, + TIMER_PINNED); u64_stats_init(>stats.syncp); diff --git a/include/linux/timer.h b/include/linux/timer.h index b10c4bdc6fbd..9da903562ed4 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -128,8 +128,6 @@ static inline void init_timer_on_stack_key(struct timer_list *timer, #define init_timer(timer) \ __init_timer((timer), 0) -#define init_timer_pinned(timer) \ - __init_timer((timer), TIMER_PINNED) #define init_timer_deferrable(timer) \ __init_timer((timer), TIMER_DEFERRABLE) -- 2.7.4
[PATCH 01/13] timer: Convert schedule_timeout() to use from_timer()
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new from_timer() helper and passing the timer pointer explicitly. Since this special timer is on the stack, it needs to have a wrapper structure to carry state once .data is eliminated. Cc: John StultzCc: Thomas Gleixner Cc: Stephen Boyd Signed-off-by: Kees Cook --- include/linux/timer.h | 8 kernel/time/timer.c | 26 +++--- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 6383c528b148..5ef5c9e41a09 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -179,6 +179,14 @@ static inline void timer_setup(struct timer_list *timer, (TIMER_DATA_TYPE)timer, flags); } +static inline void timer_setup_on_stack(struct timer_list *timer, + void (*callback)(struct timer_list *), + unsigned int flags) +{ + __setup_timer_on_stack(timer, (TIMER_FUNC_TYPE)callback, + (TIMER_DATA_TYPE)timer, flags); +} + #define from_timer(var, callback_timer, timer_fieldname) \ container_of(callback_timer, typeof(*var), timer_fieldname) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index f2674a056c26..38613ced2324 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1668,9 +1668,20 @@ void run_local_timers(void) raise_softirq(TIMER_SOFTIRQ); } -static void process_timeout(unsigned long __data) +/* + * Since schedule_timeout()'s timer is defined on the stack, it must store + * the target task on the stack as well. + */ +struct process_timer { + struct timer_list timer; + struct task_struct *task; +}; + +static void process_timeout(struct timer_list *t) { - wake_up_process((struct task_struct *)__data); + struct process_timer *timeout = from_timer(timeout, t, timer); + + wake_up_process(timeout->task); } /** @@ -1704,7 +1715,7 @@ static void process_timeout(unsigned long __data) */ signed long __sched schedule_timeout(signed long timeout) { - struct timer_list timer; + struct process_timer timer; unsigned long expire; switch (timeout) @@ -1738,13 +1749,14 @@ signed long __sched schedule_timeout(signed long timeout) expire = timeout + jiffies; - setup_timer_on_stack(, process_timeout, (unsigned long)current); - __mod_timer(, expire, false); + timer.task = current; + timer_setup_on_stack(, process_timeout, 0); + __mod_timer(, expire, false); schedule(); - del_singleshot_timer_sync(); + del_singleshot_timer_sync(); /* Remove the timer from the object tracker */ - destroy_timer_on_stack(); + destroy_timer_on_stack(); timeout = expire - jiffies; -- 2.7.4
[PATCH 02/13] timer: Remove init_timer_pinned_deferrable() in favor of timer_setup()
This refactors the only user of init_timer_pinned_deferrable() to use the new timer_setup() and from_timer(). Adds a pointer back to the policy, and drops the definition of init_timer_pinned_deferrable(). Cc: "Rafael J. Wysocki"Cc: Viresh Kumar Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Thomas Gleixner Cc: linux...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook --- drivers/cpufreq/powernv-cpufreq.c | 13 +++-- include/linux/timer.h | 2 -- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 3ff5160451b4..b6d7c4c98d0a 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -90,6 +90,7 @@ struct global_pstate_info { int last_gpstate_idx; spinlock_t gpstate_lock; struct timer_list timer; + struct cpufreq_policy *policy; }; static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1]; @@ -625,10 +626,10 @@ static inline void queue_gpstate_timer(struct global_pstate_info *gpstates) * according quadratic equation. Queues a new timer if it is still not equal * to local pstate */ -void gpstate_timer_handler(unsigned long data) +void gpstate_timer_handler(struct timer_list *t) { - struct cpufreq_policy *policy = (struct cpufreq_policy *)data; - struct global_pstate_info *gpstates = policy->driver_data; + struct global_pstate_info *gpstates = from_timer(gpstates, t, timer); + struct cpufreq_policy *policy = gpstates->policy; int gpstate_idx, lpstate_idx; unsigned long val; unsigned int time_diff = jiffies_to_msecs(jiffies) @@ -800,9 +801,9 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) policy->driver_data = gpstates; /* initialize timer */ - init_timer_pinned_deferrable(>timer); - gpstates->timer.data = (unsigned long)policy; - gpstates->timer.function = gpstate_timer_handler; + gpstates->policy = policy; + timer_setup(>timer, gpstate_timer_handler, + TIMER_PINNED | TIMER_DEFERRABLE); gpstates->timer.expires = jiffies + msecs_to_jiffies(GPSTATE_TIMER_INTERVAL); spin_lock_init(>gpstate_lock); diff --git a/include/linux/timer.h b/include/linux/timer.h index 5ef5c9e41a09..d11e819a86e2 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -132,8 +132,6 @@ static inline void init_timer_on_stack_key(struct timer_list *timer, __init_timer((timer), TIMER_PINNED) #define init_timer_deferrable(timer) \ __init_timer((timer), TIMER_DEFERRABLE) -#define init_timer_pinned_deferrable(timer)\ - __init_timer((timer), TIMER_DEFERRABLE | TIMER_PINNED) #define init_timer_on_stack(timer) \ __init_timer_on_stack((timer), 0) -- 2.7.4
[PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
From: Mathieu DesnoyersProvide a new command allowing processes to register their intent to use the private expedited command. This allows PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. Changes since v1: - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in powerpc membarrier_arch_sched_in(), given that we want to specifically check the next thread state. - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. - Use task_thread_info() to pass thread_info from task to *_ti_thread_flag(). Changes since v2: - Move membarrier_arch_sched_in() call to finish_task_switch(). - Check for NULL t->mm in membarrier_arch_fork(). - Use membarrier_sched_in() in generic code, which invokes the arch-specific membarrier_arch_sched_in(). This fixes allnoconfig build on PowerPC. - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing allnoconfig build on PowerPC. - Build and runtime tested on PowerPC. Changes since v3: - Simply rely on copy_mm() to copy the membarrier_private_expedited mm field on fork. - powerpc: test thread flag instead of reading membarrier_private_expedited in membarrier_arch_fork(). - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming from kernel thread, since mmdrop() implies a full barrier. - Set membarrier_private_expedited to 1 only after arch registration code, thus eliminating a race where concurrent commands could succeed when they should fail if issued concurrently with process registration. - Use READ_ONCE() for membarrier_private_expedited field access in membarrier_private_expedited. Matches WRITE_ONCE() performed in process registration. Changes since v4: - Move powerpc hook from sched_in() to switch_mm(), based on feedback from Nicholas Piggin. Signed-off-by: Mathieu Desnoyers CC: Peter Zijlstra CC: Paul E. McKenney CC: Boqun Feng CC: Andrew Hunter CC: Maged Michael CC: gro...@google.com CC: Avi Kivity CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Dave Watson CC: Alan Stern CC: Will Deacon CC: Andy Lutomirski CC: Ingo Molnar CC: Alexander Viro CC: Nicholas Piggin CC: linuxppc-dev@lists.ozlabs.org CC: linux-a...@vger.kernel.org Signed-off-by: Paul E. McKenney --- MAINTAINERS| 2 ++ arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/membarrier.h | 43 + arch/powerpc/include/asm/thread_info.h | 3 ++ arch/powerpc/kernel/Makefile | 2 ++ arch/powerpc/kernel/membarrier.c | 45 ++ arch/powerpc/mm/mmu_context.c | 7 + fs/exec.c | 1 + include/linux/mm_types.h | 3 ++ include/linux/sched/mm.h | 50 ++ include/uapi/linux/membarrier.h| 23 +++- init/Kconfig | 3 ++ kernel/fork.c | 2 ++ kernel/sched/core.c| 10 --- kernel/sched/membarrier.c | 25 ++--- 15 files changed, 199 insertions(+), 21 deletions(-) create mode 100644 arch/powerpc/include/asm/membarrier.h create mode 100644 arch/powerpc/kernel/membarrier.c diff --git a/MAINTAINERS b/MAINTAINERS index 65b0c88d5ee0..c5296d7f447b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8822,6 +8822,8 @@ L:linux-ker...@vger.kernel.org S: Supported F: kernel/sched/membarrier.c F: include/uapi/linux/membarrier.h +F: arch/powerpc/kernel/membarrier.c +F: arch/powerpc/include/asm/membarrier.h MEMORY MANAGEMENT L: linux...@kvack.org diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 809c468edab1..6f44c5f74f71 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -138,6 +138,7 @@ config PPC select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL + select ARCH_HAS_MEMBARRIER_HOOKS select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h new file mode 100644 index
Possible LMB hot unplug bug in 4.13+ kernels
Hi, I've stumbled in a LMB hot unplug problem when running a guest with 4.13+ kernel using QEMU 2.10. When trying to hot unplug a recently hotplugged LMB this is what I got, using an upstream kernel: --- QEMU cmd line: sudo ./qemu-system-ppc64 -name migrate_qemu -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0xf -device spapr-vscsi,id=scsi0,reg=0x2000 -smp 32,maxcpus=32,sockets=32,cores=1,threads=1 --machine pseries,accel=kvm,kvm-type=HV,usb=off,dump-guest-core=off -m 4G,slots=32,maxmem=32G -drive file=/home/danielhb/vm_imgs/f26.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -nographic Last login: Wed Oct 4 12:28:25 on hvc0 [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal: 4161728 kB MemFree: 3204352 kB MemAvailable: 3558336 kB [danielhb@localhost ~]$ QEMU 2.10.50 monitor - type 'help' for more information (qemu) (qemu) object_add memory-backend-ram,id=ram0,size=1G (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal: 5210304 kB MemFree: 4254656 kB MemAvailable: 4598144 kB [danielhb@localhost ~]$ (qemu) (qemu) device_del dimm0 (qemu) [ 136.058727] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal: 5210304 kB MemFree: 4253696 kB MemAvailable: 4597184 kB [danielhb@localhost ~]$ - The LMBs are about to be unplugged, them something happens and they ended up being hotplugged back. This isn't reproducible with 4.11 guests. I can reliably reproduce it in 4.13+. Haven't tried 4.12. Changing QEMU snapshots or even the hypervisor kernel/OS didn't affect the result. In an attempt to better understand the issue I did the following changes in upstream kernel: diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 1d48ab424bd9..37550833cdb0 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -433,8 +433,10 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb) unsigned long pfn, block_sz; u64 phys_addr; - if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) + if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) { + pr_err("lmb is not assigned \n"); return false; + } block_sz = memory_block_size_bytes(); scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; @@ -442,8 +444,10 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb) #ifdef CONFIG_FA_DUMP /* Don't hot-remove memory that falls in fadump boot memory area */ - if (is_fadump_boot_memory_area(phys_addr, block_sz)) + if (is_fadump_boot_memory_area(phys_addr, block_sz)) { + pr_err("lmb belongs to fadump boot memory area\n"); return false; + } #endif for (i = 0; i < scns_per_block; i++) { @@ -454,7 +458,9 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb) rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION); phys_addr += MIN_MEMORY_BLOCK_SIZE; } - + if (!rc) { + pr_err("is_mem_section_removable returned false \n"); + } return rc ? true : false; } @@ -465,12 +471,16 @@ static int dlpar_remove_lmb(struct of_drconf_cell *lmb) unsigned long block_sz; int nid, rc; - if (!lmb_is_removable(lmb)) + if (!lmb_is_removable(lmb)) { + pr_err("dlpar_remove_lmb: lmb is not removable! \n"); return -EINVAL; + } rc = dlpar_offline_lmb(lmb); - if (rc) + if (rc) { + pr_err("dlpar_remove_lmb: offline_lmb returned not zero \n"); return rc; + } block_sz = pseries_memory_block_size(); nid = memory_add_physaddr_to_nid(lmb->base_addr); And this is the output: - [danielhb@localhost ~]$ QEMU 2.10.50 monitor - type 'help' for more information (qemu) (qemu) object_add memory-backend-ram,id=ram0,size=1G (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal: 5210304 kB MemFree: 4254656 kB MemAvailable: 4598144 kB [danielhb@localhost ~]$ (qemu) (qemu) device_del dimm0 (qemu) [ 136.058473] pseries-hotplug-mem: is_mem_section_removable returned false [ 136.058607] pseries-hotplug-mem: dlpar_remove_lmb: lmb is not removable! [ 136.058727] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal: 5210304 kB MemFree: 4253696 kB MemAvailable: 4597184 kB [danielhb@localhost ~]$ --- It appears that the hot
Re: [bug report] out of bounds read parsing vmode commandline option
Hi Dan, On Wed, Oct 4, 2017 at 2:50 PM, Dan Carpenterwrote: > This bug predates git but it looks like it might be simple to fix if the > right person looked at the code. > > drivers/video/fbdev/controlfb.c:560 control_setup() > error: buffer overflow 'control_mac_modes' 20 <= 21 > > drivers/video/fbdev/controlfb.c >549 static void __init control_setup(char *options) >550 { >551 char *this_opt; >552 >553 if (!options || !*options) >554 return; >555 >556 while ((this_opt = strsep(, ",")) != NULL) { >557 if (!strncmp(this_opt, "vmode:", 6)) { >558 int vmode = simple_strtoul(this_opt+6, NULL, > 0); > ^ > We get vmode from the command line. > >559 if (vmode > 0 && vmode <= VMODE_MAX && > ^ > We check that it's <= 22. > >560 control_mac_modes[vmode - 1].m[1] >= 0) > ^ > But the problem is that control_mac_modes[] only has 20 elements so the > highest valid index is 19. vmode - 1 can be up to 21. Nice catch! The bug was introduced in v2.4.5.6, when 2 new modes were added to macmodes.h, but control_mac_modes[] wasn't updated: https://kernel.opensuse.org/cgit/kernel/diff/include/video/macmodes.h?h=v2.5.2=29f279c764808560eaceb88fef36cbc35c529aad A simple fix is to check against ARRAY_SIZE(control_mac_modes) instead. A better fix is to add the missing entries to control_mac_modes[], cfr. the (gmail-whitespace-damaged) patch below: --- a/drivers/video/fbdev/controlfb.h +++ b/drivers/video/fbdev/controlfb.h @@ -141,5 +141,7 @@ static struct max_cmodes control_mac_modes[] = { {{ 1, 2}}, /* 1152x870, 75Hz */ {{ 0, 1}}, /* 1280x960, 75Hz */ {{ 0, 1}}, /* 1280x1024, 75Hz */ + {{ 1, 2}}, /* 1152x768, 60Hz */ + {{ 0, 1}}, /* 1600x1024, 60Hz */ }; (this array lists the maximum color mode (8, 16, or 32 bpp) for each video mode given RAM restrictions (2 or 4 MiB)). The 1152x768 mode is probably OK. Given the 1600x1024 mode has a lower dotclock (112 MHz) than the supported 1280x960 mode, it's probably OK, too. platinum_reg_init[] and valkyrie_reg_init[] seem to be handled fine. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH] mm: deferred_init_memmap improvements
deferred_init_memmap() is called when struct pages are initialized later in boot by slave CPUs. This patch simplifies and optimizes this function, and also fixes a couple issues (described below). The main change is that now we are iterating through free memblock areas instead of all configured memory. Thus, we do not have to check if the struct page has already been initialized. = In deferred_init_memmap() where all deferred struct pages are initialized we have a check like this: if (page->flags) { VM_BUG_ON(page_zone(page) != zone); goto free_range; } This way we are checking if the current deferred page has already been initialized. It works, because memory for struct pages has been zeroed, and the only way flags are not zero if it went through __init_single_page() before. But, once we change the current behavior and won't zero the memory in memblock allocator, we cannot trust anything inside "struct page"es until they are initialized. This patch fixes this. The deferred_init_memmap() is re-written to loop through only free memory ranges provided by memblock. Note, this first issue is relevant only when the following change is merged: mm: stop zeroing memory during allocation in vmemmap = This patch fixes another existing issue on systems that have holes in zones i.e CONFIG_HOLES_IN_ZONE is defined. In for_each_mem_pfn_range() we have code like this: if (!pfn_valid_within(pfn) goto free_range; Note: 'page' is not set to NULL and is not incremented but 'pfn' advances. Thus means if deferred struct pages are enabled on systems with these kind of holes, linux would get memory corruptions. I have fixed this issue by defining a new macro that performs all the necessary operations when we free the current set of pages. Signed-off-by: Pavel TatashinReviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- mm/page_alloc.c | 168 1 file changed, 85 insertions(+), 83 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c841af88836a..dcfd657cfd4e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1410,14 +1410,17 @@ void clear_zone_contiguous(struct zone *zone) } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __init deferred_free_range(struct page *page, - unsigned long pfn, int nr_pages) +static void __init deferred_free_range(unsigned long pfn, + unsigned long nr_pages) { - int i; + struct page *page; + unsigned long i; - if (!page) + if (!nr_pages) return; + page = pfn_to_page(pfn); + /* Free a large naturally-aligned chunk if possible */ if (nr_pages == pageblock_nr_pages && (pfn & (pageblock_nr_pages - 1)) == 0) { @@ -1443,19 +1446,89 @@ static inline void __init pgdat_init_report_one_done(void) complete(_init_all_done_comp); } +/* + * Helper for deferred_init_range, free the given range, reset the counters, and + * return number of pages freed. + */ +static inline unsigned long __def_free(unsigned long *nr_free, + unsigned long *free_base_pfn, + struct page **page) +{ + unsigned long nr = *nr_free; + + deferred_free_range(*free_base_pfn, nr); + *free_base_pfn = 0; + *nr_free = 0; + *page = NULL; + + return nr; +} + +static unsigned long deferred_init_range(int nid, int zid, unsigned long pfn, +unsigned long end_pfn) +{ + struct mminit_pfnnid_cache nid_init_state = { }; + unsigned long nr_pgmask = pageblock_nr_pages - 1; + unsigned long free_base_pfn = 0; + unsigned long nr_pages = 0; + unsigned long nr_free = 0; + struct page *page = NULL; + + for (; pfn < end_pfn; pfn++) { + /* +* First we check if pfn is valid on architectures where it is +* possible to have holes within pageblock_nr_pages. On systems +* where it is not possible, this function is optimized out. +* +* Then, we check if a current large page is valid by only +* checking the validity of the head pfn. +* +* meminit_pfn_in_nid is checked on systems where pfns can +* interleave within a node: a pfn is between start and end +* of a node, but does not belong to this memory node. +* +* Finally, we minimize pfn page lookups and scheduler checks by +* performing it only once every pageblock_nr_pages. +*/ + if (!pfn_valid_within(pfn)) { + nr_pages
Re: [PATCH 0/2] powerpc/xive: fix CPU hot unplug
On 10/03/2017 08:58 AM, David Gibson wrote: > On Tue, Oct 03, 2017 at 08:24:07AM +0200, Cédric Le Goater wrote: >> On 10/03/2017 05:36 AM, David Gibson wrote: >>> On Mon, Oct 02, 2017 at 06:27:20PM +0200, Cédric Le Goater wrote: On 09/23/2017 10:26 AM, Cédric Le Goater wrote: > Hi, > > Here are a couple of small fixes to support CPU hot unplug. There are > still some issues to be investigated as, in some occasions, after a > couple of plug and unplug, the cpu which was removed receives a 'lost' > interrupt. This showed to be the decrementer under QEMU. So this seems to be a QEMU issue only which can be solved by removing the DEE bit from the LPCR on P9 processor when the CPU is stopped in rtas. PECE3 bit on P8 processors. I think these patches are valuable fixes for 4.14. The first is trivial and the second touches the common xive part but it is only called on the pseries platform. Could you please take a look ? >>> >>> Sorry, I think I've missed something here. >>> >>> Is there a qemu bug involved in this? Has there been a patch sent >>> that I didn't spot? >> >> >> No, not yet, but I will today probably. something like below to stop >> the decrementer when a CPU is stopped: >> >> --- qemu.git.orig/hw/ppc/spapr_rtas.c >> +++ qemu.git/hw/ppc/spapr_rtas.c >> @@ -174,6 +174,15 @@ static void rtas_start_cpu(PowerPCCPU *c >> kvm_cpu_synchronize_state(cs); >> >> env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); >> + >> +/* Enable DECR interrupt */ >> +if (env->mmu_model == POWERPC_MMU_3_00) { >> +env->spr[SPR_LPCR] |= LPCR_DEE; >> +} else { >> +/* P7 and P8 both have same bit for DECR */ >> +env->spr[SPR_LPCR] |= LPCR_P8_PECE3; >> +} >> + >> env->nip = start; >> env->gpr[3] = r3; >> cs->halted = 0; >> @@ -210,6 +219,13 @@ static void rtas_stop_self(PowerPCCPU *c >>* no need to bother with specific bits, we just clear it. >>*/ >> env->msr = 0; >> + >> +if (env->mmu_model == POWERPC_MMU_3_00) { >> +env->spr[SPR_LPCR] &= ~LPCR_DEE; >> +} else { >> +/* P7 and P8 both have same bit for DECR */ >> +env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; >> +} >> } >> >> static inline int sysparm_st(target_ulong addr, target_ulong len, >> >> I haven't yet because I fail to understand why the decrementer is not >> interrupting the dying CPU under xics as it is the case under XIVE. > > Oh.. ok. This sounds very similar to the problem Nikunj hit under TCG > with decrementer interrupts waking up a supposedly dead CPU. He had a > couple of proposed fixes, but we got bogged down trying to work out > why (with TCG at least) it only seemed to bite after a system_reset, > and not on initial boot up. yes. It would be nice to fix the reset under TCG though. May be this is related. >> Also I am not sure this hack is of any use : >> >> /* >> * While stopping a CPU, the guest calls H_CPPR which >> * effectively disables interrupts on XICS level. >> * However decrementer interrupts in TCG can still >> * wake the CPU up so here we disable interrupts in MSR >> * as well. >> * As rtas_start_cpu() resets the whole MSR anyway, there is >> * no need to bother with specific bits, we just clear it. >> */ >> env->msr = 0; > > Ok.. why do you think this isn't of use? I'm pretty sure this is > necessary for the TCG case, since MSR is checked in cpu_has_work(), > which could otherwise wake up the "dead" cpu. well, no, when the CPU is stopped with the 'stop-self' RTAS call, one of the CPU states is switched to 1 (cs->halted=1). In cpu_has_work(), this is a branch in which we don't check the MSR, only pending hardware interrupts are checked with their LPCR enablement bit. So if the DECR timer fires after 'stop-self' is called (cs->halted=1) and before it is really stopped (cs->stop=1), the nearly-dead CPU will have some work to do and the guest will crash. This case happens very frequently when the P9 XIVE exploitation mode is activated but it does not without, when using the XICS mode. In XICS mode, the DECR is occasionally fired but after cs->stop=1, so no work is to be done. The patch above fixes the problem but I don't understand why this works with XICS. My feeling is that there is a race somewhere and env->msr = 0; is just a useless workaround, in this case at least. C. > >> and the different CPU states are confusing. Nikunj already to a look >> at this when trying to fix the TCG reboot. Anyway, the QEMU patch >> should (re)start a thread. This is not the place to discuss. >> >> Thanks, >> >> C. >> >> >
[PATCH v2] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
With commit 425595a7fc20 ("livepatch: reuse module loader code to write relocations") livepatch uses module loader to write relocations of livepatch symbols, instead of managing them by arch-dependent klp_write_module_reloc() function. livepatch module managed relocation entries are written to sections marked with SHF_RELA_LIVEPATCH flag and livepatch symbols within the section are marked with SHN_LIVEPATCH symbol section index. When the livepatching module is loaded, the livepatch symbols are resolved before calling apply_relocate_add() to apply the relocations. R_PPC64_REL24 relocation type resolves to a function address, those may be local to the livepatch module or available in kernel/other modules. For every such non-local function, apply_relocate_add() constructs a stub (a.k.a trampoline) to branch to a function. Stub code is responsible to save toc onto the stack, before calling the function via the global entry point. A NOP instruction is expected after every non local function branch, i.e. after the REL24 relocation. Which in-turn is replaced by toc restore instruction by apply_relocate_add(). Functions those were local to livepatched function previously, may have become global now or they might be out of range with current TOC base. During module load, apply_relocate_add() fails for such global functions, as it expect's a nop after a branch. Which does't exist for a non-local function accessed via local entry point. For example, consider the following livepatch relocations (the example is from livepatch module generated by kpatch tool): Relocation section '.klp.rela.vmlinux..text.meminfo_proc_show' at offset 0x84530 contains 44 entries: Offset Info Type Symbol's Value Symbol's Name + Addend ...... R_PPC64_REL24 0x0 .klp.sym.vmlinux.si_swapinfo,0 + 0 ...... R_PPC64_REL24 0x0 .klp.sym.vmlinux.total_swapcache_pages,0 + 0 ...... R_PPC64_REL24 0x0 .klp.sym.vmlinux.show_val_kb,1 + 0 [...] 1. .klp.sym.vmlinux.si_swapinfo and .klp.sym.vmlinux.total_swapcache_pages are not available within the livepatch module TOC range. 2. .klp.sym.vmlinux.show_val_kb livepatch symbol was previously local but now global w.r.t module call fs/proc/meminfo.c::meminfo_proc_show() While the livepatch module is loaded the livepatch symbols mentioned in case 1 will fail with an error: module_64: kpatch_meminfo: REL24 -1152921504751525976 out of range! and livepatch symbols mentioned in case 2 with fail with an error: module_64: kpatch_meminfo: Expect noop after relocate, got 3d22 Both the failures with REL24 livepatch symbols relocation, can be resolved by constructing a new livepatch stub. The newly setup klp_stub mimics the functionality of entry_64.S::livepatch_handler introduced by commit 85baa095497f ("powerpc/livepatch: Add live patching support on ppc64le"). Which introduces a "livepatch stack" growing upwards from the base of the regular stack and is used to store/restore TOC/LR values, other than the stub setup and branch. The additional instructions sequences to handle klp_stub increases the stub size and current ppc64_stub_insn[] is not sufficient to hold them. This patch also introduces new ppc64le_klp_stub_entry[], along with the helpers to find/allocate livepatch stub. Signed-off-by: Kamalesh BabulalCc: Balbir Singh Cc: Naveen N. Rao Cc: Josh Poimboeuf Cc: Jessica Yu Cc: Ananth N Mavinakayanahalli Cc: Aravinda Prasad Cc: linuxppc-dev@lists.ozlabs.org Cc: live-patch...@vger.kernel.org --- This patch applies on top of livepatch_handler fix posted at https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-September/163824.html v2: - Changed klp_stub construction to re-use livepatch_handler and additional patch code required for klp_stub, instead of duplicating it. - Minor comments and commit body edits. arch/powerpc/include/asm/module.h | 4 + arch/powerpc/kernel/module_64.c| 135 - arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 31 ++ 3 files changed, 167 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 6c0132c7212f..de22c4c7aebc 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -44,6 +44,10 @@ struct mod_arch_specific { unsigned long toc; unsigned long tramp; #endif +#ifdef CONFIG_LIVEPATCH + /* Count of kernel livepatch relocations */ + unsigned long klp_relocs; +#endif #else /* powerpc64 */ /* Indices of PLT sections within module. */ diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 0b0f89685b67..5be955e59162 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -140,6 +140,24 @@ static u32
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
On 10/04/2017 10:04 AM, Michal Hocko wrote: On Wed 04-10-17 09:28:55, Pasha Tatashin wrote: I am not really familiar with the trim_low_memory_range code path. I am not even sure we have to care about it because nobody should be walking pfns outside of any zone. According to commit comments first 4K belongs to BIOS, so I think the memory exists but BIOS may or may not report it to Linux. So, reserve it to make sure we never touch it. Yes and that memory should be outside of any zones, no? I am not totally sure, I think some x86 expert could help us here. But, in either case this issue can be fixed separately from the rest of the series. I am worried that this patch adds a code which is not really used and it will just stay that way for ever because nobody will dare to change it as it is too obscure and not explained very well. I could explain mine code better. Perhaps add more comments, and explain when it can be removed? More explanation would be definitely helpful trim_low_memory_range is a good example of this. Why do we even reserve this range from the memory block allocator? The memory shouldn't be backed by any real memory and thus not in the allocator in the first place, no? Since it is not enforced in memblock that everything in reserved list must be part of memory list, we can have it, and we need to make sure kernel does not panic. Otherwise, it is very hard to detect such bugs. So, should we report such a memblock reservation API (ab)use to the log? Are you actually sure that trim_low_memory_range is doing a sane and really needed thing? In other words do we have a zone which contains this no-memory backed pfns? And, this patch reports it already: + pr_info("Reserved but unavailable: %lld pages", pgcnt); I could add a comment above this print call, explain that such memory is probably bogus and must be studied/fixed. Also, add that this code can be removed once memblock is changed to allow reserve only memory that is backed by physical memory i.e. in "memory" list. Pasha
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
On Wed 04-10-17 09:28:55, Pasha Tatashin wrote: > > > I am not really familiar with the trim_low_memory_range code path. I am > > not even sure we have to care about it because nobody should be walking > > pfns outside of any zone. > > According to commit comments first 4K belongs to BIOS, so I think the memory > exists but BIOS may or may not report it to Linux. So, reserve it to make > sure we never touch it. Yes and that memory should be outside of any zones, no? > > I am worried that this patch adds a code which > > is not really used and it will just stay that way for ever because > > nobody will dare to change it as it is too obscure and not explained > > very well. > > I could explain mine code better. Perhaps add more comments, and explain > when it can be removed? More explanation would be definitely helpful > > trim_low_memory_range is a good example of this. Why do we > > even reserve this range from the memory block allocator? The memory > > shouldn't be backed by any real memory and thus not in the allocator in > > the first place, no? > > > > Since it is not enforced in memblock that everything in reserved list must > be part of memory list, we can have it, and we need to make sure kernel does > not panic. Otherwise, it is very hard to detect such bugs. So, should we report such a memblock reservation API (ab)use to the log? Are you actually sure that trim_low_memory_range is doing a sane and really needed thing? In other words do we have a zone which contains this no-memory backed pfns? -- Michal Hocko SUSE Labs
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
I am not really familiar with the trim_low_memory_range code path. I am not even sure we have to care about it because nobody should be walking pfns outside of any zone. According to commit comments first 4K belongs to BIOS, so I think the memory exists but BIOS may or may not report it to Linux. So, reserve it to make sure we never touch it. I am worried that this patch adds a code which is not really used and it will just stay that way for ever because nobody will dare to change it as it is too obscure and not explained very well. I could explain mine code better. Perhaps add more comments, and explain when it can be removed? trim_low_memory_range is a good example of this. Why do we even reserve this range from the memory block allocator? The memory shouldn't be backed by any real memory and thus not in the allocator in the first place, no? Since it is not enforced in memblock that everything in reserved list must be part of memory list, we can have it, and we need to make sure kernel does not panic. Otherwise, it is very hard to detect such bugs.
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
On Wed 04-10-17 08:40:11, Pasha Tatashin wrote: > > > > Could you be more specific where is such a memory reserved? > > > > > > > > > > I know of one example: trim_low_memory_range() unconditionally reserves > > > from > > > pfn 0, but e820__memblock_setup() might provide the exiting memory from > > > pfn > > > 1 (i.e. KVM). > > > > Then just initialize struct pages for that mapping rigth there where a > > special API is used. > > > > > But, there could be more based on this comment from linux/page-flags.h: > > > > > > 19 * PG_reserved is set for special pages, which can never be swapped > > > out. > > > Some > > > 20 * of them might not even exist (eg empty_bad_page)... > > > > I have no idea wht empty_bad_page is but a quick grep shows that this is > > never used. I might be wrong here but if somebody is reserving a memory > > in a special way then we should handle the initialization right there. > > E.g. create an API for special memblock reservations. > > > > Hi Michal, > > The reservations happen before struct pages are allocated and mapped. So, it > is not always possible to do it at call sites. OK, I didn't realize that. > Previously, I have solved this problem like this: > > https://patchwork.kernel.org/patch/9886163 > > But, I was not too happy with that approach, so I replaced it with the > current approach as it is more generic, and solves similar issues if they > happen in other places. Also, the comment in page-flags got me scared that > there are probably other places perhaps on other architectures that can have > the similar issue. I believe the comment is just stale. I have looked into empty_bad_page and it is just a relict. I plan to post a patch soon. > In addition, I did not like my solution, I was simply shrinking the low > reservation from: > [0 - reserve_low) to [min_pfn - reserve_low), but if min_pfn > reserve_low > can we skip low reservation entirely? I was not sure. > > The current approach notifies us if there are such pages, and we can > fix/remove them in the future without crashing kernel in the meantime. I am not really familiar with the trim_low_memory_range code path. I am not even sure we have to care about it because nobody should be walking pfns outside of any zone. I am worried that this patch adds a code which is not really used and it will just stay that way for ever because nobody will dare to change it as it is too obscure and not explained very well. trim_low_memory_range is a good example of this. Why do we even reserve this range from the memory block allocator? The memory shouldn't be backed by any real memory and thus not in the allocator in the first place, no? -- Michal Hocko SUSE Labs
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
Could you be more specific where is such a memory reserved? I know of one example: trim_low_memory_range() unconditionally reserves from pfn 0, but e820__memblock_setup() might provide the exiting memory from pfn 1 (i.e. KVM). Then just initialize struct pages for that mapping rigth there where a special API is used. But, there could be more based on this comment from linux/page-flags.h: 19 * PG_reserved is set for special pages, which can never be swapped out. Some 20 * of them might not even exist (eg empty_bad_page)... I have no idea wht empty_bad_page is but a quick grep shows that this is never used. I might be wrong here but if somebody is reserving a memory in a special way then we should handle the initialization right there. E.g. create an API for special memblock reservations. Hi Michal, The reservations happen before struct pages are allocated and mapped. So, it is not always possible to do it at call sites. Previously, I have solved this problem like this: https://patchwork.kernel.org/patch/9886163 But, I was not too happy with that approach, so I replaced it with the current approach as it is more generic, and solves similar issues if they happen in other places. Also, the comment in page-flags got me scared that there are probably other places perhaps on other architectures that can have the similar issue. In addition, I did not like my solution, I was simply shrinking the low reservation from: [0 - reserve_low) to [min_pfn - reserve_low), but if min_pfn > reserve_low can we skip low reservation entirely? I was not sure. The current approach notifies us if there are such pages, and we can fix/remove them in the future without crashing kernel in the meantime. Pasha
Re: [PATCH v2 4/5] powerpc: pseries: only store the device node basename in full_name
Rob Herringwrites: > On Tue, Oct 3, 2017 at 4:26 AM, Michael Ellerman wrote: ... >> >> The obvious fix is just to allocate it separately as before, eg ~=: > > Yes, I'll go back to doing 2 allocs like v1, but using kstrdup as was > also pointed out. Yeah that would be better still I guess. cheers
Re: [PATCH v9 06/12] mm: zero struct pages during initialization
On 10/04/2017 04:45 AM, Michal Hocko wrote: On Tue 03-10-17 11:22:35, Pasha Tatashin wrote: On 10/03/2017 09:08 AM, Michal Hocko wrote: On Wed 20-09-17 16:17:08, Pavel Tatashin wrote: Add struct page zeroing as a part of initialization of other fields in __init_single_page(). This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes): BASEFIX sparse_init 11.244671836s 0.007199623s zone_sizes_init 4.879775891s 8.355182299s -- Total 16.124447727s 8.362381922s Hmm, this is confusing. This assumes that sparse_init doesn't zero pages anymore, right? So these number depend on the last patch in the series? Correct, without the last patch sparse_init time won't change. THen this is just misleading. OK, I will re-arrange patches the way you suggested earlier. Pasha
Re: [PATCHv3 7/7] checkpatch: add pF/pf deprecation warning
On Sat 2017-09-30 11:53:19, Sergey Senozhatsky wrote: > We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart > enough to handle function pointer dereference on platforms where such > dereference is required. > > checkpatch warning example: > > WARNING: Deprecated vsprintf pointer extension '%pF' - use %pS instead > > Signed-off-by: Sergey Senozhatsky> Signed-off-by: Joe Perches > Cc: Andy Whitcroft > Tested-by: Helge Deller # parisc64 > Tested-by: Santosh Sivaraj # powerpc64 > Acked-by: Michael Ellerman # powerpc64 > Tested-by: Tony Luck # ia64 Reviewed-by: Petr Mladek Best Regards, Petr
Re: [PATCHv3 6/7] symbol lookup: use new kernel and module dereference functions
On Sat 2017-09-30 11:53:18, Sergey Senozhatsky wrote: > Call appropriate function descriptor dereference ARCH callbacks: > - dereference_kernel_function_descriptor() if the pointer is a > kernel symbol; > > - dereference_module_function_descriptor() if the pointer is a > module symbol. > > This patch also removes dereference_function_descriptor() from > '%pF/%pf' vsprintf handler, because it has the same behavior with > '%pS/%ps' now. The description is pretty criptic. It should explain why the dereference was moved from vsprintf to the symbol lookup and if it is safe. Note that kallsyms_lookup() and module_address_lookup() is used in many other situations. Also I would not be afraid to repeat description of the big picture from the 2nd patch. > Signed-off-by: Sergey Senozhatsky> Tested-by: Helge Deller # parisc64 > Tested-by: Santosh Sivaraj # powerpc64 > Acked-by: Michael Ellerman # powerpc64 > Tested-by: Tony Luck # ia64 > --- > Documentation/printk-formats.txt | 20 ++-- > kernel/kallsyms.c| 1 + > kernel/module.c | 1 + > lib/vsprintf.c | 5 + > 4 files changed, 13 insertions(+), 14 deletions(-) > > diff --git a/Documentation/printk-formats.txt > b/Documentation/printk-formats.txt > index 361789df51ec..3adbc4fdd482 100644 > --- a/Documentation/printk-formats.txt > +++ b/Documentation/printk-formats.txt > @@ -50,26 +50,28 @@ Symbols/Function Pointers > > :: > > + %pS versatile_init+0x0/0x110 > + %ps versatile_init > %pF versatile_init+0x0/0x110 > %pf versatile_init > - %pS versatile_init+0x0/0x110 > %pSRversatile_init+0x9/0x110 > (with __builtin_extract_return_addr() translation) > - %ps versatile_init > %pB prev_fn_of_versatile_init+0x88/0x88 > > -The ``F`` and ``f`` specifiers are for printing function pointers, > -for example, f->func, They have the same result as > -``S`` and ``s`` specifiers. But they do an extra conversion on > -ia64, ppc64 and parisc64 architectures where the function pointers > -are actually function descriptors. > - > The ``S`` and ``s`` specifiers can be used for printing symbols > from direct addresses, for example, __builtin_return_address(0), > (void *)regs->ip. They result in the symbol name with (``S``) or > without (``s``) offsets. If KALLSYMS are disabled then the symbol > address is printed instead. This paragraph makes the feeling that ``S`` is still only for direct adresses. We should update it as well. > +Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``) > +and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and > +parisc64 function pointers are indirect and, in fact, are function > +descriptors, which require additional dereferencing before we can lookup > +the symbol. As of now, ``S`` and ``s`` perform dereferencing on those > +platforms (when needed), so ``F`` and ``f`` exist for compatibility > +reasons only. > + > The ``B`` specifier results in the symbol name with offsets and should be > used when printing stack backtraces. The specifier takes into > consideration the effect of compiler optimisations which may occur > @@ -77,8 +79,6 @@ when tail-call``s are used and marked with the noreturn GCC > attribute. > > Examples:: > > - printk("Going to call: %pF\n", gettimeofday); > - printk("Going to call: %pF\n", p->func); > printk("%s: called from %pS\n", __func__, (void *)_RET_IP_); > printk("%s: called from %pS\n", __func__, > (void *)__builtin_return_address(0)); We should either replace %pF with %pS or remove all examples. It is strange to keep only half of them. > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c > index 127e7cfafa55..e2fc09ea9509 100644 > --- a/kernel/kallsyms.c > +++ b/kernel/kallsyms.c > @@ -322,6 +322,7 @@ const char *kallsyms_lookup(unsigned long addr, > if (is_ksym_addr(addr)) { is_ksym_addr() ignores the special .opd elf sections if CONFIG_KALLSYMS_ALL is disabled. We should dereference before this call. > unsigned long pos; > > + addr = dereference_kernel_function_descriptor(addr); > pos = get_symbol_pos(addr, symbolsize, offset); I still wonder if doing the dereference in the widely used kallsyms might cause any regression. One possible problem is that this function returns "offset". One might expect that it is offset against "addr" but it is not if the dereference happens here. Also get_symbol_pos() is called in several other helpers but the dereference is done only here. It would be confusing if for example kallsyms_lookup_size_offset() and kallsyms_lookup() give different result. I would feel much more comfortable if we keep the derefenrece only in vsprintf. In each case, we
Re: [PATCH v2] powerpc: Blacklist GCC 5.4 6.1 and 6.2
Michal Suchánekwrites: > On Mon, 13 Feb 2017 19:49:16 -0600 > Segher Boessenkool wrote: > >> On Tue, Feb 14, 2017 at 11:25:43AM +1100, Cyril Bur wrote: >> > > > At the time of writing GCC 5.4 is the most recent and is >> > > > affected. GCC 6.3 contains the backported fix, has been tested >> > > > and appears safe to use. >> > > >> > > 6.3 is (of course) the newer release; 5.4 is a maintenance >> > > release of a compiler that is a year older. >> > >> > Yes. I think the point I was trying to make is that since they >> > backported the fix to 5.x and 6.x then I expect that 5.5 will have >> > the fix but since it doesn't exist yet, I can't be sure. I'll add >> > something to that effect. >> >> The patch has been backported to the GCC 5 branch; it will be part of >> any future GCC 5 release (5.5 and later, if any later will happen; 5.5 >> will). >> >> Don't be so unsure about these things, we aren't *that* >> incompetent ;-) > > Nonetheless the message is factually correct. > >> >> > > Please mention the GCC PR # somewhere in the code, too? >> > >> > Sure. > > It seems this has never happened? >From memory we discovered that there were patched versions of the compilers in distros, which still reported those version numbers. ie. the patch would break working compilers. So Cyril was going to write a test that actually built some code and checked the generated asm. But I suspect he never got around to it. cheers
Re: [PATCH 2/2] powerpc/strict_rwx: fixup region splitting
Balbir Singhwrites: > We were aggressive with splitting regions and missed the case > when _stext and __init_begin completely overlap addr and addr+mapping. > > This patch fixes that case and allows us to keep the largest possible > mapping through the overlap. The patch also simplies the check > into a function Please do the fix in one patch, which we can backport if necessary, and then move it into a function in a separate patch. cheers
Re: [PATCHv3 4/7] powerpc64: Add .opd based function descriptor dereference
Petr Mladekwrites: > On Sat 2017-09-30 11:53:16, Sergey Senozhatsky wrote: >> diff --git a/arch/powerpc/kernel/module_64.c >> b/arch/powerpc/kernel/module_64.c >> index 0b0f89685b67..94caec045a90 100644 >> --- a/arch/powerpc/kernel/module_64.c >> +++ b/arch/powerpc/kernel/module_64.c >> @@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, >> return 0; >> } >> >> +#ifdef PPC64_ELF_ABI_v1 >> +unsigned long dereference_module_function_descriptor(struct module *mod, >> + unsigned long addr) >> +{ >> +if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd) >> +return addr; >> + >> +return dereference_function_descriptor(addr); >> +} >> +#endif /* PPC64_ELF_ABI_v1 */ > > I would personally move this up in the source file. It is related to > the definition of func_desc() and other functions that are > also PPC_ELF_ABI-specific. Yeah that would be neater. There's already a PPC64_ELF_ABI_v2 block, you could put this in the else case of that. But we can do that later if you're not respinning otherwise. cheers
Re: [PATCHv3 5/7] parisc64: Add .opd based function descriptor dereference
On Sat 2017-09-30 11:53:17, Sergey Senozhatsky wrote: > We are moving towards separate kernel and module function descriptor > dereference callbacks. This patch enables it for parisc64. > > For pointers that belong to the kernel > - Added __start_opd and __end_opd pointers, to track the kernel >.opd section address range; > > - Added dereference_kernel_function_descriptor(). Now we >will dereference only function pointers that are within >[__start_opd, __end_opd]; > > For pointers that belong to a module > - Added dereference_module_function_descriptor() to handle module >function descriptor dereference. Now we will dereference only >pointers that are within [module->opd.start, module->opd.end]. > diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c > index f1a76935a314..28f89b3dcc11 100644 > --- a/arch/parisc/kernel/module.c > +++ b/arch/parisc/kernel/module.c > @@ -954,3 +955,19 @@ void module_arch_cleanup(struct module *mod) > { > deregister_unwind_table(mod); > } > + > +#ifdef CONFIG_64BIT > +unsigned long dereference_module_function_descriptor(struct module *mod, > + unsigned long addr) > +{ > + unsigned long start_opd = (Elf64_Addr)mod->core_layout.base + > +mod->arch.fdesc_offset; > + unsigned long end_opd = start_opd + > + mod->arch.fdesc_count * sizeof(Elf64_Fdesc); I know that this is used in rather slow paths. But it still might make sense to have these section borders pre-computed and stored in struct mod_arch_specific. I mean to do similar thing that we do on powerpc. Well, we could do this in a followup patch if parisc people wanted it. > + if (addr < start_opd || addr >= end_opd) > + return addr; > + > + return dereference_function_descriptor(addr); > +} > +#endif Otherwise the patch looks fine to me. Reviewed-by: Petr MladekBest Regards, Petr
[PATCH v2 2/2] powerpc/xive: clear XIVE internal structures when a CPU is removed
Commit eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") introduced support for the XIVE exploitation mode of the P9 interrupt controller on the pseries platform. At that time, support for CPU removal was not complete on PowerVM and CPU hot unplug remained untested. It appears that some cleanups of the XIVE internal structures are required before releasing the CPU, without which the kernel crashes in a rtas call doing the CPU isolation. These changes fix the crash by deconfiguring the IPI interrupt source and clearing the event queues of the CPU when it is removed. Signed-off-by: Cédric Le Goater--- arch/powerpc/sysdev/xive/common.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 1c087ed7427f..b9440ac7a3c1 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -1398,6 +1398,14 @@ void xive_teardown_cpu(void) if (xive_ops->teardown_cpu) xive_ops->teardown_cpu(cpu, xc); + +#ifdef CONFIG_SMP + /* Get rid of IPI */ + xive_cleanup_cpu_ipi(cpu, xc); +#endif + + /* Disable and free the queues */ + xive_cleanup_cpu_queues(cpu, xc); } void xive_kexec_teardown_cpu(int secondary) -- 2.13.5
[PATCH v2 1/2] powerpc/xive: fix IPI reset
When resetting an IPI, hw_ipi should also be set to zero. Signed-off-by: Cédric Le Goater--- arch/powerpc/sysdev/xive/spapr.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c index f24a70bc6855..d9c4c9366049 100644 --- a/arch/powerpc/sysdev/xive/spapr.c +++ b/arch/powerpc/sysdev/xive/spapr.c @@ -431,7 +431,11 @@ static int xive_spapr_get_ipi(unsigned int cpu, struct xive_cpu *xc) static void xive_spapr_put_ipi(unsigned int cpu, struct xive_cpu *xc) { + if (!xc->hw_ipi) + return; + xive_irq_bitmap_free(xc->hw_ipi); + xc->hw_ipi = 0; } #endif /* CONFIG_SMP */ -- 2.13.5
[PATCH v2 0/2] powerpc/xive: fix CPU hot unplug
Hi, Recent commit eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller") introduced support for the XIVE exploitation mode of the P9 interrupt controller on the pseries platform. At the time the patchset was sent, support for CPU removal was not complete on PowerVM and only CPU hotplug was tested. To also support CPU hot unplug, some cleanups of the XIVE internal structures are needed. These two patches include the missing bits without which the kernel crashes in a rtas call when a CPU is released. There are still some corner cases to address when stressing the lpar with plug/unplug loops. These are not necessarily Linux issues. Under QEMU, it showed to be the decrementer waking the dying CPU. Under phyp, it is much more rare and investigation is under progress. Tested under a phyp and a XIVE QEMU model for pseries. These changes should only impact the pseries platform. Thanks, C. Changes since v2 : - clarified commit log. Cédric Le Goater (2): powerpc/xive: fix IPI reset powerpc/xive: fix cpu removal arch/powerpc/sysdev/xive/common.c | 8 arch/powerpc/sysdev/xive/spapr.c | 4 2 files changed, 12 insertions(+) -- 2.13.5
Re: [PATCHv3 4/7] powerpc64: Add .opd based function descriptor dereference
On Sat 2017-09-30 11:53:16, Sergey Senozhatsky wrote: > We are moving towards separate kernel and module function descriptor > dereference callbacks. This patch enables it for powerpc64. > > For pointers that belong to the kernel > - Added __start_opd and __end_opd pointers, to track the kernel >.opd section address range; > > - Added dereference_kernel_function_descriptor(). Now we >will dereference only function pointers that are within >[__start_opd, __end_opd]; > > For pointers that belong to a module > - Added dereference_module_function_descriptor() to handle module >function descriptor dereference. Now we will dereference only >pointers that are within [module->opd.start, module->opd.end]. > > Signed-off-by: Sergey Senozhatsky> Tested-by: Helge Deller # parisc64 > Tested-by: Santosh Sivaraj # powerpc64 > Acked-by: Michael Ellerman # powerpc64 > Tested-by: Tony Luck # ia64 > --- > arch/powerpc/include/asm/module.h | 3 +++ > arch/powerpc/include/asm/sections.h | 11 +++ > arch/powerpc/kernel/module_64.c | 16 > arch/powerpc/kernel/vmlinux.lds.S | 2 ++ > 4 files changed, 32 insertions(+) > > diff --git a/arch/powerpc/include/asm/module.h > b/arch/powerpc/include/asm/module.h > index 6c0132c7212f..7e28442827f1 100644 > --- a/arch/powerpc/include/asm/module.h > +++ b/arch/powerpc/include/asm/module.h > @@ -45,6 +45,9 @@ struct mod_arch_specific { > unsigned long tramp; > #endif > > + /* For module function descriptor dereference */ > + unsigned long start_opd; > + unsigned long end_opd; > #else /* powerpc64 */ > /* Indices of PLT sections within module. */ > unsigned int core_plt_section; > diff --git a/arch/powerpc/include/asm/sections.h > b/arch/powerpc/include/asm/sections.h > index 67379b8945e8..6b4ee0d1645f 100644 > --- a/arch/powerpc/include/asm/sections.h > +++ b/arch/powerpc/include/asm/sections.h > @@ -75,6 +75,17 @@ static inline unsigned long > dereference_function_descriptor(unsigned long ptr) > ptr = (unsigned long)p; > return ptr; > } > + > +#undef dereference_kernel_function_descriptor > +static inline unsigned long > +dereference_kernel_function_descriptor(unsigned long addr) > +{ > + if (addr < (unsigned long)__start_opd || > + addr >= (unsigned long)__end_opd) > + return addr; > + > + return dereference_function_descriptor(addr); > +} > #endif /* PPC64_ELF_ABI_v1 */ > > #endif > diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c > index 0b0f89685b67..94caec045a90 100644 > --- a/arch/powerpc/kernel/module_64.c > +++ b/arch/powerpc/kernel/module_64.c > @@ -344,6 +344,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr, > else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0) > dedotify_versions((void *)hdr + sechdrs[i].sh_offset, > sechdrs[i].sh_size); > + else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) { > + me->arch.start_opd = sechdrs[i].sh_addr; > + me->arch.end_opd = sechdrs[i].sh_addr + > +sechdrs[i].sh_size; > + } > > /* We don't handle .init for the moment: rename to _init */ > while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init"))) > @@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, > return 0; > } > > +#ifdef PPC64_ELF_ABI_v1 > +unsigned long dereference_module_function_descriptor(struct module *mod, > + unsigned long addr) > +{ > + if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd) > + return addr; > + > + return dereference_function_descriptor(addr); > +} > +#endif /* PPC64_ELF_ABI_v1 */ I would personally move this up in the source file. It is related to the definition of func_desc() and other functions that are also PPC_ELF_ABI-specific. Otherwise, it looks good to me. Reviewed-by: Petr Mladek Best Regards, Petr
Re: [PATCH 04/11] ia64: make dma_cache_sync a no-op
On Wed, Oct 04, 2017 at 08:59:24AM +, David Laight wrote: > Are you sure about this one? Yes. And if you are not you haven't read either of the cover letter, the DMA-API documentation, or the reply from Thomas to your mail yesterday.
Re: [PATCHv3 3/7] ia64: Add .opd based function descriptor dereference
On Sat 2017-09-30 11:53:15, Sergey Senozhatsky wrote: > We are moving towards separate kernel and module function descriptor > dereference callbacks. This patch enables it for IA64. > > For pointers that belong to the kernel > - Added __start_opd and __end_opd pointers, to track the kernel >.opd section address range; > > - Added dereference_kernel_function_descriptor(). Now we >will dereference only function pointers that are within >[__start_opd, __end_opd]; > > For pointers that belong to a module > - Added dereference_module_function_descriptor() to handle module >function descriptor dereference. Now we will dereference only >pointers that are within [module->opd.start, module->opd.end]. > > Signed-off-by: Sergey Senozhatsky> Tested-by: Helge Deller # parisc64 > Tested-by: Santosh Sivaraj # powerpc64 > Acked-by: Michael Ellerman # powerpc64 > Tested-by: Tony Luck # ia64 Looks good to me. Reviewed-by: Petr Mladek Best Regards, Petr
Re: [PATCH 1/2] powerpc/strict_rwx: quirks for STRICT_RWX patches
Hi Balbir, Mainly I think we need to add a check in mark_initmem_nx() too don't we? Or should we put it somewhere common to both? But seeing as I'm replying here are some more comments. > Subject: [PATCH 1/2] powerpc/strict_rwx: quirks for STRICT_RWX patches This misses three key things, 1) it's a bug fix 2) it's for Power9 DD1, and 3) it only applies to radix. So how about: powerpc/64s: Fix crashes on Power9 DD1 with radix MMU and STRICT_RWX Balbir Singhwrites: > This patch disables STRICT_RWX for power9 DD1 machines > where due to some limitations with the way we do tlb > updates, we clear the TLB entry of the text that's doing > the update to kernel text and that does lead to a > crash. This mostly describes the patch, but I would prefer more detail, eg: When using the radix MMU on Power9 DD1, to work around a hardware problem, radix__pte_update() is required to do a two stage update of the PTE. First we write a zero value into the PTE, then we flush the TLB, and then we write the new PTE value. In the normal case that works OK, but it does not work if we're updating the PTE that maps the code we're executing, because the mapping is removed by the TLB flush and we can no longer execute from it. Unfortunately the STRICT_RWX code needs to do exactly that. The exact symptoms when we hit this case vary, sometimes we print an oops and then get stuck after that, but I've also seen a machine just get stuck continually page faulting with no oops printed. The variance is presumably due to the exact layout of the text and the page size used for the mappings. In all cases we are unable to boot to a shell. There are possible solutions such as creating a second mapping of the TLB flush code, executing from that, and then jumping back to the original. However we don't want to add that level of complexity for a DD1 work around. So just detect that we're running on Power9 DD1 and refrain from changing the permissions, effectively disabling STRICT_RWX on Power9 DD1. > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c > index 39c252b..c2a2b46 100644 > --- a/arch/powerpc/mm/pgtable-radix.c > +++ b/arch/powerpc/mm/pgtable-radix.c > @@ -169,6 +169,18 @@ void radix__mark_rodata_ro(void) > { > unsigned long start, end; > > + /* > + * mark_rodata_ro() will mark itself as !writable at some point ^ . > + * due to workaround in radix__pte_update(), we'll end up with ^ ^ D the DD1 > + * an invalid pte and the system will crash quite severly. > + * The alternatives are quite cumbersome and leaving out > + * the page containing the flush code is not very nice. Just drop the last sentence, it doesn't have enough detail to be useful and we chose not to implement those alternatives, so better not to confuse the reader by talking about them. > + */ > + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { > + pr_warn("Warning: Unable to mark rodata read only on P9 DD1\n"); > + return; > + } Don't we also need to do the check in radix__mark_initmem_nx() ? cheers
Re: [PATCHv3 2/7] sections: split dereference_function_descriptor()
On Sat 2017-09-30 11:53:14, Sergey Senozhatsky wrote: > There are two format specifiers to print out a pointer in symbolic > format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two > mean exactly the same thing, but some architectures (ia64, ppc64, > parisc64) use an indirect pointer for C function pointers, where > the function pointer points to a function descriptor (which in > turn contains the actual pointer to the code). The '%pF/%pf, when > used appropriately, automatically does the appropriate function > descriptor dereference on such architectures. > > The "when used appropriately" part is tricky. Basically this is > a subtle ABI detail, specific to some platforms, that made it to > the API level and people can be unaware of it and miss the whole > "we need to dereference the function" business out. [1] proves > that point (note that it fixes only '%pF' and '%pS', there might > be '%pf' and '%ps' cases as well). > > It appears that we can handle everything within the affected > arches and make '%pS/%ps' smart enough to retire '%pF/%pf'. > Function descriptors live in .opd elf section and all affected > arches (ia64, ppc64, parisc64) handle it properly for kernel > and modules. So we, technically, can decide if the dereference > is needed by simply looking at the pointer: if it belongs to > .opd section then we need to dereference it. Great catch. I really like this approach! > diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h > index e5da44eddd2f..387f22c41e0d 100644 > --- a/include/asm-generic/sections.h > +++ b/include/asm-generic/sections.h > @@ -29,6 +29,7 @@ > * __ctors_start, __ctors_end > * __irqentry_text_start, __irqentry_text_end > * __softirqentry_text_start, __softirqentry_text_end > + * __start_opd, __end_opd > */ > extern char _text[], _stext[], _etext[]; > extern char _data[], _sdata[], _edata[]; > @@ -47,12 +48,15 @@ extern char __softirqentry_text_start[], > __softirqentry_text_end[]; > /* Start and end of .ctors section - used for constructor calls. */ > extern char __ctors_start[], __ctors_end[]; > > +/* Start and end of .opd section - used for function descriptors. */ > +extern char __start_opd[], __end_opd[]; > + > extern __visible const void __nosave_begin, __nosave_end; > > -/* function descriptor handling (if any). Override > - * in asm/sections.h */ > +/* Function descriptor handling (if any). Override in asm/sections.h */ > #ifndef dereference_function_descriptor > #define dereference_function_descriptor(p) (p) > +#define dereference_kernel_function_descriptor(p) (p) > #endif > > /* random extra sections (if any). Override > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h > index 4d0cb9bba93e..172904e9cded 100644 > --- a/include/linux/moduleloader.h > +++ b/include/linux/moduleloader.h > @@ -85,6 +85,10 @@ void module_arch_cleanup(struct module *mod); > /* Any cleanup before freeing mod->module_init */ > void module_arch_freeing_init(struct module *mod); > > +/* Dereference module function descriptor */ > +unsigned long dereference_module_function_descriptor(struct module *mod, > + unsigned long addr); > + The function is used when the module is already loaded. IMHO, include/linux/module.h would be a better place. One advantage would be that we could use the same trick as in include/asm-generic/sections.h. I mean: #define dereference_module_function_descriptor(mod, addr) (addr) and redefine it in the three affected arch//include/asm/module.h headers. Then it might be completely optimized out on all architectures. Anyway, we need to get ack from Jessica for this change. Best Regards, Petr
RE: [PATCH 04/11] ia64: make dma_cache_sync a no-op
From: Christoph Hellwig > Sent: 03 October 2017 11:43 > > ia64 does not implement DMA_ATTR_NON_CONSISTENT allocations, so it doesn't > make any sense to do any work in dma_cache_sync given that it must be a > no-op when dma_alloc_attrs returns coherent memory. > > Signed-off-by: Christoph Hellwig> --- > arch/ia64/include/asm/dma-mapping.h | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/arch/ia64/include/asm/dma-mapping.h > b/arch/ia64/include/asm/dma-mapping.h > index 3ce5ab4339f3..99dfc1aa9d3c 100644 > --- a/arch/ia64/include/asm/dma-mapping.h > +++ b/arch/ia64/include/asm/dma-mapping.h > @@ -48,11 +48,6 @@ static inline void > dma_cache_sync (struct device *dev, void *vaddr, size_t size, > enum dma_data_direction dir) > { > - /* > - * IA-64 is cache-coherent, so this is mostly a no-op. However, we do > need to > - * ensure that dma_cache_sync() enforces order, hence the mb(). > - */ > - mb(); > } Are you sure about this one? It looks as though you are doing a mechanical change for all architectures. Some of them are probably stranger than you realise. Even with cache coherent memory any cpu 'store/write buffer' may not be snooped by dma reads. Something needs to flush the store buffer between the last cpu write to the dma buffer and the write (probably a device register) that tells the device it can read the memory. My guess from the comment is that dma_cache_synch() is expected to include that barrier - and it might not be anywhere else. David
Re: [PATCH v9 08/12] mm: zero reserved and unavailable struct pages
On Tue 03-10-17 11:29:16, Pasha Tatashin wrote: > On 10/03/2017 09:18 AM, Michal Hocko wrote: > > On Wed 20-09-17 16:17:10, Pavel Tatashin wrote: > > > Some memory is reserved but unavailable: not present in memblock.memory > > > (because not backed by physical pages), but present in memblock.reserved. > > > Such memory has backing struct pages, but they are not initialized by > > > going > > > through __init_single_page(). > > > > Could you be more specific where is such a memory reserved? > > > > I know of one example: trim_low_memory_range() unconditionally reserves from > pfn 0, but e820__memblock_setup() might provide the exiting memory from pfn > 1 (i.e. KVM). Then just initialize struct pages for that mapping rigth there where a special API is used. > But, there could be more based on this comment from linux/page-flags.h: > > 19 * PG_reserved is set for special pages, which can never be swapped out. > Some > 20 * of them might not even exist (eg empty_bad_page)... I have no idea wht empty_bad_page is but a quick grep shows that this is never used. I might be wrong here but if somebody is reserving a memory in a special way then we should handle the initialization right there. E.g. create an API for special memblock reservations. -- Michal Hocko SUSE Labs
Re: [PATCH v9 03/12] mm: deferred_init_memmap improvements
On Tue 03-10-17 12:01:08, Pasha Tatashin wrote: > Hi Michal, > > Are you OK, if I replace DEFERRED_FREE() macro with a function like this: > > /* > * Helper for deferred_init_range, free the given range, and reset the > * counters > */ > static inline unsigned long __def_free(unsigned long *nr_free, >unsigned long *free_base_pfn, >struct page **page) > { > unsigned long nr = *nr_free; > > deferred_free_range(*free_base_pfn, nr); > *free_base_pfn = 0; > *nr_free = 0; > *page = NULL; > > return nr; > } > > Since it is inline, and we operate with non-volatile counters, compiler will > be smart enough to remove all the unnecessary de-references. As a plus, we > won't be adding any new branches, and the code is still going to stay > compact. OK. It is a bit clunky but we are holding too much state there. I haven't checked whether that can be simplified but this can be always done later. -- Michal Hocko SUSE Labs
Re: [PATCH v9 06/12] mm: zero struct pages during initialization
On Tue 03-10-17 11:22:35, Pasha Tatashin wrote: > > > On 10/03/2017 09:08 AM, Michal Hocko wrote: > > On Wed 20-09-17 16:17:08, Pavel Tatashin wrote: > > > Add struct page zeroing as a part of initialization of other fields in > > > __init_single_page(). > > > > > > This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895 > > > v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes): > > > > > > BASEFIX > > > sparse_init 11.244671836s 0.007199623s > > > zone_sizes_init 4.879775891s 8.355182299s > > >-- > > > Total 16.124447727s 8.362381922s > > > > Hmm, this is confusing. This assumes that sparse_init doesn't zero pages > > anymore, right? So these number depend on the last patch in the series? > > Correct, without the last patch sparse_init time won't change. THen this is just misleading. -- Michal Hocko SUSE Labs
Re: [PATCH v9 12/12] mm: stop zeroing memory during allocation in vmemmap
On Tue 03-10-17 16:26:51, Pasha Tatashin wrote: > Hi Michal, > > I decided not to merge these two patches, because in addition to sparc > optimization move, we have this dependancies: optimizations can and should go on top of the core patch. > mm: zero reserved and unavailable struct pages > > must be before > > mm: stop zeroing memory during allocation in vmemmap. > > Otherwise, we can end-up with struct pages that are not zeroed properly. Right and you can deal with it easily. Just introduce the mm_zero_struct_page earlier along with its user in "stop zeroing ..." I think that moving the zeroying in one go is more reasonable than adding it to __init_single_page with misleading numbers and later dropping the zeroying from the memmap path. -- Michal Hocko SUSE Labs
Re: [PATCH 07/11] powerpc: make dma_cache_sync a no-op
Le 03/10/2017 à 13:43, Christoph Hellwig a écrit : On Tue, Oct 03, 2017 at 01:24:57PM +0200, Christophe LEROY wrote: powerpc does not implement DMA_ATTR_NON_CONSISTENT allocations, so it doesn't make any sense to do any work in dma_cache_sync given that it must be a no-op when dma_alloc_attrs returns coherent memory. What about arch/powerpc/mm/dma-noncoherent.c ? Powerpc 8xx doesn't have coherent memory. It doesn't implement the DMA_ATTR_NON_CONSISTENT interface either, so if it really doesn't have a way to provide dma coherent allocation (although the code in __dma_alloc_coherent suggests it does provide dma coherent allocations) I have no idea how it could ever have worked. Yes indeed it provides coherent memory by allocation non cached memory. And drivers aiming at using non coherent memory do it by using kmalloc() with GFP_DMA then dma_map_single(). Then they use dma_sync_single_for_xxx(), which calls __dma_sync() on the 8xx and is a nop on other powerpcs. Christophe
Re: [PATCHv3 1/7] switch dereference_function_descriptor() to `unsigned long'
On Sat 2017-09-30 11:53:13, Sergey Senozhatsky wrote: > Convert dereference_function_descriptor() to accept and return > `unsigned long'. There will be two new ARCH function for kernel > and module function pointer dereference, which will work with > `unsigned long', so the patch unifies interfaces. > > Besides, dereference_function_descriptor() mostly work with > `unsigned long': > > Convert dereference_function_descriptor() users tree-wide. I am not sure if this is a real win. If I count correctly, the net result is 6 additional casts in this patch. Many casts are still needed also in the other patches. > diff --git a/arch/ia64/include/asm/sections.h > b/arch/ia64/include/asm/sections.h > index 2ab2003698ef..de6bfa1ef8fb 100644 > --- a/arch/ia64/include/asm/sections.h > +++ b/arch/ia64/include/asm/sections.h > @@ -27,13 +27,13 @@ extern char __start_unwind[], __end_unwind[]; > extern char __start_ivt_text[], __end_ivt_text[]; > > #undef dereference_function_descriptor > -static inline void *dereference_function_descriptor(void *ptr) > +static inline unsigned long dereference_function_descriptor(unsigned long > ptr) I would also expect that a function called "dereference*" would work with a pointer. The parameter is called ptr but it is unsigned long. > { > - struct fdesc *desc = ptr; > + struct fdesc *desc = (struct fdesc *)ptr; > void *p; > > if (!probe_kernel_address(>ip, p)) > - ptr = p; > + ptr = (unsigned long)p; > return ptr; > } > > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c > index 1ca9a2b4239f..06e1b79e2946 100644 > --- a/arch/parisc/mm/init.c > +++ b/arch/parisc/mm/init.c > @@ -389,10 +389,10 @@ static void __init setup_bootmem(void) > static int __init parisc_text_address(unsigned long vaddr) > { > static unsigned long head_ptr __initdata; > + unsigned long addr = (unsigned long)_kernel_start; > > if (!head_ptr) > - head_ptr = PAGE_MASK & (unsigned long) > - dereference_function_descriptor(_kernel_start); > + head_ptr = PAGE_MASK & dereference_function_descriptor(addr); IMHO, this is harder to read than the original. You need to search the definition of "addr" and check its manipulation to understand the meaning. > > return core_kernel_text(vaddr) || vaddr == head_ptr; > } To make it clear. All these comments are not a big deal and I do not want to invalidate all the acked-by and tested-by just because of them. But please, consider removing this change if we need to do another iteration of this patchset. IMHO, there is no real win and it would just pollute the git history. Best Regards, Petr
Re: [PATCH] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
On Tuesday 03 October 2017 03:00 PM, Naveen N . Rao wrote: Hi Naveen, [snip] > > A few small nits focusing on just the trampoline... > >> diff --git a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> index c98e90b..708a96d 100644 >> --- a/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> +++ b/arch/powerpc/kernel/trace/ftrace_64_mprofile.S >> @@ -249,6 +249,83 @@ livepatch_handler: >> >> /* Return to original caller of live patched function */ >> blr >> + >> +/* >> + * This livepatch stub code, called from livepatch module to jump into >> + * kernel or other modules. It replicates the livepatch_handler code, >> + * with an expection of jumping to the trampoline instead of patched >> + * function. >> + */ >> +.global klp_stub_insn >> +klp_stub_insn: >> +CURRENT_THREAD_INFO(r12, r1) >> + >> +/* Allocate 3 x 8 bytes */ >> +ld r11, TI_livepatch_sp(r12) >> +addir11, r11, 24 >> +std r11, TI_livepatch_sp(r12) >> + >> +/* Save toc & real LR on livepatch stack */ >> +std r2, -24(r11) >> +mflrr12 >> +std r12, -16(r11) >> + >> +/* Store stack end marker */ >> +lis r12, STACK_END_MAGIC@h >> +ori r12, r12, STACK_END_MAGIC@l >> +std r12, -8(r11) > > Seeing as this is the same as livepatch_handler() except for this part > in the middle, does it make sense to reuse livepatch_handler() with > appropriate labels added for your use? You could patch in the below 5 > instructions using the macros from ppc-opcode.h... Thanks for the review. The current upstream livepatch_handler code is a bit different. I have posted a bug fix to https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-September/163824.html which alters the livepatch_handler to have similar code. It's a good idea to re-use the livepatch_handler code. I will re-spin v2 with based on top of bug fix posted earlier. > >> + >> +/* >> + * Stub memory is allocated dynamically, during the module load. >> + * Load TOC relative address into r11. module_64.c::klp_stub_for_addr() >> + * identifies the available free stub slot and loads the address into >> + * r11 with two instructions. >> + * >> + * addis r11, r2, stub_address@ha >> + * addi r11, r11, stub_address@l >> + */ >> +.global klp_stub_entry >> +klp_stub_entry: >> +addis r11, r2, 0 >> +addir11, r11, 0 >> + >> +/* Load the r12 with called function address from entry->funcdata */ >> +ld r12, 128(r11) >> + >> +/* Move r12 into ctr for global entry and branch there */ >> +mtctr r12 >> +bctrl >> + >> +/* >> + * Now we are returning to the patched function. We are free to >> + * use r11, r12 and we can use r2 until we restore it. >> + */ >> +CURRENT_THREAD_INFO(r12, r1) >> + >> +ld r11, TI_livepatch_sp(r12) >> + >> +/* Check stack marker hasn't been trashed */ >> +lis r2, STACK_END_MAGIC@h >> +ori r2, r2, STACK_END_MAGIC@l >> +ld r12, -8(r11) >> +2: tdner12, r2 >> +EMIT_BUG_ENTRY 2b, __FILE__, __LINE__ - 1, 0 > > If you plan to keep this trampoline separate from livepatch_handler(), > note that the above bug entry is not required since you copy only the > text of this trampoline elsewhere and you won't have an associated bug > entry for that new stub address. > Agree, klp_stub_entry trampoline will go away in v2. -- cheers, Kamalesh.
Re: [PATCH v3 00/20] Speculative page faults
On 25/09/2017 18:27, Alexei Starovoitov wrote: > On Mon, Sep 18, 2017 at 12:15 AM, Laurent Dufour >wrote: >> Despite the unprovable lockdep warning raised by Sergey, I didn't get any >> feedback on this series. >> >> Is there a chance to get it moved upstream ? > > what is the status ? > We're eagerly looking forward for this set to land, > since we have several use cases for tracing that > will build on top of this set as discussed at Plumbers. Hi Alexei, Based on Plumber's note [1], it sounds that the use case is tied to the BPF tracing where a call tp find_vma() call will be made on a process's context to fetch user space's symbols. Am I right ? Is the find_vma() call made in the context of the process owning the mm struct ? Thanks, Laurent. [1] https://etherpad.openstack.org/p/LPC2017_Tracing)
[PATCH] powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
Nest/core pmu units are enabled only when it is used. A reference count is maintained for the events which uses the nest/core pmu units. Currently in *_imc_counters_release function a WARN() is used for notification of any underflow of ref count. The case where event ref count hit a negative value is, when perf session is started, followed by offlining of all cpus in a given core. i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the ref->count to zero, if the current cpu which is about to offline is the last cpu in a given core and make an OPAL call to disable the engine in that core. And on perf session termination, perf->destroy (core_imc_counters_release) will first decrement the ref->count for this core and based on the ref->count value an opal call is made to disable the core-imc engine. Now, since cpuhotplug path already clears the ref->count for core and disabled the engine, perf->destroy() decrementing again at event termination make it negative which in turn fires the WARN_ON. The same happens for nest units. Add a check to see if the reference count is alreday zero, before decrementing the count, so that the ref count will not hit a negative value. Signed-off-by: Anju T Sudhakar--- arch/powerpc/perf/imc-pmu.c | 28 1 file changed, 28 insertions(+) diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c index 9ccac86f3463..e3a1f65933b5 100644 --- a/arch/powerpc/perf/imc-pmu.c +++ b/arch/powerpc/perf/imc-pmu.c @@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event *event) /* Take the mutex lock for this node and then decrement the reference count */ mutex_lock(>lock); + if (ref->refc == 0) { + /* +* The scenario where this is true is, when perf session is +* started, followed by offlining of all cpus in a given node. +* +* In the cpuhotplug offline path, ppc_nest_imc_cpu_offline() +* function set the ref->count to zero, if the cpu which is +* about to offline is the last cpu in a given node and make +* an OPAL call to disable the engine in that node. +* +*/ + mutex_unlock(>lock); + return; + } ref->refc--; if (ref->refc == 0) { rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST, @@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event *event) return; mutex_lock(>lock); + if (ref->refc == 0) { + /* +* The scenario where this is true is, when perf session is +* started, followed by offlining of all cpus in a given core. +* +* In the cpuhotplug offline path, ppc_core_imc_cpu_offline() +* function set the ref->count to zero, if the cpu which is +* about to offline is the last cpu in a given core and make +* an OPAL call to disable the engine in that core. +* +*/ + mutex_unlock(>lock); + return; + } ref->refc--; if (ref->refc == 0) { rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE, -- 2.14.1