[ANNOUNCE] 3.6.4-rt10
Dear RT Folks, I'm pleased to announce the 3.6.4-rt10 release. This is just an update to 3.6.4 with no RT related changes The RT patch against 3.6.4 can be found here: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patch-3.6.4-rt10.patch.xz The split quilt queue is available at: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patches-3.6.4-rt10.tar.xz Enjoy, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT pull] Futex fix for 3.7
Linus, please pull the latest core-urgent-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-urgent-for-linus Single fix for a long standing futex race when taking over a futex whose owner died. You can end up with two owners, which violates quite some rules. Thanks, tglx -- Thomas Gleixner (1): futex: Handle futex_pi OWNER_DIED take over correctly kernel/futex.c | 41 ++--- 1 files changed, 22 insertions(+), 19 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 3717e7b..20ef219 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -716,7 +716,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, struct futex_pi_state **ps, struct task_struct *task, int set_waiters) { - int lock_taken, ret, ownerdied = 0; + int lock_taken, ret, force_take = 0; u32 uval, newval, curval, vpid = task_pid_vnr(task); retry: @@ -755,17 +755,15 @@ retry: newval = curval | FUTEX_WAITERS; /* -* There are two cases, where a futex might have no owner (the -* owner TID is 0): OWNER_DIED. We take over the futex in this -* case. We also do an unconditional take over, when the owner -* of the futex died. -* -* This is safe as we are protected by the hash bucket lock ! +* Should we force take the futex? See below. */ - if (unlikely(ownerdied || !(curval FUTEX_TID_MASK))) { - /* Keep the OWNER_DIED bit */ + if (unlikely(force_take)) { + /* +* Keep the OWNER_DIED and the WAITERS bit and set the +* new TID value. +*/ newval = (curval ~FUTEX_TID_MASK) | vpid; - ownerdied = 0; + force_take = 0; lock_taken = 1; } @@ -775,7 +773,7 @@ retry: goto retry; /* -* We took the lock due to owner died take over. +* We took the lock due to forced take over. */ if (unlikely(lock_taken)) return 1; @@ -790,20 +788,25 @@ retry: switch (ret) { case -ESRCH: /* -* No owner found for this futex. Check if the -* OWNER_DIED bit is set to figure out whether -* this is a robust futex or not. +* We failed to find an owner for this +* futex. So we have no pi_state to block +* on. This can happen in two cases: +* +* 1) The owner died +* 2) A stale FUTEX_WAITERS bit +* +* Re-read the futex value. */ if (get_futex_value_locked(curval, uaddr)) return -EFAULT; /* -* We simply start over in case of a robust -* futex. The code above will take the futex -* and return happy. +* If the owner died or we have a stale +* WAITERS bit the owner TID in the user space +* futex is 0. */ - if (curval FUTEX_OWNER_DIED) { - ownerdied = 1; + if (!(curval FUTEX_TID_MASK)) { + force_take = 1; goto retry; } default: -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] 3.7-rc3+git hard lockup on CPU after inserting/removing USB stick
On Mon, 12 Nov 2012, Martin Steigerwald wrote: Am Sonntag, 11. November 2012 schrieb Liu, Chuansheng: The first bad commit is: commit 73d4066055e0e2830533041f4b91df8e6e5976ff Author: Chuansheng Liu chuansheng@intel.com Date: Tue Sep 11 16:00:30 2012 +0800 USB/host: Cleanup unneccessary irq disable code Because the IRQF_DISABLED as the flag is now a NOOP and has been deprecated and in hardirq context the interrupt is disabled. so in usb/host code: Removing the usage of flag IRQF_DISABLED; Removing the calling local_irq save/restore actions in irq handler usb_hcd_irq(); Signed-off-by: liu chuansheng chuansheng@intel.com Acked-by: Alan Stern st...@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gre...@linuxfoundation.org But: This ony happens with threadirqs option! When I remove threadirqs from kernel command line and reboot with this last bisect kernel USB sticks work. That may explain why nobody else has seen this. So I will try a 3.7-rc4 now, but without threadirqs enabled. Thanks your pointing out, the USB HCD irq handler is designed to execute in irq handler with irq disabled. When threadirqs is in commandline, it will be executed in thread context with local irq enabling, which causes this hardlockup. No. The problem is caused by the commit above. USB with threaded interrupt handlers worked perfectly fine in the past. --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2349,7 +2349,7 @@ static int usb_hcd_request_irqs(struct usb_hcd *hcd, if (hcd-driver-irq) { snprintf(hcd-irq_descr, sizeof(hcd-irq_descr), %s:usb%d, hcd-driver-description, hcd-self.busnum); - retval = request_irq(irqnum, usb_hcd_irq, irqflags, + retval = request_irq(irqnum, usb_hcd_irq, irqflags|IRQF_NO_THREAD, hcd-irq_descr, hcd); NAK. This is exactly the wrong thing to do. We want to be able to run that code in an handler thread. So you removed the local_irq_save/restore() in the driver code and with forced threaded irqs this breaks. Now setting IRQF_NO_THREAD is just working around the problem that the above commit broke it. There is no hard requirement to run USB interrupts in hard interrupt context. I'd rather see the above commit reverted and then a proper analysis done why removing local_irq_save/restore() breaks forced threaded interrupt handlers. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v1 02/31] ARC: irqflags
On Wed, 7 Nov 2012, Vineet Gupta wrote: + ** + * Inline ASM macros to read/write AUX Regs + * Essentially invocation of lr/sr insns from C + */ + +#if 1 Leftover ??? +#define read_aux_reg(reg)__builtin_arc_lr(reg) + +/* gcc builtin sr needs reg param to be long immediate */ +#define write_aux_reg(reg_immed, val)\ + __builtin_arc_sr((unsigned int)val, reg_immed) + +#else +/* + * Conditionally Enable IRQs Unconditionally methinks The following two functions are related to irq chips I guess. So why would you want them here ? +static inline void arch_mask_irq(unsigned int irq) +{ + unsigned int ienb; + + ienb = read_aux_reg(AUX_IENABLE); + ienb = ~(1 irq); + write_aux_reg(AUX_IENABLE, ienb); +} + +static inline void arch_unmask_irq(unsigned int irq) +{ + unsigned int ienb; + + ienb = read_aux_reg(AUX_IENABLE); + ienb |= (1 irq); + write_aux_reg(AUX_IENABLE, ienb); +} The only user is the interrupt controller code, right? diff --git a/arch/arc/kernel/irq.c b/arch/arc/kernel/irq.c new file mode 100644 index 000..16fcbe8 --- /dev/null +++ b/arch/arc/kernel/irq.c @@ -0,0 +1,32 @@ +/* + * Copyright (C) 2011-12 Synopsys, Inc. (www.synopsys.com) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include linux/interrupt.h +#include linux/module.h +#include asm/irqflags.h +#include asm/arcregs.h + +void arch_local_irq_enable(void) +{ + + unsigned long flags; + flags = arch_local_save_flags(); + flags |= (STATUS_E1_MASK | STATUS_E2_MASK); + + /* + * If called from hard ISR (between irq_enter and irq_exit) + * don't allow Level 1. In Soft ISR we allow further Level 1s + */ + + if (in_irq()) + flags = ~(STATUS_E1_MASK | STATUS_E2_MASK); Hmm. This looks weird and the comment is not very helpful. So using my crystal ball you want to enforce, that nothing enables interrupts while a hard interrupt handler is running, right? Is there a chip limitation which you have to enforce here? If yes, then please explain it. Btw, all hard interrupt handlers in Linux run with interrupts disabled and they are not supposed to reenable interrupts, which is true for almost all drivers except for a few archaic IDE drivers. In fact you might even WARN about it at least once, so the offending code gets fixed. Also the code flow is backwards. What about: unsigned long flags; if (in_irq()) return; flags = + arch_local_irq_restore(flags); +} Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v1 12/31] ARC: Interrupt Handling
On Wed, 7 Nov 2012, Vineet Gupta wrote: +void __init init_IRQ(void) +{ + const int irq = TIMER0_IRQ; + + /* + * Each CPU needs to register irq of it's private TIMER0. + * The APIs request_percpu_irq()/enable_percpu_irq() will not be + * functional, if we don't prep the generic IRQ sub-system with + * the following: + * -Ensure that devid passed to request_percpu_irq() is indeed per cpu + * -disable NOAUTOEN, w/o which the device handler never gets called What sets NOAUTOEN in the first place? The core code definitely does not. + */ + irq_set_percpu_devid(irq); + irq_modify_status(irq, IRQ_NOAUTOEN, 0); Aside of that we have irq_clear_status_flags() for this. + plat_init_IRQ(); +} +int __init get_hw_config_num_irq(void) How is that function used ? +{ + uint32_t val = read_aux_reg(ARC_REG_VECBASE_BCR); + + switch (val 0x03) { + case 0: + return 16; + case 1: + return 32; + case 2: + return 8; + default: + return 0; + } + + return 0; +} Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v1 15/31] ARC: Process/scheduling/clock/Timers/Delay Management
On Wed, 7 Nov 2012, Vineet Gupta wrote: +void cpu_idle(void) +{ + /* Since we SLEEP in idle loop, TIF_POLLING_NRFLAG can't be set */ + + /* endless idle loop with no priority at all */ + while (1) { + tick_nohz_idle_enter(); + + while (!need_resched()) + arch_idle(); + + tick_nohz_idle_exit(); + + preempt_enable_no_resched(); + schedule(); + preempt_disable(); schedule_preempt_disabled() please + } diff --git a/arch/arc/kernel/time.c b/arch/arc/kernel/time.c +static void arc_periodic_timer_setup(unsigned int limit) +{ + /* setup start and end markers */ + write_aux_reg(ARC_REG_TIMER0_LIMIT, limit); + write_aux_reg(ARC_REG_TIMER0_CNT, 0); /* start from 0 */ + + /* IE: Interrupt on count = limit, + * NH: Count cycles only when CPU running (NOT Halted) + */ + write_aux_reg(ARC_REG_TIMER0_CTRL, TIMER_CTRL_IE | TIMER_CTRL_NH); +} + +/* + * Acknowledge the interrupt enable/disable the interrupt + */ +static void arc_periodic_timer_ack(unsigned int irq_reenable) +{ + /* 1. Ack the interrupt by writing to CTRL reg. + *Any write will cause intr to be ack, however it has to be one of + *writable bits (NH: Count when not halted) + * 2. If required by caller, re-arm timer to Interrupt at the end of + *next cycle. + * + * Small optimisation: + * Normal code would have been + * if (irq_reenable) CTRL_REG = (IE | NH); else CTRL_REG = NH; + * However since IE is BIT0 we can fold the branch + */ + write_aux_reg(ARC_REG_TIMER0_CTRL, irq_reenable | TIMER_CTRL_NH); +} +/** Clock Event Device */ + +static int arc_clkevent_set_next_event(unsigned long delta, + struct clock_event_device *dev) +{ + arc_periodic_timer_setup(delta); This is confusing. Is arc_periodic_timer_setup() setting up a periodic timer or a oneshot timer? It looks you use it for both and the differentiation happens in arc_periodic_timer_ack(). So I assume the timer only knows about periodic mode, but you trick it into oneshot with the ack function, right ? So it's just me being confused about the function names, but that could do with some explanatory comments. + return 0; +} + +static void arc_clkevent_set_mode(enum clock_event_mode mode, +struct clock_event_device *dev) +{ + pr_info(Device [%s] clockevent mode now [%d]\n, dev-name, mode); Please remove the debug leftover. + switch (mode) { + case CLOCK_EVT_MODE_PERIODIC: + arc_periodic_timer_setup(CONFIG_ARC_PLAT_CLK / HZ); + break; + case CLOCK_EVT_MODE_ONESHOT: + break; + default: + break; + } + + return; +} + +static DEFINE_PER_CPU(struct clock_event_device, arc_clockevent_device) = { + .name = ARC Timer0, + .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC, + .mode = CLOCK_EVT_MODE_UNUSED, + .rating = 300, + .irq= TIMER0_IRQ, /* hardwired, no need for resources */ + .set_next_event = arc_clkevent_set_next_event, + .set_mode = arc_clkevent_set_mode, +}; + +irqreturn_t timer_irq_handler(int irq, void *dev_id) static please +static int arc_finished_booting; + +/* + * Scheduler clock - returns current time in nanosec units. + * It's return value must NOT wrap around. + * + * Although the return value is nanosec units based, what's more important + * is whats the source of this value. The orig jiffies based computation + * was only as granular as jiffies itself (10ms on ARC). + * We need something that is more granular, so use the same mechanism as + * gettimeofday(), which uses ARC Timer T1 wrapped as a clocksource. + * Unfortunately the first call to sched_clock( ) is way before that subsys + * is initialiased, thus use the jiffies based value in the interim. + */ +unsigned long long sched_clock(void) +{ + if (!arc_finished_booting) { + return (unsigned long long)(jiffies - INITIAL_JIFFIES) + * (NSEC_PER_SEC / HZ); + } else { + struct timespec ts; + getrawmonotonic(ts); This can live lock. sched_clock() is used by the tracer. So assume you are function tracing and you trace a function called from within the timekeeping seqcount write locked region. You spin forever in getrawmonotonic(). Not what you want, right ? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] 3.6.6-rt17
Dear RT Folks, I'm pleased to announce the 3.6.6-rt17 release. 3.6.6-rt16 is just a not announced update release to 3.6.6. Changes since 3.6.6-rt16: * Finally make the NOHZ softirq pending detection work with the new softirq scheme. * Remove the WARN_ON from __raise_softirq_irqoff(). I got the information I want for now. The delta patch against 3.6.6-rt16 is appended below and can be found here: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/incr/patch-3.6.6-rt16-rt17.patch.xz The RT patch against 3.6.6 can be found here: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patch-3.6.6-rt17.patch.xz The split quilt queue is available at: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patches-3.6.6-rt17.tar.xz Enjoy, tglx - Index: linux-stable/kernel/softirq.c === --- linux-stable.orig/kernel/softirq.c +++ linux-stable/kernel/softirq.c @@ -100,20 +100,15 @@ void softirq_check_pending_idle(void) { static int rate_limit; struct softirq_runner *sr = __get_cpu_var(softirq_runners); - u32 warnpending, pending = local_softirq_pending(); + u32 warnpending = local_softirq_pending(); + int i; if (rate_limit = 10) return; - warnpending = pending; - - while (pending) { - struct task_struct *tsk; - int i = __ffs(pending); - - pending = ~(1 i); + for (i = 0; i NR_SOFTIRQS; i++) { + struct task_struct *tsk = sr-runner[i]; - tsk = sr-runner[i]; /* * The wakeup code in rtmutex.c wakes up the task * _before_ it sets pi_blocked_on to NULL under @@ -638,7 +633,7 @@ static void do_raise_softirq_irqoff(unsi void __raise_softirq_irqoff(unsigned int nr) { do_raise_softirq_irqoff(nr); - if (WARN_ON_ONCE(!in_irq() !current-softirq_nestcnt)) + if (!in_irq() !current-softirq_nestcnt) wakeup_softirqd(); } Index: linux-stable/localversion-rt === --- linux-stable.orig/localversion-rt +++ linux-stable/localversion-rt @@ -1 +1 @@ --rt16 +-rt17 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [STABLE REQUEST] add: e1000: fix lockdep splat in shutdown handler
On Thu, 11 Oct 2012, Steven Rostedt wrote: commit 3a3847e007aae732d64d8fd1374126393e9879a3 Author: Jesse Brandeburg jesse.brandeb...@intel.com Date: Wed Jan 4 20:23:33 2012 + e1000: fix lockdep splat in shutdown handler as I discussed with Jesse on IRC, there is another possible deadlock lurking in the e1000 code. static void e1000_reinit_safe(struct e1000_adapter *adapter) { while (test_and_set_bit(__E1000_RESETTING, adapter-flags)) msleep(1); mutex_lock(adapter-mutex); e1000_down(adapter); e1000_down() waits on the various work tasks to shut down, but those work functions might be blocked on the adapter mutex. I have no idea how I managed to trigger that one, but it's real. The task dump I got out of the machine shows stuff waiting on each other forever. I can't give you a receipe to reprodruce. Looking at the code this is not very surprising. It takes quite some coincidence of having e1000_reinit_safe() being invoked and the delayed work timer bringing the work on right after e1000_reinit_safe() took the adapter mutex. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] posix timers: allocate timer id per task
On Mon, 15 Oct 2012, Stanislav Kinsbursky wrote: This patch is required CRIU project (www.criu.org). To migrate processes with posix timers we have to make sure, that we can restore posix timer with proper id. Currently, this is not true, because timer ids are allocated globally. So, this is precursor patch and it's purpose is make posix timer id to be allocated per task. You can't allocate them per task. posix timers are process wide. What's the reason why you did not make the posix timer ids per name space instead of going down to the per process level ? Patch replaces global idr with global hash table for posix timers and makes timer ids unique not globally, but per task. Next free timer id is type of integer and stored on signal struct (posix_timer_id). If free timer id reaches negative value on timer creation, it will be dropped to zero and -EAGAIN will be returned to user. So you want to allow 2^31 posix timers created for a single process? +static struct k_itimer *__posix_timers_find(struct hlist_head *head, struct signal_struct *sig, timer_t id) +{ + struct hlist_node *node; + struct k_itimer *timer; + + hlist_for_each_entry(timer, node, head, t_hash) { + if ((timer-it_signal == sig) (timer-it_id == id)) + return timer; + } + return NULL; +} + +static struct k_itimer *posix_timer_find(timer_t id, unsigned long *flags) +{ + struct k_itimer *timer; + struct signal_struct *sig = current-signal; + struct hlist_head *head = posix_timers_hashtable[hash(sig, id)]; + + spin_lock_irqsave(hash_lock, *flags); This is not going to fly. You just reintroduced a massive scalability problem. See commit 8af08871 Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] 3.6.1-rt2
Dear RT Folks, I'm pleased to announce the 3.6.1-rt2 release. Changes since 3.6.1-rt1: * Picked up Pauls git friendly quilt queue * Compile fix for !RT_FULL (Paul Gortemaker) * Crypto init order fix * Tiny RCU fix which affects UP and is a long standing bug affecting 3.2 and 3.4-rt as well. The delta patch against 3.6.1-rt1 is appended below and can be found here http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/incr/patch-3.6.1-rt1-rt2.patch.xz The RT patch against 3.6.1 can be found here: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patch-3.6.1-rt2.patch.xz The split quilt queue is available at: http://www.kernel.org/pub/linux/kernel/projects/rt/3.6/patches-3.6.1-rt2.tar.xz Enjoy, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] 3.6.1-rt2
On Tue, 16 Oct 2012, Javier Sanz wrote: Hello, Testing, and FYI $uname -a Linux darkstar 3.6.1-rt2 #1 SMP PREEMPT RT Tue Oct 16 22:47:06 CEST 2012 i686 i686 i386 GNU/Linux shows all time ... [ 30.543233] fuse init (API version 7.20) [ 33.262077] Crap, ksoftirqd/0 looping forever in softirq [ 33.344865] Crap, ksoftirqd/2 looping forever in softirq [ 33.401736] Crap, ksoftirqd/0 looping forever in softirq [ 33.409743] Crap, ksoftirqd/0 looping forever in softirq [ 33.421658] Crap, ksoftirqd/0 looping forever in softirq [ 33.428628] Crap, ksoftirqd/0 looping forever in softirq [ 33.496468] Crap, ksoftirqd/2 looping forever in softirq Grrr. Forgot to remove that printk. Will do in the next spin. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote: Works as well. What's the difference between this and the real thing ? The real thing also calls device_power_down(PMSG_FREEZE), which is a counterpart of sysdev_shutdown(), more or less, and I think that's what goes belly up. You can use the patch below (on top of -rc6-mm1), which just disables the image creation (that should be irrelevant anyway) and see what happens. In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); disable_nonboot_cpus(); swsusp_suspend(); enable_nonboot_cpus(); platform_finish(platform_mode); device_resume(); resume_console(); We disable everything in device_suspend() including timekeeping, so any code which is depending on working timekeeping and timer functionality (which is suspended in timekeeping_suspend() as well) is busted. enable_nonboot_cpus() definitely relies on working timekeeping and timers depending on the codepath. It's just a surprise that this did not blow up earlier (also before clock events). I changed the ordering of the above to: disable_nonboot_cpus(); swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); swsusp_suspend(); platform_finish(platform_mode); device_resume(); resume_console(); enable_nonboot_cpus(); and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, but I postpone this until tomorrow morning. I'm dead tired after hunting this Heisenbug which changes with every other printk added to the code. I'm going to add some really noisy messages for everything which accesses timekeeping / timers _after_ those systems have been shut down. We really need to fix this once and forever _before_ 2.6.23 final, even if it requires a -rc8. Thanks, tglx --- a/kernel/power/disk.c 2007-09-11 09:25:24.0 +0200 +++ b/kernel/power/disk.c 2007-09-20 22:47:30.0 +0200 @@ -130,10 +130,14 @@ int hibernation_snapshot(int platform_mo { int error; + error = disable_nonboot_cpus(); + if (error) + goto resume_cpus; + /* Free memory before shutting down devices. */ error = swsusp_shrink_memory(); if (error) - return error; + goto resume_cpus; suspend_console(); error = device_suspend(PMSG_FREEZE); @@ -144,23 +148,22 @@ int hibernation_snapshot(int platform_mo if (error) goto Resume_devices; - error = disable_nonboot_cpus(); - if (!error) { - if (hibernation_mode != HIBERNATION_TEST) { - in_suspend = 1; - error = swsusp_suspend(); - /* Control returns here after successful restore */ - } else { - printk(swsusp debug: Waiting for 5 seconds.\n); - mdelay(5000); - } + if (hibernation_mode != HIBERNATION_TEST) { + in_suspend = 1; + error = swsusp_suspend(); + /* Control returns here after successful restore */ + } else { + printk(swsusp debug: Waiting for 5 seconds.\n); + mdelay(5000); } - enable_nonboot_cpus(); + Resume_devices: platform_finish(platform_mode); device_resume(); Resume_console: resume_console(); +resume_cpus: + enable_nonboot_cpus(); return error; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). Damn, you are right. Reading through 30 different logs confused me. enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. Well, I don't buy this one. The system would break in the same way, when I take CPU#1 offline before I initiate the suspend. and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. It is related. We rely on some subtle thing which is not up when we resume the non boot cpu. I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. That's possible. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, See above. :-) Yes. It makes sense. When I change the TEST code path to: - printk(swsusp debug: Waiting for 5 seconds.\n); - msleep(5000); + printk(swsusp debug: before swsusp_suspend\n); + error = swsusp_suspend(); then I have the same effect as I get from real hibernation. And we actually shut down time keeping somewhere in that code path. ACPI: PCI interrupt for device :00:1b.0 disabled swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 - works fine This is with my patch applied. Without that I get: CPU1 is down swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping Enabling non-boot CPUs -- Waits for ever until a key is pressed Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:54 +0200, Rafael J. Wysocki wrote: Hmm. This is close to the ordering we have in STR too. I have some dim memory of there being some ACPI reason why it had to be done that way. Yes. We're executing _INI from the CPU initialization code and that shouldn't be done after _WAK, which is called from platform_finish(). If I tear down CPU#1 right before I tell the kernel to hibernate, then the box must explode in the same way. It does not. On none of 4 tested laptops. Of course only the jinxed VAIO one exposes the please press a key problem. I need to follow down the swsusp_suspend() code path to figure out, why this breaks the box. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Linus, On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote: And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. Sorry for the noise, but I'm zooming in. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 19:35 -0400, Len Brown wrote: (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about it appears that some firmware in the future may depend on that sequence for correction operation Len, Shaohua, what are the real issues here? Intel's reference BIOS for Core Duo performs some re-initialization in _WAK that will get blow away if INIT follows _WAK. IIR, it is related to re-initializing the thermal sensors. I opened bug 5651 when the BIOS team informed me of this issue. Yes, bringing a processor offline and then online again w/o an intervening suspend or reset would not evaluate _WAK, and thus may still run into the issue. If this is true, then we should disable the sys//cpu/online entry right away. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Fri, 2007-09-21 at 14:51 +1000, Paul Mackerras wrote: Linus Torvalds writes: It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That is certainly what we want to do on powerpc. I would have expected that we do it exactly this way and it took me by surprise, that we do not. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 00:30 +0200, Rafael J. Wysocki wrote: -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. Yes, it is. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Len, any opinion on this one ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 16:20 +0200, Rafael J. Wysocki wrote: If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. Hm, can you please run the test I've suggested in another branch of the thread, ie. # echo shutdown /sys/power/disk # echo disk /sys/power/state without your debugging code in disk.c? This makes the hibernation code omit the major ACPI hooks, so if it works, we'll know that these hooks are responsible for the problem. Yes, this works fine. We still go into C3, but this seems not longer to brick the box. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Well, that seems less intrusive than changing the code ordering right before the major kernel release, but I think we should do our best to understand what _exactly_ is happening here. I found some other subtle thinko in the clock events code while I was heading down the swsusp_suspend code path. I wait for confirmation that it does not brick some endangered boxen, though. Still with this change in the clock events code, my VAIO goes into C2 or C3 and causes the box to wait for a helping keystroke. The correct solution would be, that the ACPI code ignores the lower C-states during suspend / resume. I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. I think we really should not fiddle with the various cpu states during the critical parts of suspend / resume. Let's keep it simple. We have the same policy during boot and I think the suspend / resume critical parts have similar constraints. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 21:20 +0200, Rafael J. Wysocki wrote: On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. So, perhaps we can add a .suspend()/.resume() routines to the processor driver and use them to disable/enable the cpuidle functionality during a suspend/resume? http://tglx.de/private/tglx/p.diff untested yet, but I'm on the way to do that :) tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: clockevents: fix resume logic
On Mon, 2007-09-17 at 18:37 +, Pavel Machek wrote: That's a bit tricky because hitting the keyboard is what unsticks things. And the video is black after resume-from-RAM (has always been thus) and we Ok, can we try to fix the video issue for you? That should make the development easier... I assume you tried s2ram from suspend.sf.net, and no combination of switches helped? I have the same issue. Blank screen after suspend to ram. Hibernate works. Do you have a debug patch or something ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: A revised timerfd API
On Sat, 2007-09-22 at 18:07 +0200, Michael Kerrisk wrote: Hello Bernd, Please don't trim the CC list when replying! I nearly did not see your reply, and others will have missed it also. Yup. On 9/22/07, Bernd Eckenfels [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED] you wrote: 1. This design stretches the POSIX timers API in strange ways. Maybe it is possible to reimplement the POSIX API in usermode using the kernel's FD implementation? Yikes. It's a clever idea... Without thinking on it too long, I'm not sure whether or not there might be some details which would make this difficult. You'd need be quite masochistic to start such a project. The POSIX timer API consists mostly of corner cases and I doubt that you get them even halfway under control in a pure user space implementation. It would be a rather huge performance penalty as well. You need at least two user space context switches to get the most simple cases resolved. (and drop the posix support from kernel) However we couldn't drop POSIX support from the kernel, because that would break the ABI. True. So there is no point in reinventing the wheel. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: A revised timerfd API
Michael, On Sat, 2007-09-22 at 15:12 +0200, Michael Kerrisk wrote: Davide, Andrew, Linus, et al. At the start of this thread (http://thread.gmane.org/gmane.linux.kernel/581115 ), I proposed 4 alternatives to Davide's original timerfd API. Based on the feedback in that thread (and one or two earlier comments): Let's dismiss option (a), since it is an unlovely multiplexing interface. Option (b) seems a viable. The most notable concern was from Thomas Gleixner, that we might end up duplicating code from the POSIX timers API within the timerfd API -- some eventual refactoring might mitigate this problem. It should be possible to use the timerfd syscalls as wrappers for the posix timer implementation and add the discussed SIGEV_TIMERFD only internally in the kernel to signal the posix timer code new delivery mechanism. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
From: Benedikt Spranger [EMAIL PROTECTED] eth_start_xmit() can race against a disconnect interrupt in the gadget device driver, which nukes all pending request. Right now we access the pending request list unconditionally and dereference the request list head itself in such a case, which results in an Oops. Check whether the list is empty before actually dereferencing dev-tx_reqs.next. Also add a comment for the second list_empty check further down to avoid confusion. Long standing bug. Patch should be applied to stable as well. Signed-off-by: Benedikt Spranger [EMAIL PROTECTED] Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/drivers/usb/gadget/ether.c b/drivers/usb/gadget/ether.c index 593e235..f2a7bd5 100644 --- a/drivers/usb/gadget/ether.c +++ b/drivers/usb/gadget/ether.c @@ -1989,8 +1989,21 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net) } spin_lock_irqsave(dev-req_lock, flags); + /* +* dev-tx_reqs may be empty. We raced against a disconnect +* interrupt in the gadget device driver, which nuked all +* pending requests. +*/ + if (list_empty(dev-tx_reqs)) { + netif_stop_queue(net); + spin_unlock_irqrestore(dev-req_lock, flags); + return 1; + } + req = container_of (dev-tx_reqs.next, struct usb_request, list); list_del (req-list); + + /* last request in list: stop queue */ if (list_empty (dev-tx_reqs)) netif_stop_queue (net); spin_unlock_irqrestore(dev-req_lock, flags); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [4/50] x86: add cpu codenames for Kconfig.cpu
On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote: From: Oliver Pinter [EMAIL PROTECTED] add cpu core name for arch/i386/Kconfig.cpu:Pentium 4 sections help add Pentium D for arch/i386/Kconfig.cpu add Pentium D for arch/x86_64/Kconfig Signed-off-by: Oliver Pinter [EMAIL PROTECTED] Signed-off-by: Andi Kleen [EMAIL PROTECTED] Acked-by: Sam Ravnborg [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- arch/i386/Kconfig.cpu | 34 +++--- arch/x86_64/Kconfig |6 +++--- 2 files changed, 34 insertions(+), 6 deletions(-) Index: linux/arch/i386/Kconfig.cpu === --- linux.orig/arch/i386/Kconfig.cpu +++ linux/arch/i386/Kconfig.cpu @@ -115,11 +115,39 @@ config MPENTIUM4 bool Pentium-4/Celeron(P4-based)/Pentium-4 M/older Xeon help Select this for Intel Pentium 4 chips. This includes the - Pentium 4, P4-based Celeron and Xeon, and Pentium-4 M - (not Pentium M) chips. This option enables compile flags - optimized for the chip, uses the correct cache shift, and + Pentium 4, Pentium D, P4-based Celeron and Xeon, and + Pentium-4 M (not Pentium M) chips. This option enables compile + flags optimized for the chip, uses the correct cache shift, and applies any applicable Pentium III optimizations. + CPUIDs: F[0-6][1-A] (in /proc/cpuinfo show = cpu family : 15 ) + + Select this for: + Pentiums (Pentium 4, Pentium D, Celeron, Celeron D) corename: + -Willamette + -Northwood + -Mobile Pentium 4 + -Mobile Pentium 4 M + -Extreme Edition (Gallatin) + -Prescott + -Prescott 2M + -Cedar Mill + -Presler + -Smithfiled + Xeons (Intel Xeon, Xeon MP, Xeon LV, Xeon MV) corename: + -Foster + -Prestonia + -Gallatin + -Nocona + -Irwindale + -Cranford + -Potomac + -Paxville + -Dempsey + + more info: http://balusc.xs4all.nl/srv/har-cpu.html This will never be up to date. Also the URL above is redirected to an empty bye/bye page. Put this up to one of the kernel related wikis, if you think it might be useful at all. 99% of the users do not even know which CPU they have in their system. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [9/50] i386: validate against ACPI motherboard resources
On Sat, 2007-09-22 at 10:28 -0600, Robert Hancock wrote: Yinghai Lu wrote: No! MMCONFIG will not work with acpi=off any more. I don't think this is unreasonable. The ACPI MCFG table is how we are supposed to learn about the area in the first place. If we can't get the table location via an approved mechanism, and can't validate it doesn't overlap with another memory reservation or something, I really don't think we should be using it. We all know how correct ACPI tables are. Specifications are nice, reality tells a different story. I don't think it's much of an issue anyway - the chances that somebody will want to run without ACPI on a system with MCFG are pretty low given that you'll end up losing a bunch of functionality (not least of which is multi-cores). acpi=off is an often used debug switch and it _is_ quite useful. Taking away debug functionality is not a good idea. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [19/50] Experimental: detect if SVM is disabled by BIOS
On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote: Also allow to set svm lock. Please use two separate patches. The detection and cpuinfo display is not related to set svm lock. TBD double check, documentation, i386 support Yes, documentation would be useful. See below. Signed-off-by: Andi Kleen [EMAIL PROTECTED] --- arch/x86_64/kernel/setup.c| 25 +++-- include/asm-i386/cpufeature.h |1 + include/asm-i386/msr-index.h |3 +++ 3 files changed, 27 insertions(+), 2 deletions(-) Index: linux/arch/x86_64/kernel/setup.c === --- linux.orig/arch/x86_64/kernel/setup.c +++ linux/arch/x86_64/kernel/setup.c @@ -565,7 +565,7 @@ static void __cpuinit early_init_amd(str static void __cpuinit init_amd(struct cpuinfo_x86 *c) { - unsigned level; + unsigned level, flags, dummy; #ifdef CONFIG_SMP unsigned long value; @@ -634,7 +634,28 @@ static void __cpuinit init_amd(struct cp /* Family 10 doesn't support C states in MWAIT so don't use it */ if (c-x86 == 0x10 !force_mwait) clear_bit(X86_FEATURE_MWAIT, c-x86_capability); + + if (c-x86 = 0xf c-x86 = 0x11 + !rdmsr_safe(MSR_VM_CR, flags, dummy) + (flags 0x18)) + set_bit(X86_FEATURE_VIRT_DISABLED, c-x86_capability); Why the check for 0x18 And please can we use understandable constants for this. bit 3 (SVM_LOCK) controls only the writeability of bit 4 (SVME_DISABLE), which controls whether SVM is allowed to be enabled or not. bit 3 bit 4 0 0 SVM can be enabled in EFER, SVME_DISABLE is writeable 1 0 SVM can be enabled in EFER, SVME_DISABLE is not writeable 0 1 SVM can not be enabled in EFER, SVME_DISABLE is writeable 1 1 SVM can not be enabled in EFER, SVME_DISABLE is not writeable So SVM is disabled, when bit 4 is set. +} + +static int enable_svm_lock(char *s) +{ + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD + boot_cpu_data.x86 = 0xf boot_cpu_data.x86 = 0x11) { + unsigned a,b; + if (rdmsr_safe(MSR_VM_CR, a, b)) + return 0; + a |= (1 3); /* set SVM lock */ SVM_LOCK is read only according to data sheet. You can set bit 4 (SVME_DISABLE) to prevent KVM or what else using that feature. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/50] x86_64: Fix some broken white space in arch/x86_64/mm/init.c
On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote: No functional changes Signed-off-by: Andi Kleen [EMAIL PROTECTED] Can we please fix _ALL_ white space and coding style issues in this file while we are at it? Updated patch below. tglx diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c index 458893b..346c962 100644 --- a/arch/x86_64/mm/init.c +++ b/arch/x86_64/mm/init.c @@ -70,10 +70,11 @@ void show_mem(void) printk(KERN_INFO Mem-info:\n); show_free_areas(); - printk(KERN_INFO Free swap: %6ldkB\n, nr_swap_pages(PAGE_SHIFT-10)); + printk(KERN_INFO Free swap: %6ldkB\n, + nr_swap_pages(PAGE_SHIFT-10)); for_each_online_pgdat(pgdat) { - for (i = 0; i pgdat-node_spanned_pages; ++i) { + for (i = 0; i pgdat-node_spanned_pages; ++i) { /* this loop can take a while with 256 GB and 4k pages so update the NMI watchdog */ if (unlikely(i % MAX_ORDER_NR_PAGES == 0)) { @@ -89,7 +90,7 @@ void show_mem(void) cached++; else if (page_count(page)) shared += page_count(page) - 1; - } + } } printk(KERN_INFO %lu pages of RAM\n, total); printk(KERN_INFO %lu reserved pages\n,reserved); @@ -100,21 +101,22 @@ void show_mem(void) int after_bootmem; static __init void *spp_getpage(void) -{ +{ void *ptr; if (after_bootmem) - ptr = (void *) get_zeroed_page(GFP_ATOMIC); + ptr = (void *) get_zeroed_page(GFP_ATOMIC); else ptr = alloc_bootmem_pages(PAGE_SIZE); if (!ptr || ((unsigned long)ptr ~PAGE_MASK)) - panic(set_pte_phys: cannot allocate page data %s\n, after_bootmem?after bootmem:); + panic(set_pte_phys: cannot allocate page data %s\n, + after_bootmem?after bootmem:); Dprintk(spp_getpage %p\n, ptr); return ptr; -} +} static __init void set_pte_phys(unsigned long vaddr, -unsigned long phys, pgprot_t prot) + unsigned long phys, pgprot_t prot) { pgd_t *pgd; pud_t *pud; @@ -130,10 +132,11 @@ static __init void set_pte_phys(unsigned long vaddr, } pud = pud_offset(pgd, vaddr); if (pud_none(*pud)) { - pmd = (pmd_t *) spp_getpage(); + pmd = (pmd_t *) spp_getpage(); set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE | _PAGE_USER)); if (pmd != pmd_offset(pud, 0)) { - printk(PAGETABLE BUG #01! %p - %p\n, pmd, pmd_offset(pud,0)); + printk(PAGETABLE BUG #01! %p - %p\n, pmd, + pmd_offset(pud,0)); return; } } @@ -162,7 +165,7 @@ static __init void set_pte_phys(unsigned long vaddr, } /* NOTE: this is meant to be run only at boot */ -void __init +void __init __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot) { unsigned long address = __fix_to_virt(idx); @@ -177,7 +180,7 @@ __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot) unsigned long __meminitdata table_start, table_end; static __meminit void *alloc_low_page(unsigned long *phys) -{ +{ unsigned long pfn = table_end++; void *adr; @@ -187,8 +190,8 @@ static __meminit void *alloc_low_page(unsigned long *phys) return adr; } - if (pfn = end_pfn) - panic(alloc_low_page: ran out of memory); + if (pfn = end_pfn) + panic(alloc_low_page: ran out of memory); adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE); memset(adr, 0, PAGE_SIZE); @@ -197,13 +200,13 @@ static __meminit void *alloc_low_page(unsigned long *phys) } static __meminit void unmap_low_page(void *adr) -{ +{ if (after_bootmem) return; early_iounmap(adr, PAGE_SIZE); -} +} /* Must run before zap_low_mappings */ __meminit void *early_ioremap(unsigned long addr, unsigned long size) @@ -224,7 +227,8 @@ __meminit void *early_ioremap(unsigned long addr, unsigned long size) vaddr += addr ~PMD_MASK; addr = PMD_MASK; for (i = 0; i pmds; i++, addr += PMD_SIZE) - set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE)); + set_pmd(pmd + i, + __pmd(addr | _KERNPG_TABLE | _PAGE_PSE)); __flush_tlb(); return (void *)vaddr; next: @@ -284,8 +288,9 @@ phys_pmd_update(pud_t *pud, unsigned long address, unsigned long end) __flush_tlb_all(); } -static void __meminit phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end) -{
Re: [PATCH] [31/50] x86_64: honor notify_die() returning NOTIFY_STOP
On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote: - notify_die(DIE_OOPS, str, regs, err, current-thread.trap_no, SIGSEGV); + if (notify_die(DIE_OOPS, str, regs, err, current-thread.trap_no, SIGSEGV) == NOTIFY_STOP) 80 chars please. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
On Sat, 2007-09-22 at 12:18 -0700, David Brownell wrote: I think you misread my comment. Those requests are **NOT** pending!! So this update has a *MORE* incorrect description of the issue. That's just the freelist ... it's a fairly conventional model whereby there's a pool of free request slots which can be issued. When the pool empties, the TX queue shuts down until one of the requests which is pending in the hardware completes, and makes a slot free. The problem you're addressing is that there's a small window where a disconnect IRQ can shut down the TX queue (and empty that freelist) after upper layers in the network stack started a transmission on an active (pre-disconnect) TX queue. That problem is *NOT* related to any pending requests at all!! Sorry, I misunderstood your comment. Can you please add the correct comment yourself before we play some more rounds of ping pong ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
On Sat, 2007-09-22 at 00:32 +0200, Andi Kleen wrote: From: Akinobu Mita [EMAIL PROTECTED] Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE. Cc: H. Peter Anvin [EMAIL PROTECTED] Signed-off-by: Akinobu Mita [EMAIL PROTECTED] Signed-off-by: Andi Kleen [EMAIL PROTECTED] Cc: Gautham R Shenoy [EMAIL PROTECTED] Cc: Oleg Nesterov [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- arch/i386/kernel/cpuid.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) Index: linux/arch/i386/kernel/cpuid.c === --- linux.orig/arch/i386/kernel/cpuid.c +++ linux/arch/i386/kernel/cpuid.c @@ -136,15 +136,18 @@ static const struct file_operations cpui .open = cpuid_open, }; -static int __cpuinit cpuid_device_create(int i) +static int cpuid_device_create(int cpu) __cpuinit please Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
On Sat, 2007-09-22 at 13:14 -0700, David Brownell wrote: How's this? Note that the queue should already have been stopped, so I removed what should be an extra call (as well as fixing the comments). Yeah, stop queue should be not necessary. - Dave From: Thomas Gleixner [EMAIL PROTECTED] Please change to: From: Benedikt Spranger [EMAIL PROTECTED] He did all the grump work of figuring out what's going wrong. I was just the messenger. This patch fixes a longstanding race in the Ethernet gadget driver, which can cause an oops on device disconnect. The fix is just to make the TX path check whether its freelist is empty. That check is otherwise not necessary, since the queue is always stopped when that list empties (and restarted when request completion puts an entry back on that freelist). Sigh. I need a real deep look inside that code to understand, why tx_reqs is not a requestlist but a freelist. Very intuitive naming :) The race window starts when the network code decides to transmit a packet, and ends when hard_start_xmit() grabs the freelist lock. If disconnect() is called inside that window, it shuts down the TX queue and breaks the otherwise-solid assumption that packets are never sent when the TX queue is stopped. Please add our signed offs as well Signed-off-by: Benedikt Spranger [EMAIL PROTECTED] Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Signed-off-by: David Brownell [EMAIL PROTECTED] Thanks, tglx --- a/drivers/usb/gadget/ether.c +++ b/drivers/usb/gadget/ether.c @@ -1989,8 +1989,20 @@ static int eth_start_xmit (struct sk_buff *skb, struct net_device *net) } spin_lock_irqsave(dev-req_lock, flags); + /* + * the freelist can be empty if an interrupt triggered disconnect() + * and reconfigured the gadget (shutting down this queue) after the + * network stack decided to xmit but before we got the spinlock. + */ + if (list_empty(dev-tx_reqs)) { + spin_unlock_irqrestore(dev-req_lock, flags); + return 1; + } + req = container_of (dev-tx_reqs.next, struct usb_request, list); list_del (req-list); + + /* temporarily stop TX queue when the freelist empties */ if (list_empty (dev-tx_reqs)) netif_stop_queue (net); spin_unlock_irqrestore(dev-req_lock, flags); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: A revised timerfd API
On Sat, 2007-09-22 at 14:07 -0700, Davide Libenzi wrote: On Sat, 22 Sep 2007, Michael Kerrisk wrote: So I'm inclined to implement option (b), unless someone has strong objections. Davide, could I persuade you to help? I guess I better do, otherwise you'll continue to stress me ;) int timerfd_create(int clockid); int timerfd_settime(int ufd, int flags, const struct itimerspec *utmr, struct itimerspec *otmr); int timerfd_gettime(int ufd, struct itimerspec *otmr); Patch below. Builds, not tested yet (you need to remove the broken status from CONFIG_TIMERFD in case you want to test - and plug the new syscall to arch/xxx). May that work for you? Thomas-san, hrtimer_try_to_cancel() does not touch -expires and I assume it'll never do, granted? Davide-san, I have no intention to change that, but remember there is this file Documentation/stable_api_nonsense.txt :) tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] suspend/resume regression fixes
Sorry, it took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches. We disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume. I spent some quality time to figure out a set of debug mechanisms, which did not influence the problem. So it is quite likely that a lot of machines might be affected by this, but due to the configuration, interrupt scenarios, the problem just does not show up. My final enlightment was, when I removed the ACPI processor module, which controls the lower idle C-states, right before resume; this worked fine all the time even without all the workaround hacks. I really hope that this two patches finally set an end to the jinxed VAIO heisenbug series, which started when we removed the periodic tick with the clockevents/dyntick patches. Venki, can you please add the analogous fix to the cpuidle patch set ? Thanks, tglx -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] ACPI: disable lower idle C-states across suspend/resume
device_suspend() calls ACPI suspend functions, which seems to have undesired side effects on lower idle C-states. It took me some time to realize that especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one) show this effect. I'm quite sure that other bug reports against suspend/resume about turning the system into a brick have the same root cause. After fishing in the dark for quite some time, I realized that removing the ACPI processor module before suspend (this removes the lower C-state functionality) made the problem disappear. Interestingly enough the propability of having a bricked box is influenced by various factors (interrupts, size of the ram image, ...). Even adding a bunch of printks in the wrong places made the problem go away. The previous periodic tick implementation simply pampered over the problem, which explains why the dyntick / clockevents changes made this more prominent. We avoid complex functionality during the boot process and we have to do the same during suspend/resume. It is a similar scenario and equaly fragile. Add suspend / resume functions to the ACPI processor code and disable the lower idle C-states across suspend/resume. Fall back to the default idle implementation (halt) instead. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Tested-by: Andrew Morton [EMAIL PROTECTED] Cc: Len Brown [EMAIL PROTECTED] Cc: Venkatesh Pallipadi [EMAIL PROTECTED] Cc: Rafael J. Wysocki [EMAIL PROTECTED] --- drivers/acpi/processor_core.c |2 ++ drivers/acpi/processor_idle.c | 19 ++- include/acpi/processor.h |2 ++ 3 files changed, 22 insertions(+), 1 deletion(-) Index: linux-2.6/drivers/acpi/processor_core.c === --- linux-2.6.orig/drivers/acpi/processor_core.c2007-09-23 00:01:00.0 +0200 +++ linux-2.6/drivers/acpi/processor_core.c 2007-09-23 00:01:00.0 +0200 @@ -102,6 +102,8 @@ static struct acpi_driver acpi_processor .add = acpi_processor_add, .remove = acpi_processor_remove, .start = acpi_processor_start, + .suspend = acpi_processor_suspend, + .resume = acpi_processor_resume, }, }; Index: linux-2.6/drivers/acpi/processor_idle.c === --- linux-2.6.orig/drivers/acpi/processor_idle.c2007-09-23 00:01:00.0 +0200 +++ linux-2.6/drivers/acpi/processor_idle.c 2007-09-23 00:01:00.0 +0200 @@ -325,6 +325,23 @@ static void acpi_state_timer_broadcast(s #endif +/* + * Suspend / resume control + */ +static int acpi_idle_suspend; + +int acpi_processor_suspend(struct acpi_device * device, pm_message_t state) +{ + acpi_idle_suspend = 1; + return 0; +} + +int acpi_processor_resume(struct acpi_device * device) +{ + acpi_idle_suspend = 0; + return 0; +} + static void acpi_processor_idle(void) { struct acpi_processor *pr = NULL; @@ -355,7 +372,7 @@ static void acpi_processor_idle(void) } cx = pr-power.state; - if (!cx) { + if (!cx || acpi_idle_suspend) { if (pm_idle_save) pm_idle_save(); else Index: linux-2.6/include/acpi/processor.h === --- linux-2.6.orig/include/acpi/processor.h 2007-09-23 00:01:00.0 +0200 +++ linux-2.6/include/acpi/processor.h 2007-09-23 00:01:00.0 +0200 @@ -320,6 +320,8 @@ int acpi_processor_power_init(struct acp int acpi_processor_cst_has_changed(struct acpi_processor *pr); int acpi_processor_power_exit(struct acpi_processor *pr, struct acpi_device *device); +int acpi_processor_suspend(struct acpi_device * device, pm_message_t state); +int acpi_processor_resume(struct acpi_device * device); /* in processor_thermal.c */ int acpi_processor_get_limit_info(struct acpi_processor *pr); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] clockevents: remove the suspend/resume workaround^Wthinko
In a desparate attempt to fix the suspend/resume problem on Andrews VAIO I added a workaround which enforced the broadcast of the oneshot timer on resume. This was actually resolving the problem on the VAIO but was just a stupid workaround, which was not tackling the root cause: the assignement of lower idle C-States in the ACPI processor_idle code. The cpuidle patches, which utilize the dynamic tick feature and go faster into deeper C-states exposed the problem again. The correct solution is the previous patch, which prevents lower C-states across the suspend/resume. Remove the enforcement code, including the conditional broadcast timer arming, which helped to pamper over the real problem for quite a time. The oneshot broadcast flag for the cpu, which runs the resume code can never be set at the time when this code is executed. It only gets set, when the CPU is entering a lower idle C-State. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Tested-by: Andrew Morton [EMAIL PROTECTED] Cc: Len Brown [EMAIL PROTECTED] Cc: Venkatesh Pallipadi [EMAIL PROTECTED] Cc: Rafael J. Wysocki [EMAIL PROTECTED] --- kernel/time/tick-broadcast.c | 17 + 1 file changed, 1 insertion(+), 16 deletions(-) Index: linux-2.6/kernel/time/tick-broadcast.c === --- linux-2.6.orig/kernel/time/tick-broadcast.c 2007-09-23 00:00:59.0 +0200 +++ linux-2.6/kernel/time/tick-broadcast.c 2007-09-23 00:01:00.0 +0200 @@ -382,23 +382,8 @@ static int tick_broadcast_set_event(ktim int tick_resume_broadcast_oneshot(struct clock_event_device *bc) { - int cpu = smp_processor_id(); - - /* -* If the CPU is marked for broadcast, enforce oneshot -* broadcast mode. The jinxed VAIO does not resume otherwise. -* No idea why it ends up in a lower C State during resume -* without notifying the clock events layer. -*/ - if (cpu_isset(cpu, tick_broadcast_mask)) - cpu_set(cpu, tick_broadcast_oneshot_mask); - clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); - - if(!cpus_empty(tick_broadcast_oneshot_mask)) - tick_broadcast_set_event(ktime_get(), 1); - - return cpu_isset(cpu, tick_broadcast_oneshot_mask); + return 0; } /* -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] suspend/resume regression fixes
Linus, On Sat, 2007-09-22 at 15:59 -0700, Linus Torvalds wrote: My final enlightment was, when I removed the ACPI processor module, which controls the lower idle C-states, right before resume; this worked fine all the time even without all the workaround hacks. I really hope that this two patches finally set an end to the jinxed VAIO heisenbug series, which started when we removed the periodic tick with the clockevents/dyntick patches. Ok, so the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the *why* does this happen? question. Yeah, I gave up at the point where I was not longer able to dig deeper :) I realize that the answer is easily because ACPI screwed up, but I'm wondering if there's something we do to trigger that screw-up. Fair enough. In particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved. The reason I say this is that I have a suspicion on what triggers it. I suspect that the problem is that we do pm_ops-prepare(); disable_nonboot_cpus() suspend_enter(); enable_nonboot_cpus() pm_finish() and here the big thing to notice is that pm_ops-prepare() call, which sets the wakup vector etc etc. So maybe the real problem here is that once we've done the -prepare() call and ACPI has set up various stuff, we MUST NOT do any calls to any ACPI routines to set low-power states, because the stupid firmware isn't expecting it. That's what I suspect and deduced from the various experiments including a force the cpu into a lower c-state one, which triggered the problem fully reproducible. Note that in case of the force a lower c-state I verified, that the PIT was activated to avoid the local apic stops in c3 issue. But I never got an PIT interrupt. Either the box was completely stuck or I was able to recover by hitting a key, which is as well one of the unexplained phenomenons. Now, if this is the cause, then I think your patch should indeed fix it, since you get called by the early-suspend code (which happens *before* the -prepare() call), but at the same time, I wonder if maybe it would be slightly more correct to instead of using the suspend/resume callbacks, simply do this in the acpi_pm_prepare() stage, since that is likely the thing that triggers it? Yeah, probably that's the correct point, but I leave this to the ACPI wizards. But hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code! That needs some explanation of the folks who can actually look beyond the ACPI/BIOS internals. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-gadget-ether: Prevent oops caused by error interrupt race -V2 (comments update)
On Sat, 2007-09-22 at 13:53 -0700, David Brownell wrote: Sigh. I need a real deep look inside that code to understand, why tx_reqs is not a requestlist but a freelist. Very intuitive naming :) It *is* a list of requests: free ones -- the only kind this level of driver is allowed to remember! ;) Yeah, I had to go back and read the driver again before I understood just what problem this patch was trying to fix. Which is why I wanted to make sure the mismatch between comments and contents was resolved. Fair enough. Thanks for sanitizing the comments. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [35/50] i386: Do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
On Sun, 2007-09-23 at 10:52 +0900, Akinobu Mita wrote: arch/i386/kernel/cpuid.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) Index: linux/arch/i386/kernel/cpuid.c === --- linux.orig/arch/i386/kernel/cpuid.c +++ linux/arch/i386/kernel/cpuid.c @@ -136,15 +136,18 @@ static const struct file_operations cpui .open = cpuid_open, }; -static int __cpuinit cpuid_device_create(int i) +static int cpuid_device_create(int cpu) __cpuinit please Yes. This eliminates earlier patch in this series. ([22/50] i386: Misc cpuinit annotation) No, it's even worse: #22 is applied before #35. #35 is reverting the __cpuinit anotation of #22 with its modificiations of cpuid_device_create() tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote: Hi Thomas, Unfortunately, my observation that the patch series: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with the above patch series applied. I've also tried: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch with the same result. The problematic patch is x86_64-convert-to-clockevents.patch . Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote: Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset (.config for 2.6.23-rc6-mm1 is attached). Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote: Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] new timerfd API - new timerfd API
Davide, On Sun, 2007-09-23 at 15:49 -0700, Davide Libenzi wrote: This is the new timerfd API as it is implemented by the following patch: --- fs/compat.c | 32 ++- fs/timerfd.c | 199 ++- include/linux/compat.h |7 + include/linux/syscalls.h |7 + 4 files changed, 168 insertions(+), 77 deletions(-) Index: linux-2.6.mod/fs/timerfd.c === --- linux-2.6.mod.orig/fs/timerfd.c 2007-09-23 15:18:09.0 -0700 +++ linux-2.6.mod/fs/timerfd.c2007-09-23 15:25:55.0 -0700 @@ -23,15 +23,17 @@ struct timerfd_ctx { struct hrtimer tmr; + int clockid; ktime_t tintv; wait_queue_head_t wqh; int expired; + u64 ticks; }; Can you please restructure the struct in a way which does not result in padding by the compiler ? struct timerfd_ctx { struct hrtimer tmr; ktime_t tintv; wait_queue_head_t wqh; u64 ticks; int expired; int clockid; }; + ticks += (u64) hrtimer_forward(ctx-tmr, hrtimer_cb_get_time(ctx-tmr), You need to use ctx-tmr.base-get_time() here, otherwise you might read a stale time value (in case that CONFIG_HIGH_RES_TIMERS is off). - ctx-tintv); + ctx-tintv) - 1; hrtimer_restart(ctx-tmr); +asmlinkage long sys_timerfd_create(int clockid) { - int error; + int error, ufd; struct timerfd_ctx *ctx; struct file *file; struct inode *inode; - struct itimerspec ktmr; - - if (copy_from_user(ktmr, utmr, sizeof(ktmr))) - return -EFAULT; if (clockid != CLOCK_MONOTONIC clockid != CLOCK_REALTIME) return -EINVAL; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + init_waitqueue_head(ctx-wqh); + ctx-clockid = clockid; + hrtimer_init(ctx-tmr, clockid, HRTIMER_MODE_ABS); + + error = anon_inode_getfd(ufd, inode, file, [timerfd], + timerfd_fops, ctx); + if (error) + goto err_kfree_ctx; + + return ufd; + +err_kfree_ctx: + kfree(ctx); + return error; You really can avoid the goto here. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer I'm blind, sorry. So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Fortunately, noapictimer helps as well, dmesg attached (I have the one from 2.6.23-rc6-mm1 ready, too). Ok, at which point is the box stopping, when you omit noa* ? Is earlyprintk giving you any useful info ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote: So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? And yes, it matches noacpiwhatever in the command line with noacpi. Sigh. Urgh. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] new timerfd API - new timerfd API
On Mon, 2007-09-24 at 08:42 -0700, Davide Libenzi wrote: + ticks += (u64) hrtimer_forward(ctx-tmr, hrtimer_cb_get_time(ctx-tmr), You need to use ctx-tmr.base-get_time() here, otherwise you might read a stale time value (in case that CONFIG_HIGH_RES_TIMERS is off). Is the particular position of hrtimer_cb_get_time() in the code that would break here? Because function was added by your patch ;) Did something change later? For non high res systems we speed up the access to now by storing the current time when we start to process the hrtimer softirq callbacks. hrtimer_cb_get_time(timer) reads timer-base-now For high resolution systems hrtimer_cb_get_time() resolves to timer-base-get_time(). In the timerfd case we are not in softirq context and we read at any given later time. Also on SMP the base-now variable might be changed by the softirq running on the other CPU. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote: Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? Works, dmesg attached. /me scratches head We know, that - disabling local apic timers work - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING is given on the kernel command line. I have no clue, what might be the difference of noacpiFSCKEDPARSING. The boot log is not giving any hint at all. acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1. What happens, if you set acpi=noirq instead ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote: /me scratches head Retested. We know, that - disabling local apic timers work This works reproducibly accross the board. Ok - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING This stopped working, although it evidently worked yesterday (wtf?). There seems to be a history effect in the box, to make things more interesting. Did you connect this box to Andrews VAIO during KS ? I think the only solid data point so far is that noapictimer makes the box boot. Ok. Can you add nmi_watchdog=1 to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list Thanks, tlgx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On Mon, 2007-09-24 at 12:34 -0700, Andrew Morton wrote: It prints twice 'System halted' and blinks the keyboard leds, but does not switch off. On all other kernel version I only see one keyboard blink before the power goes out. ok... I compared its dmesg to vanilla-rc7 and -rc4-mm1, but expect that rc-4 assigns different IRQs I can't see any differences except the normal variation in BogoMips etc. Can your check whether 2.6.23-rc7 + http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch works for you ? hm, dunno. The only substantial patch which touches arch/x86_64/kernel/process.c (which is where cpu_idle lives) is x86_64-prep-idle-loop-for-dynticks.patch. The problem is, 2.6.23-rc6-mm1's git-acpi patch had all the new cpuidle code in it. Len dropped all that code over the weekend (which is when I picked this copy of his tree), so 2.6.23-rc7-mm1 doesn't have the cpuidle code. Len will be reapplying the cpuidle patches today(ish) so next -mm _will_ have the cpuidle code. So what we have in rc7-mm1 is this transient no-cpuidle state. It could be that the x86_64 dynticks code (which was developed previously tested in conjunction with the cpuidle patches) has some dependency on cpuidle. It should not. cpuidle makes use of dynticks not the other way round. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] new timerfd API v2 - introduce a new hrtimer_forward_now() function
On Mon, 2007-09-24 at 13:22 -0700, Davide Libenzi wrote: I think that advancing the timer against the timer's current now can be a pretty common usage, so, w/out exposing hrtimer's internals, we add a new hrtimer_forward_now() function. Signed-off-by: Davide Libenzi [EMAIL PROTECTED] Reviewed-and-Acked-by: Thomas Gleixner [EMAIL PROTECTED] - Davide --- include/linux/hrtimer.h |7 +++ 1 file changed, 7 insertions(+) Index: linux-2.6.mod/include/linux/hrtimer.h === --- linux-2.6.mod.orig/include/linux/hrtimer.h2007-09-24 12:27:20.0 -0700 +++ linux-2.6.mod/include/linux/hrtimer.h 2007-09-24 12:29:39.0 -0700 @@ -298,6 +298,13 @@ extern unsigned long hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval); +/* Forward a hrtimer so it expires after the hrtimer's current now */ +static inline unsigned long hrtimer_forward_now(struct hrtimer *timer, + ktime_t interval) +{ + return hrtimer_forward(timer, timer-base-get_time(), interval); +} + /* Precise sleep: */ extern long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22
On Mon, 2007-09-24 at 23:08 +0200, Helge Hafting wrote: The two kernels mentioned hangs occationally. Typically when I compile something and pass the time by surfing the web. A few minutes and then I notice that the mouse (and everything else in X) stops. kbd LEDs does not react to numlock/capslock. The only thing that still works is sysrq+B So far this has happened while running X, so no messages. I have gone back to 2.6.22rc4, which seems to work. This is a single opteron, although on a dual-slot board. Can you switch to serial console, so we can get some information out of that box? Sysrq-B is working, so we can get info from other sysrq functions as well. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc7-mm1
On Tue, 2007-09-25 at 09:32 +0200, Torsten Kaiser wrote: On 9/24/07, Thomas Gleixner [EMAIL PROTECTED] wrote: Can your check whether 2.6.23-rc7 + http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch works for you ? Yes, powers off normally. Ok, so it's probably some merge artifact in -mm. We'll get this sorted out once Len has his new tree available. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: Hello Thomas, Rafael We know, that - disabling local apic timers work As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why do so many machines need noapic?
Chuck, On Thu, 2007-09-13 at 12:38 -0400, Chuck Ebbert wrote: On 09/10/2007 03:44 PM, Andi Kleen wrote: Yes, it has an hpet. And I tried every combination of options I could think of. But, even stranger, x86_64 works (only i386 fails.) x86-64 has quite different time code (at least until the dyntick patches currently in mm) Obvious thing would be to diff the boot messages and see if anything jumps out (e.g. in interrupt routing). Or check with mm and if x86-64 is broken there too then it's likely the new time code. I reported too soon that x86_64 works. It does not work, it just takes a bit longer before it freezes. There are message threads all over the place discussing this problem with the HP Pavilion tx 1000, and it seems the best workaround is to use the nolapic option instead of noapic. Using that, it is totally stable _and_ there are no spurious interrupts that would otherwise break USB. Interrupt setup is a bit strange, though: can you please send me 32 and 64 bit boot logs of mainline and fedora kernels ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote: On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: Hello Thomas, Rafael We know, that - disabling local apic timers work As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1, -rc7-mm1 kill audio on HP nx6325
On Tue, 2007-09-25 at 14:08 +0200, Rafael J. Wysocki wrote: Hi, This patch from Andi: x86_64-mm-cpa-einval.patch makes the hda_intel audio driver stop working on my HP nx6325. The following line appears in dmesg (from 2.6.23-rc7-mm1: ALSA /home/rafael/src/mm/linux-2.6.23-rc7-mm1/sound/pci/hda/hda_intel.c:1755: hd a-intel: ioremap error and the driver doesn't work afterwards. Still, I'm not sure if the patch above is wrong or rather it exposes a problem in the driver. The patch is correct. Instead of returning Success in the case of a failure of lookup_address, it now returns -EINVAL, which in turn makes the ioremap fail. OTOH, the driver ioremap call looks straight forward. Can you apply the patch below and provide the resulting debug output please ? Thanks, tglx Index: linux-2.6.23-rc8-mm/arch/x86_64/mm/pageattr.c === --- linux-2.6.23-rc8-mm.orig/arch/x86_64/mm/pageattr.c 2007-09-25 14:05:41.0 +0200 +++ linux-2.6.23-rc8-mm/arch/x86_64/mm/pageattr.c 2007-09-25 14:09:35.0 +0200 @@ -156,8 +156,10 @@ __change_page_attr(unsigned long address pgprot_t ref_prot2; kpte = lookup_address(address); - if (!kpte) + if (!kpte) { + printk(lookup failed for %lu\n, address); return -EINVAL; + } kpte_page = virt_to_page(((unsigned long)kpte) PAGE_MASK); BUG_ON(PageCompound(kpte_page)); Index: linux-2.6.23-rc8-mm/sound/pci/hda/hda_intel.c === --- linux-2.6.23-rc8-mm.orig/sound/pci/hda/hda_intel.c 2007-09-25 14:05:43.0 +0200 +++ linux-2.6.23-rc8-mm/sound/pci/hda/hda_intel.c 2007-09-25 14:09:28.0 +0200 @@ -1752,7 +1752,8 @@ static int __devinit azx_create(struct s chip-addr = pci_resource_start(pci, 0); chip-remap_addr = ioremap_nocache(chip-addr, pci_resource_len(pci,0)); if (chip-remap_addr == NULL) { - snd_printk(KERN_ERR SFX ioremap error\n); + snd_printk(KERN_ERR SFX ioremap error: %lu %lu\n, + chip-addr, pci_resource_len(pci, 0)); err = -ENXIO; goto errout; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? That's possible, how to check? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer LOC: 01158220 Local interrupts 0:1161996 0 local-APIC-edge timer LOC: 01160723 Local interrupts Hmm. That's strange. It looks like the local apic timer is not used, but x86_64 definitely lacks the above check. Can you please remove/disable the acpi processor module and recheck ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: There seems to be a history effect in the box, to make things more interesting. Did you connect this box to Andrews VAIO during KS ? No, but it's famous for being interestingly broken nevertheless. :) I think the only solid data point so far is that noapictimer makes the box boot. Ok. Can you add nmi_watchdog=1 to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list No, it doesn't. I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? tglx Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 20:30:00.0 +0200 +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c 2007-09-25 15:05:32.0 +0200 @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock( void __init setup_boot_APIC_clock (void) { +#if 0 /* * The local apic timer can be disabled via the kernel commandline. * Register the lapic timer as a dummy clock event source on SMP @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void) setup_APIC_timer(); return; } - +#endif printk(KERN_INFO Using local APIC timer interrupts.\n); calibrate_APIC_clock(); @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void) * PIT/HPET going. Otherwise register lapic as a dummy * device. */ - if (nmi_watchdog != NMI_IO_APIC) + if (!disable_apic_timer nmi_watchdog != NMI_IO_APIC) lapic_clockevent.features = ~CLOCK_EVT_FEAT_DUMMY; +#if 0 else printk(KERN_WARNING APIC timer registered as dummy, due to nmi_watchdog=1!\n); +#endif setup_APIC_timer(); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1, -rc7-mm1 kill audio on HP nx6325
On Tue, 2007-09-25 at 15:20 +0200, Rafael J. Wysocki wrote: The patch is correct. Instead of returning Success in the case of a failure of lookup_address, it now returns -EINVAL, which in turn makes the ioremap fail. OTOH, the driver ioremap call looks straight forward. Can you apply the patch below and provide the resulting debug output please ? lookup failed for 18446604438082158592 [--snipped some USB messages--] ALSA /home/rafael/src/mm/linux-2.6.23-rc8-mm1/sound/pci/hda/hda_intel.c:1756: hda-intel: ioremap error: 2349334528 16384 Stupid me, hex formatting would have been easier to read :) Lookup failed for 0x 8100 8C08 ioremap: 0x 8C08 length 16384 It seems, that this patch only reveals some other wreckage. The code is called as part of ioremap, where it adjusts the caching attributes of the mapping, which was setup right before change_page_attr_address() is called. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1, -rc7-mm1 kill audio on HP nx6325
On Tue, 2007-09-25 at 16:29 +0200, Rafael J. Wysocki wrote: lookup failed for 18446604438082158592 [--snipped some USB messages--] ALSA /home/rafael/src/mm/linux-2.6.23-rc8-mm1/sound/pci/hda/hda_intel.c:1756: hda-intel: ioremap error: 2349334528 16384 Stupid me, hex formatting would have been easier to read :) Lookup failed for 0x 8100 8C08 ioremap: 0x 8C08 length 16384 It seems, that this patch only reveals some other wreckage. The code is called as part of ioremap, where it adjusts the caching attributes of the mapping, which was setup right before change_page_attr_address() is called. Hm, it looks like the first address is a kernel one and the second one is physical, so they apparently match, which means that the lookup shouldn't fail, if I understand this correctly. Yes, the lookup address is virtual and it should be the one, which was mapped right before the call to change_page_attr_address(). I'm looking into that right now. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
commit 3556ddfa9284a86a59a9b78fe5894430f6ab4eef titled [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64 X2) with C1E enabled: When both cores go into idle at the same time, then the system switches into C1E state, which is basically the same as C3. This stops the local apic timer. This was debugged right after the dyntick merge on i386 and despite the patch title it fixes only the 32 bit path. x86_64 is still missing this fix. It seems that mainline is not really affected by this issue, as the PIT is running and keeps jiffies incrementing, but that's just waiting for trouble. -mm suffers from this problem due to the x86_64 high resolution timer patches. This is a quick and dirty port of the i386 code to x86_64. I spent quite a time with Rafael to debug the -mm / hrt wreckage until someone pointed us to this. I really had forgotten that we debugged this half a year ago already. Sigh, is it just me or is there something yelling arch/x86 into my ear? Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c index af838f6..32054bf 100644 --- a/arch/x86_64/kernel/setup.c +++ b/arch/x86_64/kernel/setup.c @@ -546,6 +546,37 @@ static void __init amd_detect_cmp(struct cpuinfo_x86 *c) #endif } +#define ENABLE_C1E_MASK0x1800 +#define CPUID_PROCESSOR_SIGNATURE 1 +#define CPUID_XFAM 0x0ff0 +#define CPUID_XFAM_K8 0x +#define CPUID_XFAM_10H 0x0010 +#define CPUID_XFAM_11H 0x0020 +#define CPUID_XMOD 0x000f +#define CPUID_XMOD_REV_F 0x0004 + +/* AMD systems with C1E don't have a working lAPIC timer. Check for that. */ +static __cpuinit int amd_apic_timer_broken(void) +{ + u32 lo, hi; + u32 eax = cpuid_eax(CPUID_PROCESSOR_SIGNATURE); + switch (eax CPUID_XFAM) { + case CPUID_XFAM_K8: + if ((eax CPUID_XMOD) CPUID_XMOD_REV_F) + break; + case CPUID_XFAM_10H: + case CPUID_XFAM_11H: + rdmsr(MSR_K8_ENABLE_C1E, lo, hi); + if (lo ENABLE_C1E_MASK) + return 1; + break; + default: + /* err on the side of caution */ + return 1; + } + return 0; +} + static void __cpuinit init_amd(struct cpuinfo_x86 *c) { unsigned level; @@ -617,6 +648,9 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) /* Family 10 doesn't support C states in MWAIT so don't use it */ if (c-x86 == 0x10 !force_mwait) clear_bit(X86_FEATURE_MWAIT, c-x86_capability); + + if (amd_apic_timer_broken()) + disable_apic_timer = 1; } static void __cpuinit detect_ht(struct cpuinfo_x86 *c) diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h index 85125ef..e458020 100644 --- a/include/asm-x86_64/apic.h +++ b/include/asm-x86_64/apic.h @@ -20,6 +20,7 @@ extern int apic_verbosity; extern int apic_runs_main_timer; extern int ioapic_force; extern int apic_mapped; +extern int disable_apic_timer; /* * Define the default level of output to be very little - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] UML - time build fix
On Tue, 2007-09-25 at 13:37 -0400, Jeff Dike wrote: Put back an implementation of timeval_to_ns in arch/um/os-Linux/time.c. tglx pointed out in his review of tickless support that there was a perfectly good implementation of it in linux/time.h. The problem is that this is userspace code which can't pull in kernel headers and there doesn't seem to be a libc version. Oops. Did not notice. Can't we move it into some header file which is accessible from everywhere ? tglx So, I'm copying the version from linux/time.h rather than resurrecting my version. This causes some declaration changes as it now returns a signed value rather than an unsigned value. Signed-off-by: Jeff Dike [EMAIL PROTECTED] --- arch/um/include/os.h|4 ++-- arch/um/os-Linux/time.c | 22 +++--- 2 files changed, 21 insertions(+), 5 deletions(-) Index: linux-2.6.22/arch/um/include/os.h === --- linux-2.6.22.orig/arch/um/include/os.h2007-09-25 09:26:42.0 -0400 +++ linux-2.6.22/arch/um/include/os.h 2007-09-25 09:28:42.0 -0400 @@ -252,9 +252,9 @@ extern void os_dump_core(void); extern void idle_sleep(unsigned long long nsecs); extern int set_interval(void); extern int timer_one_shot(int ticks); -extern unsigned long long disable_timer(void); +extern long long disable_timer(void); extern void uml_idle_timer(void); -extern unsigned long long os_nsecs(void); +extern long long os_nsecs(void); /* skas/mem.c */ extern long run_syscall_stub(struct mm_id * mm_idp, Index: linux-2.6.22/arch/um/os-Linux/time.c === --- linux-2.6.22.orig/arch/um/os-Linux/time.c 2007-09-25 09:26:42.0 -0400 +++ linux-2.6.22/arch/um/os-Linux/time.c 2007-09-25 09:28:42.0 -0400 @@ -39,7 +39,23 @@ int timer_one_shot(int ticks) return 0; } -unsigned long long disable_timer(void) +/** + * timeval_to_ns - Convert timeval to nanoseconds + * @ts: pointer to the timeval variable to be converted + * + * Returns the scalar nanosecond representation of the timeval + * parameter. + * + * Ripped from linux/time.h because it's a kernel header, and thus + * unusable from here. + */ +static inline long long timeval_to_ns(const struct timeval *tv) +{ + return ((long long) tv-tv_sec * UM_NSEC_PER_SEC) + + tv-tv_usec * UM_NSEC_PER_USEC; +} + +long long disable_timer(void) { struct itimerval time = ((struct itimerval) { { 0, 0 }, { 0, 0 } }); @@ -47,10 +63,10 @@ unsigned long long disable_timer(void) printk(UM_KERN_ERR disable_timer - setitimer failed, errno = %d\n, errno); - return tv_to_nsec(time.it_value); + return timeval_to_ns(time.it_value); } -unsigned long long os_nsecs(void) +long long os_nsecs(void) { struct timeval tv; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote: On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: [--snip--] I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots with noapictimer and doesn't boot without it. That was expected. I explicitly asked to add noapictimer to the kernel command line. Ok, so we ruled out the apic timer calibration routine. I did not expect that this would be the culprit, but with dark screen as the only debug info, I need to resort to small steps. Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1 after booting with noapictimer ? I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? Is the 32 bit kernel working on that box ? Thanks for your patience. tglx PS: I just sent out the disable APIC timer for AMD C1E boxen patch. We debugged this half a year ago on a nx6325, but I completely forgot about that. The explanation from AMD was sensible, but your apicmaintimer works statement is contradictory. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
On Tue, 2007-09-25 at 22:55 +0200, Rafael J. Wysocki wrote: I have reworked the patch a bit so that it applies on top of 2.6.23-rc8-mm1 and compiles (my version is attached). With this patch applied, the kernel boots correctly on the nx6325. I know. It's basically enforced noapictimer. But this still does not explain why your nasty box booted current mainline with apicmaintimer on the kernel command line. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote: I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? No, it's not. The mainline _usually_ doesn't boot with apicmaintimer. It seems to me that _sometimes_ the CPU just doesn't enter this C1E state and then everything goes fine ... I'm relieved. I really started to go nuts on this contradicting patterns. Your box seems to be worse than the VAIO, it has some random surprise generator built in :) Is the 32 bit kernel working on that box ? Can't tell, I have only 64-bit userland here. Should be fine. The check is there since late 2.6.21-rc. I really could kick my own ass that I did not remember the nx6325 wreckage in the 2.6.21-rc time frame. Sigh, way too much broken hardware out there to keep track of it. Thanks for your patience. Well, I'm only making sure that future kernels will run on my box. ;-) Nothing wrong with that. Thanks again for your help, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1: somewhat broken forced HPET on ICH5
Alexey, On Wed, 2007-09-26 at 00:50 +0400, Alexey Dobriyan wrote: ich-force-hpet-ich5-quirk-to-force-detect-enable.patch is causing the following on Etch boot: [initscripts as usual] Setting system clock: [nothing happens for several seconds] select to /dev/rtc to wait for clock tick timed out [initscripts as usual] Then clock is skewed for 3 hours (GMT/MSK difference). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch has the same problem ? It contains the hpet force enable patches as well, but lacks the other crap^Wfeatures of -mm :) Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] UML - time build fix
Jeff, On Tue, 2007-09-25 at 17:56 -0400, Jeff Dike wrote: On Tue, Sep 25, 2007 at 09:54:15PM +0200, Thomas Gleixner wrote: On Tue, 2007-09-25 at 13:37 -0400, Jeff Dike wrote: Put back an implementation of timeval_to_ns in arch/um/os-Linux/time.c. tglx pointed out in his review of tickless support that there was a perfectly good implementation of it in linux/time.h. The problem is that this is userspace code which can't pull in kernel headers and there doesn't seem to be a libc version. Oops. Did not notice. It's a UML peculiarity... Can't we move it into some header file which is accessible from everywhere ? Not in the generic kernel. UML has some generally includable headers of its own, but that doesn't really help. The one thing that would help is a libc timeval_to_ns. Fair enough. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm1: somewhat broken forced HPET on ICH5
On Wed, 2007-09-26 at 13:14 +0400, Alexey Dobriyan wrote: On Tue, Sep 25, 2007 at 11:45:17PM +0200, Thomas Gleixner wrote: On Wed, 2007-09-26 at 00:50 +0400, Alexey Dobriyan wrote: ich-force-hpet-ich5-quirk-to-force-detect-enable.patch is causing the following on Etch boot: [initscripts as usual] Setting system clock: [nothing happens for several seconds] select to /dev/rtc to wait for clock tick timed out [initscripts as usual] Then clock is skewed for 3 hours (GMT/MSK difference). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch has the same problem ? It contains the hpet force enable patches as well, but lacks the other crap^Wfeatures of -mm :) Yes, exactly same delay and clock skew. Ok, stupid me. Did not look at your config snippet right away. Can you please enable CONFIG_HPET_EMULATE_RTC ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [discuss] 2.6.23-rc8-mm1, -rc7-mm1 kill audio on HP nx6325
On Wed, 2007-09-26 at 08:32 +0100, Jan Beulich wrote: ioremap_nocache() does __ioremap(..., _PAGE_PCD);, then __ioremap() does ioremap_page_range(..., _PAGE_PCD | other_stuff) That's one. __ioremap() then does ioremap_change_attr(..., _PAGE_PCD);. That's two. So I _think_ we're setting _PAGE_PCD twice on those pte's? Unclear. The implementation is rather different from i386, too. I dunno why __change_page_attr() failed though. Perhaps this, in change_page_attr_addr(): if (!kernel_map || pte_present(pfn_pte(0, prot))) { should be Definitely not, and this code has been that way for a while. I rather suspect this change - if (!kpte) return 0; + if (!kpte) + return -EINVAL; to be the reason for the failure (and I had already sent a comment to this respect to Andi upon his review request). This change exposes the problem. The question is why we do not have a page table entry for the address, which was mapped right before that. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: There still are some oddities. First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Next, on 2.6.23-rc8 with the patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't work correctly. Although the box hibernates and restores, there is a temporary hang during the resume hardware sequence, after which the lock led starts to blink (and remains in this state) and something like this appears in dmesg: Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Unable to handle kernel paging request at 806c64d4 RIP: [802104cb] identify_cpu+0x2ac/0x5a1 Hmm. That's really early in the CPU bring up. The only change in this area is the C1E patch. Can you decode the exact source line, where it is failing ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
Rafael, On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote: First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Seems to be reproducible, though. I'll investigate further. So far, the results are the following: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 x86-64: Disable local APIC timer use on AMD systems with C1E It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ] 2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems with C1E patch applied behaves like the current -git 3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_ OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x8664 code uses the broadcast only when the local apic timer is active, i.e. noapictimer is not on the command line. This defeats the whole purpose of noapictimer. It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion. apicmaintimer on your box is not going to work. See the C1E patch. apicmaintimer switches off PIT and then waits for ever for the local APIC timer interrupts. 4) 2.6.22 behaves like 2.6.23-rc8 No surprise 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with noapictimer 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the x86-64: Disable local APIC timer use on AMD systems with C1E patch boots without any extra command line options That's consistent behaviour. Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wed, 2007-09-26 at 15:22 -0700, Linus Torvalds wrote: On Wed, 26 Sep 2007, Thomas Gleixner wrote: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 Reverted. OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) It's even worse than I thought on the first check: noapictimer on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote: Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. No big deal. I'm glad that you've found what's up. Well, we still have the CPU hotplug during suspend w/ the hrt patch problem to debug ... ;-) Yeah. Knowing the actual line of code where it breaks might be helpful. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Compile handle_percpu_irq even for uniprocessor kernels
On Thu, 2007-09-27 at 12:24 +0100, Ralf Baechle wrote: Compiling handle_percpu_irq only on uniprocessor generates an artificial special case so a typical use like: set_irq_chip_and_handler(irq, some_irq_type, handle_percpu_irq); needs to be conditionally compiled only on SMP systems as well and an alternative UP construct is usually needed - for no good reason. Signed-off-by: Ralf Baechle [EMAIL PROTECTED] Makes sense. Acked-by: Thomas Gleixner [EMAIL PROTECTED] --- This fixes uniprocessor configurations for some MIPS SMP systems. diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index f1a73f0..9b5dff6 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -503,7 +503,6 @@ out_unlock: spin_unlock(desc-lock); } -#ifdef CONFIG_SMP /** * handle_percpu_IRQ - Per CPU local irq handler * @irq: the interrupt number @@ -529,8 +528,6 @@ handle_percpu_irq(unsigned int irq, struct irq_desc *desc) desc-chip-eoi(irq); } -#endif /* CONFIG_SMP */ - void __set_irq_handler(unsigned int irq, irq_flow_handler_t handle, int is_chained, const char *name) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc8-mm2: problems on HP nx6325
On Thu, 2007-09-27 at 17:59 +0200, Rafael J. Wysocki wrote: 2) CPU hotplug is busted (onlining of CPU1 kills the kernel), probably due to the same issue that I'm having with the -hrt version of 2.6.23-rc8 (we're debugging it right now) This one is fixed by the following patch: --- From: Rafael J. Wysocki [EMAIL PROTECTED] Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by a reference to disable_apic_timer (labeled as __initdata) from the CPU initialization code. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] Doh, I knew I blew it. Good catch, thanks, tglx --- arch/x86_64/kernel/apic.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c +++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c @@ -42,7 +42,7 @@ int apic_verbosity; static int apic_calibrate_pmtmr __initdata; -int disable_apic_timer __initdata; +int disable_apic_timer __cpuinitdata; /* Local APIC timer works in C2? */ int local_apic_timer_c2_ok; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NO_HZ hangs up AMD MK-36
On Thu, 2007-09-27 at 23:28 +0300, Dmitry Tyschenko wrote: I have laptop Asus X50M. Using old Debian Etch from February. Kernel from 2.6.21 doesn't boot, hangs up just in 10seconds - 1minute after GRUB screen. I have tryed different versions of gcc (4.1.1, 4.1.2, 4.2.1) to build 2.6.22.8 kernel, but no results. But if I disable NO_HZ option 2.6.21 is working fine for me. We have fixed a bunch of bugs in this area. Can you please try the latest mainline kernel, whether the problem still persists ? Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NO_HZ hangs up AMD MK-36
On Fri, 2007-09-28 at 00:01 +0300, Dmitry Tyschenko wrote: Sorry, I am newbie in linux. Hope you was talking about: /boot/vmlinuz-2.6.22-1-k7 root=/dev/sda5 ro nohz=off Yes. But it doesn't help for Debians 2.6.22-1 (I don't have another prebuiled) still same problems. Can you please add: nolapic_timer instead ? Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] clockevents: fix bogus next_event reset for oneshot broadcast devices
In periodic broadcast mode the next_event member of the broadcast device structure is set to KTIME_MAX in the interrupt handler. This is wrong, as we calculate the next periodic interrupt with this variable. Remove it. Noticed by Ralf. MIPS is the first user of this mode, it does not affect existing users. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Acked-and-tested-by: Ralf Baechle [EMAIL PROTECTED] --- diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 0962e05..acf15b4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -176,8 +176,6 @@ static void tick_do_periodic_broadcast(void) */ static void tick_handle_periodic_broadcast(struct clock_event_device *dev) { - dev-next_event.tv64 = KTIME_MAX; - tick_do_periodic_broadcast(); /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1 powerpc - kgdb is broken
On Fri, 2007-09-28 at 16:07 +0530, Kamalesh Babulal wrote: The kgdb is also broken with 2.6.23-rc8-mm2 on the powerpc . The below patch disables the kgdb from getting compiled over powerpc platform. Signed-off-by : Kamalesh Babulal [EMAIL PROTECTED] --- --- linux-2.6.23-rc8/lib/Kconfig.kgdb 2007-09-28 06:33:37.0 +0530 +++ linux-2.6.23-rc8/lib/~Kconfig.kgdb 2007-09-28 23:48:33.0 +0530 @@ -14,7 +14,7 @@ config KGDB bool KGDB: kernel debugging with remote gdb select WANT_EXTRA_DEBUG_INFORMATION select KGDB_ARCH_HAS_SHADOW_INFO if X86_64 - depends on DEBUG_KERNEL (ARM || X86 || MIPS || (SUPERH !SUPERH64) || IA64 || PPC) + depends on DEBUG_KERNEL (ARM || X86 || MIPS || (SUPERH !SUPERH64) || IA64 || !PPC) This enables the KGDB config for _ALL_ platforms except powerpc. Just remove PPC completely. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spin_lock_unlocked cleanups
On Fri, 2007-09-28 at 09:56 +0100, Andy Whitcroft wrote: I think we're ready to wire checkpatch up to a email robot which monitors the mailing lists and sends people nastygrams. I bet that'll be popular ;) We should wire it up to git-commit as well. A lot of that comes in via git subsystems. The problem with git-commit is who's repo to add the hook to. I did attempt to do this by picking up each of linus' main releases and then using the git blame engine to attribute each failure to a particular commit. The plan then would be to send a nasty-gram to the committer about violations there-in. I'll try and find some time to get this bit polished and at least emailing me. The question is, whether we can convince the git developers to integrate it. When a commit happens and checkpatch.pl is in scripts/, then run the patch through it before doing the actual commit. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disable non-boot CPUs before poweroff
On Fri, 2007-09-28 at 09:52 -0400, Mark Lord wrote: We need to disable all CPUs other than the boot CPU (usually 0) before attempting to power-off modern SMP machines. This seems to fix the hang-on-poweroff issue that one of my SMP boxes exhibits. More testing required. Signed-off-by: Mark Lord [EMAIL PROTECTED] Fixes my new toybox as well. Thanks for tracking it down before I had to dig in. Acked-by: Thomas Gleixner [EMAIL PROTECTED] --- --- linux/kernel/sys.c.orig 2007-09-13 09:49:11.0 -0400 +++ linux/kernel/sys.c2007-09-28 09:48:54.0 -0400 @@ -32,6 +32,7 @@ #include linux/getcpu.h #include linux/task_io_accounting_ops.h #include linux/seccomp.h +#include linux/cpu.h #include linux/compat.h #include linux/syscalls.h @@ -879,6 +880,7 @@ if (pm_power_off_prepare) pm_power_off_prepare(); sysdev_shutdown(); + disable_nonboot_cpus(); printk(KERN_EMERG Power down.\n); machine_power_off(); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disable non-boot CPUs before poweroff
On Fri, 2007-09-28 at 17:05 +0200, Rafael J. Wysocki wrote: if (pm_power_off_prepare) pm_power_off_prepare(); sysdev_shutdown(); + disable_nonboot_cpus(); Before sysdev_shutdown(), please. sysdev_shutdown() may touch things that belong to CPU0. Damn, you're right. Missed that. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8]
On Fri, 2007-09-28 at 11:07 -0400, Chuck Ebbert wrote: On 09/26/2007 06:35 PM, Thomas Gleixner wrote: It's even worse than I thought on the first check: noapictimer on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed. Is this why I get 1000 spurious interrupts/second on IRQ7 when booting x86_64 with noapic? No, thats a different problem. The wreckage is a stuck local apic timer interrupt. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spin_lock_unlocked cleanups
On Fri, 2007-09-28 at 01:26 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 10:17:30 +0200 Thomas Gleixner [EMAIL PROTECTED] wrote: can we please add this to checkpatch.pl ? -spinlock_t bpci_lock = SPIN_LOCK_UNLOCKED; +DEFINE_SPINLOCK(bpci_lock); That check is already in checkpatch. Problem is that hardly anyone runs the thing. Sigh, I forgot that perl is write only. :) I think we're ready to wire checkpatch up to a email robot which monitors the mailing lists and sends people nastygrams. I bet that'll be popular ;) We should wire it up to git-commit as well. A lot of that comes in via git subsystems. (I'd love it if it could detect wordwrapped and tab-expanded patches, too. You wouldn't _believe_...) I know ... tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spin_lock_unlocked cleanups
On Thu, 2007-09-27 at 23:36 +0200, roel wrote: Replace some SPIN_LOCK_UNLOCKED with DEFINE_SPINLOCK Signed-off-by: Roel Kluin [EMAIL PROTECTED] Acked-by: Thomas Gleixner [EMAIL PROTECTED] Andy, Randy, can we please add this to checkpatch.pl ? -spinlock_t bpci_lock = SPIN_LOCK_UNLOCKED; +DEFINE_SPINLOCK(bpci_lock); This code was introduced in June 2007, almost two years after the first big DEFINE_SPINLOCK cleanup. Sigh. Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/2] suspend/resume regression fixes
On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote: Linus Torvalds wrote: On Sat, 22 Sep 2007, Thomas Gleixner wrote: My final enlightment was, when I removed the ACPI processor module, which controls the lower idle C-states, right before resume; this worked fine all the time even without all the workaround hacks. I really hope that this two patches finally set an end to the jinxed VAIO heisenbug series, which started when we removed the periodic tick with the clockevents/dyntick patches. Ok, so the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the *why* does this happen? question. On a closely related note: I just now submitted a patch to fix SMP-poweroff, by having it do disable_nonboot_cpus before doing poweroff. Which has led me to thinking.. ..are similar precautions perhaps necessary for *all* ACPI BIOS calls? Because one never knows what the other CPUs are doing at the same time, and what the side effects may be on the ACPI BIOS functions. And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls only ever happen from CPU#0 (or the boot CPU)? Or do we do that already? The ACPI calls are serialized in the kernel, AFAICT. But the fragile situations (suspend, resume, shutdown, reboot) are probably those, where some BIOS implementation expect that certain things are not called or not active. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Sun, 30 Sep 2007, Andi Kleen wrote: OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. It does not matter whether it makes sense to you or not. It is a command line option which bricks systems. There is neither an explanation in Dokumentation/kernel-parameters.txt nor a check in the code, which disables this completely. It makes a lot of sense even with the existing architecture. Trouble shooting a box, where the local apic timer does not work correctly is not an UP only requirement. Yes, it is a hack, a _bad_ hack. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) There is nothing really to fix currently. Clockevents changes behaviour majorly (always using APIC timers without irq 0 backups[1]) and that causes problems that need new workarounds and new fixes (surprise surprise!) That merge would probably fix a few more such Thomas doesn't understand the code bugs I guess because he hacks much more on i386 than x86-64; but if the overall result will be really better is a totally different question. I understand the code quite well. I'm just surprised from time to time by interesting hacks in the so clean x8664 tree. [1] Or let's call it I trust all my time to the CPU and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. No, it's not dangerous. We spent quite some time to make the clock events layer flexible enough to handle the current problems and the design allows to add more infrastructure when necessary. The maybe new (mis)features of upcoming CPUs need to be addressed with or without clock events and they need to be done careful and not by random hacks. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] robust futex thread exit race
On Sun, 30 Sep 2007, Ingo Molnar wrote: * Martin Schwidefsky [EMAIL PROTECTED] wrote: Hi Ingo, I finally found the bug that causes tst-robust8 from the glibc to fail on s390x. Turned out to be a common code problem with the processing of the robust futex list. The patch below fixes the bug for me. good catch! A quick preliminary review of your patch indicates it's fine - and it might be v2.6.23 material. Acked-by: Ingo Molnar [EMAIL PROTECTED] Acked-by: Thomas Gleixner [EMAIL PROTECTED] Calling handle_futex_death in exit_robust_list for the different robust mutexes of a thread basically frees the mutex. Another thread might grab the lock immediately which updates the next pointer of the mutex. fetch_robust_entry over the next pointer might therefore branch into the robust mutex list of a different thread. This can cause two problems: 1) some mutexes held by the dead thread are not getting freed and 2) some mutexs held by a different thread are freed. The next point need to be read before calling handle_futex_death. nasty race... Ulrich, Thomas, do you concur? Yes. Where do they sell those brown paperbags again ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86-64 sporadic hang in 2.6.23rc7 and 2.6.22
On Sat, 29 Sep 2007, Helge Hafting wrote: Thomas Gleixner wrote: I have gone back to 2.6.22rc4, which seems to work. This is a single opteron, although on a dual-slot board. Can you switch to serial console, so we can get some information out of that box? Sysrq-B is working, so we can get info from other sysrq functions as well. I didn't need the serial - it crashes during console work too. I think a make clean was in progress at the time. There must be work going on in order to crash. This time 2.6.22rc4 died on me with a general protection fault I got two reports, the first one scrolled partially off screen but the whole trace was there: That's why I asked for a serial console. That way we can get all the information from the reports including the register dumps Then I got: spinlock lockup on cpu #0, kswapd 0/212 That's probably caused by the previous one. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: x86_64 and AMD with C1E
On Mon, 1 Oct 2007, Mikhail Kshevetskiy wrote: No, it boot and work normally. The only thing i bother, is the additional 260 timer interrupts per seconds. Here is short result: c1e enabled: -- power consumption about 23 watts -- there is only C1 power state enabled -- there are about 260 timer interrupts per seconds tested with x86_64(2.6.22, 2.6.23-rc8, 2.6.23-rc8-hrt1 ), i386(2.6.21, 2.6.22, 2.6.23-rc5-hrt1) c1e disabled: -- power consumption about 27 watts -- there are no any power state enabled (including C1) -- there are no additional 260 timer interrupts per seconds tested with 2.6.23-rc6-hrt1/x86_64. I want to reduce the power consumption of my notebook. I see the 2 possibility: -- remove 260 additional timer interrupts (c1e enabled case ) There is work in progress on a patch, which allows to utilize the hpet timers as per cpu timers. This should solve the problem. Be patient. Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: x86_64 and AMD with C1E
On Mon, 1 Oct 2007, Andi Kleen wrote: There is work in progress on a patch, which allows to utilize the hpet timers as per cpu timers. This should solve the problem. Be patient. Given that e.g. ICH8 only has 3 HPET timers that seems doubtful except for the special case of single-socket non hyper threaded dual core. You'll probably do a lot of broadcasting and IPI'ing still. Also you'll likely make user space unhappy which often requires at least one free HPET timer for /dev/rtc. Ok I suppose that could be replaced with a hrtimer. Yes, we can replace rtc with a hrtimer. Also HPET can operate in non legacy irq mode, so the legacy rtc is still available. So if the number of hpet channels is greater/equal to the number of possible CPUs it's perfectly fine and does not need IPI at all. Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: x86_64 and AMD with C1E
On Mon, 1 Oct 2007, Andi Kleen wrote: So if the number of hpet channels is greater/equal to the number of possible CPUs it's perfectly fine and does not need IPI at all. That is only a stop gap then. I don't see this being generally true in the future. e.g. Intel announced SMT will be soon back so even a standard dual core would exceed it with current southbridges. Sigh. We have to deal with current hardware and the problems of exactly that hardware. We have the possibility to solve problems and witchcrafting what might happen next is not a good reason not to do so. Also I'm not sure but I suspect non Intel HPETs have less than three timers. Certainly they generally miss the 64bitness. two timers are enough and 64 bit is nice to have, but not a requirement. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote: On Mon, 1 Oct 2007, Andi Kleen wrote: On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Arjan van de Ven wrote: I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. Fix below. Post .23 material. I work out a separate one for the x8664 clock events series. tglx [PATCH] i386: Fix nmi watchdog per cpu timer irq accounting The clock events patches changed the interrupt distribution and the local apic timer interrupt accounting for the broadcast case. The per cpu clock events handler of the cpu, which runs the broadcast interrupt, is executed directly in the broadcast irq context. This does not invoke the low level arch code, which does the local apic timer irq accounting. The work around for false positives in the nmi watchdog was to add the irq0 interrupts (broadcast device) to the local apic timer interrupts. This falsifies the results for the CPUs which are not handling the broadcast interrupt, i.e. stuck CPUs might be not detected, as noticed by Andi Kleen. It would be possible to move the clockevents handler invocation of the CPU which runs the broadcast interrupt into the tick device broadcast function, but this would require to handle the per cpu device to this function and perform the direct operation in the clock device specific architecture code. Right now this is only i386 and x86_64, but MIPS is on the way to use the broadcast mode as well. Introduce a weak function tick_broadcast_account(), which allows x86 to adjust the local apic timer interrupt counter in the case when the cpu local timer handler has been invoked. This keeps the cpu local handler decision and invocation in the common code and allows x86 to handle the nmi watchdog accounting correctly. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c index 3d67ae1..180dde8 100644 --- a/arch/i386/kernel/apic.c +++ b/arch/i386/kernel/apic.c @@ -283,6 +283,16 @@ static void lapic_timer_broadcast(cpumask_t mask) } /* + * Called from the broadcasting code to keep the local apic timer irq + * accounting straight for the nmi watchdog. Is called with interrupts + * disabled. + */ +void tick_broadcast_account(int cpu) +{ + per_cpu(irq_stat, cpu).apic_timer_irqs++; +} + +/* * Setup the local APIC timer for this CPU. Copy the initilized values * of the boot CPU and register the clock event in the framework. */ diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c index c7227e2..03cdcaf 100644 --- a/arch/i386/kernel/nmi.c +++ b/arch/i386/kernel/nmi.c @@ -349,11 +349,7 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) cpu_clear(cpu, backtrace_mask); } - /* -* Take the local apic timer and PIT/HPET into account. We don't -* know which one is active, when we have highres/dyntick on -*/ - sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0]; + sum = per_cpu(irq_stat, cpu).apic_timer_irqs; /* if the none of the timers isn't firing, this cpu isn't doing much */ if (!touched last_irq_sums[cpu] == sum) { diff --git a/include/linux/tick.h b/include/linux/tick.h index 9a7252e..99b3021 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -73,6 +73,7 @@ static inline void tick_cancel_sched_timer(int cpu) { } # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST extern struct tick_device *tick_get_broadcast_device(void); extern cpumask_t *tick_get_broadcast_mask(void); +extern void tick_broadcast_account(int cpu); # ifdef CONFIG_TICK_ONESHOT extern cpumask_t *tick_get_broadcast_oneshot_mask(void); diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 0962e05..43d0085 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -123,6 +123,16 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu) } /* + * Weak function for cpu local interrupt accounting. Used by x86 to + * keep the lapic accounting correct for nmi_watchdog. + * + * Must be called with interrupts disabled. + */ +void __attribute__((weak)) tick_broadcast_account(int cpu) +{ +} + +/* * Broadcast the event to the cpus, which are set in the mask */ int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. That cannot happen right now because cpu_disable() on both i386/x86-64 reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both do that already. I was wrong earlier in being concerned about this. int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev); + tick_broadcast_account(cpu); That would not handle the case with a single CPU running only irq 0 but not broadcasting I think. Hmm. The only situation where this can happen is when you add nolapic_timer to the command line on a single CPU system. We do not register the lapic dummy clock event device then. I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog is the correct fix Yup, I completely missed the fact, that we reject CPU#0 unplugging, so your fix seems indeed to be more correct and simpler. OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? Hmm, in lowestpriority mode it would be possible that the APIC changes the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even if that happens you should have them all on 1. Maybe the chipset is just ignoring the IO-APIC configuration in this case? Is it always the same chipset? Is it seen on i386 too? The problem is really that if this happens it's more than the NMI watchdog that is broken. If you don't run an additional APIC timer interrupt on CPU #0 it's possible that CPU #0 won't schedule at all. The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. The clock events code does handle this already. The broadcast interrupt can come in on any cpu. It's just the nmi watchdog which would be affected by that. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86 patches was Re: -mm merge plans for 2.6.24
On Tue, 2 Oct 2007, Andi Kleen wrote: On Tue, Oct 02, 2007 at 09:37:03AM +0200, Ingo Molnar wrote: * Andrew Morton [EMAIL PROTECTED] wrote: On 02 Oct 2007 08:18:17 +0200 Andi Kleen [EMAIL PROTECTED] wrote: The clockevents patches are not included in this; but given the recent trouble i'm not 100% sure they are even ready yet. i'm curious, which recent trouble do you refer to? (The NMI watchdog bug [which is off by default] was fixed quickly. The C1E bug was found and fixed quickly. Anything else i missed?) C1e and now the misrouted irq 0s Thomas reported. Also i'm a little worried about the missing C1e check; it looks like it needs a re-review to make sure not other infrastructure was missing. I had completely forgotten about the C1E problem, which we debugged half a year ago on 32bit. I went through the other pitfalls we had in 32bit carefully again and they are all covered on 64 bit too. C1E was the only one I missed. The irq0 problem is not a real one. The clock events code has no irq0 bound to cpuX assumption at all. The only affected part is nmi_watchdog and I have a fix ready to handle this even for the irq#0 not on cpu#0 case. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..
On Mon, 1 Oct 2007, Linus Torvalds wrote: This is also a good time to warn about the fact that we're doing the x86 merge very soon (as in the next day or two) after 2.6.23 is out, so if you have pending patches for the next series that touch arch/i386 or x86-64, you should get in touch with Thomas Gleixner and Ingo Molnar, who are the keepers of the merge scripts, and will help you prepare.. Doing it as early as possible in the 2.6.24-rc4 series (basically I'll do it first thing) will mean that we'll have the maximum amount of time to sort out any issues, and the thing is, Thomas and Ingo already have a tree ready to go, so people can check their work against that, and don't need to think that they have to do any fixups after it his *my* tree. It would be much better if everybody was just ready for it, and not taken by surprise. In other words, people who know they may be affected and would want to prepare can look at (for example) git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86 and generally get ready for the switch-over. I have uploaded an update of the arch/x86 tree based on -rc9 to git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86 For convenience there is a patch fixup script which helps you to convert pending patches against this tree. http://userweb.kernel.org/~tglx/x86/x86-fixup-patches.py It's generated from the merge script and fixes the namespace of patches. There will still be some rejects which can not be fixed up automatically, but this should be rare. I did a test with Andrews -mm series and only ~10 arch/x86 related patches had rejects, out of 230+ patches, so the 100%-painless conversion ratio is better than 95%. Those patches with rejects were trivial to fix. Usage: x86-fixup-patches.py sourcepatch destpatch source and dest can be the same. A helper script to convert complete quilt series is here: http://userweb.kernel.org/~tglx/x86/fixupseries.sh If there is anything we can help with the transition, please do not hesitate to ask. Thanks, Thomas, Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/