Re: 2.6.25-rc2: wpa_supplicant BUGs kernel in rwlock recursion
On Feb 16, 2008 6:14 PM, Alessandro Suardi <[EMAIL PROTECTED]> wrote: > Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0, Same thing here, bisected it to: commit 45b503548210fe6f23e92b856421c2a3f05fd034 Author: Laszlo Attila Toth balabit.hu> Date: Tue Feb 12 22:42:09 2008 -0800 [RTNETLINK]: Send a single notification on device state changes. The revert applies cleanly and fixes the problem. Rafael has more details in http://lkml.org/lkml/2008/2/15/542. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc2: wpa_supplicant BUGs kernel in rwlock recursion
On Feb 16, 2008 6:14 PM, Alessandro Suardi [EMAIL PROTECTED] wrote: Feb 16 16:51:49 sandman kernel: BUG: rwlock recursion on CPU#0, Same thing here, bisected it to: commit 45b503548210fe6f23e92b856421c2a3f05fd034 Author: Laszlo Attila Toth panther at balabit.hu Date: Tue Feb 12 22:42:09 2008 -0800 [RTNETLINK]: Send a single notification on device state changes. The revert applies cleanly and fixes the problem. Rafael has more details in http://lkml.org/lkml/2008/2/15/542. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Help debugging filesystem activity?
On Feb 11, 2008 2:17 PM, rzryyvzy <[EMAIL PROTECTED]> wrote: > $ cat /proc/fs/vfs/reading_files > > $ cat /proc/fs/vfs/writing_files You can try: # echo 1 > /proc/sys/vm/block_dump # dmesg HTH. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Help debugging filesystem activity?
On Feb 11, 2008 2:17 PM, rzryyvzy [EMAIL PROTECTED] wrote: $ cat /proc/fs/vfs/reading_files $ cat /proc/fs/vfs/writing_files You can try: # echo 1 /proc/sys/vm/block_dump # dmesg HTH. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 019/233] proc: fix the threaded /proc/self
On Feb 8, 2008 1:18 PM, <[EMAIL PROTECTED]> wrote: > Long ago when the CLONE_THREAD support first went it someone thought it > would be wise to point /proc/self at /proc/ instead of /proc/. The last message about this conversation is: http://lkml.org/lkml/2007/12/1/172 So I thought we would end up with a new file, in order to make the change discoverable. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 019/233] proc: fix the threaded /proc/self
On Feb 8, 2008 1:18 PM, [EMAIL PROTECTED] wrote: Long ago when the CLONE_THREAD support first went it someone thought it would be wise to point /proc/self at /proc/tgid instead of /proc/pid. The last message about this conversation is: http://lkml.org/lkml/2007/12/1/172 So I thought we would end up with a new file, in order to make the change discoverable. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
On Jan 4, 2008 4:19 PM, Al Viro <[EMAIL PROTECTED]> wrote: > Umm... Actually, m_next() and m_stop() both appear to be too convoluted. > > * m_next() never gets v == NULL > * the only reason why we do that mmput et.al. both from ->next() and > ->stop() is that we try to avoid having priv->mm; why bother? > * why the _hell_ is proc_maps_private defined in include/linux/proc_fs.h, > of all places? > * while we are at it, why is it in any header at all? Having that sucker > in task_mmu.c and task_nommu.c would be more than enough (and we'd avoid > that ifdef in definition, while we are at it). > > How about this: Hi Al, Any update on this patch? As you completely rewrote it, I thought you would take care of pushing it forward. Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
On Jan 4, 2008 4:19 PM, Al Viro [EMAIL PROTECTED] wrote: Umm... Actually, m_next() and m_stop() both appear to be too convoluted. * m_next() never gets v == NULL * the only reason why we do that mmput et.al. both from -next() and -stop() is that we try to avoid having priv-mm; why bother? * why the _hell_ is proc_maps_private defined in include/linux/proc_fs.h, of all places? * while we are at it, why is it in any header at all? Having that sucker in task_mmu.c and task_nommu.c would be more than enough (and we'd avoid that ifdef in definition, while we are at it). How about this: Hi Al, Any update on this patch? As you completely rewrote it, I thought you would take care of pushing it forward. Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: remove unused code in set_cyc2ns_scale()
On 1/31/08, Ingo Molnar <[EMAIL PROTECTED]> wrote: > hm, this is not a pure elimination of dead code, this will change > behavior. For example we wont call sched_clock_idle_sleep_event() on > !cpu_khz now. Hm? Oops, indeed I overlooked that. OTOH, I can't see how it can happen (in 32 bit at least), and even if it happens it should not have any effect. But I'll keep this check to avoid making this case illegal. Thanks for the review. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
On 1/31/08, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > Does something like this help? I made it compile by open coding undefined macros instead of refactoring the whole file. But it didn't affect wake up latencies. Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in work_resched
On 1/31/08, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > works for me :-( (x86_64 rawhide userspace) i386, !SMP, Fedora 8 here. > Could you send your .config? Here we go: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24 # Thu Jan 31 12:33:36 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set CONFIG_ARCH_SUPPORTS_OPROFILE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="-gc" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_NS=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set # CONFIG_CGROUP_CPUACCT is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_LBD=y CONFIG_BLK_DEV_IO_TRACE=y # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_CLASSIC_RCU=y # CONFIG_PREEMPT_RCU is not set # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_X86_RDC321X is not set # CONFIG_X86_VSMP is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set # CONFIG_GENERIC_CPU is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y
Re: Hang in work_resched
On Jan 31, 2008 9:55 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > Does this patch from thomas fix it as well? Unfortunately, not. For information, reverting just the first part of the offending commit (sl->timer.cb_mode) fixed the problem, while reverting only the second part (if (!hrtimer_active(>timer))) had no effect. Also, I found a trivially reproductible testcase : sleep 0. It hangs in nanosleep({0, 0}). -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in work_resched
On Jan 31, 2008 9:55 AM, Peter Zijlstra [EMAIL PROTECTED] wrote: Does this patch from thomas fix it as well? Unfortunately, not. For information, reverting just the first part of the offending commit (sl-timer.cb_mode) fixed the problem, while reverting only the second part (if (!hrtimer_active(t-timer))) had no effect. Also, I found a trivially reproductible testcase : sleep 0. It hangs in nanosleep({0, 0}). -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in work_resched
On 1/31/08, Peter Zijlstra [EMAIL PROTECTED] wrote: works for me :-( (x86_64 rawhide userspace) i386, !SMP, Fedora 8 here. Could you send your .config? Here we go: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24 # Thu Jan 31 12:33:36 2008 # # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y # CONFIG_GENERIC_LOCKBREAK is not set CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_GENERIC_GPIO is not set CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y # CONFIG_RWSEM_GENERIC_SPINLOCK is not set CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y # CONFIG_GENERIC_TIME_VSYSCALL is not set # CONFIG_HAVE_SETUP_PER_CPU_AREA is not set CONFIG_ARCH_SUPPORTS_OPROFILE=y # CONFIG_ZONE_DMA32 is not set CONFIG_ARCH_POPULATES_NODE_MAP=y # CONFIG_AUDIT_ARCH is not set CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION=-gc # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=17 CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_NS=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set # CONFIG_CGROUP_CPUACCT is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y # CONFIG_MODULE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_BLOCK=y CONFIG_LBD=y CONFIG_BLK_DEV_IO_TRACE=y # CONFIG_LSF is not set CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq CONFIG_CLASSIC_RCU=y # CONFIG_PREEMPT_RCU is not set # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_X86_RDC321X is not set # CONFIG_X86_VSMP is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set # CONFIG_GENERIC_CPU is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=4
Re: High wake up latencies with FAIR_USER_SCHED
On 1/31/08, Peter Zijlstra [EMAIL PROTECTED] wrote: Does something like this help? I made it compile by open coding undefined macros instead of refactoring the whole file. But it didn't affect wake up latencies. Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: remove unused code in set_cyc2ns_scale()
On 1/31/08, Ingo Molnar [EMAIL PROTECTED] wrote: hm, this is not a pure elimination of dead code, this will change behavior. For example we wont call sched_clock_idle_sleep_event() on !cpu_khz now. Hm? Oops, indeed I overlooked that. OTOH, I can't see how it can happen (in 32 bit at least), and even if it happens it should not have any effect. But I'll keep this check to avoid making this case illegal. Thanks for the review. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in work_resched
On Jan 29, 2008 11:30 PM, Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > === > gnome-termina S 0027 0 2201 1 >f6711fb0 00200082 cb330d62 0027 f664105c 0b1e cb331880 >0027 f660d780 009e3840 080ab7d8 080ab298 f6711000 c0103e7e 009e3840 >000e0002 0002 080ab7d8 080ab298 bfb41be8 080ab7d8 007b c010007b > Call Trace: > [] work_resched+0x5/0x16 > === > > This corresponds to the cli instruction: > c0103e7e: fa cli I bisected it, and the resulting commit is appended. Rerverting this commit applies cleanly on today's git (dd430ca20c40ecccd6954a7efd13d4398f507728) and makes the hang go away -:) commit 37bb6cb4097e29ffee970065b74499cbf10603a3 Author: Peter Zijlstra <[EMAIL PROTECTED]> Date: Fri Jan 25 21:08:32 2008 +0100 hrtimer: unlock hrtimer_wakeup hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 061ae28..bd5d6b5 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1293,7 +1293,7 @@ void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, struct task_struct *task) sl->timer.function = hrtimer_wakeup; sl->task = task; #ifdef CONFIG_HIGH_RES_TIMERS - sl->timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_RESTART; + sl->timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_SOFTIRQ; #endif } @@ -1304,6 +1304,8 @@ static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mod do { set_current_state(TASK_INTERRUPTIBLE); hrtimer_start(>timer, t->timer.expires, mode); + if (!hrtimer_active(>timer)) + t->task = NULL; if (likely(t->task)) schedule(); -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hang in work_resched
On Jan 29, 2008 11:30 PM, Guillaume Chazarain [EMAIL PROTECTED] wrote: === gnome-termina S 0027 0 2201 1 f6711fb0 00200082 cb330d62 0027 f664105c 0b1e cb331880 0027 f660d780 009e3840 080ab7d8 080ab298 f6711000 c0103e7e 009e3840 000e0002 0002 080ab7d8 080ab298 bfb41be8 080ab7d8 007b c010007b Call Trace: [c0103e7e] work_resched+0x5/0x16 === This corresponds to the cli instruction: c0103e7e: fa cli I bisected it, and the resulting commit is appended. Rerverting this commit applies cleanly on today's git (dd430ca20c40ecccd6954a7efd13d4398f507728) and makes the hang go away -:) commit 37bb6cb4097e29ffee970065b74499cbf10603a3 Author: Peter Zijlstra [EMAIL PROTECTED] Date: Fri Jan 25 21:08:32 2008 +0100 hrtimer: unlock hrtimer_wakeup hrtimer_wakeup creates a base-lock rq-lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base-lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 061ae28..bd5d6b5 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1293,7 +1293,7 @@ void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, struct task_struct *task) sl-timer.function = hrtimer_wakeup; sl-task = task; #ifdef CONFIG_HIGH_RES_TIMERS - sl-timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_RESTART; + sl-timer.cb_mode = HRTIMER_CB_IRQSAFE_NO_SOFTIRQ; #endif } @@ -1304,6 +1304,8 @@ static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mod do { set_current_state(TASK_INTERRUPTIBLE); hrtimer_start(t-timer, t-timer.expires, mode); + if (!hrtimer_active(t-timer)) + t-task = NULL; if (likely(t-task)) schedule(); -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: remove unused code in set_cyc2ns_scale()
This should be fold into: 4f95bd6e2b21a8c724357463f8341502d47aba13 x86: scale cyc_2_nsec according to CPU frequency Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- arch/x86/kernel/tsc_32.c | 14 +- arch/x86/kernel/tsc_64.c | 14 +- 2 files changed, 10 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 43517e3..e05e221 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -83,20 +83,16 @@ DEFINE_PER_CPU(unsigned long, cyc2ns); static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - unsigned long flags, prev_scale, *scale; - unsigned long long tsc_now, ns_now; + unsigned long flags, *scale; + + if (!cpu_khz) + return; local_irq_save(flags); sched_clock_idle_sleep_event(); scale = _cpu(cyc2ns, cpu); - - rdtscll(tsc_now); - ns_now = __cycles_2_ns(tsc_now); - - prev_scale = *scale; - if (cpu_khz) - *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; /* * Start smoothly with the new frequency: diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 947554d..e0e9d4f 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -44,20 +44,16 @@ DEFINE_PER_CPU(unsigned long, cyc2ns); static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - unsigned long flags, prev_scale, *scale; - unsigned long long tsc_now, ns_now; + unsigned long flags, *scale; + + if (!cpu_khz) + return; local_irq_save(flags); sched_clock_idle_sleep_event(); scale = _cpu(cyc2ns, cpu); - - rdtscll(tsc_now); - ns_now = __cycles_2_ns(tsc_now); - - prev_scale = *scale; - if (cpu_khz) - *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; sched_clock_idle_wakeup_event(0); local_irq_restore(flags); -- 1.5.3.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
On Jan 29, 2008 6:47 AM, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote: > IMHO this is expected results and if someone really needs to cut down > this latency, they can reduce sysctl_sched_latency (which will be bad > from perf standpoint, as we will cause more cache thrashing with that). Thank you very much for the detailed explanation Srivatsa, that made a lot of sense. Unfortunately, it means I'll disable FAIR_USER_SCHED as I initially thought these latencies were caused by my local patches that give each group a load proportional to the max load of its elements. Anyway, I don't absolutely need a fair user scheduler on my laptop, but low latencies in the default configuration are nice to have. I just thought about something to restore low latencies with FAIR_GROUP_SCHED, but it's possibly utter nonsense, so bear with me ;-) The idea would be to reverse the trees upside down. The scheduler would only see tasks (on the leaves) so could apply its interactivity magic, but the hierarchical groups would be used to compute dynamic loads for each task according to their position in the tree: - now: - we schedule each level of the tree starting from the root - with my proposition: - we schedule tasks like with !FAIR_GROUP_SCHED, but calc_delta_fair() would traverse the tree starting from the leaves to compute the dynamic load. Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
On Jan 29, 2008 6:47 AM, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote: IMHO this is expected results and if someone really needs to cut down this latency, they can reduce sysctl_sched_latency (which will be bad from perf standpoint, as we will cause more cache thrashing with that). Thank you very much for the detailed explanation Srivatsa, that made a lot of sense. Unfortunately, it means I'll disable FAIR_USER_SCHED as I initially thought these latencies were caused by my local patches that give each group a load proportional to the max load of its elements. Anyway, I don't absolutely need a fair user scheduler on my laptop, but low latencies in the default configuration are nice to have. I just thought about something to restore low latencies with FAIR_GROUP_SCHED, but it's possibly utter nonsense, so bear with me ;-) The idea would be to reverse the trees upside down. The scheduler would only see tasks (on the leaves) so could apply its interactivity magic, but the hierarchical groups would be used to compute dynamic loads for each task according to their position in the tree: - now: - we schedule each level of the tree starting from the root - with my proposition: - we schedule tasks like with !FAIR_GROUP_SCHED, but calc_delta_fair() would traverse the tree starting from the leaves to compute the dynamic load. Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: remove unused code in set_cyc2ns_scale()
This should be fold into: 4f95bd6e2b21a8c724357463f8341502d47aba13 x86: scale cyc_2_nsec according to CPU frequency Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- arch/x86/kernel/tsc_32.c | 14 +- arch/x86/kernel/tsc_64.c | 14 +- 2 files changed, 10 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 43517e3..e05e221 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -83,20 +83,16 @@ DEFINE_PER_CPU(unsigned long, cyc2ns); static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - unsigned long flags, prev_scale, *scale; - unsigned long long tsc_now, ns_now; + unsigned long flags, *scale; + + if (!cpu_khz) + return; local_irq_save(flags); sched_clock_idle_sleep_event(); scale = per_cpu(cyc2ns, cpu); - - rdtscll(tsc_now); - ns_now = __cycles_2_ns(tsc_now); - - prev_scale = *scale; - if (cpu_khz) - *scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; + *scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; /* * Start smoothly with the new frequency: diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 947554d..e0e9d4f 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -44,20 +44,16 @@ DEFINE_PER_CPU(unsigned long, cyc2ns); static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - unsigned long flags, prev_scale, *scale; - unsigned long long tsc_now, ns_now; + unsigned long flags, *scale; + + if (!cpu_khz) + return; local_irq_save(flags); sched_clock_idle_sleep_event(); scale = per_cpu(cyc2ns, cpu); - - rdtscll(tsc_now); - ns_now = __cycles_2_ns(tsc_now); - - prev_scale = *scale; - if (cpu_khz) - *scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; + *scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; sched_clock_idle_wakeup_event(0); local_irq_restore(flags); -- 1.5.3.7 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
Unfortunately it seems to not be completely fixed, with this script: #!/usr/bin/python import os import time SLEEP_TIME = 0.1 SAMPLES = 5 PRINT_DELAY = 0.5 def print_wakeup_latency(): times = [] last_print = 0 while True: start = time.time() time.sleep(SLEEP_TIME) end = time.time() times.insert(0, end - start - SLEEP_TIME) del times[SAMPLES:] if end > last_print + PRINT_DELAY: copy = times[:] copy.sort() print '%f ms' % (copy[len(copy)/2] * 1000) last_print = end if os.fork() == 0: if os.fork() == 0: os.setuid(1) while True: pass else: os.setuid(2) while True: pass else: os.setuid(1) print_wakeup_latency() I get seemingly unpredictable latencies (with or without the patch applied): # ./sched.py 14.810944 ms 19.829893 ms 1.968050 ms 8.021021 ms -0.017977 ms 4.926109 ms 11.958027 ms 5.995893 ms 1.992130 ms 0.007057 ms 0.217819 ms -0.004864 ms 5.907202 ms 6.547832 ms -0.012970 ms 0.209951 ms -0.002003 ms 4.989052 ms Without FAIR_USER_SCHED, latencies are consistently in the noise. Also, I forgot to mention that I'm on a single CPU. Thanks for the help. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
Hi Srivatsa, On Jan 28, 2008 3:31 AM, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote: > Given that sysctl_sched_wakeup_granularity is set to 10ms by default, > this doesn't sound abnormal. Indeed, by lowering sched_wakeup_granularity I get much better latencies, but lowering sched_latency seems to be more effective. > NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and > not group-level entities. With the patch attached, I could see that wakeup > latencies with FAIR_USER_SCHED are restored to the same level as > !FAIR_USER_SCHED. Thanks for the patch, it works perfectly. > However I am not sure whether that is the way to go. We want to let one group > of > tasks running as much as possible until the fairness/wakeup-latency threshold > is > exceeded. If someone does want better wakeup latencies between groups too, > they > can always tune sysctl_sched_wakeup_granularity. Having an inconsistency here between FAIR_USER_SCHED and !FAIR_USER_SCHED sounds strange, but Ingo took the patch, so I'm happy :-) Thanks for the replies. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
Unfortunately it seems to not be completely fixed, with this script: #!/usr/bin/python import os import time SLEEP_TIME = 0.1 SAMPLES = 5 PRINT_DELAY = 0.5 def print_wakeup_latency(): times = [] last_print = 0 while True: start = time.time() time.sleep(SLEEP_TIME) end = time.time() times.insert(0, end - start - SLEEP_TIME) del times[SAMPLES:] if end last_print + PRINT_DELAY: copy = times[:] copy.sort() print '%f ms' % (copy[len(copy)/2] * 1000) last_print = end if os.fork() == 0: if os.fork() == 0: os.setuid(1) while True: pass else: os.setuid(2) while True: pass else: os.setuid(1) print_wakeup_latency() I get seemingly unpredictable latencies (with or without the patch applied): # ./sched.py 14.810944 ms 19.829893 ms 1.968050 ms 8.021021 ms -0.017977 ms 4.926109 ms 11.958027 ms 5.995893 ms 1.992130 ms 0.007057 ms 0.217819 ms -0.004864 ms 5.907202 ms 6.547832 ms -0.012970 ms 0.209951 ms -0.002003 ms 4.989052 ms Without FAIR_USER_SCHED, latencies are consistently in the noise. Also, I forgot to mention that I'm on a single CPU. Thanks for the help. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High wake up latencies with FAIR_USER_SCHED
Hi Srivatsa, On Jan 28, 2008 3:31 AM, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote: Given that sysctl_sched_wakeup_granularity is set to 10ms by default, this doesn't sound abnormal. Indeed, by lowering sched_wakeup_granularity I get much better latencies, but lowering sched_latency seems to be more effective. NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and not group-level entities. With the patch attached, I could see that wakeup latencies with FAIR_USER_SCHED are restored to the same level as !FAIR_USER_SCHED. Thanks for the patch, it works perfectly. However I am not sure whether that is the way to go. We want to let one group of tasks running as much as possible until the fairness/wakeup-latency threshold is exceeded. If someone does want better wakeup latencies between groups too, they can always tune sysctl_sched_wakeup_granularity. Having an inconsistency here between FAIR_USER_SCHED and !FAIR_USER_SCHED sounds strange, but Ingo took the patch, so I'm happy :-) Thanks for the replies. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
High wake up latencies with FAIR_USER_SCHED
Hi, I noticed some strangely high wake up latencies with FAIR_USER_SCHED using this script: #!/usr/bin/python import os import time SLEEP_TIME = 0.1 SAMPLES = 100 PRINT_DELAY = 0.5 def print_wakeup_latency(): times = [] last_print = 0 while True: start = time.time() time.sleep(SLEEP_TIME) end = time.time() times.insert(0, end - start - SLEEP_TIME) del times[SAMPLES:] if end > last_print + PRINT_DELAY: copy = times[:] copy.sort() print '%f ms' % (copy[len(copy)/2] * 1000) last_print = end if os.fork() == 0: os.setuid(1) for i in xrange(2): if os.fork() == 0: while True: pass else: os.setuid(2) # <-- here print_wakeup_latency() We have two busy loops with UID=1. And UID=2 maintains the running median of its wake up latency. I get these latencies: # ./sched.py 4.300022 ms 4.801178 ms 4.604006 ms 4.606867 ms 4.604006 ms 4.606867 ms 4.604006 ms 4.606867 ms 4.606867 ms 4.676008 ms 4.604006 ms 4.604006 ms 4.606867 ms Disabling FAIR_USER_SCHED restores wake up latencies in the noise: # ./sched.py -0.156975 ms -0.067091 ms -0.022984 ms -0.022984 ms -0.022030 ms -0.022030 ms -0.022030 ms -0.021076 ms -0.015831 ms -0.015831 ms -0.016069 ms -0.015831 ms Strangely enough, another way to restore normal latencies is to change setuid(2) to setuid(1), that is, putting the latency measurement in the same group as the two busy loops. Thanks in advance for any help. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
High wake up latencies with FAIR_USER_SCHED
Hi, I noticed some strangely high wake up latencies with FAIR_USER_SCHED using this script: #!/usr/bin/python import os import time SLEEP_TIME = 0.1 SAMPLES = 100 PRINT_DELAY = 0.5 def print_wakeup_latency(): times = [] last_print = 0 while True: start = time.time() time.sleep(SLEEP_TIME) end = time.time() times.insert(0, end - start - SLEEP_TIME) del times[SAMPLES:] if end last_print + PRINT_DELAY: copy = times[:] copy.sort() print '%f ms' % (copy[len(copy)/2] * 1000) last_print = end if os.fork() == 0: os.setuid(1) for i in xrange(2): if os.fork() == 0: while True: pass else: os.setuid(2) # -- here print_wakeup_latency() We have two busy loops with UID=1. And UID=2 maintains the running median of its wake up latency. I get these latencies: # ./sched.py 4.300022 ms 4.801178 ms 4.604006 ms 4.606867 ms 4.604006 ms 4.606867 ms 4.604006 ms 4.606867 ms 4.606867 ms 4.676008 ms 4.604006 ms 4.604006 ms 4.606867 ms Disabling FAIR_USER_SCHED restores wake up latencies in the noise: # ./sched.py -0.156975 ms -0.067091 ms -0.022984 ms -0.022984 ms -0.022030 ms -0.022030 ms -0.022030 ms -0.021076 ms -0.015831 ms -0.015831 ms -0.016069 ms -0.015831 ms Strangely enough, another way to restore normal latencies is to change setuid(2) to setuid(1), that is, putting the latency measurement in the same group as the two busy loops. Thanks in advance for any help. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dropping some patches from sched-devel
On Jan 25, 2008 5:58 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > sure, done. Thanks. > what method are you using of determining quality? I was talking about code quality: adding a dependency on jiffies does not seems like a good idea. But also, about the clock quality, I was focusing on getting rid of underflows and overflows so relaxed the checks. But I realized all these underflows are definitely needed. I mean, the conversion from TSC to sched_clock always rounds to lower, so overtime it lags a bit. > Could you perhaps try > to automate it? (even better would be some self-test within the kernel > that detects badness) I find the overflow/underflow/warps checks you added in the first place to be sufficent. Not sure we want to add more tests to differentiate between normal and abnormal drifts. Thanks for your prompt reply. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Dropping some patches from sched-devel
Hi Ingo, Can I talk you into dropping these patches of mine from sched-devel (or not send them to Linus): da0f9440cdcb1edd5424de91f326de83de3fe5f9 sched: make sure jiffies is up to date before calling __update_rq_clock() 6eb300ad38fef6db4efe177067a65aaa771596da sched: fix rq->clock overflows detection with CONFIG_NO_HZ They are not of good enough quality, and I'm working on a better approach. Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dropping some patches from sched-devel
On Jan 25, 2008 5:58 PM, Ingo Molnar [EMAIL PROTECTED] wrote: sure, done. Thanks. what method are you using of determining quality? I was talking about code quality: adding a dependency on jiffies does not seems like a good idea. But also, about the clock quality, I was focusing on getting rid of underflows and overflows so relaxed the checks. But I realized all these underflows are definitely needed. I mean, the conversion from TSC to sched_clock always rounds to lower, so overtime it lags a bit. Could you perhaps try to automate it? (even better would be some self-test within the kernel that detects badness) I find the overflow/underflow/warps checks you added in the first place to be sufficent. Not sure we want to add more tests to differentiate between normal and abnormal drifts. Thanks for your prompt reply. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Dropping some patches from sched-devel
Hi Ingo, Can I talk you into dropping these patches of mine from sched-devel (or not send them to Linus): da0f9440cdcb1edd5424de91f326de83de3fe5f9 sched: make sure jiffies is up to date before calling __update_rq_clock() 6eb300ad38fef6db4efe177067a65aaa771596da sched: fix rq-clock overflows detection with CONFIG_NO_HZ They are not of good enough quality, and I'm working on a better approach. Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > FYI, I'm currently trying to track down where rq->clock started to > overflow with nohz=off, and it seems to be before 2.6.23, so my patches > are not at fault ;-) Or maybe I am dreaming and it was always > overflowing. Investigating ... And the winner is: commit 529c77261bccd9d37f110f58b0753d95beaa9fa2 Author: Ingo Molnar <[EMAIL PROTECTED]> Date: Fri Aug 10 23:05:11 2007 +0200 sched: improve rq-clock overflow logic improve the rq-clock overflow logic: limit the absolute rq->clock delta since the last scheduler tick, instead of limiting the delta itself. tested by Arjan van de Ven - whole laptop was misbehaving due to an incorrectly calibrated cpu_khz confusing sched_clock(). Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> diff --git a/kernel/sched.c b/kernel/sched.c index b0afd8d..6247e4a 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -263,6 +263,7 @@ struct rq { unsigned int clock_warps, clock_overflows; unsigned int clock_unstable_events; + u64 tick_timestamp; atomic_t nr_iowait; @@ -341,8 +342,11 @@ static void __update_rq_clock(struct rq *rq) /* * Catch too large forward jumps too: */ - if (unlikely(delta > 2*TICK_NSEC)) { - clock++; + if (unlikely(clock + delta > rq->tick_timestamp + TICK_NSEC)) { + if (clock < rq->tick_timestamp + TICK_NSEC) + clock = rq->tick_timestamp + TICK_NSEC; + else + clock++; rq->clock_overflows++; } else { if (unlikely(delta > rq->clock_max_delta)) @@ -3308,9 +3312,16 @@ void scheduler_tick(void) int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); struct task_struct *curr = rq->curr; + u64 next_tick = rq->tick_timestamp + TICK_NSEC; spin_lock(>lock); __update_rq_clock(rq); + /* +* Let rq->clock advance by at least TICK_NSEC: +*/ + if (unlikely(rq->clock < next_tick)) + rq->clock = next_tick; + rq->tick_timestamp = rq->clock; update_cpu_load(rq); if (curr != rq->idle) /* FIXME: needed? */ curr->sched_class->task_tick(rq, curr); Seems like I originally was not the only one seeing 2 jiffies jumps ;-) I'll adapt my patches. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
Ingo Molnar <[EMAIL PROTECTED]> wrote: > ok. I have applied all but this one Hmm, I couldn't find them in mingo/linux-2.6-sched-devel.git. > i think it's much simpler to do what i have below. Could you try it on > your box? Or if it is using ACPI idle - in that case the callbacks > should already be there and there should be no need for further fixups. > > Subject: x86: idle wakeup event in the HLT loop I use ACPI, so this patch has no effect. FYI, I'm currently trying to track down where rq->clock started to overflow with nohz=off, and it seems to be before 2.6.23, so my patches are not at fault ;-) Or maybe I am dreaming and it was always overflowing. Investigating ... -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block: fix blktrace timestamps
Ingo Molnar <[EMAIL PROTECTED]> wrote: > Correction: it was not a high res time source, it was "the scheduler's > per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res > timesource that was intentionally called the _sched_ clock" ;-) I think the warts of cpu_clock() are fixable, except maybe unsynchronization on SMP which is harder. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
David Dillow <[EMAIL PROTECTED]> wrote: > Patched kernel, nohz=off: > .clock_underflows : 213887 A little bit of warning about these patches, they are WIP, that's why I did not send them earlier. It regress nohz=off. A bit of context: these patches aim at making sure cpu_clock() on my laptop (cpufreq enabled) never overflows/underflows/warps with CONFIG_NOHZ enabled. With these patches, I have a few hundreds overflows and underflows during early bootup, and then nothing :-) Ingo Molnar <[EMAIL PROTECTED]> wrote: > they are from the scheduler git tree (except the first debug patch), but > queued up for v2.6.25 at the moment. You are talking about "x86: scale cyc_2_nsec according to CPU frequency" here, but I don't think it is at stakes here as David has: > CONFIG_CPU_FREQ is not set Let me review my patches myself to give a bit of context: > sched: monitor clock underflows in /proc/sched_debug This, I'd like to have it in .25 just for convenience. > x86: scale cyc_2_nsec according to CPU frequency You already know that one ;-) > sched: fix rq->clock warps on frequency changes This is a bugfix for .25 once the previous patch is applied. I don't think it helps David, but it could help blktrace users with cpufreq enabled. > sched: Fix rq->clock overflows detection with CONFIG_NO_HZ I think this one is the most important for David, but unfortunately it has some problems. > +static inline u64 max_skipped_ticks(struct rq *rq) > +{ > + return nohz_on(cpu_of(rq)) ? jiffies - rq->last_tick_seen + 2 : 1; > +} Here, I initially wrote rq->last_tick_seen + 1 but experiments showed that +2 was needed as I really saw deltas of 2 milliseconds. These patches have two objectives: - taking into account that jiffies are not always incremented by 1 thanks to nohz - as the tick is stopped and restarted it may not tick at the exact expected moment, so allow a window of 1 jiffie. If the tick occurs during the right jiffy, we know the TSC is more precise than the tick so don't correct the clock. And the problem is that I seem to need a window of 2 jiffies, so I need some help. > sched: make sure jiffies is up to date before calling __update_rq_clock() This is one is needed too but I'm less confident in its validity. > scheduler_tick() is not called every jiffies This one is a bit ugly and seems to break nohz=off. > - if (unlikely(rq->clock < next_tick)) { > + if (unlikely(rq->clock < next_tick - nohz_on(cpu) * TICK_NSEC)) { No, I'm not proud of this :-( Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
David Dillow [EMAIL PROTECTED] wrote: Patched kernel, nohz=off: .clock_underflows : 213887 A little bit of warning about these patches, they are WIP, that's why I did not send them earlier. It regress nohz=off. A bit of context: these patches aim at making sure cpu_clock() on my laptop (cpufreq enabled) never overflows/underflows/warps with CONFIG_NOHZ enabled. With these patches, I have a few hundreds overflows and underflows during early bootup, and then nothing :-) Ingo Molnar [EMAIL PROTECTED] wrote: they are from the scheduler git tree (except the first debug patch), but queued up for v2.6.25 at the moment. You are talking about x86: scale cyc_2_nsec according to CPU frequency here, but I don't think it is at stakes here as David has: CONFIG_CPU_FREQ is not set Let me review my patches myself to give a bit of context: sched: monitor clock underflows in /proc/sched_debug This, I'd like to have it in .25 just for convenience. x86: scale cyc_2_nsec according to CPU frequency You already know that one ;-) sched: fix rq-clock warps on frequency changes This is a bugfix for .25 once the previous patch is applied. I don't think it helps David, but it could help blktrace users with cpufreq enabled. sched: Fix rq-clock overflows detection with CONFIG_NO_HZ I think this one is the most important for David, but unfortunately it has some problems. +static inline u64 max_skipped_ticks(struct rq *rq) +{ + return nohz_on(cpu_of(rq)) ? jiffies - rq-last_tick_seen + 2 : 1; +} Here, I initially wrote rq-last_tick_seen + 1 but experiments showed that +2 was needed as I really saw deltas of 2 milliseconds. These patches have two objectives: - taking into account that jiffies are not always incremented by 1 thanks to nohz - as the tick is stopped and restarted it may not tick at the exact expected moment, so allow a window of 1 jiffie. If the tick occurs during the right jiffy, we know the TSC is more precise than the tick so don't correct the clock. And the problem is that I seem to need a window of 2 jiffies, so I need some help. sched: make sure jiffies is up to date before calling __update_rq_clock() This is one is needed too but I'm less confident in its validity. scheduler_tick() is not called every jiffies This one is a bit ugly and seems to break nohz=off. - if (unlikely(rq-clock next_tick)) { + if (unlikely(rq-clock next_tick - nohz_on(cpu) * TICK_NSEC)) { No, I'm not proud of this :-( Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] block: fix blktrace timestamps
Ingo Molnar [EMAIL PROTECTED] wrote: Correction: it was not a high res time source, it was the scheduler's per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res timesource that was intentionally called the _sched_ clock ;-) I think the warts of cpu_clock() are fixable, except maybe unsynchronization on SMP which is harder. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
Ingo Molnar [EMAIL PROTECTED] wrote: ok. I have applied all but this one Hmm, I couldn't find them in mingo/linux-2.6-sched-devel.git. i think it's much simpler to do what i have below. Could you try it on your box? Or if it is using ACPI idle - in that case the callbacks should already be there and there should be no need for further fixups. Subject: x86: idle wakeup event in the HLT loop I use ACPI, so this patch has no effect. FYI, I'm currently trying to track down where rq-clock started to overflow with nohz=off, and it seems to be before 2.6.23, so my patches are not at fault ;-) Or maybe I am dreaming and it was always overflowing. Investigating ... -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
Guillaume Chazarain [EMAIL PROTECTED] wrote: FYI, I'm currently trying to track down where rq-clock started to overflow with nohz=off, and it seems to be before 2.6.23, so my patches are not at fault ;-) Or maybe I am dreaming and it was always overflowing. Investigating ... And the winner is: commit 529c77261bccd9d37f110f58b0753d95beaa9fa2 Author: Ingo Molnar [EMAIL PROTECTED] Date: Fri Aug 10 23:05:11 2007 +0200 sched: improve rq-clock overflow logic improve the rq-clock overflow logic: limit the absolute rq-clock delta since the last scheduler tick, instead of limiting the delta itself. tested by Arjan van de Ven - whole laptop was misbehaving due to an incorrectly calibrated cpu_khz confusing sched_clock(). Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Arjan van de Ven [EMAIL PROTECTED] diff --git a/kernel/sched.c b/kernel/sched.c index b0afd8d..6247e4a 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -263,6 +263,7 @@ struct rq { unsigned int clock_warps, clock_overflows; unsigned int clock_unstable_events; + u64 tick_timestamp; atomic_t nr_iowait; @@ -341,8 +342,11 @@ static void __update_rq_clock(struct rq *rq) /* * Catch too large forward jumps too: */ - if (unlikely(delta 2*TICK_NSEC)) { - clock++; + if (unlikely(clock + delta rq-tick_timestamp + TICK_NSEC)) { + if (clock rq-tick_timestamp + TICK_NSEC) + clock = rq-tick_timestamp + TICK_NSEC; + else + clock++; rq-clock_overflows++; } else { if (unlikely(delta rq-clock_max_delta)) @@ -3308,9 +3312,16 @@ void scheduler_tick(void) int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); struct task_struct *curr = rq-curr; + u64 next_tick = rq-tick_timestamp + TICK_NSEC; spin_lock(rq-lock); __update_rq_clock(rq); + /* +* Let rq-clock advance by at least TICK_NSEC: +*/ + if (unlikely(rq-clock next_tick)) + rq-clock = next_tick; + rq-tick_timestamp = rq-clock; update_cpu_load(rq); if (curr != rq-idle) /* FIXME: needed? */ curr-sched_class-task_tick(rq, curr); Seems like I originally was not the only one seeing 2 jiffies jumps ;-) I'll adapt my patches. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_NO_HZ breaks blktrace timestamps
David Dillow <[EMAIL PROTECTED]> wrote: > At the moment, I'm not sure how to track this farther, or how to fix it > properly. Any advice would be appreciated. Just out of curiosity, could you try the appended cumulative patch and report .clock_warps, .clock_overflows and .clock_underflows as you did. Thanks. commit 20fa02359d971bdb820d238184fabd42d8018e4f Author: Guillaume Chazarain <[EMAIL PROTECTED]> Date: Thu Jan 10 23:36:43 2008 +0100 sched: monitor clock underflows in /proc/sched_debug We monitor clock overflows, let's also monitor clock underflows. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> diff --git a/kernel/sched.c b/kernel/sched.c index 37cf07a..cab9756 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -317,7 +317,7 @@ struct rq { u64 clock, prev_clock_raw; s64 clock_max_delta; - unsigned int clock_warps, clock_overflows; + unsigned int clock_warps, clock_overflows, clock_underflows; u64 idle_clock; unsigned int clock_deep_idle_events; u64 tick_timestamp; @@ -3485,8 +3485,10 @@ void scheduler_tick(void) /* * Let rq->clock advance by at least TICK_NSEC: */ - if (unlikely(rq->clock < next_tick)) + if (unlikely(rq->clock < next_tick)) { rq->clock = next_tick; + rq->clock_underflows++; + } rq->tick_timestamp = rq->clock; update_cpu_load(rq); if (curr != rq->idle) /* FIXME: needed? */ diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 80fbbfc..9e5de09 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -179,6 +179,7 @@ static void print_cpu(struct seq_file *m, int cpu) PN(prev_clock_raw); P(clock_warps); P(clock_overflows); + P(clock_underflows); P(clock_deep_idle_events); PN(clock_max_delta); P(cpu_load[0]); commit c146421cae64bb626714dc951fa39b55d2f819c1 Author: Guillaume Chazarain <[EMAIL PROTECTED]> Date: Wed Jan 2 14:10:17 2008 +0100 commit 60c6397ce4e8c9fd7feaeaef4167ace71c3949c8 x86: scale cyc_2_nsec according to CPU frequency scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ [EMAIL PROTECTED]: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..00bb4c1 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = _cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* +* Start smoothly with the new frequency: +*/ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_block *nb, unsigned long val, void *data) ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +388,8 @@ static inline void check_geode_tsc_reliable(void) { } void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +403,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* +* Secondary CPUs d
Re: CONFIG_NO_HZ breaks blktrace timestamps
David Dillow [EMAIL PROTECTED] wrote: At the moment, I'm not sure how to track this farther, or how to fix it properly. Any advice would be appreciated. Just out of curiosity, could you try the appended cumulative patch and report .clock_warps, .clock_overflows and .clock_underflows as you did. Thanks. commit 20fa02359d971bdb820d238184fabd42d8018e4f Author: Guillaume Chazarain [EMAIL PROTECTED] Date: Thu Jan 10 23:36:43 2008 +0100 sched: monitor clock underflows in /proc/sched_debug We monitor clock overflows, let's also monitor clock underflows. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] diff --git a/kernel/sched.c b/kernel/sched.c index 37cf07a..cab9756 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -317,7 +317,7 @@ struct rq { u64 clock, prev_clock_raw; s64 clock_max_delta; - unsigned int clock_warps, clock_overflows; + unsigned int clock_warps, clock_overflows, clock_underflows; u64 idle_clock; unsigned int clock_deep_idle_events; u64 tick_timestamp; @@ -3485,8 +3485,10 @@ void scheduler_tick(void) /* * Let rq-clock advance by at least TICK_NSEC: */ - if (unlikely(rq-clock next_tick)) + if (unlikely(rq-clock next_tick)) { rq-clock = next_tick; + rq-clock_underflows++; + } rq-tick_timestamp = rq-clock; update_cpu_load(rq); if (curr != rq-idle) /* FIXME: needed? */ diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c index 80fbbfc..9e5de09 100644 --- a/kernel/sched_debug.c +++ b/kernel/sched_debug.c @@ -179,6 +179,7 @@ static void print_cpu(struct seq_file *m, int cpu) PN(prev_clock_raw); P(clock_warps); P(clock_overflows); + P(clock_underflows); P(clock_deep_idle_events); PN(clock_max_delta); P(cpu_load[0]); commit c146421cae64bb626714dc951fa39b55d2f819c1 Author: Guillaume Chazarain [EMAIL PROTECTED] Date: Wed Jan 2 14:10:17 2008 +0100 commit 60c6397ce4e8c9fd7feaeaef4167ace71c3949c8 x86: scale cyc_2_nsec according to CPU frequency scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ [EMAIL PROTECTED]: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..00bb4c1 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include linux/jiffies.h #include linux/init.h #include linux/dmi.h +#include linux/percpu.h #include asm/delay.h #include asm/tsc.h @@ -80,13 +81,31 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * * [EMAIL PROTECTED] math is hard, lets go shopping! */ -unsigned long cyc2ns_scale __read_mostly; -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ +DEFINE_PER_CPU(unsigned long, cyc2ns); -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu) { - cyc2ns_scale = (100 CYC2NS_SCALE_FACTOR)/cpu_khz; + unsigned long flags, prev_scale, *scale; + unsigned long long tsc_now, ns_now; + + local_irq_save(flags); + sched_clock_idle_sleep_event(); + + scale = per_cpu(cyc2ns, cpu); + + rdtscll(tsc_now); + ns_now = __cycles_2_ns(tsc_now); + + prev_scale = *scale; + if (cpu_khz) + *scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; + + /* +* Start smoothly with the new frequency: +*/ + sched_clock_idle_wakeup_event(0); + local_irq_restore(flags); } /* @@ -239,7 +258,9 @@ time_cpufreq_notifier(struct notifier_block *nb, unsigned long val, void *data) ref_freq, freq-new); if (!(freq-flags CPUFREQ_CONST_LOOPS)) { tsc_khz = cpu_khz; - set_cyc2ns_scale(cpu_khz); + preempt_disable(); + set_cyc2ns_scale(cpu_khz, smp_processor_id()); + preempt_enable(); /* * TSC based sched_clock turns * to junk w/ cpufreq @@ -367,6 +388,8 @@ static inline void check_geode_tsc_reliable(void) { } void __init tsc_init(void) { + int cpu; + if (!cpu_has_tsc || tsc_disable) goto out_no_tsc; @@ -380,7 +403,15 @@ void __init tsc_init(void) (unsigned long)cpu_khz / 1000, (unsigned long)cpu_khz % 1000); - set_cyc2ns_scale(cpu_khz); + /* +* Secondary CPUs do not run through tsc_init(), so set up +* all the scale factors
[PATCH] fs-writeback: handle errors in sync_sb_inodes()
Currently it is possible for some errors to be detected at write-back time but not reported to the program as shown by the following script using the included make_file.c. -8<-8<-8<-8<-8<-8<- #!/bin/sh # We binary search the size of a file in 40M filesystem that can cause # the missed error. MIN=500 MAX=5000 rm fs.40M dd if=/dev/zero of=fs.40M bs=40M count=0 seek=1 status=noxfer #mkfs.ext2 -F fs.40M mkfs.ext3 -F fs.40M #mkfs.jfs -q fs.40M #mkfs.reiserfs -fq fs.40M #mkfs.xfs fs.40M attempt() { SIZE=$1 RES=0 ./make_file valid_file $SIZE mount fs.40M /mnt -o loop if ! ./make_file /mnt/not_enough_space $SIZE; then # We could not create the file as the requested size # was clearly too big RES=1 fi umount /mnt if [ $RES -eq 0 ]; then mount fs.40M /mnt -o loop if cmp valid_file /mnt/not_enough_space; then # The file was too small, it fitted in the filesystem RES=-1 fi umount /mnt fi if [ $RES -eq 0 ]; then echo "Undetected ENOSPC with SIZE=$SIZE" exit fi return $RES } while [ $((MAX - MIN)) -gt 1 ]; do SIZE=$(((MIN + MAX) / 2)) attempt $SIZE RES=$? if [ $RES -eq 1 ]; then MAX=$SIZE else MIN=$SIZE fi done echo "Could not reproduce the problem" -8<-8<-8<-8<-8<-8<- /* make_file.c */ #include #include #include #include #include #include int main(int argc, char **argv) { int size, fd; char *mapping; if (argc != 3) { fprintf(stderr, "Usage: %s FILE SIZE\n", argv[0]); return 1; } size = atoi(argv[2]); fd = open(argv[1], O_RDWR | O_CREAT, 0600); if (fd < 0) { perror(argv[1]); return 1; } if (ftruncate(fd, size) < 0) { perror("ftruncate"); return 1; } mapping = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0); if (mapping == MAP_FAILED) { perror("mmap"); return 1; } memset(mapping, 0xFF, size); /* Force a write-back */ sync(); if (msync(mapping, size, MS_SYNC) < 0) { perror("msync"); return 1; } if (close(fd) < 0) { perror("close"); return 1; } printf("%s: successfully written %d bytes\n", argv[1], size); return 0; } -8<-8<-8<-8<-8<-8<- make_file.c mmaps a hole, performs some writeback (memset + sync) and then expects to find some error code in msync(). The script mounts a 40M loopback filesystem and does a binary search to find the size of a file big enough to provoke a ENOSPC, but small enough to show the error not being detected at msync() time. The error window is large enough for such a size to be quickly found, but with this patch, no such file size can be found. All mmap capable filesystems I tested are affected (ext2, ext3, jfs, reiserfs, xfs). XFS is special in that it survives the test thanks to the page_mkwrite() work, i.e. it SIGBUS during memset. Anyway, this behavious solves ENOSPC but does nothing for EIO. The offending code is in fs/fs-writeback.c: sync_sb_inodes(...) () { ... __writeback_single_inode(inode, wbc); ... } __writeback_single_inode() gets the error from mapping->flags, clears it and returns it. But sync_sb_inodes() ignores this return value. In -mm there is sync_sb_inodes-propagate-errors.patch that propagates the error from __writeback_single_inode upwards in the call stack. IMHO, this propagation is useless because: - the error is combined from the errors in all the synced inodes, so it just tells that some inode in a specific fs got an error, - nobody in the call stack is interested in this error: certainly not pdflush, or 'void sync(2)'. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- fs/fs-writeback.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 0fca820..88bb3c4 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -417,6 +417,7 @@ sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc) struct address_space *mapping = inode->i_mapping; struct backing_dev_info *bdi = mapping->backing_dev_info; long pages_skipped; + int e
[PATCH] fs-writeback: handle errors in sync_sb_inodes()
Currently it is possible for some errors to be detected at write-back time but not reported to the program as shown by the following script using the included make_file.c. -8-8-8-8-8-8- #!/bin/sh # We binary search the size of a file in 40M filesystem that can cause # the missed error. MIN=500 MAX=5000 rm fs.40M dd if=/dev/zero of=fs.40M bs=40M count=0 seek=1 status=noxfer #mkfs.ext2 -F fs.40M mkfs.ext3 -F fs.40M #mkfs.jfs -q fs.40M #mkfs.reiserfs -fq fs.40M #mkfs.xfs fs.40M attempt() { SIZE=$1 RES=0 ./make_file valid_file $SIZE mount fs.40M /mnt -o loop if ! ./make_file /mnt/not_enough_space $SIZE; then # We could not create the file as the requested size # was clearly too big RES=1 fi umount /mnt if [ $RES -eq 0 ]; then mount fs.40M /mnt -o loop if cmp valid_file /mnt/not_enough_space; then # The file was too small, it fitted in the filesystem RES=-1 fi umount /mnt fi if [ $RES -eq 0 ]; then echo Undetected ENOSPC with SIZE=$SIZE exit fi return $RES } while [ $((MAX - MIN)) -gt 1 ]; do SIZE=$(((MIN + MAX) / 2)) attempt $SIZE RES=$? if [ $RES -eq 1 ]; then MAX=$SIZE else MIN=$SIZE fi done echo Could not reproduce the problem -8-8-8-8-8-8- /* make_file.c */ #include unistd.h #include sys/fcntl.h #include sys/mman.h #include string.h #include stdio.h #include stdlib.h int main(int argc, char **argv) { int size, fd; char *mapping; if (argc != 3) { fprintf(stderr, Usage: %s FILE SIZE\n, argv[0]); return 1; } size = atoi(argv[2]); fd = open(argv[1], O_RDWR | O_CREAT, 0600); if (fd 0) { perror(argv[1]); return 1; } if (ftruncate(fd, size) 0) { perror(ftruncate); return 1; } mapping = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0); if (mapping == MAP_FAILED) { perror(mmap); return 1; } memset(mapping, 0xFF, size); /* Force a write-back */ sync(); if (msync(mapping, size, MS_SYNC) 0) { perror(msync); return 1; } if (close(fd) 0) { perror(close); return 1; } printf(%s: successfully written %d bytes\n, argv[1], size); return 0; } -8-8-8-8-8-8- make_file.c mmaps a hole, performs some writeback (memset + sync) and then expects to find some error code in msync(). The script mounts a 40M loopback filesystem and does a binary search to find the size of a file big enough to provoke a ENOSPC, but small enough to show the error not being detected at msync() time. The error window is large enough for such a size to be quickly found, but with this patch, no such file size can be found. All mmap capable filesystems I tested are affected (ext2, ext3, jfs, reiserfs, xfs). XFS is special in that it survives the test thanks to the page_mkwrite() work, i.e. it SIGBUS during memset. Anyway, this behavious solves ENOSPC but does nothing for EIO. The offending code is in fs/fs-writeback.c: sync_sb_inodes(...) () { ... __writeback_single_inode(inode, wbc); ... } __writeback_single_inode() gets the error from mapping-flags, clears it and returns it. But sync_sb_inodes() ignores this return value. In -mm there is sync_sb_inodes-propagate-errors.patch that propagates the error from __writeback_single_inode upwards in the call stack. IMHO, this propagation is useless because: - the error is combined from the errors in all the synced inodes, so it just tells that some inode in a specific fs got an error, - nobody in the call stack is interested in this error: certainly not pdflush, or 'void sync(2)'. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- fs/fs-writeback.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 0fca820..88bb3c4 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -417,6 +417,7 @@ sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc) struct address_space *mapping = inode-i_mapping; struct backing_dev_info *bdi = mapping-backing_dev_info; long pages_skipped; + int err; if (!bdi_cap_writeback_dirty(bdi)) { redirty_tail(inode); @@ -461,7 +462,8 @@ sync_sb_inodes(struct
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Al Viro <[EMAIL PROTECTED]> wrote: > How about this: At least the task_mmu part works fine. Tested-by: Guillaume Chazarain <[EMAIL PROTECTED]> -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Al Viro <[EMAIL PROTECTED]> wrote: > vma_stop() doesn't need changes either... Hmmm, not sure ;-) $ cat /proc/1/maps Pid: 2282, comm: cat Not tainted (2.6.24-rc6-gc2 #185) EIP: 0060:[] EFLAGS: 00010286 CPU: 0 EIP is at vma_stop+0xd/0x21 EAX: f7c90360 EBX: f7c90360 ECX: c042b5f0 EDX: ESI: f62aa240 EDI: EBP: f62daf24 ESP: f62daf20 DS: 007b ES: 007b FS: GS: 0033 SS: 0068 Process cat (pid: 2282, ti=f62da000 task=f6264d20 task.ti=f62da000) Stack: f7c90360 f62daf30 c01a40dc f62d0080 f62daf70 c018bdf1 0400 0804f000 f62d0080 f62aa260 0400 f62cc000 f62dafb0 f62d0080 c018bc9e 0804f000 f62daf90 c01751c5 f62daf9c Call Trace: [] show_trace_log_lvl+0x1a/0x2f [] show_stack_log_lvl+0x9d/0xa5 [] show_registers+0xa2/0x1b8 [] die+0x11d/0x202 [] do_general_protection+0x1f7/0x1ff [] error_code+0x6a/0x70 [] m_stop+0xe/0x29 [] seq_read+0x153/0x25a [] vfs_read+0xa6/0x158 [] sys_read+0x3d/0x61 [] sysenter_past_esp+0x6b/0xa1 === Code: 89 50 18 31 d2 89 48 1c 83 c4 5c 89 d0 5b 5e 5f 5d c3 55 31 c9 89 e5 e8 80 fd ff ff 5d c3 55 85 d2 89 e5 53 74 16 3b 50 08 74 11 <8b> 1a 8d 43 34 e8 80 ea f8 ff 89 d8 e8 16 89 f7 ff 5b 5d c3 55 EIP: [] vma_stop+0xd/0x21 SS:ESP 0068:f62daf20 ---[ end trace 297d07fbbfc82b7b ]--- This is an inconsistency in the handling of errors in m_start() between fs/proc/task_mmu.c and fs/proc/task_nommu.c. task_mmu.c: if (IS_ERR(mm) || !mm) return mm; task_nommu.c: if (IS_ERR(mm) || !mm) { put_task_struct(priv->task); priv->task = NULL; return mm; } task_nommu.c does the cleanup while task_mmu.c defers it to m_stop. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Return an error instead of successfully reading an empty file. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> Acked-by: Al Viro <[EMAIL PROTECTED]> --- fs/proc/base.c |2 +- fs/proc/task_mmu.c |6 +++--- fs/proc/task_nommu.c |4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..3aebc85 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -219,7 +219,7 @@ out: task_unlock(task); up_read(>mmap_sem); mmput(mm); - return NULL; + return ERR_PTR(-EPERM); } static int proc_pid_cmdline(struct task_struct *task, char * buffer) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8043a3e..74b4829 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -398,8 +398,8 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv->task); - if (!mm) - return NULL; + if (IS_ERR(mm) || !mm) + return mm; priv->tail_vma = tail_vma = get_gate_vma(priv->task); @@ -437,7 +437,7 @@ out: static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma) { - if (vma && vma != priv->tail_vma) { + if (vma && !IS_ERR(vma) && vma != priv->tail_vma) { struct mm_struct *mm = vma->vm_mm; up_read(>mmap_sem); mmput(mm); diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 1932c2c..53cb062 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -166,10 +166,10 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv->task); - if (!mm) { + if (IS_ERR(mm) || !mm) { put_task_struct(priv->task); priv->task = NULL; - return NULL; + return mm; } /* start from the Nth VMA */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: advertise new restrictions on /proc/*/maps & /proc/*/smaps
Al Viro <[EMAIL PROTECTED]> wrote: > The whole point is that we have to reject it at read() time, not open() > time. Yes, my patch was a complement to yours to propagate the -EPERM in easy cases. As you noted it added restrictions on reading /proc/*/maps, even though I found them acceptable. How about this instead? Maybe you'd prefer to propagate the actual -EPERM from __ptrace_may_attach but that would be more invasive. Sidenote: do you think a sparse annotation to check IS_ERR/PTR_ERR usage would make sense? proc: return -EPERM when preventing read of /proc/*/maps Return an error instead of successfully reading an empty file. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- fs/proc/base.c |2 +- fs/proc/task_mmu.c |8 +--- fs/proc/task_nommu.c |4 ++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..3aebc85 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -219,7 +219,7 @@ out: task_unlock(task); up_read(>mmap_sem); mmput(mm); - return NULL; + return ERR_PTR(-EPERM); } static int proc_pid_cmdline(struct task_struct *task, char * buffer) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8043a3e..db57e65 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -398,8 +398,8 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv->task); - if (!mm) - return NULL; + if (IS_ERR(mm) || !mm) + return mm; priv->tail_vma = tail_vma = get_gate_vma(priv->task); @@ -437,7 +437,7 @@ out: static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma) { - if (vma && vma != priv->tail_vma) { + if (vma && !IS_ERR(vma) && vma != priv->tail_vma) { struct mm_struct *mm = vma->vm_mm; up_read(>mmap_sem); mmput(mm); @@ -451,6 +451,8 @@ static void *m_next(struct seq_file *m, void *v, loff_t *pos) struct vm_area_struct *tail_vma = priv->tail_vma; (*pos)++; + if (IS_ERR(vma)) + return vma; if (vma && (vma != tail_vma) && vma->vm_next) return vma->vm_next; vma_stop(priv, vma); diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 1932c2c..53cb062 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -166,10 +166,10 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv->task); - if (!mm) { + if (IS_ERR(mm) || !mm) { put_task_struct(priv->task); priv->task = NULL; - return NULL; + return mm; } /* start from the Nth VMA */ -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: advertise new restrictions on /proc/*/maps /proc/*/smaps
Al Viro [EMAIL PROTECTED] wrote: The whole point is that we have to reject it at read() time, not open() time. Yes, my patch was a complement to yours to propagate the -EPERM in easy cases. As you noted it added restrictions on reading /proc/*/maps, even though I found them acceptable. How about this instead? Maybe you'd prefer to propagate the actual -EPERM from __ptrace_may_attach but that would be more invasive. Sidenote: do you think a sparse annotation to check IS_ERR/PTR_ERR usage would make sense? proc: return -EPERM when preventing read of /proc/*/maps Return an error instead of successfully reading an empty file. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- fs/proc/base.c |2 +- fs/proc/task_mmu.c |8 +--- fs/proc/task_nommu.c |4 ++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..3aebc85 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -219,7 +219,7 @@ out: task_unlock(task); up_read(mm-mmap_sem); mmput(mm); - return NULL; + return ERR_PTR(-EPERM); } static int proc_pid_cmdline(struct task_struct *task, char * buffer) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8043a3e..db57e65 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -398,8 +398,8 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv-task); - if (!mm) - return NULL; + if (IS_ERR(mm) || !mm) + return mm; priv-tail_vma = tail_vma = get_gate_vma(priv-task); @@ -437,7 +437,7 @@ out: static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma) { - if (vma vma != priv-tail_vma) { + if (vma !IS_ERR(vma) vma != priv-tail_vma) { struct mm_struct *mm = vma-vm_mm; up_read(mm-mmap_sem); mmput(mm); @@ -451,6 +451,8 @@ static void *m_next(struct seq_file *m, void *v, loff_t *pos) struct vm_area_struct *tail_vma = priv-tail_vma; (*pos)++; + if (IS_ERR(vma)) + return vma; if (vma (vma != tail_vma) vma-vm_next) return vma-vm_next; vma_stop(priv, vma); diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 1932c2c..53cb062 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -166,10 +166,10 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv-task); - if (!mm) { + if (IS_ERR(mm) || !mm) { put_task_struct(priv-task); priv-task = NULL; - return NULL; + return mm; } /* start from the Nth VMA */ -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Return an error instead of successfully reading an empty file. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] Acked-by: Al Viro [EMAIL PROTECTED] --- fs/proc/base.c |2 +- fs/proc/task_mmu.c |6 +++--- fs/proc/task_nommu.c |4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..3aebc85 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -219,7 +219,7 @@ out: task_unlock(task); up_read(mm-mmap_sem); mmput(mm); - return NULL; + return ERR_PTR(-EPERM); } static int proc_pid_cmdline(struct task_struct *task, char * buffer) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8043a3e..74b4829 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -398,8 +398,8 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv-task); - if (!mm) - return NULL; + if (IS_ERR(mm) || !mm) + return mm; priv-tail_vma = tail_vma = get_gate_vma(priv-task); @@ -437,7 +437,7 @@ out: static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma) { - if (vma vma != priv-tail_vma) { + if (vma !IS_ERR(vma) vma != priv-tail_vma) { struct mm_struct *mm = vma-vm_mm; up_read(mm-mmap_sem); mmput(mm); diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 1932c2c..53cb062 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -166,10 +166,10 @@ static void *m_start(struct seq_file *m, loff_t *pos) return NULL; mm = mm_for_maps(priv-task); - if (!mm) { + if (IS_ERR(mm) || !mm) { put_task_struct(priv-task); priv-task = NULL; - return NULL; + return mm; } /* start from the Nth VMA */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Al Viro [EMAIL PROTECTED] wrote: vma_stop() doesn't need changes either... Hmmm, not sure ;-) $ cat /proc/1/maps Pid: 2282, comm: cat Not tainted (2.6.24-rc6-gc2 #185) EIP: 0060:[c01a4080] EFLAGS: 00010286 CPU: 0 EIP is at vma_stop+0xd/0x21 EAX: f7c90360 EBX: f7c90360 ECX: c042b5f0 EDX: ESI: f62aa240 EDI: EBP: f62daf24 ESP: f62daf20 DS: 007b ES: 007b FS: GS: 0033 SS: 0068 Process cat (pid: 2282, ti=f62da000 task=f6264d20 task.ti=f62da000) Stack: f7c90360 f62daf30 c01a40dc f62d0080 f62daf70 c018bdf1 0400 0804f000 f62d0080 f62aa260 0400 f62cc000 f62dafb0 f62d0080 c018bc9e 0804f000 f62daf90 c01751c5 f62daf9c Call Trace: [c0104e4a] show_trace_log_lvl+0x1a/0x2f [c0104efc] show_stack_log_lvl+0x9d/0xa5 [c0104fa6] show_registers+0xa2/0x1b8 [c01051d9] die+0x11d/0x202 [c03319f9] do_general_protection+0x1f7/0x1ff [c0331172] error_code+0x6a/0x70 [c01a40dc] m_stop+0xe/0x29 [c018bdf1] seq_read+0x153/0x25a [c01751c5] vfs_read+0xa6/0x158 [c0175583] sys_read+0x3d/0x61 [c0103ea2] sysenter_past_esp+0x6b/0xa1 === Code: 89 50 18 31 d2 89 48 1c 83 c4 5c 89 d0 5b 5e 5f 5d c3 55 31 c9 89 e5 e8 80 fd ff ff 5d c3 55 85 d2 89 e5 53 74 16 3b 50 08 74 11 8b 1a 8d 43 34 e8 80 ea f8 ff 89 d8 e8 16 89 f7 ff 5b 5d c3 55 EIP: [c01a4080] vma_stop+0xd/0x21 SS:ESP 0068:f62daf20 ---[ end trace 297d07fbbfc82b7b ]--- This is an inconsistency in the handling of errors in m_start() between fs/proc/task_mmu.c and fs/proc/task_nommu.c. task_mmu.c: if (IS_ERR(mm) || !mm) return mm; task_nommu.c: if (IS_ERR(mm) || !mm) { put_task_struct(priv-task); priv-task = NULL; return mm; } task_nommu.c does the cleanup while task_mmu.c defers it to m_stop. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: return -EPERM when preventing read of /proc/*/maps
Al Viro [EMAIL PROTECTED] wrote: How about this: At least the task_mmu part works fine. Tested-by: Guillaume Chazarain [EMAIL PROTECTED] -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: advertise new restrictions on /proc/*/maps & /proc/*/smaps
Now that strangers are kept out of /proc//maps, let's welcome them with -EPERM instead of a blank file. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- fs/proc/base.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..c824b23 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2207,7 +2207,7 @@ static const struct pid_entry tgid_base_stuff[] = { INF("cmdline",S_IRUGO, pid_cmdline), INF("stat", S_IRUGO, tgid_stat), INF("statm", S_IRUGO, pid_statm), - REG("maps", S_IRUGO, maps), + REG("maps", S_IRUSR, maps), #ifdef CONFIG_NUMA REG("numa_maps", S_IRUGO, numa_maps), #endif @@ -2219,7 +2219,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG("mountstats", S_IRUSR, mountstats), #ifdef CONFIG_MMU REG("clear_refs", S_IWUSR, clear_refs), - REG("smaps", S_IRUGO, smaps), + REG("smaps", S_IRUSR, smaps), #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, attr_dir), @@ -2533,7 +2533,7 @@ static const struct pid_entry tid_base_stuff[] = { INF("cmdline", S_IRUGO, pid_cmdline), INF("stat", S_IRUGO, tid_stat), INF("statm", S_IRUGO, pid_statm), - REG("maps", S_IRUGO, maps), + REG("maps", S_IRUSR, maps), #ifdef CONFIG_NUMA REG("numa_maps", S_IRUGO, numa_maps), #endif @@ -2544,7 +2544,7 @@ static const struct pid_entry tid_base_stuff[] = { REG("mounts",S_IRUGO, mounts), #ifdef CONFIG_MMU REG("clear_refs", S_IWUSR, clear_refs), - REG("smaps", S_IRUGO, smaps), + REG("smaps", S_IRUSR, smaps), #endif #ifdef CONFIG_SECURITY DIR("attr", S_IRUGO|S_IXUGO, attr_dir), -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc: advertise new restrictions on /proc/*/maps /proc/*/smaps
Now that strangers are kept out of /proc/pid/maps, let's welcome them with -EPERM instead of a blank file. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- fs/proc/base.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 7411bfb..c824b23 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2207,7 +2207,7 @@ static const struct pid_entry tgid_base_stuff[] = { INF(cmdline,S_IRUGO, pid_cmdline), INF(stat, S_IRUGO, tgid_stat), INF(statm, S_IRUGO, pid_statm), - REG(maps, S_IRUGO, maps), + REG(maps, S_IRUSR, maps), #ifdef CONFIG_NUMA REG(numa_maps, S_IRUGO, numa_maps), #endif @@ -2219,7 +2219,7 @@ static const struct pid_entry tgid_base_stuff[] = { REG(mountstats, S_IRUSR, mountstats), #ifdef CONFIG_MMU REG(clear_refs, S_IWUSR, clear_refs), - REG(smaps, S_IRUGO, smaps), + REG(smaps, S_IRUSR, smaps), #endif #ifdef CONFIG_SECURITY DIR(attr, S_IRUGO|S_IXUGO, attr_dir), @@ -2533,7 +2533,7 @@ static const struct pid_entry tid_base_stuff[] = { INF(cmdline, S_IRUGO, pid_cmdline), INF(stat, S_IRUGO, tid_stat), INF(statm, S_IRUGO, pid_statm), - REG(maps, S_IRUGO, maps), + REG(maps, S_IRUSR, maps), #ifdef CONFIG_NUMA REG(numa_maps, S_IRUGO, numa_maps), #endif @@ -2544,7 +2544,7 @@ static const struct pid_entry tid_base_stuff[] = { REG(mounts,S_IRUGO, mounts), #ifdef CONFIG_MMU REG(clear_refs, S_IWUSR, clear_refs), - REG(smaps, S_IRUGO, smaps), + REG(smaps, S_IRUSR, smaps), #endif #ifdef CONFIG_SECURITY DIR(attr, S_IRUGO|S_IXUGO, attr_dir), -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: separate objdir Makefile regression in 2.6.24-rc*
On Dec 13, 2007 2:48 PM, Andi Kleen <[EMAIL PROTECTED]> wrote: > > 2.6.24-rc5 doesn't seem to create Makefiles in empty obj dirs anymore Known problem ;-) See http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/188cbd12d7c0871b/194fbc7c94314b2c -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: separate objdir Makefile regression in 2.6.24-rc*
On Dec 13, 2007 2:48 PM, Andi Kleen [EMAIL PROTECTED] wrote: 2.6.24-rc5 doesn't seem to create Makefiles in empty obj dirs anymore Known problem ;-) See http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/188cbd12d7c0871b/194fbc7c94314b2c -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kbuild: Re-enable Makefile generation in a new O=... directory
The patch kbuild: fix building with O=.. options http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=18c32dac75b187d1a4e858f3cfdf03e844129f5e disabled the creation of a Makefile in a new O=... directory. Restore it. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- scripts/mkmakefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/scripts/mkmakefile b/scripts/mkmakefile index 9ad1bd7..e0f54b9 100644 --- a/scripts/mkmakefile +++ b/scripts/mkmakefile @@ -13,7 +13,7 @@ test ! -r $2/Makefile -o -O $2/Makefile || exit 0 # Only overwrite automatically generated Makefiles # (so we do not overwrite kernel Makefile) -if ! grep -q Automatically $2/Makefile +if test -e $2/Makefile && ! grep -q Automatically $2/Makefile then exit 0 fi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kbuild: Re-enable Makefile generation in a new O=... directory
The patch kbuild: fix building with O=.. options http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=18c32dac75b187d1a4e858f3cfdf03e844129f5e disabled the creation of a Makefile in a new O=... directory. Restore it. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- scripts/mkmakefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/scripts/mkmakefile b/scripts/mkmakefile index 9ad1bd7..e0f54b9 100644 --- a/scripts/mkmakefile +++ b/scripts/mkmakefile @@ -13,7 +13,7 @@ test ! -r $2/Makefile -o -O $2/Makefile || exit 0 # Only overwrite automatically generated Makefiles # (so we do not overwrite kernel Makefile) -if ! grep -q Automatically $2/Makefile +if test -e $2/Makefile ! grep -q Automatically $2/Makefile then exit 0 fi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Arjan van de Ven <[EMAIL PROTECTED]> wrote: > the frequency of both cores is the maximum of what linux sets each core to; Do you mean that the cpufreq code can be confused about the actual frequency of the cores? That sounds like a big problem. Thanks for any insight. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Stefano Brivio <[EMAIL PROTECTED]> wrote: > Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in > delays isn't that good when using my crappy unstable TSC (mdelay(2000) > causes delays between 2 and 2.9 seconds) but it's not depending on frequency > changes anymore. So I'd say it's fixed, but please tell me if you want me > to do any other test so as to be sure it is. Ingo, it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() based udelay), so how udelay can be affected by your proposed changes? Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
On Dec 10, 2007 9:42 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > although some claimed effect was on udelay()/mdelay() too. Any specific report? The jumping sched_clock on frequency change caused some scheduling oddities for me, but CFS attenuated the effect. Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
On Dec 10, 2007 9:42 PM, Ingo Molnar [EMAIL PROTECTED] wrote: although some claimed effect was on udelay()/mdelay() too. Any specific report? The jumping sched_clock on frequency change caused some scheduling oddities for me, but CFS attenuated the effect. Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Stefano Brivio [EMAIL PROTECTED] wrote: Sorry for disappearing. Anyway, yes, those patches fixed it. Precision in delays isn't that good when using my crappy unstable TSC (mdelay(2000) causes delays between 2 and 2.9 seconds) but it's not depending on frequency changes anymore. So I'd say it's fixed, but please tell me if you want me to do any other test so as to be sure it is. Ingo, it seems you dropped http://lkml.org/lkml/2007/12/7/100 (cpu_clock() based udelay), so how udelay can be affected by your proposed changes? Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23
Arjan van de Ven [EMAIL PROTECTED] wrote: the frequency of both cores is the maximum of what linux sets each core to; Do you mean that the cpufreq code can be confused about the actual frequency of the cores? That sounds like a big problem. Thanks for any insight. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] x86/hrtimer/acpi fixes
On Dec 9, 2007 7:01 PM, Pavel Machek <[EMAIL PROTECTED]> wrote: > > + * ns += offset to avoid sched_clock jumps with cpufreq > > + * > > * [EMAIL PROTECTED] "math is hard, lets go shopping!" > > */ > > Did john add the 'ns+=' or do comments need reorder? I added it, but I think it needs to be removed as now the offset is maintained by the scheduler in __update_rq_clock(). Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] x86/hrtimer/acpi fixes
On Dec 9, 2007 7:01 PM, Pavel Machek [EMAIL PROTECTED] wrote: + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] math is hard, lets go shopping! */ Did john add the 'ns+=' or do comments need reorder? I added it, but I think it needs to be removed as now the offset is maintained by the scheduler in __update_rq_clock(). Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 8, 2007 9:52 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > the scariest bit isnt even the scaling i think - that is a fairly > straightforward and clean PER_CPU-ization of the global scaling factor, > and its hookup with cpufreq events. (and the credit for that goes to > Guillaume Chazarain) To be fair, the cpufreq hook were already there, I just did a buggy percpu conversion and added an offset that you removed ;-) -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 8, 2007 9:52 AM, Ingo Molnar [EMAIL PROTECTED] wrote: the scariest bit isnt even the scaling i think - that is a fairly straightforward and clean PER_CPU-ization of the global scaling factor, and its hookup with cpufreq events. (and the credit for that goes to Guillaume Chazarain) To be fair, the cpufreq hook were already there, I just did a buggy percpu conversion and added an offset that you removed ;-) -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Le Fri, 7 Dec 2007 15:54:18 +0100, Ingo Molnar <[EMAIL PROTECTED]> a écrit : > This is a version that > is supposed fix all known aspects of TSC and frequency-change > weirdnesses. Tested it with frequency changes, the clock is as smooth as I like it :-) The only remaining sched_clock user in need of conversion seems to be lockdep. Great work. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] x86: scale cyc_2_nsec according to CPU frequency
Le Fri, 7 Dec 2007 14:55:25 +0100, Ingo Molnar <[EMAIL PROTECTED]> a écrit : > Firstly, we dont need the 'offset' anymore because cpu_clock() maintains > offsets itself. Yes, but a lower quality one. __update_rq_clock tries to compensate large jumping clocks with a jiffy resolution, while my offset arranges for a very smooth frequency transition. I agree with keeping a single offset, but I liked the fact that with my patch on frequency change, the clock had no jump at all. > + * ns += offset to avoid sched_clock jumps with cpufreq I guess this needs to go away if I don't make my point :-( > + printk("CPU#%d: changed cyc2ns scale from %ld to %ld\n", > + cpu, prev_scale, *scale); Pointing it out just to be sure it does not end in the final version ;-) Thanks for cleaning up my mess ;-) -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:18 PM, Guillaume Chazarain <[EMAIL PROTECTED]> wrote: > Any pointer to it? Nevermind, I found it ... in this same thread :-( -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:13 PM, Nick Piggin <[EMAIL PROTECTED]> wrote: > My patch should fix the worst cpufreq sched_clock jumping issue > I think. Any pointer to it? Thanks. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Le Fri, 7 Dec 2007 09:51:21 +0100, Ingo Molnar <[EMAIL PROTECTED]> a écrit : > yeah, we can do something like this in 2.6.25 - this will improve the > quality of sched_clock(). Thanks a lot for your interest! I'll clean it up and resend it later. As I don't have the necessary knowledge to do the tsc_{32,64}.c unification, should I copy paste common functions into tsc_32.c and tsc_64.c to ease later unification or should I start a common .c file? Thanks again for showing interest. -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
"Guillaume Chazarain" <[EMAIL PROTECTED]> wrote: > On Dec 7, 2007 6:51 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > Hmrpf. sched_clock() is used for the time stamp of the printks. We > > need to find some better solution other than killing off the tsc > > access completely. > > Something like http://lkml.org/lkml/2007/3/16/291 that would need some > refresh? And here is a refreshed one just for testing with 2.6-git. The 64 bit part is a shamelessly untested copy/paste as I cannot test it. diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..d561b2f 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -78,15 +79,32 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -unsigned long cyc2ns_scale __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; + +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (100 << CYC2NS_SCALE_FACTOR)/cpu_khz; + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; + + rdtscll(tsc_now); + params = _cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + params->offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } /* diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 9c70af4..93e7a06 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include #include +#include static int notsc __initdata = 0; @@ -18,16 +19,25 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / khz; -} + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) >> NS_SCALE; + rdtscll(tsc_now); + params = _cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params->scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz; + params->offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } unsigned long long sched_clock(void) diff --git a/include/asm-x86/timer.h b/include/asm-x86/timer.h index 0db7e99..ff4f2a3 100644 --- a/include/asm-x86/timer.h +++ b/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include #include +#include #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) => nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,44 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] "math is hard, lets go shopping!" */ -extern unsigned long cyc2ns_scale __read_mostly; + +struct cyc2ns_params { + unsigned long scale; + unsigned long long offset; +}; + +DECLARE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(struct cyc2ns_params *params, + unsigned long long cyc) { - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; + return ((cyc * params->scale) >> CYC2NS_SCALE_
Re: [patch] x86: scale cyc_2_nsec according to CPU frequency
Le Fri, 7 Dec 2007 14:55:25 +0100, Ingo Molnar [EMAIL PROTECTED] a écrit : Firstly, we dont need the 'offset' anymore because cpu_clock() maintains offsets itself. Yes, but a lower quality one. __update_rq_clock tries to compensate large jumping clocks with a jiffy resolution, while my offset arranges for a very smooth frequency transition. I agree with keeping a single offset, but I liked the fact that with my patch on frequency change, the clock had no jump at all. + * ns += offset to avoid sched_clock jumps with cpufreq I guess this needs to go away if I don't make my point :-( + printk(CPU#%d: changed cyc2ns scale from %ld to %ld\n, + cpu, prev_scale, *scale); Pointing it out just to be sure it does not end in the final version ;-) Thanks for cleaning up my mess ;-) -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:18 PM, Guillaume Chazarain [EMAIL PROTECTED] wrote: Any pointer to it? Nevermind, I found it ... in this same thread :-( -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 12:13 PM, Nick Piggin [EMAIL PROTECTED] wrote: My patch should fix the worst cpufreq sched_clock jumping issue I think. Any pointer to it? Thanks. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Guillaume Chazarain [EMAIL PROTECTED] wrote: On Dec 7, 2007 6:51 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: Hmrpf. sched_clock() is used for the time stamp of the printks. We need to find some better solution other than killing off the tsc access completely. Something like http://lkml.org/lkml/2007/3/16/291 that would need some refresh? And here is a refreshed one just for testing with 2.6-git. The 64 bit part is a shamelessly untested copy/paste as I cannot test it. diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c index 9ebc0da..d561b2f 100644 --- a/arch/x86/kernel/tsc_32.c +++ b/arch/x86/kernel/tsc_32.c @@ -5,6 +5,7 @@ #include linux/jiffies.h #include linux/init.h #include linux/dmi.h +#include linux/percpu.h #include asm/delay.h #include asm/tsc.h @@ -78,15 +79,32 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable); * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] math is hard, lets go shopping! */ -unsigned long cyc2ns_scale __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline void set_cyc2ns_scale(unsigned long cpu_khz) +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; + +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (100 CYC2NS_SCALE_FACTOR)/cpu_khz; + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; + + rdtscll(tsc_now); + params = get_cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params-scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; + params-offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } /* diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c index 9c70af4..93e7a06 100644 --- a/arch/x86/kernel/tsc_64.c +++ b/arch/x86/kernel/tsc_64.c @@ -10,6 +10,7 @@ #include asm/hpet.h #include asm/timex.h +#include asm/timer.h static int notsc __initdata = 0; @@ -18,16 +19,25 @@ EXPORT_SYMBOL(cpu_khz); unsigned int tsc_khz; EXPORT_SYMBOL(tsc_khz); -static unsigned int cyc2ns_scale __read_mostly; +DEFINE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; -static inline void set_cyc2ns_scale(unsigned long khz) +static void set_cyc2ns_scale(unsigned long cpu_khz) { - cyc2ns_scale = (NSEC_PER_MSEC NS_SCALE) / khz; -} + struct cyc2ns_params *params; + unsigned long flags; + unsigned long long tsc_now, ns_now; -static unsigned long long cycles_2_ns(unsigned long long cyc) -{ - return (cyc * cyc2ns_scale) NS_SCALE; + rdtscll(tsc_now); + params = get_cpu_var(cyc2ns); + + local_irq_save(flags); + ns_now = __cycles_2_ns(params, tsc_now); + + params-scale = (NSEC_PER_MSEC CYC2NS_SCALE_FACTOR)/cpu_khz; + params-offset += ns_now - __cycles_2_ns(params, tsc_now); + local_irq_restore(flags); + + put_cpu_var(cyc2ns); } unsigned long long sched_clock(void) diff --git a/include/asm-x86/timer.h b/include/asm-x86/timer.h index 0db7e99..ff4f2a3 100644 --- a/include/asm-x86/timer.h +++ b/include/asm-x86/timer.h @@ -2,6 +2,7 @@ #define _ASMi386_TIMER_H #include linux/init.h #include linux/pm.h +#include linux/percpu.h #define TICK_SIZE (tick_nsec / 1000) @@ -16,7 +17,7 @@ extern int recalibrate_cpu_khz(void); #define calculate_cpu_khz() native_calculate_cpu_khz() #endif -/* Accellerators for sched_clock() +/* Accelerators for sched_clock() * convert from cycles(64bits) = nanoseconds (64bits) * basic equation: * ns = cycles / (freq / ns_per_sec) @@ -31,20 +32,44 @@ extern int recalibrate_cpu_khz(void); * And since SC is a constant power of two, we can convert the div * into a shift. * - * We can use khz divisor instead of mhz to keep a better percision, since + * We can use khz divisor instead of mhz to keep a better precision, since * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. * ([EMAIL PROTECTED]) * + * ns += offset to avoid sched_clock jumps with cpufreq + * * [EMAIL PROTECTED] math is hard, lets go shopping! */ -extern unsigned long cyc2ns_scale __read_mostly; + +struct cyc2ns_params { + unsigned long scale; + unsigned long long offset; +}; + +DECLARE_PER_CPU(struct cyc2ns_params, cyc2ns) __read_mostly; #define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ -static inline unsigned long long cycles_2_ns(unsigned long long cyc) +static inline unsigned long long __cycles_2_ns(struct cyc2ns_params *params, + unsigned long long cyc) { - return (cyc * cyc2ns_scale) CYC2NS_SCALE_FACTOR; + return ((cyc * params-scale) CYC2NS_SCALE_FACTOR) + params-offset
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Le Fri, 7 Dec 2007 09:51:21 +0100, Ingo Molnar [EMAIL PROTECTED] a écrit : yeah, we can do something like this in 2.6.25 - this will improve the quality of sched_clock(). Thanks a lot for your interest! I'll clean it up and resend it later. As I don't have the necessary knowledge to do the tsc_{32,64}.c unification, should I copy paste common functions into tsc_32.c and tsc_64.c to ease later unification or should I start a common .c file? Thanks again for showing interest. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
Le Fri, 7 Dec 2007 15:54:18 +0100, Ingo Molnar [EMAIL PROTECTED] a écrit : This is a version that is supposed fix all known aspects of TSC and frequency-change weirdnesses. Tested it with frequency changes, the clock is as smooth as I like it :-) The only remaining sched_clock user in need of conversion seems to be lockdep. Great work. -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 6:51 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > Hmrpf. sched_clock() is used for the time stamp of the printks. We > need to find some better solution other than killing off the tsc > access completely. Something like http://lkml.org/lkml/2007/3/16/291 that would need some refresh? -- Guillaume -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scheduler: fix x86 regression in native_sched_clock
On Dec 7, 2007 6:51 AM, Thomas Gleixner [EMAIL PROTECTED] wrote: Hmrpf. sched_clock() is used for the time stamp of the printks. We need to find some better solution other than killing off the tsc access completely. Something like http://lkml.org/lkml/2007/3/16/291 that would need some refresh? -- Guillaume -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3: find complains about /proc/net
On 11/21/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > i guess it was a v2.6.24 change, hence a regression that needs to be > fixed? It seems to be http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410 So, linux 2.6.0-test6 -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: Fix the threaded /proc/self.
Hello Eric, This fills a need I had to get the current TID in a Java program, so I'm very interested in this change. OTOH, how will someone not reading LKML discover that the current TID is now in /proc/self and that it was not always the case? I would put my 2 cents in /proc/self/task/self, this way TGID are always in /proc and TID in /proc/TGID/task. -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc: Fix the threaded /proc/self.
Hello Eric, This fills a need I had to get the current TID in a Java program, so I'm very interested in this change. OTOH, how will someone not reading LKML discover that the current TID is now in /proc/self and that it was not always the case? I would put my 2 cents in /proc/self/task/self, this way TGID are always in /proc and TID in /proc/TGID/task. -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc3: find complains about /proc/net
On 11/21/07, Ingo Molnar [EMAIL PROTECTED] wrote: i guess it was a v2.6.24 change, hence a regression that needs to be fixed? It seems to be http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=01660410 So, linux 2.6.0-test6 -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
On 11/11/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote: > > So it's not strictly an > > output directory, more a build directory. > The opposite > All output is placed there - including the configuration generated by > the *config frontends. I meant, it's not strictly an output directory as if I do make O=dir oldconfig it will _read_ dir/.config, so the O= directory is also used for input. And yes, I was splitting hairs ;-) Sorry for the confusion. -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
On 11/11/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: > Another important point is that users that know about and see CONFIG_* > variables are kernel hackers, not the normal kconfig users. But kconfig is mainly for kernel hackers, otherwise it would be called CML2 ;-) > > Also, when working on a specific feature of the kernel, I tend to > > install both a kernel with the CONFIG_ option set and one with > > the option unset. Scripts to do that can twiddle the .config file, > > but it would be more convenient if kbuild could avoid that. > > I'm wondering why you don't use two different O= output directories > instead? > > Depending on the CONFIG_ option in question this might even greatly > reduce your compile times. /me is filled with wonder at the discovery that .config is saved in the O= directory. Thanks a lot Adrian for this time saver. So it's not strictly an output directory, more a build directory. I still think "make oldconfig CONFIG_FOO=bar" is useful for the occasional config change, but thanks again for this great tip. -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
Hi Adrian, On 11/11/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: > What exactly are the use cases where someone would need this? Glad you asked. Today, when I want to recompile a kernel while changing a CONFIG_ option, I manually edit the .config, remove the appropriate line and then run make oldconfig. I'd like to be able to do: make oldconfig CONFIG_FOO=bar. Also, when working on a specific feature of the kernel, I tend to install both a kernel with the CONFIG_ option set and one with the option unset. Scripts to do that can twiddle the .config file, but it would be more convenient if kbuild could avoid that. As you see, I'm more interested in make oldconfig than make all*config. Cheers. -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
Hi Adrian, On 11/11/07, Adrian Bunk [EMAIL PROTECTED] wrote: What exactly are the use cases where someone would need this? Glad you asked. Today, when I want to recompile a kernel while changing a CONFIG_ option, I manually edit the .config, remove the appropriate line and then run make oldconfig. I'd like to be able to do: make oldconfig CONFIG_FOO=bar. Also, when working on a specific feature of the kernel, I tend to install both a kernel with the CONFIG_ option set and one with the option unset. Scripts to do that can twiddle the .config file, but it would be more convenient if kbuild could avoid that. As you see, I'm more interested in make oldconfig than make all*config. Cheers. -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
On 11/11/07, Adrian Bunk [EMAIL PROTECTED] wrote: Another important point is that users that know about and see CONFIG_* variables are kernel hackers, not the normal kconfig users. But kconfig is mainly for kernel hackers, otherwise it would be called CML2 ;-) Also, when working on a specific feature of the kernel, I tend to install both a kernel with the CONFIG_ option set and one with the option unset. Scripts to do that can twiddle the .config file, but it would be more convenient if kbuild could avoid that. I'm wondering why you don't use two different O= output directories instead? Depending on the CONFIG_ option in question this might even greatly reduce your compile times. /me is filled with wonder at the discovery that .config is saved in the O= directory. Thanks a lot Adrian for this time saver. So it's not strictly an output directory, more a build directory. I still think make oldconfig CONFIG_FOO=bar is useful for the occasional config change, but thanks again for this great tip. -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
On 11/11/07, Sam Ravnborg [EMAIL PROTECTED] wrote: So it's not strictly an output directory, more a build directory. The opposite All output is placed there - including the configuration generated by the *config frontends. I meant, it's not strictly an output directory as if I do make O=dir oldconfig it will _read_ dir/.config, so the O= directory is also used for input. And yes, I was splitting hairs ;-) Sorry for the confusion. -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
Hi, On 11/10/07, Sam Ravnborg <[EMAIL PROTECTED]> wrote: > The variable K64BIT can now be used to select the > value of CONFIG_64BIT. Why not calling the environment variable CONFIG_64BIT, in preparation of the day when all CONFIG_ variables can be passed by environment variables? -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kconfig: use $K64BIT to set 64BIT with all*config targets
Hi, On 11/10/07, Sam Ravnborg [EMAIL PROTECTED] wrote: The variable K64BIT can now be used to select the value of CONFIG_64BIT. Why not calling the environment variable CONFIG_64BIT, in preparation of the day when all CONFIG_ variables can be passed by environment variables? -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] replace "make ARCH=i386/x86_64 with make ARCH=x86"
On 11/6/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote: > The issue with "make allyesconfig" concerns me, although the same > situation already exists with any multiple-choice configuration. What I > guess we really want is to be able to specify a few specific choices. I don't know enough about Kbuild to know if it's possible or not, but I would find it great if the *config targets could take CONFIG_ variables on the command line, like: make oldconfig CONFIG_SMP=y If it's not possible, why not inherit the CONFIG_ options from environment variables, like we already do for $CFLAGS, but only at make *config time in this case? -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] replace make ARCH=i386/x86_64 with make ARCH=x86
On 11/6/07, H. Peter Anvin [EMAIL PROTECTED] wrote: The issue with make allyesconfig concerns me, although the same situation already exists with any multiple-choice configuration. What I guess we really want is to be able to specify a few specific choices. I don't know enough about Kbuild to know if it's possible or not, but I would find it great if the *config targets could take CONFIG_ variables on the command line, like: make oldconfig CONFIG_SMP=y If it's not possible, why not inherit the CONFIG_ options from environment variables, like we already do for $CFLAGS, but only at make *config time in this case? -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix delay accounting regression
On 11/2/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > What user-space tools are utilizing delay-accounting by the way? Thanks for the plugging opportunity ;-) http://guichaz.free.fr/misc/#iotop uses the I/O side of delay-accounting. -- Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix delay accounting regression
On 11/2/07, Ingo Molnar [EMAIL PROTECTED] wrote: What user-space tools are utilizing delay-accounting by the way? Thanks for the plugging opportunity ;-) http://guichaz.free.fr/misc/#iotop uses the I/O side of delay-accounting. -- Guillaume - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched: CONFIG_FAIR_USER_SCHED: auto adjust users weights
CONFIG_FAIR_USER_SCHED is great and I'm happy to see it is enabled by default but it suffers from some limitations IMHO at this time: - on a single user system, it's useful to have root processes be given twice as CPU as user processes but I don't want nice 19 cron jobs like updatedb or rpmq to have twice as cpu as my nice -20 tasks. - on a multi user system, a user should be able to give back its cpu share to other users. This is not possible for now with CONFIG_FAIR_USER_SCHED. This implies that returning EPERM on nice(<0) becomes worthless, as it is equivalent to nice(>0) for every other process of the user, ignoring the limits of the nice range. To address these problems, this patch changes the weight of the cfs_rq of each user to the maximum weight of the processes on this cfs_rq, scaled with /sys/kernel/uids/UID/cpu_share. It's possible that more elaborate mathematics than taking the max are needed, but basic testing showed the expected fairness. Signed-off-by: Guillaume Chazarain <[EMAIL PROTECTED]> --- include/linux/sched.h |4 ++ kernel/sched.c| 50 +++ kernel/sched_fair.c | 108 + 3 files changed, 154 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 155d743..d6d2db9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -908,6 +908,10 @@ struct sched_entity { /* rq "owned" by this entity/group: */ struct cfs_rq *my_q; #endif +#ifdef CONFIG_FAIR_USER_SCHED + /* used to track the max load.weight */ + struct rb_node max_load; +#endif }; struct task_struct { diff --git a/kernel/sched.c b/kernel/sched.c index 3f6bd11..df8114b 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -260,6 +260,10 @@ struct cfs_rq { struct list_head leaf_cfs_rq_list; /* Better name : task_cfs_rq_list? */ struct task_group *tg;/* group that "owns" this runqueue */ #endif +#ifdef CONFIG_FAIR_USER_SCHED + /* used to track the sched_entity with the max load in this cfs_rq */ + struct rb_root max_load_se; +#endif }; /* Real-Time classes' related field in a runqueue: */ @@ -7094,14 +7098,12 @@ done: task_rq_unlock(rq, ); } +/* cfs_rq->rq->lock must be taken */ static void set_se_shares(struct sched_entity *se, unsigned long shares) { struct cfs_rq *cfs_rq = se->cfs_rq; - struct rq *rq = cfs_rq->rq; int on_rq; - spin_lock_irq(>lock); - on_rq = se->on_rq; if (on_rq) dequeue_entity(cfs_rq, se, 0); @@ -7111,22 +7113,54 @@ static void set_se_shares(struct sched_entity *se, unsigned long shares) if (on_rq) enqueue_entity(cfs_rq, se, 0); +} - spin_unlock_irq(>lock); +#ifdef CONFIG_FAIR_USER_SCHED +static void update_group_share(struct task_group *tg, int cpu) +{ + struct rb_node *max_load_node = rb_last(>cfs_rq[cpu]->max_load_se); + struct sched_entity *max_load_entry; + unsigned long shares; + + if (!max_load_node) + /* empty cfs_rq */ + return; + + max_load_entry = rb_entry(max_load_node, struct sched_entity, max_load); + shares = scale_tg_weight(tg, max_load_entry->load.weight); + set_se_shares(tg->se[cpu], shares); +} +#else +static void update_group_share(struct task_group *tg, int cpu) +{ + set_se_shares(tg->se[cpu], tg->shares); } +#endif int sched_group_set_shares(struct task_group *tg, unsigned long shares) { - int i; + int cpu; + unsigned long flags; + + if (shares <= 1) + return -EINVAL; + +#ifdef CONFIG_FAIR_USER_SCHED + if ((shares * prio_to_weight[0]) / prio_to_weight[0] != shares) + /* The provided value would overflow in scale_tg_weight() */ + return -EINVAL; +#endif spin_lock(>lock); if (tg->shares == shares) goto done; tg->shares = shares; - for_each_possible_cpu(i) - set_se_shares(tg->se[i], shares); - + for_each_possible_cpu(cpu) { + spin_lock_irqsave(>cfs_rq[cpu]->rq->lock, flags); + update_group_share(tg, cpu); + spin_unlock_irqrestore(>cfs_rq[cpu]->rq->lock, flags); + } done: spin_unlock(>lock); return 0; diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 01859f6..70ed34e 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -135,6 +135,112 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) return se->vruntime - cfs_rq->min_vruntime; } +#ifdef CONFIG_FAIR_USER_SCHED +static void set_se_shares(struct sched_entity *se, unsigned long shares); + +static unsigned long scale_tg_weight(struct task_group *tg, unsign
[PATCH] sched: CONFIG_FAIR_USER_SCHED: auto adjust users weights
CONFIG_FAIR_USER_SCHED is great and I'm happy to see it is enabled by default but it suffers from some limitations IMHO at this time: - on a single user system, it's useful to have root processes be given twice as CPU as user processes but I don't want nice 19 cron jobs like updatedb or rpmq to have twice as cpu as my nice -20 tasks. - on a multi user system, a user should be able to give back its cpu share to other users. This is not possible for now with CONFIG_FAIR_USER_SCHED. This implies that returning EPERM on nice(0) becomes worthless, as it is equivalent to nice(0) for every other process of the user, ignoring the limits of the nice range. To address these problems, this patch changes the weight of the cfs_rq of each user to the maximum weight of the processes on this cfs_rq, scaled with /sys/kernel/uids/UID/cpu_share. It's possible that more elaborate mathematics than taking the max are needed, but basic testing showed the expected fairness. Signed-off-by: Guillaume Chazarain [EMAIL PROTECTED] --- include/linux/sched.h |4 ++ kernel/sched.c| 50 +++ kernel/sched_fair.c | 108 + 3 files changed, 154 insertions(+), 8 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 155d743..d6d2db9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -908,6 +908,10 @@ struct sched_entity { /* rq owned by this entity/group: */ struct cfs_rq *my_q; #endif +#ifdef CONFIG_FAIR_USER_SCHED + /* used to track the max load.weight */ + struct rb_node max_load; +#endif }; struct task_struct { diff --git a/kernel/sched.c b/kernel/sched.c index 3f6bd11..df8114b 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -260,6 +260,10 @@ struct cfs_rq { struct list_head leaf_cfs_rq_list; /* Better name : task_cfs_rq_list? */ struct task_group *tg;/* group that owns this runqueue */ #endif +#ifdef CONFIG_FAIR_USER_SCHED + /* used to track the sched_entity with the max load in this cfs_rq */ + struct rb_root max_load_se; +#endif }; /* Real-Time classes' related field in a runqueue: */ @@ -7094,14 +7098,12 @@ done: task_rq_unlock(rq, flags); } +/* cfs_rq-rq-lock must be taken */ static void set_se_shares(struct sched_entity *se, unsigned long shares) { struct cfs_rq *cfs_rq = se-cfs_rq; - struct rq *rq = cfs_rq-rq; int on_rq; - spin_lock_irq(rq-lock); - on_rq = se-on_rq; if (on_rq) dequeue_entity(cfs_rq, se, 0); @@ -7111,22 +7113,54 @@ static void set_se_shares(struct sched_entity *se, unsigned long shares) if (on_rq) enqueue_entity(cfs_rq, se, 0); +} - spin_unlock_irq(rq-lock); +#ifdef CONFIG_FAIR_USER_SCHED +static void update_group_share(struct task_group *tg, int cpu) +{ + struct rb_node *max_load_node = rb_last(tg-cfs_rq[cpu]-max_load_se); + struct sched_entity *max_load_entry; + unsigned long shares; + + if (!max_load_node) + /* empty cfs_rq */ + return; + + max_load_entry = rb_entry(max_load_node, struct sched_entity, max_load); + shares = scale_tg_weight(tg, max_load_entry-load.weight); + set_se_shares(tg-se[cpu], shares); +} +#else +static void update_group_share(struct task_group *tg, int cpu) +{ + set_se_shares(tg-se[cpu], tg-shares); } +#endif int sched_group_set_shares(struct task_group *tg, unsigned long shares) { - int i; + int cpu; + unsigned long flags; + + if (shares = 1) + return -EINVAL; + +#ifdef CONFIG_FAIR_USER_SCHED + if ((shares * prio_to_weight[0]) / prio_to_weight[0] != shares) + /* The provided value would overflow in scale_tg_weight() */ + return -EINVAL; +#endif spin_lock(tg-lock); if (tg-shares == shares) goto done; tg-shares = shares; - for_each_possible_cpu(i) - set_se_shares(tg-se[i], shares); - + for_each_possible_cpu(cpu) { + spin_lock_irqsave(tg-cfs_rq[cpu]-rq-lock, flags); + update_group_share(tg, cpu); + spin_unlock_irqrestore(tg-cfs_rq[cpu]-rq-lock, flags); + } done: spin_unlock(tg-lock); return 0; diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 01859f6..70ed34e 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -135,6 +135,112 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se) return se-vruntime - cfs_rq-min_vruntime; } +#ifdef CONFIG_FAIR_USER_SCHED +static void set_se_shares(struct sched_entity *se, unsigned long shares); + +static unsigned long scale_tg_weight(struct task_group *tg, unsigned long weight) +{ + unsigned long scaled_weight = (weight * tg-shares) / NICE_0_LOAD; + return max(scaled_weight, 2UL