Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: > Jan Kiszka wrote: >> George Anzinger wrote: >>> On 01/30/2008 04:08 PM, Jan Kiszka was caught saying: >>>> [Here comes a rebased version against latest x86/mm] >>>> >>>> In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up >>>> and connect to the front-end already during early_param evaluation. >>>> This >>>> fails on x86 as the exception stack is not yet initialized, effectively >>>> delaying kgdbwait until late-init. >>> >>> I wonder how much work it would take to just set up the exception >>> stack and proceed. After all the kgbdwait is there to help debug >>> very early kernel code... >> >> In principle a valid question, but I'm not the one to answer it. I >> would not feel very well if I had to reorder this critical setup code. >> Look, we would have to move trap_init in start_kernel before >> parse_early_param, and that would affect _every_ arch... I can not speak to other archs, but for x86 I called trap_init from the code that caught the kgdbwait. At that time (since I retired, I have not looked at the actual kernel code) it could be called again later by the kernel code. I.e. I did not try to reorder the kernel bring up code, but just added an additional call to trap_init and then only in the case of finding a kgdbwait. As such, this would need to be arch specific... >> > > BTW, do you know if EXCEPTION_STACK_READY fails for other archs in > parse_early_param as well? It should, because my under standing of > trap_init is that it's the functions to arm things like... exception > handlers? And that raises the question of the deeper purpose of this > check (and the invocation of kgdb_early_init from the argument parsing > function). Sigh, KGDB is still a quite improvable piece of code. Likely. Once you get it in the main line kernel, one would hope that other arch code would be forth coming as many more "eyes" will be in play. > > Jan > > PS: Can we move this to some public list? Sure, sorry I picked the wrong reply button, never intended it to be private. > -- George Anzinger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit : Serge Noiraud wrote: ... I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? Where did you get the kgdb you are using? It looks like kgdb_ts is in this version, but it it not in the one on my website http://source.mvista.com/~ganzinger/ This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) I don't get those errors with CONFIG_KGDB=n bellow I put the diff between a working . config and a non working .config George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: ... If I redo the make command only ( not make rpm ) I obtain the following : # make CHK include/linux/version.h make[1]: « arch/i386/kernel/asm-offsets.s » est à jour. CHK include/linux/compile.h CHK usr/initramfs_list Kernel: arch/i386/boot/bzImage is ready (#1) Building modules, stage 2. MODPOST *** Warning: "preempt_locks" [net/sunrpc/sunrpc.ko] undefined! *** Warning: "preempt_locks" [net/appletalk/appletalk.ko] undefined! *** Warning: "preempt_locks" [fs/reiserfs/reiserfs.ko] undefined! *** Warning: "preempt_locks" [fs/ntfs/ntfs.ko] undefined! *** Warning: "preempt_locks" [fs/nfs/nfs.ko] undefined! *** Warning: "preempt_locks" [fs/minix/minix.ko] undefined! *** Warning: "preempt_locks" [fs/jbd/jbd.ko] undefined! *** Warning: "preempt_locks" [fs/ext3/ext3.ko] undefined! *** Warning: "preempt_locks" [fs/cifs/cifs.ko] undefined! *** Warning: "preempt_locks" [fs/affs/affs.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/libata.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/ide-scsi.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/gdth.ko] undefined! *** Warning: "preempt_locks" [drivers/md/raid6.ko] undefined! *** Warning: "preempt_locks" [drivers/md/raid5.ko] undefined! *** Warning: "preempt_locks" [drivers/ide/ide-floppy.ko] undefined! *** Warning: "preempt_locks" [drivers/block/pktcdvd.ko] undefined! *** Warning: "preempt_locks" [drivers/block/loop.ko] undefined! preempt_locks is being accessed from a module but is not exported. This is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it should build. # ~ -# CONFIG_EARLY_PRINTK is not set -# CONFIG_DEBUG_STACKOVERFLOW is not set +CONFIG_LATENCY_TRACE=y +CONFIG_RT_DEADLOCK_DETECT=y +CONFIG_DEBUG_RT_LOCKING_MODE=y <- This one is doing it +CONFIG_DEBUG_KOBJECT=y +CONFIG_DEBUG_HIGHMEM=y ~ +CONFIG_KGDB=y +CONFIG_KGDB_9600BAUD=y +# CONFIG_KGDB_19200BAUD is not set +# CONFIG_KGDB_38400BAUD is not set +# CONFIG_KGDB_57600BAUD is not set +# CONFIG_KGDB_115200BAUD is not set +CONFIG_KGDB_PORT=0x3f8 +CONFIG_KGDB_IRQ=4 +CONFIG_KGDB_MORE=y +CONFIG_KGDB_OPTIONS="-O1" +CONFIG_NO_KGDB_CPUS=8 The following are not in the latest kgdb... +CONFIG_KGDB_TS=y +# CONFIG_KGDB_TS_64 is not set +CONFIG_KGDB_TS_128=y +# CONFIG_KGDB_TS_256 is not set +# CONFIG_KGDB_TS_512 is not set +# CONFIG_KGDB_TS_1024 is not set . +CONFIG_STACK_OVERFLOW_TEST=y +CONFIG_TRAP_BAD_SYSCALL_EXITS=y <--- I recommend against this one, see notes at front of kgdb patch +CONFIG_KGDB_CONSOLE=y<--- Likewise use this only if you have only one serial port and no VGA +CONFIG_KGDB_SYSRQ=y # - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. Hi, everybody I found two bugs in kgdb-ga-rt patch. The first one : if CONFIG_SMP is not set, we have a compile error The second one : if CONFIG_KGDB is not set, we have a link error I send you a diff patch to correct this. I am not sure the last patch is correct, but it works. The reported bugs are now rolled into the kgdb patch. Also, there is a new README.txt. I also included, in the kgdb patch, an updated gdb macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu macro to: given a per_cpu structure name and the cpu number returns the address of that structure, properly typed. I am also putting my current version of time_stamp_tool. This is the replacement for kgdb_ts() which I have removed from the kgdb patch. Still a little rough but it has promise of being arch independent. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko needs unknown symbol preempt_locks make[3]: *** [_modinst_post] Erreur 1 erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) Erreur de construction de RPM: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) make[2]: *** [rpm] Erreur 1 make[1]: *** [rpm] Erreur 2 make: *** [rpm] Erreur 2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values
john stultz wrote: All, I recently ran into a bug with an older kernel where xtime's tv_nsec field had accumulated more then 2 seconds worth of time. The timespec's tv_nsec is a signed long, however gettimeofday() treats it as an unsigned long. Thus when the failure occured, very strange and difficult to debug time problems occurred. The main cause of the problem I was seeing is already fixed in mainline, however just to be safe, I figured the following patch would be wise. I only audited i386 and x86_64, however other arches probably could have similar signed problems as well. Please let me know if you have any further comments or feedback. John, There is a problem in the way this code handles the conversion to usec. There is a conversion here and also in the get_offset code. If the nanoseconds are carrier until after the addition of the two about 25% of the time you will end up with an additional usec in time. I strongly suggest changing to convert to usec after the addition of xtime and get_offset time to avoid this. If the "correct" thing is done in clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns up as a back step in time WRT gettimeofday and clock_gettime(). George -- thanks -john linux-2.6.13_signed-tv_nsec_A0.patch diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c --- a/arch/i386/kernel/time.c +++ b/arch/i386/kernel/time.c @@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv) usec += lost * (USEC_PER_SEC / HZ); sec = xtime.tv_sec; - usec += (xtime.tv_nsec / 1000); + usec += (unsigned long)xtime.tv_nsec / 1000; } while (read_seqretry(&xtime_lock, seq)); while (usec >= 100) { diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv) seq = read_seqbegin(&xtime_lock); sec = xtime.tv_sec; - usec = xtime.tv_nsec / 1000; + usec = (unsigned long)xtime.tv_nsec / 1000; /* i386 does some correction here to keep the clock monotonous even when ntpd is fixing drift. diff --git a/kernel/timer.c b/kernel/timer.c --- a/kernel/timer.c +++ b/kernel/timer.c @@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo do { ticks--; update_wall_time_one_tick(); - if (xtime.tv_nsec >= 10) { + if ((unsigned long)xtime.tv_nsec >= 10) { xtime.tv_nsec -= 10; xtime.tv_sec++; second_overflow(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()
Tom Rini wrote: On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote: Tom Rini wrote: CC: Andi Kleen <[EMAIL PROTECTED]> This adds a call to notify_die() in the "no context" portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than "no context". This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like "bad kernel address" or "illegal kernel address". "no context" is the label we're in, in the code. What it's actually used for is "hey, we (== kgdb) tried to read/write a very very bogus addr, time to longjmp". If it's not true that kgdb is at fault then we drop to the debugger anyhow, and the user can see where they came from. No. What the user sees is the offending code (i.e. prior to the trap to page_fault), NOT how kgdb happend to be called. The "no_context" is IN the _context_ of page_fault, but that is lost by the time you get to kgdb and ask to see _why_ (via, hint, hint: "p kgdb_info"). -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()
Tom Rini wrote: CC: Andi Kleen <[EMAIL PROTECTED]> This adds a call to notify_die() in the "no context" portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than "no context". This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like "bad kernel address" or "illegal kernel address". -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: when or where can the case occur in "linux kernel development " about "kernel preemption"?
linux-os (Dick Johnson) wrote: On Sat, 27 Aug 2005, Sat. wrote: 2005/8/27, Christopher Friesen <[EMAIL PROTECTED]>: Sat. wrote: the case about kernel preemption as follow : the book said "when a process that has a higher priority than the currenty running process is awakened ". but I can think about when such case can occur , could you give me an example ? There may be others, but one common case is when a hardware interrupt causes the higher priority process to become runnable. Some examples of this would be a network packet arriving, or the expiry of a hardware timer. Chris unfortunately, I cannot agree with you , normally ,when the kernel runs in interrupt context , the schedule() should not be invoked --my views . then,could anyone give me a definite example about network like above or anything else to eluminate this , ok? thanks ! -- Sat. Schedule is never executed from an interrupt, BUT, there may be kernel threads or even user tasks that are sleeping, waiting to be awakened when some preliminary interrupt processing has occurred. The interrupt code may execute one of the wake-up calls which will cause the target to be put into the run queue as soon as possible. Actually, this is not completly true. The kernel sets a flag while handling interrupts that says it is within an interrupt. This flag is cleared on the way out of the interrupt but prior to the return from interrupt (rfi) instruction. Between this flag clearing and the rfi, there is a check made to see if the kernel is preemptable and, if so, if it is desired (i.e. something more important should run NOW). If both of these are true, schedule is called to do the context switch. So, schedule IS called from within the interrupt, but NOT within the area the kernel flags as being in an interrupt which is a subset of the actual interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: George Anzinger [mailto:[EMAIL PROTECTED] wrote: Well, I checked, it is "int $3". Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? Below is the console from the session it is interesting that gdb is not able to access the memory. I let it continue and then ctrl-c broke it later in the boot cycle and tried disass again with the same result. Feel free to flog me if this is stupid but I have just one EM64T machine (test) and I'm using a regular P4 machine as dev. I build the test kernel on the EM64T machine and then copy the updated sources, object files, and images via NFS to the dev machine. I believe I read in the kgdb doc that it was possible to use to different architecture machines for test and dev although there wasn't any information about how to do it. This is probably the source of the OS/ABI warning. I can probably get the mothership to send me another EM64T machine if need be. What you need is a cross development environment. Not having that, your gdb is likely not aware of how to talk to the hardware you are using. The cross develoment should cost a whole lot less than another machine. George -- vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i586-suse-linux"... warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration of GDB. Attempting to continue with the default i386:x86-64 settings. Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) target remote /dev/ttyS0 Remote debugging using /dev/ttyS0 0x80503b50 in ?? () warning: no shared library support for this OS / ABI (gdb) disass gdb_interrupt Dump of assembler code for function gdb_interrupt: 0x80247009 : Cannot access memory at address 0x80247009 (gdb) c Continuing. Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb) Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14 MDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e6000 - 0010 (reserved) BIOS-e820: 0010 - 3fe2f800 (usable) BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS) BIOS-e820: 3ff1 - 3ff3 (reserved) BIOS-e820: 3ff3 - 3ff4 (ACPI data) BIOS-e820: 3ff4 - 3fff (ACPI NVS) BIOS-e820: 3fff - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fed13000 - fed1a000 (reserved) BIOS-e820: fed1c000 - feda (reserved) ACPI: PM-Timer IO Port: 0x408 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
George Anzinger wrote: Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with "target remote /dev/ttyS0" and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, "unable to open an initial console", followed by, "Kernel panic - not syncing: Attempted to kill init!" Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e "\003" > /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. Well, I checked, it is "int $3". Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Christoph Lameter wrote: On Fri, 26 Aug 2005, Alex Williamson wrote: Would we ever want to favor a frequency shifting timer over anything else in the system? If it was noticeable perhaps we'd just need a callback to re-evaluate the frequency and rescan for the best timer. If it happens without notice, a flag that statically assigns it the lowest priority will due. Or maybe if the driver factored the frequency shifting into the drift it would make the timer undesirable without resorting to flags. Thanks, Timers are usually constant. AFAIK Frequency shifts only occur through power management. In that case we usually have some notifiers running before the change. These notifiers need to switch to a different time source if the timer frequency will be shifting or the timer will become unavailable. If there is a notifier, I presume we can track it. We might want to refine things so as to not hit too big a bump when the shift occures, but I think it is doable. The desirability of doing it, I think, depends on the availablity of something better. The access time of the TSC is "really" enticing. Even so, I think a _good_ clock would not depend on long term accuracy of something as fast as the TSC. Vendors are even modulating these to reduce RFI, but still, because of its speed, it makes the best interpolator for the jiffie to jiffie times. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with "target remote /dev/ttyS0" and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, "unable to open an initial console", followed by, "Kernel panic - not syncing: Attempted to kill init!" Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e "\003" > /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Alex Williamson wrote: On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote: I think a priority is something useful for the interpolators. Some of the decisions about which time sources to use also have criteria different from drift/latency/jitter/cpu. F.e. timers may not survive various power-saving configurations. Thus I would think that we need a priority plus some flags. Some of the criteria for choosing a time source may be: Hi Christoph, I sent another followup to this thread with a patch containing a fairly crude algorithm that I think better explains my starting point. I'm sure the weighting and scaling factors need work, but I think many of the criteria you describe will favor the right clock. 1. If a system boots up with a single cpu then there is no question that the ITC/TSC should be used because of the fast access. We need to factor in frequency shifting here, especially if it happens with out notice. ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
John McCutchan wrote: On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote: Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? You are reading the log correctly. There are two bugs. One is that if we pass X to idr_get_new_above, it can return X again (doesn't ever seem to return < X). The other problem is that the find fails on 1024 (and 2048 if we skip 1024). That IS strange. 1024 is on a "level" boundry, but then next level is 2**15, not 2**11. I will take a look. So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. Yes that is the exact scenario. Only 2 id's are used at any given time, and once we hit 1024 things break. This doesn't happen when the tree is not empty. Thanks for looking at this! -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 4 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 4 dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 4 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 Note the incrementing wd value even though we are removing them as we go.. What kernel are you running? The wd's should ALWAYS be incrementing, you should never get the same wd as you did before. From your log, you are getting the same wd (after you inotify_rm_watch it). I can reproduce this bug on 2.6.13-rc7. idr_get_new_above isn't returning something above. Also, the idr layer seems to be breaking when we pass in 1024. I can reproduce that on my 2.6.13-rc7 system as well. This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel. Robert, John, what do you think? Is this possibly related to the oops seen in the log that I reported earlier? (Which is still showing up 2-3 times per day, btw) There is definitely something broken here. Jim, George- We are seeing a problem in the idr layer. If we do idr_find(1024) when, say, a low valued idr, like, zero, is unallocated, NULL is returned. I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NTP ntp-helper functions
john stultz wrote: Andrew, All, This patch cleans up a commonly repeated set of changes to the NTP state variables by adding two helper inline functions: ntp_clear(): Clears the ntp state variables How many places is this called in any given arch? I ask because it _may_ save space if it is NOT inlined. I don't think it is ever in a critical code path... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote: john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Yes, clearly a forward knowledge of the NTP adjustment is necessary for gettimeofday(), because after the NTP adjustment has been accumulated into xtime, there's nothing left for gettimeofday to adjust (its already been applied). :) Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so it would be added to the nanosecond offset from the last tick (which is how the current code works). If you are doing scaling (as you have in the equation above), then the problem goes away, since you can apply the adjustment consistently through any interval. Until the end of the correction time... I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a "* ntp_adj2" is needed. Ok, so your forcing gettimeofday to be interval aware, so its applying different fixed NTP adjustments to different chunks of the current interval. The issue of course is if you're using fixed adjustments, is that you have to have n ntp adjustments for n intervals, or you have to apply the same ntp adjustment to multiple intervals. Uh, are you saying that one ntpd call can set up several different adjustments? I was assuming that any given call would set up either a fixed adjustment for ever or a fixed adjustment to be applied for a fixed number of ticks (or until so much correcting was done, which, in the end is the same thing at this point in the code). If ntpd has to come back to change the adjustment, I am assuming that some kernel action can be taken at that time to sync the xtime clock and the gettimeofday reading of it. I.e. we would only have to keep track of one adjustment with a possible pre specified end time. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. If you argue that we only need two adjustments, why not argue for only one? You're saying have one adjustment that you apply for the first tick's worth of time, and a second adjustment that you apply for the following N ticks' worth of time in the interval. Why the odd base case? Correct me if I am wrong here, but I am assuming that ntpd can ask for an adjustment of X amou
Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel
john stultz wrote: On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote: CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. Err, the Cyclone does not generate interrupts. So this issue does not affect those systems. As for the HPET, it sets its own interrupt frequency based off of KERNEL_TICK_USEC (which you're right, isn't quite what is used in the jiffies conversions). Would it be easier to just adjust that value to use ACTHZ or CLOCK_TICK_RATE? If you want to take that approach you would want the HPET to interrupt every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Incorrect CLOCK_TICK_RATE in 2.6 kernel
CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. This means that systems using these interrupt sources will be doing a) incorrect update of xtime and b) incorrect conversion of jiffies. Since these two values will track each other this will not be seen by simple gettimeofday(); sleep();gettimeofday() tests, but will be seen as a system clock drift (without NTP) or with NTP, a somewhat high drift rate (to the point of loosing sync at HZ=1000). The fact that the user/ system chooses the clock to use at boot time and can change the clock after boot means that it is not possible to pin down CLOCK_TICK_RATE at compile time. However, since the computation of TICK_NSEC and the conversion constants is rather involved it is clear that we REALLY do want to compute these at compile time. The suggested solution is to a) set up a structure with the default (clock of choice at config time) conversion constants in it at compile time. Then b) at clock init time, populate the structure with the proper constants for the given clock. These can be computed at compile time, but from the correct CLOCK_TICK_RATE for the given clock. Switching to a fall back clock would also require an update of this structure. Commits? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdbwait in 2.6.13-rc4-mm1?
Wilkerson, Bryan P wrote: Is there an equivalent kernel boot option for kgdbwait in 2.6.13-rc4-mm1? I grep'd the kernel source but didn't find kgdbwait. Is there any documentation other than the source for the flavor of KGDB that is included in the akpm kernel patch? The patch has some documentation at Documentation/i386/kgdb/* as well as a couple of gdb macros... The wait option is "gdb". This has been in flux so, to be absolutely sure, look at include/asm-i386/bugs.h -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a "* ntp_adj2" is needed. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Well, in many arches gettimeofday() works around the above issue by capping the offset_ns value as such: I think this may have been done with only usec gettimeofday. Now that we have clock_gettime() returning nsec, we need to be a bit more careful. gettimeofday: xtime + min(offset_ns, tick_len + ntp_adj) The problem with this is that when we have lost or late ticks, or if we are using dynamic ticks you have granularity problems. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. John, If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Hope this helps... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2
Jim Ramsay wrote: On 8/23/05, Jim Ramsay <[EMAIL PROTECTED]> wrote: Then I must have found an undocumented feature! I've applied this set of patches to a 2.6.11 kernel (with few problems) and ran into a bunch of "scheduling while atomic" errors when hotplugging a drive, culprit being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or perhaps when it then calls class_device_unregister, which does a 'down_write'. After further debugging, it appears that the problem is the debounce timer in libata-core.c. Timers appear to operate in an atomic context, so timers should not be allowed to call scsi_remove_device, which eventually schedules. Any suggestions on the best way to fix this? Workqueue, perhaps. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
return(ret); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: ~ 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: George, On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote: 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? Nothing which applies clean at the moment and I have no access to the box where the patch floats around. It's simply explained. Current code: set_timer() calc interval->jiffies / interval->arch_cycles; based on it.interval rearm() timer->expires += interval->jiffies; timer->arch_cycle_expires += interval->arch_cycles; normalize(timer); Patched code: set_timer() timer.interval = it.interval; timer.next_expire = it.value; both stored as timespec rearm() next_expire += interval; calc timer->expires/arch_cycle_expires; So on each rearm we eliminate the rounding errors and take the drift adjustment into account. It adds some calculation overhead to each rearm, but I think the standard was written to eliminate the need for this. The notion is that we have a resolution which we use in the calculations so while there may be drift WRT his request, there should be no drift WRT the requested value rounded up to the next resolution. Still, if we can't keep that resolution in arch_cycles... On another issue along this line, I have been thinking of changing the x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the arch cycle.) The reason to do this is to correctly track changes in cpu frequency as it is today, we would need to track down and update all pending HR timers when ever the frequency changed. By using a common unit all we need to do is change the conversion constants (well I guess they would not be constants any more :). I REALLY don't want to do this as it does add conversion overhead, but I can not think of another clean way to track TSC frequency changes. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Latency with Real-Time Preemption with 2.6.12
Steven Rostedt wrote: On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote: Hi, I am trying to experiment using 2.6.12 kernel with the realtime-preempt V0.7.51-38 patch to determine the kernel preemption latencies with the CONFIG_PREEMPT_RT mode. The test program I wrote does the following on a thread with highest priority (99) and SCHED_FIFO policy to simulate a real time thread. t1 = gettimeofday nanosleep(for 3 ms) t2 = gettimeofday I was expecting to see the difference t2-t1 to be close to 3 ms. However, the smallest difference I see is 4 milliseconds under no system load, and the difference is as high as 25 milliseconds under moderate to heavy system load (mostly performing disk I/O). That version of Ingo's patch does not implement High-Resolution Timers. Thomas worked on merging this into the latest RT patch. Without high-res timers, the best you may ever get is 4ms. This is because nanosleep is to guarantee _at_least_ 3 ms. So you have the following situation: 0 1 23 4 (ms) +---+---++---+---> ^^ || Start here 0+3 = 3 here we have the response If we look at this in smaller units than ms, we started on 0.8ms and responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms. So since Ingo's patch doesn't increase the resolution of the timers from a jiffy (which is currently 1ms) Linux is forced to add one more than you need. Based on the articles and the mails I read on this list, I understand that worst case latencies of 1 ms (or less) should be possible using the RT Preemption patch, but I am unable to get anything less than 4 millseconds even with sleep times smaller than 3 ms. I am running the tests on a SBC with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). I believe I have the high resolution timer working correctly, because if I comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond. I don't think you have the high res timer working, since there is no high res timer in that kernel. Following earlier discussions (in July) in this list, I tried to set kernel configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug information, but I didn't find these parameters in my .config file. I appreciate your suggestions/insights into the situation and steps that I should try to get more debug/tracing information that might help to understand the cause of high latency. It's not a high latency. It's doing exactly as it is suppose to, since the nanosleep doesn't have high-res support (in that kernel). If you really want to measure latency, you need to add a device or something and see what the response time of an interrupt going off to the time a thread is woken to respond to it. Now with Ingo's that is really fast. Another way to do it is to set up a repeating timer. You _must_ read back the timer to get the repeat time it is really using, and then measure how well it does giving signals at these repeat times. FAR FAR more than three lines of code... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.
David S. Miller wrote: From: Anton Blanchard <[EMAIL PROTECTED]> Date: Fri, 19 Aug 2005 04:29:55 +1000 Calling itanium the "fastest 64bit processor at any given clock frequency" on lkml is likewise inflammatory :) I totally agree. Since the itanium off loads a lot of its instruction steam decisions on to the compiler(s), where other processors just do it, one might argue that you can not even characterize the itanium without bundling in the compilers... Not to say that is wrong but just to make it clear that saying the itanium speed is is like saying that a cummings diesel is fast with out saying what sort of car/truck it is mounted in. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. I don't want to be too parental either, but I also am trying to avoid code duplication. Lots of drivers basically do something like poll_event() does (or could do with some changes), i.e. looping a constant amount multiple times, checking something every so often. The patch was just a thought, though. I will keep evaluating drivers and see if it's a useful interface to have eventually. I guess I'm just concerned with making an unintuitive interface. As was brought up at OLS, drivers are a major source of bugs/buggy code. The simpler, more useful we can make interfaces, the better, I think. I'm not claiming you disagree, I just want to make my own motives clear. While fixing up the schedule_timeout() comment would make it clear what schedule_timeout() achieves, I'm not sure how useful such an interface is, if every caller adds 1 :) I need to mull it over, though... Lots to consider. I also, of course, want to stay flexible for the reasons you mention (letting the driver adjust the timeout as they expect to). I would leave the +1 alone and put in the correct documentation. This way _more_ folks will be made aware of the mid jiffie issue. Far to often we see (and let get in) patches that mess up user interfaces around this issue. The recent changes to itimer come to mind... ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Ingo Molnar wrote: * George Anzinger wrote: I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. great. For the time being i wont add it to the -RT tree (because KGDB is not destined for upstream merging it seems), but it sure is a useful development/debugging add-on. I agree on not adding it. Tom Rini is working on a version the Andrew seems inclined to merge. When that happens I will most likely put together enhancements to it to bring it up to what this one does. Meanwhile I am trying to capture some of Tom's changes in this one. Also, it is MUCH easier for me to maintain as a seperate patch. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: ~ The thing that worries me about this function is that it does every thing in usec. We are using nsec in xtime now and I wonder if it would not be more accurate to do the math in nsecs. Even tick size (tick_nsec) does not translate well to usec, it currently being 999849 nsecs. George --- kernel/time.c |3 ++- kernel/timer.c | 53 + 2 files changed, 55 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time.c === --- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200 @@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc) } /* txc->modes & ADJ_OFFSET */ if (txc->modes & ADJ_TICK) { tick_usec = txc->tick; - tick_nsec = TICK_USEC_TO_NSEC(tick_usec); } + if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK)) + time_recalc(); } /* txc->modes */ leave: if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 || ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0 Index: linux-2.6/kernel/timer.c === --- linux-2.6.orig/kernel/timer.c 2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200 @@ -559,6 +559,7 @@ found: */ unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */ unsigned long tick_nsec = TICK_NSEC; /* ACTHZ period (nsec) */ +unsigned long tick_nsec2 = TICK_NSEC; /* * The current time @@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC; /* * the usual normalization. */ struct timespec xtime __attribute__ ((aligned (16))); +struct timespec xtime2 __attribute__ ((aligned (16))); struct timespec wall_to_monotonic __attribute__ ((aligned (16))); EXPORT_SYMBOL(xtime); @@ -596,6 +598,33 @@ static long time_adj; /* tick adjust ( long time_reftime; /* time at last adjustment (s) */ long time_adjust; long time_next_adjust; +static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, time_phase2; + +void time_recalc(void) +{ + long f, t; + tick_nsec = TICK_USEC_TO_NSEC(tick_usec); This leaves bits on the floor. Is it not possible to do this whole calculation in nano seconds? Currently, for example, tick_nsec is 999849... + + t = time_freq >> (SHIFT_USEC + 8); + if (t) { + time_freq -= t << (SHIFT_USEC + 8); + t *= 1000 << 8; + } + f = time_freq * 125; + t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3)); + f &= (1 << (SHIFT_USEC - 3)) - 1; + tick_nsec2 = t / HZ; + f += (t % HZ) << (SHIFT_USEC - 3); + f <<= 5; + time_adj2 = f / HZ; + time_freq_adj2 = f % HZ; + + printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n", + xtime.tv_sec, xtime.tv_sec, + tick_nsec, time_freq, time_offset, time_next_adjust, + xtime2.tv_sec, xtime2.tv_nsec, + tick_nsec2, time_adj2, time_freq_adj2); +} /* * this routine handles the overflow of the microsecond field @@ -739,6 +768,16 @@ static void second_overflow(void) #endif } +static void second_overflow2(void) +{ + time_adj2_cur = time_adj2; + time_freq_phase2 += time_freq_adj2; + if (time_freq_phase2 > HZ) { + time_freq_phase2 -= HZ; + time_adj2_cur++; + } +} + /* in the NTP reference this is called "hardclock()" */ static void update_wall_time_one_tick(void) { @@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo time_adjust = time_next_adjust; time_next_adjust = 0; } + + delta_nsec = tick_nsec2; + time_phase2 += time_adj2_cur; + if (time_phase2 >= (1 << (SHIFT_USEC + 2))) { + long ltemp = time_phase2 >> (SHIFT_USEC + 2); + time_phase2 -= ltemp << (SHIFT_USEC + 2); + delta_nsec += ltemp; + } + xtime2.tv_nsec += delta_nsec; + if (xtime2.tv_nsec >= NSEC_PER_SEC) { + xtime2.tv_nsec -= NSEC_PER_SEC; + xtime2.tv_sec++; + second_overflow2(); + } } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "un
[patch] KGDB for Real-Time Preemption systems
I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: On 04.08.2005 [09:45:55 -0700], George Anzinger wrote: Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and timeval_to_jiffies() to add 1. This is NOT the right thing to do. For repeating times (see setitimer code) we need the actual time as we KNOW where the jiffies edge is in the repeating case. The +1 is needed ONLY for the initial time, not the repeating time. See: http://marc.theaimsgroup.com/?l=linux-kernel&m=112208357906156&w=2 I followed that thread, George, but I think it's a different case with schedule_timeout() [maybe this indicates drivers/other users should maybe be using itimers, but I'll get to that in a sec]. I think I miss understood back then :). With schedule_timeout(), we are just given a relative jiffies value. We have no context as to which task is requesting the delay, per se, meaning we don't (can't) know from the interface whether this is the first delay in a sequence, or a brand new one, without changing all users to have some sort of control structure. The callers of schedule_timeout() don't even get a pointer to the timer added internally. So, adding 1 to all sleeps seems like it might be reasonable, as looping sleeps probably need to use a different interface. I had worked a bit ago on something like poll_event() with the kernel-janitors group, which would abstract out the repeated sleeps. Basically wait_event() without wait-queues... Maybe we could make such an interface just use itimers? I've attached my old patch for poll_event(), just for reference. I think not. itimers is really pointed at a particular system call and has resources in the task structure to do it. These would be hard to share... My point, I guess, is that in the schedule_timeout() case, we don't know where the jiffies edge is, as we either expire or receive a wait-queue event/signal, we never mod_timer() the internal timer... So we have to assume that we need to sleep the request. But maybe Roman's idea of sleeping a certain number of jiffy edges is sufficient. I am not yet convinced driver authors want/need such an interface, though, still thinking it over. IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features
Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: * George Anzinger wrote: Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. ah - thanks, applied. Might explain some of the recent SMP weirdnesses i'm seeing. Attributed them to the HRT patch ;-) i'm still seeing weird crashes under SMP, which go away if i disable CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs in the HRT code) It happens sometime during the bootup, after enabling the network but before users can log in. There's no good debug info, just a hang that comes from all CPUs trying to get some debug info out but crashing deeply. I haven't looked at this new code all that closely as yet. One thing I did notice is that there is an assumption that the "timer being delivered flag" can be shared between LR timers and HR timers. I suspect this is wrong as the delivery code is in seperate threads (I assume). This could lead to del_timer_async missing a timer. In the prior patch we just ignored the del_timer_async issue for HR timers (code I plan to do soon). This WAS taken care of in earlier kernels by a reuse of one of the list link fields, but Andrew convince me that this was _not_ good. So, my guess, a nanosleep for an RT task (I think you said these are promoted to HR) is completing and over writing the deliver in progress flag for a LR timer which just happens to have a del_timer_sync going on at the same time. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] eliminte NMI entry/ exit code
Zachary Amsden wrote: George Anzinger wrote: Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add <- interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... So NMI changing the preempt count and restoring in the middle of a RWM is not the problem. Thus I don't understand what the issue is. NMI must undo all side effects. Does the PREEMPT_DEBUG code check the count somewhere within the NMI handler? If so, shouldn't the proper fix be to make that code aware that it could be running inside of an NMI and/or ensure that code is not called from within the NMI handler? Yes that is the problem. The sanity check in PREEMPT_DEBUG fails when called from the NMI handler. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] eliminte NMI entry/ exit code
Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add <- interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features
Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzinger hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
[PATCH] eliminte NMI entry/ exit code
The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzinger hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Bill Davidsen wrote: George Anzinger wrote: Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. What are the portability and scaling issues if it were done this way? It clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm less sure how it would work. I am not sure how much is involved. For VST I disabled the tick generated NMI watchdog interrupt on a per cpu basis but stopped the PIT tick only when all cpus were idle. The next step would be to mess with the interrupt steering logic to keep the tick away from idle cpus. I did not get into this level in my work, being mainly interested in embedded systems. But when you go to non-x86 hardware, is there always going to be another source of wakeup available if the PIT is blocked instead of reset? I have to go back and look at how SPARC hardware works, I don't remember enough to be useful. Most (all) other archs don't have PITs. The x86 sucks big time when it comes to time keeping hardware. The most common hardware is a counter that runs forever (much as the TSC but FIXED in frequency). Interrupts are generated either by comparing a register to this or using companion counters that just count down to zero. In either case you don't loose time because you can always precisely set up an interrupt. To sleep, then, you just set your sleep time in the normal time base interrupt counter. At the end, you know exactly what to set to get back to the regular tick. These other platforms make VST and High Res Timers so easy... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: ~ Do you have a patch around for improving next_timer_interrupt()? Well, sort of. The code in the VST patch does the right thing. Problem is it does a bit more than the timer.c code. You can find that code on the HRT site CVS. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: * Srivatsa Vaddagiri <[EMAIL PROTECTED]> [050805 05:37]: On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote: This is the dynamic ticks patch for i386 as written by Tony Lindgen <[EMAIL PROTECTED]> and Tuukka Tikkanen <[EMAIL PROTECTED]>. Patch for 2.6.13-rc5 There were a couple of things that I wanted to change so here is an updated version. This code should have stabilised enough for general testing now. Con, I have been looking at some of the requirement of tickless idle CPUs in core kernel areas like scheduler and RCU. Basically, both power management and virtualization benefit if idle CPUs can cut off useless timer ticks. Especially from a virtualization standpoint, I think it makes sense that we enable this feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and when they become idle. The benefit of this is more visible in platforms that host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be partially idle (some CPUs in it are idle, some not) and it is good that we quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports of Linux kernel have this ability today (S390 has it in mainline already and Xen has it out of tree). Good point, and it would be nice to have it resolved for systems that support idling individual CPUs. The current setup was done because when I was tinkering with the amd76x_pm patch a while a back, I noticed that idling the cpu disconnects all cpus from the bus. (As far as I remember) So this may need to be configured depending on the system. From this viewpoint, I think the current implementation of dynamic tick falls short of this requirement. It cuts of the timer ticks only when all CPUs go idle. Apart from this observation, I have some others about the current dynamic tick patch: - All CPUs seem to cut off the same number of ticks (dyn_tick->skip). Isn't this wrong, considering that the timer list is per-CPU? This will cause some timers to be serviced much later than usual. Yes if it's done on per-CPU basis. In the current setup the first interrupt will kick the system off the dyn-tick state and the timers get checked again. - The fact that dyn_tick_state is global and accessed from all CPUs is probably a scalability concern, especially if we allow the ticks to be cut off on per-CPU basis. From idling devices point of view, we still need some global variable I believe. How else would you be able to tell all devices that the whole system does not have any timers for next 2 seconds? - Again, when we allow this on a per-CPU basis, subsystems like RCU need to know the partial set of idle CPUs. RCU already does that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map). Sounds like that could work for dyn-tick too. - Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling do_timer_interrupt so many times and instead update jiffies to (skipped_ticks - 1) and then call do_timer_interrupt once? I think VST does it that way. In the long run we would do the calculations in usecs and just emulate jiffies from the hw timer. But yes, optimizing updating the time would be great. - dyn_tick->max_skip = 0xff / apic_timer_val; From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the above calculation take only 24-bits into account? What am I missing here? Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is optional support, or maybe I accidentally pulled the optional 24-bit support from the ACPI PM timer. But in any case on P4 systems the APIC timer is not the bottleneck as stopping or reprogramming PIT also kills APIC. (This does not happen on P3 systems). So the bottleneck most likely is the length of PIT. I can take a shot at addressing these concerns in dynamic_tick patch, but it seems to me that VST has already addressed all these to a big extent. Had you considered VST before? The biggest bottleneck I see in VST going mainline is its dependency on HRT patch but IMO it should be possible to write a small patch to support VST w/o HRT. George, what do you think? HRT + VST depend on APIC only, and does not use next_timer_interrupt(). I convinced my self that the next_timer... code in timer.c misses timers (i.e. gives the wrong answer). I did this (after wondering due to performance) by scanning the whole timer list after I had the next_timer... answer and finding a better answer, not always, but some times. That code does not address the cascade list correctly. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote: Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen with further code cleanups. Are were there yet? Con, I am afraid until SMP correctness is resolved, then this is not in a position to go in (unless you want to enable it only for UP, which I think should not be our target). I am working on making this work correctly on SMP systems. Hopefully I will post a patch soon. Another observation I have made regarding dynamic tick patch is that PIT is being reprogrammed whenever the CPUs are coming out of sleep state (because of an interrupt say). This can happen at any arbitary time, not necessarily on jiffy boundaries. As a result, there will be an offset between when jiffy interrupts will now occur vs when they would have originally occured had PIT never been stopped. Not sure if having this offset is good, but atleast one necessary change that I foresee if zeroing delay_at_last_interrupt when disabling dynamic tick. For that matter, it may be easier to disable the PIT timer by just masking PIT interrupts (instead of changing its mode). IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. My VST patch just turns masks the interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Roland McGrath wrote: There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. That's correct. de_thread will turn the thread calling exec into the new leader and kill off all the other threads, including the old leader. The exec'ing thread's existing task_struct is reassigned to the PID of the original leader. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. That is correct. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. You are mistaken. The signal code handles process signals sent when the leader is a zombie. The group leader sticks around with the PID that matches the TGID, until there are no live threads with its TGID. That is how process-wide kill can still work. Yes, I see, traced through the signal delivery. So Linus' patch as well as the regression of Ingo's will fix all of this. Right? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote: Roland McGrath <[EMAIL PROTECTED]> wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. OK.. +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co - if (group_dead) + if (group_dead) { + del_timer_sync(&tsk->signal->real_timer); acct_process(code); + } +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * - del_timer_sync(&sig->real_timer); That one fixes it for me. There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. So, don't we need to also change real_timer's task when the exiting task is the real_timer wake up task, assigning it to some other member of the group? Note, I don't say just if it is the group leader... Then when we finally release the signal structure, we can "del" the timer. Did I miss something here? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Andrew Morton wrote: Roland McGrath <[EMAIL PROTECTED]> wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. Hm... I was looking at 2.6.10 to figure it out. This looks more correct. OK.. --- 25/kernel/exit.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co acct_update_integrals(tsk); update_mem_hiwater(tsk); group_dead = atomic_dec_and_test(&tsk->signal->live); - if (group_dead) + if (group_dead) { + del_timer_sync(&tsk->signal->real_timer); acct_process(code); + } exit_mm(tsk); exit_sem(tsk); diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c --- 25/kernel/posix-timers.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * tmr = list_entry(sig->posix_timers.next, struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(&sig->real_timer); } /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: Hi, Somewhere between 2.6.11 and 2.6.12 the regression in $subject was added to the linux kernel. Testcase below. Yep. The itimer changes got a bit carried away. Here is a fix. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: The changes to itimer of late (after 2.6.11) cause itimers not to survive the exec* calls. Standard says they should. Signed-off-by: George Anzinger exit.c |1 + posix-timers.c |4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.13-rc/kernel/exit.c === --- linux-2.6.13-rc.orig/kernel/exit.c +++ linux-2.6.13-rc/kernel/exit.c @@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co } tsk->flags |= PF_EXITING; + del_timer_sync(&tsk->signal->real_timer); /* * Make sure we don't try to process any timer firings Index: linux-2.6.13-rc/kernel/posix-timers.c === --- linux-2.6.13-rc.orig/kernel/posix-timers.c +++ linux-2.6.13-rc/kernel/posix-timers.c @@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct * struct k_itimer *tmr; while (!list_empty(&sig->posix_timers)) { - tmr = list_entry(sig->posix_timers.next, struct k_itimer, list); + tmr = list_entry(sig->posix_timers.next, +struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(&sig->real_timer); } /*
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ Sorry, I forgot that sys_nanosleep() also always adds 1 to the request (to account for this same issue, I believe, as POSIX demands no early return from nanosleep() calls). There are some other locations where similar + (t.tv_sec || t.tv_nsec) This is not the same as "always add 1". We don't do it this way just to have fun with C. If you change schedule_timeout() to add the 1, nanosleep() will need to do things differently to get the same behavior. (And, YES users do pass in zero sleep times.) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
e(unsigned int msecs) { - unsigned long timeout = msecs_to_jiffies(msecs) + 1; + unsigned long timeout = msecs_to_jiffies(msecs); while (timeout && !signal_pending(current)) { set_current_state(TASK_INTERRUPTIBLE); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
Keith Owens wrote: On Tue, 02 Aug 2005 18:12:27 -0700, George Anzinger wrote: How about something like: if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC) current points to the current struct task, regs points to the kernel stack. Those two data areas can be completely separate, as they are on i386. Also i386 uses a separate kernel stack for interrupts. Acually I must mean the thread_info and not current. i386 only uses a seperate stack if you use 4K stacks. I think others use seperate interrupt stacks, however :(. Also, on thinking on it, I think some archs don't call the registers pt_regs either. Oh, well, it was a thought... Waiting for its brother... :) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
Steven Rostedt wrote: On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote: Couldn't you just do some math off current->timestamp to see how long the task has been running? This per arch stuff seems a bit invasive.. The thing is, I'm tracking how long the task is running in the kernel without doing a schedule. That's actually easy, but I don't want to count when the task is in userspace. The per-arch is only updating so that we don't count user space, otherwise the count could be in the task_struct. If there is an arch-independent way to tell if a task is running in user-space or kernel when an interrupt goes off then I would use it. The per arch is actually easy, and I would write it, but I don't have the hardware now to test it. I could at least do PPC and MIPS since I'm quite familiar with both, but I don't currently have a cross compiler to compile it. I understand your point, I would really prefer an arch independent solution, but the timestamp from current just wont cut it. Have another idea, I'm all open for it. How about something like: if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC) The idea is that an interrupt from user space will be the ONLY thing on the stack while an interrupt from the kernel will have kernel stack under it. Current is the bottom end of the kernel stack and regs + sizeof(pt_regs) is where the interrupt context started. Assumptions a) stack grows down, b) no switch stack at interrupt. MAGIC is some small number. For x86 user it is actually zero, don't know about others but the saved context should be the first thing on the stack so a minimun frame size should do. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: [PATCH] NMI watch dog notify patch
It seems that the subject patch generates a warning (missed it on the compile). Here is a patch to eliminate the warning. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: This patch eliminates the warning generated in die_nmi() when calling notify_die() by adding "const" to notify_die()'s definition. Signed-off-by: George Anzinger Index: linux-2.6.13-rc/include/asm-i386/kdebug.h === --- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h +++ linux-2.6.13-rc/include/asm-i386/kdebug.h @@ -41,7 +41,7 @@ enum die_val { DIE_PAGE_FAULT, }; -static inline int notify_die(enum die_val val,char *str,struct pt_regs *regs,long err,int trap, int sig) +static inline int notify_die(enum die_val val, const char *str,struct pt_regs *regs,long err,int trap, int sig) { struct die_args args = { .regs=regs, .str=str, .err=err, .trapnr=trap,.signr=sig }; return notifier_call_chain(&i386die_chain, val, &args);
Re: Clock resolution / RT preemption
greg wrote: Hi folks, I'm looking for a timer resolution lower than 1 ms (and monotonic clock rate) destined to be used with some network code running on x86 platforms. Would you please provide me with informations about how to get/implement this. AFAIK, there's a "high resultion timer" patch hanging around, but there's not much informations with regard to portability (specific hardware requirements ?), scalability, integration with RT patches. I understand the POSIX 1003.1b Clocks and Timers system calls are not fully available within the linux kernel (and libc ?), am I right on that ? On the HRT web site (see signature) there is a CVS repository. In there is a special version for the RT kernel. As to porting it to other archs, have a look at the include/linux/hrtimer.h file. It has (or should have) all you need to know. Please pass back any port you do. One more question : I believe Ingo's preemption patch run timers/interrupt handlers within kernel threads, how should I assign specific priority to address my goals without compromising system stability ? Carefully :) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Fri, 29 Jul 2005 13:55:23 -0700, George Anzinger wrote: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(&nmi_print_lock); /* * We are in trouble anyway, lets at least try Minor nitpick. die_nmi() already gets a message passed in to distinguish between different types of nmi. Pass that message to notify_die(), on the off chance that the notified routines can use that difference. Excellent idea! Also your patch adds a trailing whitespace on the call to notify_die(). Fixed. This should do it. - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, "NMI Watchdog detected LOCKUP"); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) == + NOTIFY_STOP) + return; + spin_lock(&nmi_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: Keith Owens <[EMAIL PROTECTED]> wrote: I had though that too, but it does not allow recovery (i.e. lets reset >the watchdog and try again). die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the reset and continues. Patch below. >Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi >trap, only from the watchdog. Also, there is a notify in the path to >the other nmi stuff. I was looking at unknown_nmi_panic_callback(), which also calls die_nmi(). traps.c already has several notify_die() calls, nmi.c has none. It is cleaner to keep all the notification in traps.c, with this small change to nmi.c to cope with die_nmi() returning. Index: linux/arch/i386/kernel/nmi.c === --- linux.orig/arch/i386/kernel/nmi.c 2005-07-28 17:22:06.735038510 +1000 +++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000 @@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) + if (alert_counter[cpu] == 5*nmi_hz) { die_nmi(regs, "NMI Watchdog detected LOCKUP"); + alert_counter[cpu] = 0; + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; That all makes sense - let's go that way? Looks good to me. Trimed a bit more fat too. Here is the complete patch. - - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c |5 - traps.c |4 2 files changed, 8 insertions(+), 1 deletion(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs * */ alert_counter[cpu]++; if (alert_counter[cpu] == 5*nmi_hz) + /* +* die_nmi will return ONLY if NOTIFY_STOP happens.. +*/ die_nmi(regs, "NMI Watchdog detected LOCKUP"); - } else { + last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; } Index: linux-2.6.13-rc/arch/i386/kernel/traps.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c +++ linux-2.6.13-rc/arch/i386/kernel/traps.c @@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock); void die_nmi (struct pt_regs *regs, const char *msg) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, + 0, 0, SIGINT) == NOTIFY_STOP) + return; + spin_lock(&nmi_print_lock); /* * We are in trouble anyway, lets at least try
Re: [PATCH] NMI watch dog notify patch
Keith Owens wrote: On Thu, 28 Jul 2005 13:31:58 -0700, George Anzinger wrote: I have been doing some work on kgdb to pull a few of it "fingers" out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. I like the idea, but the hook should be in die_nmi(), not in the watchdog, using the reason that is already passed into die_nmi. die_nmi() is also called for a real NMI. I had though that too, but it does not allow recovery (i.e. lets reset the watchdog and try again). Hmm.. just looked at traps.c. Seems die_nmi is NOT called from the nmi trap, only from the watchdog. Also, there is a notify in the path to the other nmi stuff. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NMI watch dog notify patch
Andrew Morton wrote: George Anzinger wrote: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. It looks sensible, but as there aren't actually any in-kernel uses for this I'd have thought it would be better for it to live out-of-tree? I should just bundle it with the kgdb patch then? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] NMI watch dog notify patch
Andrew, I have been doing some work on kgdb to pull a few of it "fingers" out of various places in the kernel. This is the final location where we have a kgdb intercept not covered by a notify. On a related issue, I feel very queasy with sending nmi interrupts and non-nmi events to the same notify code. Would you be open to a patch to create a seperate notify list for nmi events? - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Enhancement Description: This patch adds a notify to the nmi watchdog to notify that the system is about to be taken down by the watchdog. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger nmi.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c === --- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c +++ linux-2.6.13-rc/arch/i386/kernel/nmi.c @@ -26,11 +26,13 @@ #include #include #include +#include #include #include #include #include +#include #include "mach_traps.h" @@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs * * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; - if (alert_counter[cpu] == 5*nmi_hz) - die_nmi(regs, "NMI Watchdog detected LOCKUP"); + if (alert_counter[cpu] == 5*nmi_hz) { + if (notify_die(DIE_NMIWATCHDOG, "nmi_ipi_watchdog", + regs, 0, 0, SIGINT) == NOTIFY_STOP) { + last_irq_sums[cpu] = sum; + alert_counter[cpu] = 0; + } else { + die_nmi(regs, "NMI Watchdog detected LOCKUP"); + } + } } else { last_irq_sums[cpu] = sum; alert_counter[cpu] = 0; @@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta return -EBUSY; } else { set_nmi_callback(unknown_nmi_panic_callback); - } + } } else { release_lapic_nmi(); unset_nmi_callback();
[PATCH] fix normalize problem in posix timers.
We found this (after a customer complained) and it is in the kernel.org kernel. Seems that for CLOCK_MONOTONIC absolute timers and clock_nanosleep calls both the request time and wall_to_monotonic are subtracted prior to the normalize resulting in an overflow in the existing normalize test. This causes the result to be shifted ~4 seconds ahead instead of ~2 seconds back in time. Patch is attached. - George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: The normalize code in posix-timers.c fails when the tv_nsec member is ~1.2 seconds negative. This can happen on absolute timers (and clock_nanosleeps) requested on CLOCK_MONOTONIC (both the request time and wall_to_monotonic are subtracted resulting in the possibility of a number close to -2 seconds.) This fix uses the set_normalized_timespec() (which does not have an overflow problem) to fix the problem and as a side effect makes the code cleaner. Signed-off-by: George Anzinger posix-timers.c | 17 +++-- 1 files changed, 3 insertions(+), 14 deletions(-) Index: linux-2.6.13-rc/kernel/posix-timers.c === --- linux-2.6.13-rc.orig/kernel/posix-timers.c +++ linux-2.6.13-rc/kernel/posix-timers.c @@ -915,21 +915,10 @@ static int adjust_abs_time(struct k_cloc jiffies_64_f = get_jiffies_64(); } /* -* Take away now to get delta +* Take away now to get delta and normalize */ - oc.tv_sec -= now.tv_sec; - oc.tv_nsec -= now.tv_nsec; - /* -* Normalize... -*/ - while ((oc.tv_nsec - NSEC_PER_SEC) >= 0) { - oc.tv_nsec -= NSEC_PER_SEC; - oc.tv_sec++; - } - while ((oc.tv_nsec) < 0) { - oc.tv_nsec += NSEC_PER_SEC; - oc.tv_sec--; - } + set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec, + oc.tv_nsec - now.tv_nsec); }else{ jiffies_64_f = get_jiffies_64(); }
Re: [PATCH] Re: itimer oddness in 2.6.12
Andrew Morton wrote: George Anzinger wrote: + while (time_before_eq(p->signal->real_timer.expires, jiffies)) + p->signal->real_timer.expires += inc; It gives me the creeps when I see timer code doing this, and it seems to be done relatively frequently. Surely it can be calculated arithmetically? If not, are you really sure that it is not exploitable by malicious code? Hm.. the system only falls into a loop here if the system is loaded to the point where we are a jiffie or more late. The prior code just did the "+=" and called add_timer, possibly with a time in the past. I suspect that way of doing this would never catch up if the user asked for a one jiffie repeat time. Also, this is faster than the div, mpy if you are not late (or even if you are several jiffies late). A possible alternative might be: p->signal->real_timer.expires += inc; if (time_before_eq(p->signal->real_timer.expires, jiffies)) p->signal->real_timer.expires += ((jiffies - p->signal->real_timer.expires + inc -1) / inc) * inc; Both a div and a mpy in there. I really think the "while" is ok, but if you prefer... The last time you questioned this sort of thing was in the code to correct an absolute timer. In that case we were adjusting after a clock set and, yes, it was possibly exploitable (assuming you could set the clock). Here we don't have that possibility, i.e. we only get into the loop if the system is late. - -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: itimer oddness in 2.6.12
Tom Marshall wrote: On Fri, Jul 22, 2005 at 08:21:25PM +0100, Paulo Marques wrote: Tom Marshall wrote: The patch to fix "setitimer timer expires too early" is causing issues for the Helix server. We have a timer processs that updates the server's timestamp on an itimer and it expects the signal to be delivered at roughly the interval retrieved from getitimer. This is very consistent on every platform, including Linux up to 2.6.11, but breaks on 2.6.12. On 2.6.12, setting the itimer to 10ms and retrieving the actual interval from getitimer reports 10.998ms, but the timer interrupts are consistently delivered at roughly 11.998ms. Unfortunately, this is not so clear cut as it seems :( Oops! That patch is wrong. The +1 should be applied to the initial interval _only_. We KNOW when the repeating intervals start (i.e. at the jiffie edge) and don't need to adjust them. The patch, however, incorrectly, rolls them all into one. The attach patch should fix the problem. Warnning, it compiles and boots, but I have not tested it. George -- Yes, I am sure that it is not a simple problem. I am not a kernel developer but I imagine that issues such as NTP adjustments would complicate this issue. I must also admit that I am not intimately familiar with the POSIX spec regarding itimers. Our current code does a setitimer followed by getitimer, then uses the actual interval retrieved by getitimer to set a global timer delta. On each timer signal, it updates the notion of the current time by the timer delta. As mentioned, this works on every other platform (Solaris, BSD, HPUX, AIX, DGUX, IRIX, Tru64, and Linux up to 2.6.11) but breaks on 2.6.12. This is not an insurmountable problem for userspace. It can be easily solved by using gettimeofday in the timer interrupt instead of adding the delta to the current time blindly. No big deal. I just wanted to point this issue out and ensure that (1) it was a known issue, and (2) it is the direction that the Linux kernel intends to take. If so, no big deal and we can modify the timer code to take that into account. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. Type: Defect Fix Disposition: Description: This changes setitimer as follows: 1. The repeating timer is figured using the requested time (not +1 as we know where we are in the jiffie). 2. The tests for interval too large are left to the time_val to jiffie code. Signed-off-by: George Anzinger itimer.c | 37 - 1 files changed, 16 insertions(+), 21 deletions(-) Index: linux-2.6.13-rc/kernel/itimer.c === --- linux-2.6.13-rc.orig/kernel/itimer.c +++ linux-2.6.13-rc/kernel/itimer.c @@ -112,28 +112,11 @@ asmlinkage long sys_getitimer(int which, return error; } -/* - * Called with P->sighand->siglock held and P->signal->real_timer inactive. - * If interval is nonzero, arm the timer for interval ticks from now. - */ -static inline void it_real_arm(struct task_struct *p, unsigned long interval) -{ - p->signal->it_real_value = interval; /* XXX unnecessary field?? */ - if (interval == 0) - return; - if (interval > (unsigned long) LONG_MAX) - interval = LONG_MAX; - /* the "+ 1" below makes sure that the timer doesn't go off before -* the interval requested. This could happen if -* time requested % (usecs per jiffy) is more than the usecs left -* in the current jiffy */ - p->signal->real_timer.expires = jiffies + interval + 1; - add_timer(&p->signal->real_timer); -} void it_real_fn(unsigned long __data) { struct task_struct * p = (struct task_struct *) __data; + unsigned long inc = p->signal->it_real_incr; send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p); @@ -141,14 +124,23 @@ void it_real_fn(unsigned long __data) * Now restart the timer if necessary. We don't need any locking * here because do_setitimer makes sure we have finished running * before it touches anything. +* Note, we KNOW we are (or should be) at a jiffie edge here so +* we don't need the +1 stuff. Also, we want to use the prior +* expire value so as to not "slip" a jiffie if we are late. +* Deal with requesting a time prior to "now" here rather than +* in add_timer. */ - it_real_arm(p, p->signal->it_real_incr); + if (!inc) + return; + while (time_before_eq(p->signal->real_timer.expires, jiffies)) + p->signal->real_timer.expires += inc; + add_timer(&p->signal->real_timer); } int do_setitimer(int which, struct itimerval *v
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Con Kolivas wrote: On Thu, 14 Jul 2005 05:10, Linus Torvalds wrote: On Wed, 13 Jul 2005, Vojtech Pavlik wrote: No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have a counter that counts the ticks in nanoseconds (xtime ...), the first will be exact, the second will be accumulating an error. It's not even that we have a counter like that, it's the simple fact that we have a standard interface to user space that is based on milli-, micro- and nanoseconds. (For "poll()", "struct timeval" and "struct timespec" respectively). It's totally pointless saying that we can do 864 Hz "exactly", when the fact is that all the timeouts we ever get from user space aren't in that format. So the only thing that matters is how close to a millisecond we can get, not how close to some random number. That may be the case but when I've measured the actual delay of schedule timeout when using nanosleep from userspace, the average at 1000Hz was 1.4ms +/- 1.5 sd . When we're expecting a sleep of "up to 1ms" we're getting 50% longer than the longest expected. Purely mathematically the accuracy of changing HZ from 1000 -> 864 will not bring with it any significant change to the accuracy. This can easily be measured as well to confirm. Using schedule timeout as an argument against it doesn't hold for me. Vojtech's comment of : "No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have a counter that counts the ticks in nanoseconds (xtime ...), the first will be exact, the second will be accumulating an error." is probably the most valid argument against such a funky number. No, that doesn't hold water either. We already jigger jiffie to be _close_ to 1/HZ and closer still to what we can get from the PIT as its true period (for example, today the jiffie is 999849 nanoseconds) and this too is only accurate to the nanosecond. Here are the jiffie values for several HZ values using the formulas in the code which use the TICK_RATE as given by the hardware. Note the error here is the difference between an asked for repeating timer of 1 second and what the system clock on the same system says, NOT what real time is in either case, just relative between the two. In otherwords, if you set up an itimer to signal every second and looked at the long term drift between the signals it gives and the system clock you would see the itimer drifting by ~914ppm (with HZ = 846). HZ TICK RATE jiffie(ns) second(ns) error (ppbillion) 100 1193182100010 0 200 1193182 598119600 19600 250 1193182 4000250162500 62500 500 1193182 19996881001843688 1843688 1000 1193182 999848 1000847848 847848 846 1193182 11817171000914299914299 Cheers, Con -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Lee Revell wrote: On Wed, 2005-07-13 at 14:16 -0700, Chris Wedgwood wrote: Both can be detected from you .config and we could see HZ as needed there and everyone else could avoid this surely? Does anyone object to setting HZ at boot? I suspect nothing else will make everyone happy. This will really mess up the jiffie_to_* and *_to_jiffie conversions. They rely in a rather large way on the complier doing all the heavy lifting. If HZ is a variable we introduce a LOT of runtime overhead here. (Try make kernel/itimer.i and look for jiffies_to_t* and friends.) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Linus Torvalds wrote: On Wed, 13 Jul 2005, Vojtech Pavlik wrote: No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have a counter that counts the ticks in nanoseconds (xtime ...), the first will be exact, the second will be accumulating an error. It's not even that we have a counter like that, it's the simple fact that we have a standard interface to user space that is based on milli-, micro- and nanoseconds. (For "poll()", "struct timeval" and "struct timespec" respectively). It's totally pointless saying that we can do 864 Hz "exactly", when the fact is that all the timeouts we ever get from user space aren't in that format. So the only thing that matters is how close to a millisecond we can get, not how close to some random number. So we do a lot of conversions from "struct timeval" to "jiffies", and if you don't take the error in that conversion into account, then you're ignoring what is likely a _bigger_ error. Long-term time drift is a known issue, and is unavoidable since you don't even know the exact frequency of the crystal, since that is not only not that exact in the first place, it depends on temperature etc. So long-term time drift is something that we inevitably have to use things like NTP to handle, if you want an exact clock. And in short-term things, the timeval/jiffie conversion is likely to be a _bigger_ issue than the crystal frequency conversion. So we should aim for a HZ value that makes it easy to convert to and from the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all good values for that reason. 864 is not.Linus Torvalds wrote: On Wed, 13 Jul 2005, Vojtech Pavlik wrote: No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have a counter that counts the ticks in nanoseconds (xtime ...), the first will be exact, the second will be accumulating an error. It's not even that we have a counter like that, it's the simple fact that we have a standard interface to user space that is based on milli-, micro- and nanoseconds. (For "poll()", "struct timeval" and "struct timespec" respectively). It's totally pointless saying that we can do 864 Hz "exactly", when the fact is that all the timeouts we ever get from user space aren't in that format. So the only thing that matters is how close to a millisecond we can get, not how close to some random number. So we do a lot of conversions from "struct timeval" to "jiffies", and if you don't take the error in that conversion into account, then you're ignoring what is likely a _bigger_ error. Long-term time drift is a known issue, and is unavoidable since you don't even know the exact frequency of the crystal, since that is not only not that exact in the first place, it depends on temperature etc. So long-term time drift is something that we inevitably have to use things like NTP to handle, if you want an exact clock. And in short-term things, the timeval/jiffie conversion is likely to be a _bigger_ issue than the crystal frequency conversion. So we should aim for a HZ value that makes it easy to convert to and from the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all good values for that reason. 864 is not. Uh, WAIT A NANOSECOND! Look at what we are doing today in that department. The key is not the ability to convert based on the value of HZ but on the implied value of jiffie given CLOCK_TICK_RATE. Today the value we use for jiffie is 999849 nanoseconds which is what the given CLOCK_TICK_RATE and HZ end up getting from the PIT. By the time the user comes along we have TICK_NSEC and the current conversion routines which are not exactly simple but they are correct. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Con Kolivas wrote: On Tue, 12 Jul 2005 22:39, Con Kolivas wrote: On Tue, 12 Jul 2005 22:10, Vojtech Pavlik wrote: The PIT crystal runs at 14.3181818 MHz (CGA dotclock, found on ISA, ...) and is divided by 12 to get PIT tick rate 14.3181818 MHz / 12 = 1193182 Hz Yes, but the current code uses 1193180. Wonder why that is... The reality is that the crystal is usually off by 50-100 ppm from the standard value, depending on temperature. HZ ticks/jiffie 1 second error (ppm) --- 100 11932 1.15238 15.2 200 5966 1.15238 15.2 250 4773 1.57143 57.1 300 3977 0.31429 -68.6 333 3583 0.64114 -35.9 500 2386 0.999847619-152.4 1000 1193 0.999847619-152.4 If we are following the standard and trying to set up a timer, the 1 second time MUST be >= 1 second. Thus the values for 300 and above in this table don't fly. If we are trying to keep system time, well we do just fine at that by using the actual value of the jiffie (NOT 1/HZ) when we update time (one of the reasons for going to nanoseconds in xtime). The observable thing the user sees is best seen by setting up an itimer to repeat every second. Then you will see the drift AND it will be against the system clock which itself is quite accurate (the 50-100ppm you mention), even without ntp. And the error really is in the range of 848ppm for HZ=1000 BECAUSE we need to follow the standard. You can easily see this with the current 2.6 kernel. We even have a bug report on it: http://bugzilla.kernel.org/show_bug.cgi?id=3289 ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Martin J. Bligh wrote: Lots of people have switched from 2.4 to 2.6 (100 Hz to 1000 Hz) with no impact in stability, AFAIK. (I only remember some weird warning about HZ with debian woody's ps). Yes, that's called "progress" so no one complained. Going back is called a "regression". People don't like those as much. That's a very subjective viewpoint. Realize that this is a balancing act between latency and overhead ... and you're firmly only looking at one side of the argument, instead of taking a compromise in the middle ... If I start arguing for 100HZ on the grounds that it's much more efficient, will that make 250/300 look much better to you? ;-) I would like to interject an addition data point, and I will NOT be subjective. The nature of the PIT is that it can _hit_ some frequencies better than others. We have had complaints about repeating timers not keeping good time. These are not jitter issues, but drift issues. The standard says we may not return early from a timer so any timer will either be on time or late. The amount of lateness depends very much on the HZ value. Here is what the values are for the standard CLOCK_TICK_RATE: HZ TICK RATE jiffie(ns) second(ns) error (ppbillion) 100 1193180100010 0 200 1193180 598119600 19600 250 1193180 4000250162500 62500 500 1193180 19997031001851203 1851203 1000 1193180 9998481000847848847848 The jiffie values here are exactly what the kernel uses and are based on the best one can do with the PIT hardware. I am not suggesting any given default HZ, but rather an argumentation of the help text that goes with it. For those who want timers to repeat at one second (or multiples there of) this is useful info. For you enjoyment I have attached the program used to print this. It allows you to try additional values... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ #define NSEC_PER_SEC 10 //#define CLOCK_TICK_RATE /*1 */ 1193180 #define LATCH(CLOCK_TICK_RATE,HZ) ((CLOCK_TICK_RATE + HZ/2) / HZ) #define SH_DIV(NOM,DEN,LSH) ( ((NOM / DEN) << LSH) \ + (((NOM % DEN) << LSH) + DEN / 2) / DEN) #define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH(CLOCK_TICK_RATE,HZ), 8)) #define TICK_NSEC (SH_DIV (100UL * 1000, ACTHZ, 8)) struct { int hz; int clocktickrate; } vals[] = {{100, 1193180}, {200, 1193180}, {250, 1193180}, {500, 1193180}, {1000, 1193180},{0,0}}; void do_it(int hz,int tickrate) { int HZ = hz; int CLOCK_TICK_RATE = tickrate; int tick_nsec = TICK_NSEC; int ticks_per_sec = NSEC_PER_SEC/tick_nsec; int sec_size = ticks_per_sec * tick_nsec; int one_sec_p; int err; if (sec_size < NSEC_PER_SEC) sec_size += tick_nsec; one_sec_p = sec_size; err = one_sec_p - NSEC_PER_SEC; printf( "%4d\t%8d\t%8d\t%10d\t%8d\n",hz, tickrate, tick_nsec, one_sec_p, err); } void bail(void) { printf("run as: as [hz [clock_tick_rate]]\n"); exit(1); } main(int argc, char** argv) { int i = 0; int phz = 0; int pcr = vals[0].clocktickrate; if (argc > 1) { phz = atoi(argv[1]); if (!phz) bail(); } if (argc > 2) { pcr = atoi(argv[2]); if (!pcr) bail(); } printf("HZ \tTICK RATE\tjiffie(ns)\tsecond(ns)\t error (ppbillion)\n"); while(vals[i].hz) { do_it(vals[i].hz, vals[i].clocktickrate); i++; } if (phz) do_it(phz, pcr); }
Re: Build TAGS problem with O=
Cleaned up to be a standard "p 1" patch. Make the comments more concise. make O=/dir TAGS fails with: MAKE TAGS find: security/selinux/include: No such file or directory find: include: No such file or directory find: include/asm-i386: No such file or directory find: include/asm-generic: No such file or directory The problem is in this line: ifeq ($(KBUILD_OUTPUT),) KBUILD_OUTPUT is not defined (ever) after make reruns itself. This line is used in the TAGS, tags, and cscope makes. Here is a fix: Signed-off-by: George Anzinger --- linux-2.6.12-org/Makefile 2005-07-01 14:37:44.0 -0700 +++ linux-2.6.13-rc/Makefile2005-07-05 19:45:00.588314304 -0700 @@ -1149,7 +1149,7 @@ #(which is the most common case IMHO) to avoid unneeded clutter in the big tags file. #Adding $(srctree) adds about 20M on i386 to the size of the output file! -ifeq ($(KBUILD_OUTPUT),) +ifeq ($(src),$(obj)) __srctree = else __srctree = $(srctree)/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Build TAGS problem with O=
George Anzinger wrote: If you try: make O=/usr/src/ver/2.6.13-rc/obj/ -j5 LOCALVERSION=_2.6.13-rc TAGS ARCH=i386 it fails with: MAKE TAGS find: security/selinux/include: No such file or directory find: include: No such file or directory find: include/asm-i386: No such file or directory find: include/asm-generic: No such file or directory The problem seems to be this bit of the topdir Makefile: #We want __srctree to totally vanish out when KBUILD_OUTPUT is not set #(which is the most common case IMHO) to avoid unneeded clutter in the big tags file. #Adding $(srctree) adds about 20M on i386 to the size of the output file! ifeq ($(KBUILD_OUTPUT),) __srctree = else __srctree = $(srctree)/ endif It would appear that the "ifeq ($(KBUILD_OUTPUT),)" is doing the wrong thing. I am not a make expert, but I have had a lot of BAD experience trying to use this construct. Any one up to proposing a fix? The problem appears to be that KBUILD_OUTPUT is NOT defined after make reruns itself. Here is a fix: Signed-off-by: George Anzinger --- /usr/src/linux-2.6.12-org/Makefile 2005-07-01 14:37:44.0 -0700 +++ /usr/src/linux-2.6.13-rc/Makefile 2005-07-05 19:45:00.588314304 -0700 @@ -1149,7 +1149,7 @@ #(which is the most common case IMHO) to avoid unneeded clutter in the big tags file. #Adding $(srctree) adds about 20M on i386 to the size of the output file! -ifeq ($(KBUILD_OUTPUT),) +ifeq ($(src),$(obj)) __srctree = else __srctree = $(srctree)/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Maintainers list update: linux-net -> netdev
Horms wrote: On Tue, Apr 12, 2005 at 12:14:56PM -0700, George Anzinger wrote: Horms wrote: Use netdev as the mailing list contact instead of the mostly dead linux-net list. ~ PHRAM MTD DRIVER @@ -1795,7 +1795,7 @@ POSIX CLOCKS and TIMERS P: George Anzinger M: george@mvista.com -L: linux-net@vger.kernel.org +L: netdev@oss.sgi.com S: Supported I don't really know about the rest of them, but I think this should be: L: linux-kernel@vger.kernel.org Least wise that is where I look... Yes, I was wondering about that one. Here is a patch that adds to my previous patch. Trivial to say the least. I can re-diff the whole thing if that is more convenient. Looks good to me. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Maintainers list update: linux-net -> netdev
Horms wrote: On Sat, Apr 09, 2005 at 03:52:05PM +0200, Jörn Engel wrote: On Fri, 8 April 2005 22:16:07 +0200, Pavel Machek wrote: More importantly, it is still listed as "the list" for network drivers... NETWORK DEVICE DRIVERS P: Andrew Morton M: [EMAIL PROTECTED] P: Jeff Garzik M: [EMAIL PROTECTED] L: linux-net@vger.kernel.org S: Maintained Maybe one of the two maintainers might want to change that? ;) Use netdev as the mailing list contact instead of the mostly dead linux-net list. ~ PHRAM MTD DRIVER @@ -1795,7 +1795,7 @@ POSIX CLOCKS and TIMERS P: George Anzinger M: george@mvista.com -L: linux-net@vger.kernel.org +L: netdev@oss.sgi.com S: Supported I don't really know about the rest of them, but I think this should be: L: linux-kernel@vger.kernel.org Least wise that is where I look... ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clean up FIXME in do_timer_interrupt-lock fix
Andrew Morton wrote: George Anzinger wrote: Did you pick this up? First sent on 3-11. I did, although now looking at it I have issues. I was not happy with the locking on this. Two changes: 1) Turn off irq while setting the clock. 2) Call the timer code only through the timer interface (set a short timer to do it from the ntp call). I wanted the calls to sync_cmos_clock() to be made in a consistent environment. This was not true when calling it directly from the NTP call code. The change means that sync_cmos_clock() is ALWAYS called from run_timers(), i.e. as a timer call back function. I would consider this to be an inadequate description :( Signed-off-by: George Anzinger time.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c === --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no int retval; /* gets recalled with irq locally disabled */ - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_set_rtc_mmss(nowtime); else retval = mach_set_rtc_mmss(nowtime); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } If the comment is correct, and this code is called with local irq's disabled then this patch should be using spin_lock_irqsave() With the change below, it is always called from the timer call back code which, I believe, is always called with irq on. Looks like I missed the comment :( @@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon } void notify_arch_cmos_timer(void) { - sync_cmos_clock(0); + mod_timer(&sync_cmos_timer, jiffies + 1); } static long clock_cmos_diff, sleep_start; Your description says what this does, but it doesn't way why it was done? -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clean up FIXME in do_timer_interrupt-lock fix
Did you pick this up? First sent on 3-11. Andrew Morton wrote: Lee Revell <[EMAIL PROTECTED]> wrote: On Thu, 2005-03-10 at 00:42 -0800, George Anzinger wrote: This patch changes the update of the cmos clock to be timer driven rather than poll driven by the timer interrupt function. If the clock is not being synced to an outside source the timer is removed and thus system overhead is nill in that case. The update frequency is still ~11 minutes and missing the update window still causes a retry in 60 seconds. No replies yet. Are there any objections to this patch? Nope. I think it's neat. I queued it up. I had a nightmare about ntp coming in at the "wrong" time resulting in a deadlock. Attached locking changes will make me sleep better :) -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. Type: Defect Fix Disposition: Pending Description: I was not happy with the locking on this. Two changes: 1) Turn off irq while setting the clock. 2) Call the timer code only through the timer interface (set a short timer to do it from the ntp call). Signed-off-by: George Anzinger time.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c === --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no int retval; /* gets recalled with irq locally disabled */ - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_set_rtc_mmss(nowtime); else retval = mach_set_rtc_mmss(nowtime); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon } void notify_arch_cmos_timer(void) { - sync_cmos_clock(0); + mod_timer(&sync_cmos_timer, jiffies + 1); } static long clock_cmos_diff, sleep_start;
Re: [PATCH 2.6] fix POSIX timers expire before their scheduled time
Liu, Hong wrote: POSIX says: POSIX timers should not expire before their scheduled time. Due to the timer started between jiffies, there are cases that the timer will expire before its scheduled time. This patch ensures timers will not expire early. --- a/kernel/posix-timers.c 2005-03-10 15:46:27.329333664 +0800 +++ b/kernel/posix-timers.c 2005-03-10 15:50:11.884196136 +0800 @@ -957,7 +957,8 @@ &expire_64, &(timr->wall_to_prev))) { return -EINVAL; } - timr->it_timer.expires = (unsigned long)expire_64; + timr->it_timer.expires = (unsigned long)expire_64 + 1; tstojiffie(&new_setting->it_interval, clock->res, &expire_64); timr->it_incr = (unsigned long)expire_64; Has this happened?? The following code (in adjust_abs_time()) is supposed to prevent this sort of thing: if (oc.tv_sec | oc.tv_nsec) { oc.tv_nsec += clock->res; timespec_norm(&oc); } Also, we run rather extensive tests for this sort of thing. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tvtime audio vs pcHDTV-3000 card and pvHDTV-1.6 software
Heavens, no need to clean the tree at all. Just add "-X to your diff. I have attached what I use for . It is likely over kill, but should do... -g Gene Heskett wrote: Greetings; I've spent a goodly part of the last 3 hours rebooting, to find out where this audio control function died, and I think now I can point an accusatory finger at the 2.6.11.2 patch with some degree of certainty. The scenario goes like this: reboot to 2.6.11-rc5, everything works flawlessly except the 1394 stuff, that kernel didn't have it built in yet. reboot to 2.6.11+bk-ieee1394.patch everything works flawlessly reboot to 2.6.11.1+bk-ieee1394.patch everything works flawlessly reboot to 2.6.11.2+bk-ieee1394.patch tvtime has no volume control, and the sound gets very very tinny about 1 second after it starts This scenario continues up to and includeing 2.6.11.4. So now my next question is, how to I clean up those src trees so that a diff actually outputs only the src code differences, thereby allowing a simple diff -urN (or whatever is the recommended command line to do a recursive diff on the whole maryann) to disclose the real diffs. In other words, is a simple 'make clean' sufficient? I got the impression from a comment that was made, that quite a body of work was actually done, in the i2c area, that somehow does not show in the changelog, nor in that simple little 10 line patch that was 2.6.11.2. And how that little patch could be responsible for breaking this boggles what tiny little miniscule piece of a mind I have left at this point. If thats the case, then how did it get into my src code tree since the exact same 2.6.11.tar.gz was used as the base for applying each of the incrementals to each of the src trees I now have sitting in /usr/src? Good question that... Unforch, the 2.6.11 plain tree has not, in this case been built yet as it got accidently nuked by a missfire of my 'buildit26' script, which normally moves a base version tree out of the way before it unpacks a fresh copy, and then renames that tree to be the current version and then restores the base tree to its original name. Thats not the one I want to use as the 'gold standard' anyway. 2.6.11.1 works, and 2.6.11.2 doesn't. So at this point, 2.6.11.1 is the 'gold standard'. But, both the 2.6.11.1 and the 2.6.11.2 trees are as built, and the diff I got was far larger than forgetting to apply the bk-ieee1394.patch to one of them would account for. Many tens of kilobytes in fact. Please throw me a bone here folks. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ *.o *.i .* *.*~ *~ *.rej *.orig *.orig.* #* *# *.ver ETAGS TAGS tags *.map *.s *.a *X *Y *.*X *.*Y SCCS CVS *.*,* dwarf2-defs.h kconfig configs.c defconfig mkdep split-include tkparse vmlinux consolemap_deftbl.c tkparse.c classlist.h crc32table.h devlist.h config autoconf.h compile.h version.h kconfig.tk soundmodem defkeymap.c patest asm boot conmakehash gen-devlist modversions.h elfconfig.h asm_offsets.h *.old cscope.* *.so gen_crc32table docproc fixdep kallsyms mk_elfconfig modpost pnmtologo initramfs_data.* gen_init_cpio
Re: [topic change] jiffies as a time value
john stultz wrote: On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote: john stultz wrote: On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote: + /* finally, update legacy time values */ + write_seqlock_irqsave(&xtime_lock, x_flags); + xtime = ns2timespec(system_time + wall_time_offset); + wall_to_monotonic = ns2timespec(wall_time_offset); + wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec; + wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec; + /* XXX - should jiffies be updated here? */ Excellent question. Indeed. Currently jiffies is used as both a interrupt counter and a time unit, and I'm trying make it just the former. If I emulate it then it stops functioning as a interrupt counter, and if I don't then I'll probably break assumptions about jiffies being a time unit. So I'm not sure which is the easiest path to go until all the users of jiffies are audited for intent. Really? Who counts interrupts??? The timer code treats jiffies as a unit of time. You will need to rewrite that to make it otherwise. Ug. I'm thin on time this week, so I was hoping to save this discussion for later, but I guess we can get into it now. Well, assuming timer interrupts actually occur HZ times a second, yes one could (and current practice, one does) implicitly interpret jiffies as being a valid notion of time. However with SMIs, bad drivers that disable interrupts for too long, and virtualization the reality is that that assumption doesn't hold. We do have the lost-ticks compensation code that tries to help this, but that conflicts with some virtualization implementations. Suspend/resume tries to compensate jiffies for ticks missed over time suspended, but I'm not sure how accurate it really is (additionally, looking at it now, it assumes jiffies is only 32bits). Adding to that, the whole jiffies doesn't really increment at HZ, but ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount of trouble stemming from folks using jiffies as a time value. Because in reality, it is just a interrupt counter. Well, currently, in x86 systems it causes wall clock to advance a very well defined amount. That it is not exactly 1/HZ is something we need to live with... So now, if new timeofday code emulates jiffies, we have to decide if it emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies possibly jittering from it being incremented every tick and then set to the proper time when the timekeeping code runs. I think your overlooking timers. We have a given resolution for timers and some code, at least, expects timers to run with that resolution. This REQUIRES interrupts at resolution frequency. We can argue about what that interrupt event is called (currently a jiffies interrupt) and disparage the fact that hardware can not give us "nice" numbers for the resolution, but we do need the interrupts. That there are bad places in the code where interrupts are delayed is not really important in this discussion. For what it worth, the RT patch Ingo is working on is getting latencies down in the 10s of microseconds region. We also need, IMNSHO to recognize that, at lest with some hardware, that interrupt IS in fact the clock and is the only reasonable way we have of reading it. This is true, for example, on the x86. The TSC we use as a fill in for between interrupts is not stable in the long term and should only be used to interpolate over 1 to 10 ticks or so. I'm not sure which is the best way to go, but it sounds that emulating it is probably the easiest. I just deferred the question with a comment until now because its not completely obvious. Any suggestions on the above questions (I'm guessing the answers are: use ACTHZ, and the jitter won't hurt that bad). But then you have another problem. To correctly function, times need to expire on time (hay how bout that) not some time later. To do this we need an interrupt source. To this point in time, the jiffies interrupt has been the indication that one or more timer may have expired. While we don't need to "count" the interrupts, we DO need them to expire the timers AND they need to be on time. Well, something Nish Aravamudan has been working on is converting the common users of jiffies (drivers) to start using human time units. These very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can then be accurately changed to jiffies (or possibly some other time unit) internally. It would even be possible for soft-timers to expire based upon the actual high-res time value, rather then the low-res tick- counter(which is something else Nish has been playing with). When that occurs we can easily start doing other interesting things that I believe you've already been working on in your HRT code, such as changing the timer interrupt frequency dynamically, or
Re: [RFC][PATCH] new timeofday core subsystem (v. A3)
john stultz wrote: On Mon, 2005-03-14 at 21:37 -0800, Christoph Lameter wrote: Note that similarities exist between the posix clock and the time sources. Will all time sources be exportable as posix clocks? At this point I'm not familiar enough with the posix clocks interface to say, although its probably outside the scope of the initial timeofday rework. I do think we need to consider the needs of that subsystem. Clock wise, it makes a monotonic and a real time clock available to the user. The real time clock is just a timespec version of the timeval gettimeofday clock. At the current time, the monotonic clock is the real time clock plus wall_to_monotonic. All that is rather simple and straight forward, an I don't recommend adding any other clocks unless there is a real need. The interesting thing is that the posix timers are based on the posix clocks which are base on wall_clock, and the jiffies clock which is what runs the timers. In order to make sense of timer requests it is neccessary to, atomically, grab all three clocks (i.e. wall_clock aka gettimeofday, wall_to_monotonic, and jiffies with the jiffies offset). The code can then figure out when a timer needs to expire in jiffies time in order to expire at a given wall or monotonic time. Currently the xtime_time sequence lock is used to do this. Another issue that posix timers brings forward is the need to know when the clock is set. This is needed to cause timers that were requested to expire at some absolute wall_time to do so even if time is set while they are running. A word on how this is done is in order... Since the processing of a clock set by the posix timers code may, in fact, allow the time to be set more than once before the affected timers are adjusted (or rather to avoid the locking rats nest not allowing this would cause), the wall_to_monotonic value is exploited. In particular, a clock setting changes this value by the exact amount that time was adjusted. So, each posix timer carries the value of wall_to_monotonic that was in use when the timer was started. The clock_was_set code uses this to compute the clock movement and thus the adjustment needed to make the timer expire at the right time. What this translates to in the new code is a) the need for a way to atomically get all the key times (wall, monotonic, jiffie) and b) access to a value that will allow it to compute the amount of time a clock set, or a series of clock settings, changed time by. Of course, it also needs the clock_was_set() notify call. Do you have a link that might explain the posix clocks spec and its intent? Well, there is my signature :) Really, on the high-res-timers project site you want to download the support patch. In there, among other things, is a set of man pages on posix clocks & timers. The patch applies to any kernel and just adds a new set of directories off of Documentation. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock error in arch/i386/kernel/time.c on APM resume
Pavel Machek wrote: Hi! I agree. Still in all that follows, no one has addressed the apparent race described above. The reason the system reported the errors that started this thread is that the APM restore code was trying to read the cmos clock (I assume to set the xtime clock) WHILE the timer interrupt code what trying to set the cmos clock from xtime. In other words, it is destroying the time it is trying to read. I repeat "Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep." I am not sure how ntp is supposed to react to the resume but I suspect that the system time is rather out of sync... It needs to work without NTP, too. You don't get NTP on plane (etc) where suspend is most usefull. We have CMOS clock, it should be possible to get time from there without resorting to NTP.. Eh... sure, but... the bug was reported because the system was attempting to update the cmos clock (which it does every ~11 min.) during APM exit. It does this IF AND ONLY IF the system is synced to an external source as indicated by the STA_UNSYNC bit being cleared in the time_state. Now, I don't know what or how APM and NTP are supposed to play together, but I suspect that on entry to APM time is no longer synced, thus my comment. As to your comment, the bug would never have shown its ugly face if the system wasn't using NTP. Uh, ok, you are right. We should set time to STA_UNSYNC so that we do not write back to CMOS during/shortly after resume. I did not realize what STA_UNSYNC means. Perhaps you have patch to do that somewhere? ;- Zwane has convinced me that the real problem is doing the right thing (tm) in the APM code, i.e. not allowing the timer interrupt until after reading the cmos clock, for which he has a patch tendered. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock error in arch/i386/kernel/time.c on APM resume
Pavel Machek wrote: Hi! And more... That this occures implies we are attempting to update the cmos clock on resume seems wrong. One would presume that the time is wrong at this time and we are about to save that wrong time. Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep (or what ever it is called). Who should we ping with this? timer_resume, which appears to be the problem, wants to calculate amount of time was spent suspended, also your unconditional irq enable in get_cmos_time breaks the atomicity of device_power_up and would deadlock in sections of code which call get_time_diff() with xtime_lock held. I sent a patch subject "APM: fix interrupts enabled in device_power_up" which should address this. I agree. Still in all that follows, no one has addressed the apparent race described above. The reason the system reported the errors that started this thread is that the APM restore code was trying to read the cmos clock (I assume to set the xtime clock) WHILE the timer interrupt code what trying to set the cmos clock from xtime. In other words, it is destroying the time it is trying to read. I repeat "Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep." I am not sure how ntp is supposed to react to the resume but I suspect that the system time is rather out of sync... It needs to work without NTP, too. You don't get NTP on plane (etc) where suspend is most usefull. We have CMOS clock, it should be possible to get time from there without resorting to NTP.. Eh... sure, but... the bug was reported because the system was attempting to update the cmos clock (which it does every ~11 min.) during APM exit. It does this IF AND ONLY IF the system is synced to an external source as indicated by the STA_UNSYNC bit being cleared in the time_state. Now, I don't know what or how APM and NTP are supposed to play together, but I suspect that on entry to APM time is no longer synced, thus my comment. As to your comment, the bug would never have shown its ugly face if the system wasn't using NTP. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] new timeofday core subsystem (v. A3)
john stultz wrote: On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote: ~ + /* finally, update legacy time values */ + write_seqlock_irqsave(&xtime_lock, x_flags); + xtime = ns2timespec(system_time + wall_time_offset); + wall_to_monotonic = ns2timespec(wall_time_offset); + wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec; + wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec; + /* XXX - should jiffies be updated here? */ Excellent question. Indeed. Currently jiffies is used as both a interrupt counter and a time unit, and I'm trying make it just the former. If I emulate it then it stops functioning as a interrupt counter, and if I don't then I'll probably break assumptions about jiffies being a time unit. So I'm not sure which is the easiest path to go until all the users of jiffies are audited for intent. Really? Who counts interrupts??? The timer code treats jiffies as a unit of time. You will need to rewrite that to make it otherwise. But then you have another problem. To correctly function, times need to expire on time (hay how bout that) not some time later. To do this we need an interrupt source. To this point in time, the jiffies interrupt has been the indication that one or more timer may have expired. While we don't need to "count" the interrupts, we DO need them to expire the timers AND they need to be on time. ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock error in arch/i386/kernel/time.c on APM resume
Zwane Mwaikambo wrote: On Sat, 12 Mar 2005, George Anzinger wrote: I agree. Still in all that follows, no one has addressed the apparent race described above. The reason the system reported the errors that started this thread is that the APM restore code was trying to read the cmos clock (I assume to set the xtime clock) WHILE the timer interrupt code what trying to set the cmos clock from xtime. Doesn't my reply explain the actual problem? The code path being; Sorry, I just didn't look at the apm code. My bad. -g arch/i386/kernel/apm.c suspend() write_seqlock_irq(xtime_lock) ... write_sequnlock_irq(xtime_lock) device_power_up() timer_resume() get_cmos_time(); S So this covers the problem that the reporter reported, so yes it's setting xtime but we shouldn't be taking interrupts in the first place, so i posted the patch to cover that. APM was clearly violating PM resume procedures. Thanks, Zwane -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock error in arch/i386/kernel/time.c on APM resume
Zwane Mwaikambo wrote: On Sat, 12 Mar 2005, George Anzinger wrote: Looks like we need the irq on the read clock also. This is true both before and after the prior cmos_time changes. The attached replaces the patch I sent yesterday. For those wanting to fix the kernel with out those patches, all that is needed its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks. And more... That this occures implies we are attempting to update the cmos clock on resume seems wrong. One would presume that the time is wrong at this time and we are about to save that wrong time. Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep (or what ever it is called). Who should we ping with this? timer_resume, which appears to be the problem, wants to calculate amount of time was spent suspended, also your unconditional irq enable in get_cmos_time breaks the atomicity of device_power_up and would deadlock in sections of code which call get_time_diff() with xtime_lock held. I sent a patch subject "APM: fix interrupts enabled in device_power_up" which should address this. I agree. Still in all that follows, no one has addressed the apparent race described above. The reason the system reported the errors that started this thread is that the APM restore code was trying to read the cmos clock (I assume to set the xtime clock) WHILE the timer interrupt code what trying to set the cmos clock from xtime. In other words, it is destroying the time it is trying to read. I repeat "Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep." I am not sure how ntp is supposed to react to the resume but I suspect that the system time is rather out of sync... -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock error in arch/i386/kernel/time.c on APM resume
J. Bruce Fields wrote: On APM resume this morning on my Thinkpad X31, I got a "spin_lock is already locked" error; see below. This doesn't happen on every resume, though it's happened before. The kernel is 2.6.11 plus a bunch of (hopefully unrelated...) NFS patches. Any ideas? Yesterday's night mare, todays bug :( Looks like we need the irq on the read clock also. This is true both before and after the prior cmos_time changes. Andrew, The attached replaces the patch I sent yesterday. For those wanting to fix the kernel with out those patches, all that is needed its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks. And more... That this occures implies we are attempting to update the cmos clock on resume seems wrong. One would presume that the time is wrong at this time and we are about to save that wrong time. Possibly the APM code should change time_status to STA_UNSYNC on the way into the sleep (or what ever it is called). Who should we ping with this? ~ Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device :00:1f.5 to 64 Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by arch/i386/kernel/time.c/309 Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked ~ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. Type: Defect Fix Disposition: Pending Description: I was not happy with the locking on this. Two changes: 1) Turn off irq while setting the clock. 2) Call the timer code only through the timer interface (set a short timer to do it from the ntp call). Signed-off-by: George Anzinger time.c | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c === --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no int retval; /* gets recalled with irq locally disabled */ - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_set_rtc_mmss(nowtime); else retval = mach_set_rtc_mmss(nowtime); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -282,14 +282,14 @@ unsigned long get_cmos_time(void) { unsigned long retval; - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_get_time(); else retval = mach_get_cmos_time(); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon } void notify_arch_cmos_timer(void) { - sync_cmos_clock(0); + mod_timer(&sync_cmos_timer, jiffies + 1); } static long clock_cmos_diff, sleep_start;
Re: [PATCH] clean up FIXME in do_timer_interrupt
Andrew Morton wrote: Lee Revell <[EMAIL PROTECTED]> wrote: On Thu, 2005-03-10 at 00:42 -0800, George Anzinger wrote: This patch changes the update of the cmos clock to be timer driven rather than poll driven by the timer interrupt function. If the clock is not being synced to an outside source the timer is removed and thus system overhead is nill in that case. The update frequency is still ~11 minutes and missing the update window still causes a retry in 60 seconds. No replies yet. Are there any objections to this patch? Nope. I think it's neat. I queued it up. I had a nightmare about ntp coming in at the "wrong" time resulting in a deadlock. Attached locking changes will make me sleep better :) -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. Type: Defect Fix Disposition: Pending Description: I was not happy with the locking on this. Two changes: 1) Turn off irq while setting the clock. 2) Call the timer code only through the timer interface (set a short timer to do it from the ntp call). Signed-off-by: George Anzinger time.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c === --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no int retval; /* gets recalled with irq locally disabled */ - spin_lock(&rtc_lock); + spin_lock_irq(&rtc_lock); if (efi_enabled) retval = efi_set_rtc_mmss(nowtime); else retval = mach_set_rtc_mmss(nowtime); - spin_unlock(&rtc_lock); + spin_unlock_irq(&rtc_lock); return retval; } @@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon } void notify_arch_cmos_timer(void) { - sync_cmos_clock(0); + mod_timer(&sync_cmos_timer, jiffies + 1); } static long clock_cmos_diff, sleep_start;
Re: [PATCH] more reliable system timer for SC1100 CPU
Ted Phelps wrote: First, procedure... patches should be *.patch and not compressed. If too long they need to be broken up. Lately, folks have said they should be inline in the email text, but watch out for your mailer doing UGLY things with white space. Hello, The attached patch is an attempt to work around the buggy timestamp counter on the NatSemi SC1100 CPU by using the on-board 27MHz high-resolution timer as an alternative time source. It should, in theory, work with any of the SCx200 CPUs as well, though I have been unable to test this. I have tested it fairly thoroughly with NTP on an SC1100 and it seems to behave sanely. That said, there are three things about it that I'm not entirely comfortable with: (1) The high-resolution timer is driven by a separate crystal than the CPU's timer interrupt, and on the SC1100 I have access to, it's consistently slower. I've found that it is necessary to periodically *decrement* the jiffies_64 counter in mark_offset in order to make gettimeofday produce anything reasonable. In practice jiffies_64 is incremented again in do_timer before anything else reads it, so the net effect is minimal. I don't think this is what your seeing. As I read the code, if an interrupt gets delayed and the next one is not, you will determine that you should decrement jiffies. Interrupts DO get delayed. This counter is only being used to cover the jiffie to jiffie time. I suspect that any systemic errors such as different rocks are not really important (but drift needs to be accounted for, see below). The better thing to do here is to figure some arbitrary start time when a jiffies edge is "close" to the actually interrupt time and use the counter time at that time as the "base" time. Each jiffie you then bump this by the counts per jiffie. (By the way, this should be calculated using TICK_NSEC (nsecs per tick) and NOT HZ. TICK_NSEC accounts for the fact that the PIT does not produce exactly 1/HZ ticks.) In addition to this, at each interrupt, to account for drift, I have been using code that, on each interrupt, checks if it is early (i.e.: base + ticks_per_jiffy > now) if so adjust base to make it on time. If it is late, I keep the minimum amount it is late for several ticks and then adjust base to make it on time. This ends up making small changes in "base" to account for any drift. It also ends up ignoring occasional late times caused by normal interrupt latency. If it is late by over a tick, jiffies is adjusted for the lost tick. (All this code is in the high-res-timers patch, see signature.) Do note this assumes (and IMHO rightly so) that the PIT is the system time gold standard. George (2) The 27MHz timer is accessed via the PCI bus, which is not available when the system clock is initialized. To work around this, I've written the init function to always fail so that loops_per_jiffy is computed using another timer (the TSC in my case). Once the high-resolution timer is accessible, the kernel will switch to using it to compute gettimeofday and the monotonic clock, but still use the original timer's delay function. This is somewhat kludgy, but I can't see a cleaner way. (3) The timer depends on CONFIG_SCx200, which appears later in the configuration hierarchy to the timers, and in an entirely different part. For now I've kept its Kconfig with the other timers, but I'm not entirely happy with this choice. The patch is against linux-2.6.11-mm2 as it relies on the 'determine-scx200-cb-address-at-run-time.patch' patch which has not made it into in the mainline. Please CC me if you reply as I'm not subscribed to LKML. Cheers, -Ted -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clean up FIXME in do_timer_interrupt
Ok, here is a patch. See what you think. This patch assumes that Lee's patch has been merged (although it eliminates all of it). George George Anzinger wrote: Lee Revell wrote: On Fri, 2005-03-04 at 12:58 -0800, George Anzinger wrote: Lee Revell wrote: On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote: The thing that brought this code to my attention is that with PREEMPT_RT this happens to be the longest non-preemptible code path in the kernel. On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with the rest of timer irq we end up disabling preemption for about 90 usecs. Unfortunately I don't have the trace anymore. Anyway the upshot is if we hung this off a timer it looks like we would improve the worst case latency with PREEMPT_RT by almost 50%. Unless there is some reason it has to be done synchronously of course. Well, it does have to be done at the right WRT the second, but I suspect we can hit that as well with a timer as it is hit now. Also, if we are _really_ off the mark, this can be defered till the next second. Do you have a patch? Not at the moment, but I will work one up. Andrew merged my trivial patch to clean up the logic, but a real fix would be better. Lee -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Enhancement Disposition: pending Description: This patch changes the update of the cmos clock to be timer driven rather than poll driven by the timer interrupt function. If the clock is not being synced to an outside source the timer is removed and thus system overhead is nill in that case. The update frequency is still ~11 minutes and missing the update window still causes a retry in 60 seconds. signed off by George Anzinger george@mvista.com arch/i386/kernel/time.c | 67 +--- kernel/time.c |9 ++ 2 files changed, 56 insertions(+), 20 deletions(-) Index: linux-2.6.12-rc/arch/i386/kernel/time.c === --- linux-2.6.12-rc.orig/arch/i386/kernel/time.c +++ linux-2.6.12-rc/arch/i386/kernel/time.c @@ -186,8 +186,6 @@ static int set_rtc_mmss(unsigned long no return retval; } -/* last time the cmos clock got updated */ -static long last_rtc_update; int timer_ack; @@ -239,24 +237,6 @@ static inline void do_timer_interrupt(in do_timer_interrupt_hook(regs); - /* -* If we have an externally synchronized Linux clock, then update -* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be -* called as close as possible to 500 ms before the new second starts. -*/ - if ((time_status & STA_UNSYNC) == 0 && - xtime.tv_sec > last_rtc_update + 660 && - (xtime.tv_nsec / 1000) - >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && - (xtime.tv_nsec / 1000) - <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { - last_rtc_update = xtime.tv_sec; - if (efi_enabled) { - if (efi_set_rtc_mmss(xtime.tv_sec)) - last_rtc_update -= 600; - } else if (set_rtc_mmss(xtime.tv_sec)) - last_rtc_update -= 600; - } if (MCA_bus) { /* The PS/2 uses level-triggered interrupts. You can't @@ -313,7 +293,54 @@ unsigned long get_cmos_time(void) return retval; } +static void sync_cmos_clock(unsigned long dummy); +static struct timer_list sync_cmos_timer = + TIMER_INITIALIZER(sync_cmos_clock, 0, 0); + +static void sync_cmos_clock(unsigned long dummy) +{ + struct timeval now, next; + int fail = 1; + /* +* If we have an externally synchronized Linux clock, then update +* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be +* called as close as possible to 500 ms before the new second starts. +* This code is run on a timer. If the clock is set, that timer +* may not expire at the correct time. Thus, we adjust... +*/ + if ((time_status & STA_UNSYNC) != 0) + /* +* Not synced, exit, do not restart a timer (if one is +* running, let it run out). +*/ + return; + + do_gettimeofday(&now); + if (now.tv_usec >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && + now.tv_usec <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { + fail = set_rtc_mmss(now.tv_sec); + } + next.tv_usec = USEC_AFTER - now.tv_usec; + if (next.tv_usec <= 0) + next.tv_usec += USEC_PER_SEC; + if (!fail) { + next.tv_sec = 659; +
Re: [PATCH] clean up FIXME in do_timer_interrupt
Lee Revell wrote: On Fri, 2005-03-04 at 12:58 -0800, George Anzinger wrote: Lee Revell wrote: On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote: The thing that brought this code to my attention is that with PREEMPT_RT this happens to be the longest non-preemptible code path in the kernel. On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with the rest of timer irq we end up disabling preemption for about 90 usecs. Unfortunately I don't have the trace anymore. Anyway the upshot is if we hung this off a timer it looks like we would improve the worst case latency with PREEMPT_RT by almost 50%. Unless there is some reason it has to be done synchronously of course. Well, it does have to be done at the right WRT the second, but I suspect we can hit that as well with a timer as it is hit now. Also, if we are _really_ off the mark, this can be defered till the next second. Do you have a patch? Not at the moment, but I will work one up. Andrew merged my trivial patch to clean up the logic, but a real fix would be better. Lee -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clean up FIXME in do_timer_interrupt
Lee Revell wrote: On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote: Lee Revell wrote: On Thu, 2005-03-03 at 16:45 -0800, Andrew Morton wrote: If efi_enabled is true and efi_set_rtc_mmss(xtime.tv_sec) returns zero, the new code will run set_rtc_mmss(xtime.tv_sec) whereas the old code won't. Argh, I should know better then to send patches before having coffee. Here's a new patch. Still ugly, but might be a worthwhile cleanup. Lets ask the obvious question: Why isn't this update hung on a timer? It seems silly to check this 6000 times per update. I am sure we can sync a timer to the same degree we do timer interrupts, so there _must_ be some other reason. Right? Thanks George, I knew there was an obvious question here, I just didn't know what it was ;-). The thing that brought this code to my attention is that with PREEMPT_RT this happens to be the longest non-preemptible code path in the kernel. On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with the rest of timer irq we end up disabling preemption for about 90 usecs. Unfortunately I don't have the trace anymore. Anyway the upshot is if we hung this off a timer it looks like we would improve the worst case latency with PREEMPT_RT by almost 50%. Unless there is some reason it has to be done synchronously of course. Well, it does have to be done at the right WRT the second, but I suspect we can hit that as well with a timer as it is hit now. Also, if we are _really_ off the mark, this can be defered till the next second. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, deactivate() scheduling issue
Eugeny S. Mints wrote: Esben Nielsen wrote: As I read the code the driver task (A) should _not_ be removed from the runqueue. It has to be waken up to call schedule_timeout() such it gets back on the runqueue after 10 ms. If it is taken out of the runqueue at line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE state! Exactly. This is definilty the bug in the driver code - a developer just didn;t care about proper utilization of set_current_state(). The driver works just because as you have described - his fortune that scheduler doesn't remove task in not TASK_RUNNING state from a run queue. And my main question was - does everybody think it's ok have task in not TASK_RUNNING state in run queue. My current feeling is that this should not be allowed. This is the normal and specified way to handle this sort of thing. There is a race issue that coding in this way avoids. The coding sequence is: a) set the task state to some state other than TASK_RUNNING. b) do what ever triggers the wake up. This may be several things, for example, an interrupt from some device OR a timeout. c) call schedule to wait. The race is getting to the schedule call before the wake up happens. If, for some reason, the wake up condition happens prior to the schedule call, it will set the task state back to TASK_RUNNING so that when the schedule() call is made the scheduler will just return which is the right thing (tm) to do as the condition being waited on has happened. We also note that disabling interrupts or preemption will NOT avoid the race unless you disable interrupts on ALL cpus, which is a VERY expensive cross cpu call. As I read the use PREEMPT_ACTIVE, it is there to test on whether this rescheduling is voluntary or forced (a preemption). If it is forced the task shall of course not go off the runqueue but stay there to run again when it gets the highest priority. That is why PREEMPT_ACTIVE is set in preempt_schedule() and preempt_schedule_irq(). On the other hand if the task itself has called schedule() or schedule_timeout() it has to go out of the runqueue and wait for some event to wake it up. You right - it works perfectly - but not for my test case - I believe task in not TASK_RUNNING state should be removed from a run queue by the first (any - voluntary or forced) execution of the schedule() which detects the task state is not TASK_RUNNIG. This would cause the task to loose control prior to its setting up the needed wakeup events. Yes there will be tasks in state other that TASK_RUNNING on the runqueue. The "bug" as I see it is in the scheduler interface: There is no way to set the task state and call schedule() or schedule_timeout() atomicly. Therefore you can be preempted while the state is not TASK_RUNNING. Exactly. IMO this interface is weird and needs rework. I don;t understand what the reason to set task state before schedule_timeout() call but not inside, right before the schedule(). The actual task state may be passed as a parameter. You are assuming that the task ONLY wants to do a timeout. Most of the time the timeout indicates an error condition. The timeout bounds the wait for what is really desired, i.e. a device interrupt, some other task signaling, or some such. Surly this is covered in the various driver writing guides... -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] clean up FIXME in do_timer_interrupt
Lee Revell wrote: On Thu, 2005-03-03 at 16:45 -0800, Andrew Morton wrote: If efi_enabled is true and efi_set_rtc_mmss(xtime.tv_sec) returns zero, the new code will run set_rtc_mmss(xtime.tv_sec) whereas the old code won't. Argh, I should know better then to send patches before having coffee. Here's a new patch. Still ugly, but might be a worthwhile cleanup. Lets ask the obvious question: Why isn't this update hung on a timer? It seems silly to check this 6000 times per update. I am sure we can sync a timer to the same degree we do timer interrupts, so there _must_ be some other reason. Right? George Lee --- linux-2.6.11-rc4-V0.7.39-02/arch/i386/kernel/time.c 2005-02-14 18:10:49.0 -0500 +++ linux-2.6.11-rc4/arch/i386/kernel/time.c 2005-03-03 20:15:39.0 -0500 @@ -254,16 +254,12 @@ >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 && (xtime.tv_nsec / 1000) <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) { - /* horrible...FIXME */ + last_rtc_update = xtime.tv_sec; if (efi_enabled) { - if (efi_set_rtc_mmss(xtime.tv_sec) == 0) -last_rtc_update = xtime.tv_sec; - else -last_rtc_update = xtime.tv_sec - 600; - } else if (set_rtc_mmss(xtime.tv_sec) == 0) - last_rtc_update = xtime.tv_sec; - else - last_rtc_update = xtime.tv_sec - 600; /* do it again in 60 s */ + if (efi_set_rtc_mmss(xtime.tv_sec)) + last_rtc_update -= 600; + } else if (set_rtc_mmss(xtime.tv_sec)) + last_rtc_update -= 600; } if (MCA_bus) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: realtime patch
Fabian Fenaut wrote: shabanip a ecrit le 25.02.2005 00:37: where can i find realtime patchs to kernel 2.6? http://sourceforge.net/projects/realtime-lsm/ ? What?? NO, they are here: http://redhat.com/~mingo/realtime-preempt/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Needed faster implementation of do_gettimeofday()
Puneet Kaushik wrote: Hello Parag and George, Thanks for immediate reply. The main problem is I am working on a SMP system. I have written a small program that just calls the gettimeofday(), one billion times. I have run it with time utility and it takes almost double time on SMP then a UP. with kernel 2.6.10 on UP real4m5.495s user1m17.088s sys 2m48.046s With Kernel 2.6.10 on SMP real6m24.485s user1m43.723s sys 4m30.749s And the fact is this SMP machine is faster and with more memory than the UP one. In SMP systems it make a spinlock every time it got called, synchronizes both the processors, and unlock them. Thats all I know about it. On 2.6 the lock is a r/w sequence lock. The machines are not synchronized or locked, but some of the sequence lock instructions around the locking are "locked". I find it hard to believe that this would double the time, however. Ah..., now I remember. On SMP x86 boxen, the accounting/ run_timer interrupt comes from the lapic timer. This is triggered at a 1/HZ rate and means that there is an additional time keeping interrupt. Actually, over the box, you get (N+1)/HZ interrupts where N is the number of cpus. Assuming that the PIT and the lapic interrupt take about the same amount of time and that the PIT interrupt is evenly distributed on the CPUs, the interrupt contention should go from 1 to 1.5. This alone would take your 4.084 sec UP time to 6.125 sec on an SMP boxen (that is amazingly close to what you are seeing if you ask me). Again, I recommend my HRT patch. There the accounting interrupt is generated by an "all-but-self" IPI. This is generated by the PIT interrupt code which also does the accounting on the cpu handling the PIT interrupt. Result: total time keeping interrupts N/HZ where N is the number of CPUs. George I am just working on your suggestion, let me know if it will work for SMPs. See above. Should solve your problem. If there is some good implementation for SMP, please let me know. Thanks, - Puneet On Tue, 2005-02-22 at 08:36, George Anzinger wrote: Parag Warudkar wrote: On Sunday 20 February 2005 05:58 am, [EMAIL PROTECTED] wrote: 9859138.6083 vmlinux mark_offset_tsc 5844735.1032 libc-2.3.2.sogetc What makes you think mark_offset_tsc is slow? Do you have any comparative numbers? It might just be that the workload you are throwing at it justifies it. (For e.g. if your workload does a zillion system calls, system_call will show up as a hot spot in oprofile - doesn't necessarily mean it is slow - it's just overused.) Can you post the relevant code? He really is right. Mark offset is reading the PIT counter and that is not only rather dumb but dog slow. A suggestion, try the high res timers patch. Even if you don't use the timers the mark offset there is MUCH faster. It does not read the PIT. The difference is where we assume the jiffie bump is in time. If we assume it is at the point that the PIT interrupts, well then the only way to get to that is to read the PIT. If, on the other hand, we assume it is at the time after the interrrupt where we mark offset, we can observe the "best" time for this event based on the TSC and avoid reading the PIT. Try the HRT patch (see signature below) and see if if doesn't do better. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Needed faster implementation of do_gettimeofday()
Parag Warudkar wrote: On Sunday 20 February 2005 05:58 am, [EMAIL PROTECTED] wrote: 9859138.6083 vmlinux mark_offset_tsc 5844735.1032 libc-2.3.2.sogetc What makes you think mark_offset_tsc is slow? Do you have any comparative numbers? It might just be that the workload you are throwing at it justifies it. (For e.g. if your workload does a zillion system calls, system_call will show up as a hot spot in oprofile - doesn't necessarily mean it is slow - it's just overused.) Can you post the relevant code? He really is right. Mark offset is reading the PIT counter and that is not only rather dumb but dog slow. A suggestion, try the high res timers patch. Even if you don't use the timers the mark offset there is MUCH faster. It does not read the PIT. The difference is where we assume the jiffie bump is in time. If we assume it is at the point that the PIT interrupts, well then the only way to get to that is to read the PIT. If, on the other hand, we assume it is at the time after the interrrupt where we mark offset, we can observe the "best" time for this event based on the TSC and avoid reading the PIT. Try the HRT patch (see signature below) and see if if doesn't do better. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: queue_work from interrupt Real time preemption2.6.11-rc2-RT-V0.7.37-03
David S. Miller wrote: On Wed, 16 Feb 2005 06:16:45 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: Maybe the networking stack would break if we allowed the TIMER softirq (thread) to preempt the NET softirq (threads) (and vice versa)? The major assumption is that softirq's run indivisibly per-cpu. Otherwise the per-cpu queues of RX and TX packet work would get corrupted. For what its worth, I, a short while ago, put together a workqueue package to a) allow easy priority setting for work queues and b) change either softirq, tasklet or bh code to use workqueues. This was done mostly with CPP macros and a few conversion routines. I then converted the network code to use this package simply by adding a key include to a couple of files. The result worked on UP but ended up hanging the network code on SMP. Everything else still worked, but not the net stuff. I never ran down the problem as the "boss" was not interested in SMP... George -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Sven Dietrich wrote: Hi George, you may want to use this for reference. This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. I'm not totally happy with the #ifdefs, but it may make witching back and forth easier. Thanks, but... You are addressing a different problem than I. I want to code the VST patch to work in a system with or without the RT patch (it is easy to work with the RT option on or off). The problem is setting up the spin locks it needs. My solution assumes that RAW_SPIN_LOCK_UNLOCKED will not be defined unless the RT patch is applied. As to your patch, in most archs the timer interrupt does accounting which requires input on just who was interrupted on the interrupt. This is lost when threading the timer IRQ. I think it was problems of this sort that caused Ingo to back away... George PS By the way, your mailer (Microsoft Outlook) set up your attachment in such a way that my mailer would not inline it. You might want to look into this. Sven -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of George Anzinger Sent: Thursday, February 10, 2005 12:21 PM To: Ingo Molnar Cc: William Weston; linux-kernel@vger.kernel.org Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 If I want to write a patch that will work with or without the RT patch applied is the following enough? #ifndef RAW_SPIN_LOCK_UNLOCKED typedef raw_spinlock_t spinlock_t #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED #endif -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
I am seeing: kernel/built-in.o(.text+0x4974): In function `copy_mm': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/fork.c:493: undefined reference to `__spin_is_locked' kernel/built-in.o(.text+0x9f5a): In function `next_thread': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/exit.c:877: undefined reference to `__raw_rwlock_is_locked' net/built-in.o(.text+0x1258): In function `__sock_create': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/socket.c:175: undefined reference to `__spin_is_locked' net/built-in.o(.text+0x16b54): In function `dev_deactivate': /usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/sched/sch_generic.c:594: undefined reference to `__spin_is_locked' make[1]: *** [vmlinux] Error 1 make: *** [bzImage] Error 2 Possibly from: define __raw_spin_is_locked(x) (*(volatile signed char *)(&(x)->lock) <= 0) #define __raw_spin_unlock_wait(x) \ do { barrier(); } while(__spin_is_locked(x)) in asm/spinlock.h should that be __raw_spin_is_locked(x) instead? -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
If I want to write a patch that will work with or without the RT patch applied is the following enough? #ifndef RAW_SPIN_LOCK_UNLOCKED typedef raw_spinlock_t spinlock_t #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED #endif -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Dynamic tick, version 050127-1
Pavel Machek wrote: Hi! I do have CONFIG_X86_PM_TIMER enabled, but it seems by board does not have such piece of hardware: [EMAIL PROTECTED]:/usr/src/linux-mm$ dmesg | grep -i "time\|tick\|apic" PCI: Setting latency timer of device :00:11.5 to 64 [EMAIL PROTECTED]:/usr/src/linux-mm$ If you are sure that machine supports ACPI, maybe this is your problem (from the POSIX high res timer patch): If you enable the ACPI pm timer and it cannot be found, it is possible that your BIOS is not producing the ACPI table or that your machine does not support ACPI. In the former case, see "Default ACPI pm timer address". If the timer is not found the boot will fail when trying to calibrate the 'delay' loop. Well, but how do I get the address? I'll try looking at BIOS options... Pavel In my machine, if I turned off the PM code (in the BIOS) (or possibly turning on the ACPI, again in the BIOS) it did produce the address. Booting then would put that address in the dmesg file. You can then change the BIOS back to what it was and use the address found in the dmesg file. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: High resolution timers and BH processing on -RT
Ingo Molnar wrote: * Thomas Gleixner <[EMAIL PROTECTED]> wrote: or is it that we have a 'group' of normal timers expiring, which, if they happen to occur _just_ prior a HRT event will generate a larger delay? Yep. The timers expire at random times. So it's likely to have short sequences of timer interrupts going off. This needs reprogramming of the PIT and processing of the expired timers. If you can use a machine that has a local apic we can leave the PIT out of it. Really this is MUCH preferred. If your box has a LAPIC, make sure it is not disabled by your config setup. Leaving the PIT out of it, the structure is that HRT timers are put in the normal timer list and, when they expire, are moved to a HRT list which only contains timers that will expire prior to the next jiffie. This list is managed by interrupt, ideally from the LAPIC, or the PIT is need be. Aside from the PIT reprograming (once per HRT timer plus once to get back to the 1/HZ period), there can be delays in getting the timer out of the normal timer list. The main thing here is that the list MUST be processed as close to the jiffie edge as possible as any timers due shortly after the jiffie edge will be shadowed by this regardless of the HRT interrupt. Of course, it an expired timer is presented to the HRT code by the normal timer expire code, it is expired immeadiatly. A quick comment here on the current RT code. It looks to me like there is a race in timer delivery. It looks like the softirq is "raised" by the PIT interrupt code and the jiffie is bumped by the timer thread. If the softirq gets to run prior to the PIT interrupt thread we could end up in the run_timer list code with a stale jiffie value and do nothing. This would delay normal timers for a jiffie and HRT timers for some time less than a jiffie, depending on when they were really due. I thing we should move the raising of the timer softirq to the PIT interrupt thread after we release the xtime_lock. i dont really like the static splitup of 'normal' vs. 'HRT' timers - there might in fact be separate priority requirements between HRT timers too. Yes, and high priority tasks might want low res timers... i think one possible solution would be to introduce some notion of 'timer priority', and to expire each timer priority level in a separate timer expiry thread. Priority 0 (lowest) would be expired in ksoftirqd, and there would be 3 separate threads for say priorities 1-3. Or something like this. Potentially exposed to user-space as well, via new APIs. Hm? To push this even further: in theory timers could inherit the priority of the task that starts them, and they would be expired in that priority order - but this probably needs a pretty clever (and most likely complex) data-structure ... A long time ago in another land, I did such a system. The timer priority was taken from the calling task. At that time (and now, till convinced otherwise) I thought it a _good thing_ to expire timers in order, regardless of their priority, so all timers pending delivery were delivered at the priority of the highest priority timer in the "batch". The basic idea was that the interrupt code pulled expired timers from the timer list and pushed them into the pending list. In the process it found the highest priority timer in the list. The timer delivery thread was then run at that priority. This thread adjusted its priority downward as needed, but in all cases the timers were delivered in strict time order. Since then, as now, the timer delivery usually just _notified_ a task of a pending signal, the low priority timers did not really hold up things for long. Once the high priority timer was delivered and the thread either finished or dropped its priority, the waiting task (having been wakened by the signal delivery) could switch in. The primary thing needed for this is a simple and quick way to switch a tasks priority, both from outside and from the task itself. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] to fix xtime lock for in the RT kernel patch
George Anzinger wrote: Ingo Molnar wrote: * George Anzinger wrote: What I am suggesting is spliting the mark code so that it would only grap the offset (current TSC in most systems) during interrupt processing. Applying this would be done later in the thread. Since it is not applying the offset, the xtime_lock would not need to be taken. ok, you are right, and this would be fine with me. Wanna take a shot at it? I've uploaded the -03 patch which is my most current tree. (with the do_timer() moving done already.) I've reviewed the TSC offset codepath again and i'm not sure where i got the 10 usecs from ... it's a pretty cheap codepath that can be done in the direct interrupt just fine. Tomorrow, uh, later today. Need some sleep now... Ingo, I have been looking at the code being proposed by John Stultz. It looks like it handles all the issues I am talking about here. I think it would be best to leave the RT patch as it is WRT this issue and work on getting John's patch ready for prime time as any work I would do here will just get tossed when his patch hits the steet. Meanwhile, I will (already have) get HRT working on RT and make that available in the next few days. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/7] posix-timers: CPU clock support for POSIX timers
.clock = TIMER_OFF; + timr->it.mmtimer.expires = 0; spin_unlock_irqrestore(&t->lock, irqflags); } return 0; @@ -558,7 +558,7 @@ static int sgi_timer_del(struct k_itimer static void sgi_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) { - if (timr->it_timer.magic == TIMER_OFF) { + if (timr->it.mmtimer.clock == TIMER_OFF) { cur_setting->it_interval.tv_nsec = 0; cur_setting->it_interval.tv_sec = 0; cur_setting->it_value.tv_nsec = 0; @@ -566,8 +566,8 @@ static void sgi_timer_get(struct k_itime return; } - ns_to_timespec(cur_setting->it_interval, timr->it_incr * sgi_clock_period); - ns_to_timespec(cur_setting->it_value, (timr->it_timer.expires - rtc_time())* sgi_clock_period); + ns_to_timespec(cur_setting->it_interval, timr->it.mmtimer.incr * sgi_clock_period); + ns_to_timespec(cur_setting->it_value, (timr->it.mmtimer.expires - rtc_time())* sgi_clock_period); return; } @@ -640,19 +640,19 @@ retry: base[i].timer = timr; base[i].cpu = smp_processor_id(); - timr->it_timer.magic = i; - timr->it_timer.data = nodeid; - timr->it_incr = period; - timr->it_timer.expires = when; + timr->it.mmtimer.clock = i; + timr->it.mmtimer.node = nodeid; + timr->it.mmtimer.incr = period; + timr->it.mmtimer.expires = when; if (period == 0) { if (mmtimer_setup(i, when)) { mmtimer_disable_int(-1, i); posix_timer_event(timr, 0); - timr->it_timer.expires = 0; + timr->it.mmtimer.expires = 0; } } else { - timr->it_timer.expires -= period; + timr->it.mmtimer.expires -= period; if (reschedule_periodic_timer(base+i)) err = -EINVAL; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/