Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: > Jan Kiszka wrote: >> George Anzinger wrote: >>> On 01/30/2008 04:08 PM, Jan Kiszka was caught saying: >>>> [Here comes a rebased version against latest x86/mm] >>>> >>>> In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up >>>> and connect to the front-end already during early_param evaluation. >>>> This >>>> fails on x86 as the exception stack is not yet initialized, effectively >>>> delaying kgdbwait until late-init. >>> >>> I wonder how much work it would take to just set up the exception >>> stack and proceed. After all the kgbdwait is there to help debug >>> very early kernel code... >> >> In principle a valid question, but I'm not the one to answer it. I >> would not feel very well if I had to reorder this critical setup code. >> Look, we would have to move trap_init in start_kernel before >> parse_early_param, and that would affect _every_ arch... I can not speak to other archs, but for x86 I called trap_init from the code that caught the kgdbwait. At that time (since I retired, I have not looked at the actual kernel code) it could be called again later by the kernel code. I.e. I did not try to reorder the kernel bring up code, but just added an additional call to trap_init and then only in the case of finding a kgdbwait. As such, this would need to be arch specific... >> > > BTW, do you know if EXCEPTION_STACK_READY fails for other archs in > parse_early_param as well? It should, because my under standing of > trap_init is that it's the functions to arm things like... exception > handlers? And that raises the question of the deeper purpose of this > check (and the invocation of kgdb_early_init from the argument parsing > function). Sigh, KGDB is still a quite improvable piece of code. Likely. Once you get it in the main line kernel, one would hope that other arch code would be forth coming as many more "eyes" will be in play. > > Jan > > PS: Can we move this to some public list? Sure, sorry I picked the wrong reply button, never intended it to be private. > -- George Anzinger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init
On 01/31/2008 01:36 AM, Jan Kiszka was caught saying: Jan Kiszka wrote: George Anzinger wrote: On 01/30/2008 04:08 PM, Jan Kiszka was caught saying: [Here comes a rebased version against latest x86/mm] In case kgdbwait is passed as kernel parameter, KGDB tries to set up and connect to the front-end already during early_param evaluation. This fails on x86 as the exception stack is not yet initialized, effectively delaying kgdbwait until late-init. I wonder how much work it would take to just set up the exception stack and proceed. After all the kgbdwait is there to help debug very early kernel code... In principle a valid question, but I'm not the one to answer it. I would not feel very well if I had to reorder this critical setup code. Look, we would have to move trap_init in start_kernel before parse_early_param, and that would affect _every_ arch... I can not speak to other archs, but for x86 I called trap_init from the code that caught the kgdbwait. At that time (since I retired, I have not looked at the actual kernel code) it could be called again later by the kernel code. I.e. I did not try to reorder the kernel bring up code, but just added an additional call to trap_init and then only in the case of finding a kgdbwait. As such, this would need to be arch specific... BTW, do you know if EXCEPTION_STACK_READY fails for other archs in parse_early_param as well? It should, because my under standing of trap_init is that it's the functions to arm things like... exception handlers? And that raises the question of the deeper purpose of this check (and the invocation of kgdb_early_init from the argument parsing function). Sigh, KGDB is still a quite improvable piece of code. Likely. Once you get it in the main line kernel, one would hope that other arch code would be forth coming as many more eyes will be in play. Jan PS: Can we move this to some public list? Sure, sorry I picked the wrong reply button, never intended it to be private. -- George Anzinger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit : Serge Noiraud wrote: ... I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? Where did you get the kgdb you are using? It looks like kgdb_ts is in this version, but it it not in the one on my website http://source.mvista.com/~ganzinger/ This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) I don't get those errors with CONFIG_KGDB=n bellow I put the diff between a working . config and a non working .config George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: ... If I redo the make command only ( not make rpm ) I obtain the following : # make CHK include/linux/version.h make[1]: « arch/i386/kernel/asm-offsets.s » est à jour. CHK include/linux/compile.h CHK usr/initramfs_list Kernel: arch/i386/boot/bzImage is ready (#1) Building modules, stage 2. MODPOST *** Warning: "preempt_locks" [net/sunrpc/sunrpc.ko] undefined! *** Warning: "preempt_locks" [net/appletalk/appletalk.ko] undefined! *** Warning: "preempt_locks" [fs/reiserfs/reiserfs.ko] undefined! *** Warning: "preempt_locks" [fs/ntfs/ntfs.ko] undefined! *** Warning: "preempt_locks" [fs/nfs/nfs.ko] undefined! *** Warning: "preempt_locks" [fs/minix/minix.ko] undefined! *** Warning: "preempt_locks" [fs/jbd/jbd.ko] undefined! *** Warning: "preempt_locks" [fs/ext3/ext3.ko] undefined! *** Warning: "preempt_locks" [fs/cifs/cifs.ko] undefined! *** Warning: "preempt_locks" [fs/affs/affs.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/libata.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/ide-scsi.ko] undefined! *** Warning: "preempt_locks" [drivers/scsi/gdth.ko] undefined! *** Warning: "preempt_locks" [drivers/md/raid6.ko] undefined! *** Warning: "preempt_locks" [drivers/md/raid5.ko] undefined! *** Warning: "preempt_locks" [drivers/ide/ide-floppy.ko] undefined! *** Warning: "preempt_locks" [drivers/block/pktcdvd.ko] undefined! *** Warning: "preempt_locks" [drivers/block/loop.ko] undefined! preempt_locks is being accessed from a module but is not exported. This is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it should build. # ~ -# CONFIG_EARLY_PRINTK is not set -# CONFIG_DEBUG_STACKOVERFLOW is not set +CONFIG_LATENCY_TRACE=y +CONFIG_RT_DEADLOCK_DETECT=y +CONFIG_DEBUG_RT_LOCKING_MODE=y <- This one is doing it +CONFIG_DEBUG_KOBJECT=y +CONFIG_DEBUG_HIGHMEM=y ~ +CONFIG_KGDB=y +CONFIG_KGDB_9600BAUD=y +# CONFIG_KGDB_19200BAUD is not set +# CONFIG_KGDB_38400BAUD is not set +# CONFIG_KGDB_57600BAUD is not set +# CONFIG_KGDB_115200BAUD is not set +CONFIG_KGDB_PORT=0x3f8 +CONFIG_KGDB_IRQ=4 +CONFIG_KGDB_MORE=y +CONFIG_KGDB_OPTIONS="-O1" +CONFIG_NO_KGDB_CPUS=8 The following are not in the latest kgdb... +CONFIG_KGDB_TS=y +# CONFIG_KGDB_TS_64 is not set +CONFIG_KGDB_TS_128=y +# CONFIG_KGDB_TS_256 is not set +# CONFIG_KGDB_TS_512 is not set +# CONFIG_KGDB_TS_1024 is not set . +CONFIG_STACK_OVERFLOW_TEST=y +CONFIG_TRAP_BAD_SYSCALL_EXITS=y <--- I recommend against this one, see notes at front of kgdb patch +CONFIG_KGDB_CONSOLE=y<--- Likewise use this only if you have only one serial port and no VGA +CONFIG_KGDB_SYSRQ=y # - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit : Serge Noiraud wrote: ... I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? Where did you get the kgdb you are using? It looks like kgdb_ts is in this version, but it it not in the one on my website http://source.mvista.com/~ganzinger/ This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) I don't get those errors with CONFIG_KGDB=n bellow I put the diff between a working . config and a non working .config George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: ... If I redo the make command only ( not make rpm ) I obtain the following : # make CHK include/linux/version.h make[1]: « arch/i386/kernel/asm-offsets.s » est à jour. CHK include/linux/compile.h CHK usr/initramfs_list Kernel: arch/i386/boot/bzImage is ready (#1) Building modules, stage 2. MODPOST *** Warning: preempt_locks [net/sunrpc/sunrpc.ko] undefined! *** Warning: preempt_locks [net/appletalk/appletalk.ko] undefined! *** Warning: preempt_locks [fs/reiserfs/reiserfs.ko] undefined! *** Warning: preempt_locks [fs/ntfs/ntfs.ko] undefined! *** Warning: preempt_locks [fs/nfs/nfs.ko] undefined! *** Warning: preempt_locks [fs/minix/minix.ko] undefined! *** Warning: preempt_locks [fs/jbd/jbd.ko] undefined! *** Warning: preempt_locks [fs/ext3/ext3.ko] undefined! *** Warning: preempt_locks [fs/cifs/cifs.ko] undefined! *** Warning: preempt_locks [fs/affs/affs.ko] undefined! *** Warning: preempt_locks [drivers/scsi/libata.ko] undefined! *** Warning: preempt_locks [drivers/scsi/ide-scsi.ko] undefined! *** Warning: preempt_locks [drivers/scsi/gdth.ko] undefined! *** Warning: preempt_locks [drivers/md/raid6.ko] undefined! *** Warning: preempt_locks [drivers/md/raid5.ko] undefined! *** Warning: preempt_locks [drivers/ide/ide-floppy.ko] undefined! *** Warning: preempt_locks [drivers/block/pktcdvd.ko] undefined! *** Warning: preempt_locks [drivers/block/loop.ko] undefined! preempt_locks is being accessed from a module but is not exported. This is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it should build. # ~ -# CONFIG_EARLY_PRINTK is not set -# CONFIG_DEBUG_STACKOVERFLOW is not set +CONFIG_LATENCY_TRACE=y +CONFIG_RT_DEADLOCK_DETECT=y +CONFIG_DEBUG_RT_LOCKING_MODE=y - This one is doing it +CONFIG_DEBUG_KOBJECT=y +CONFIG_DEBUG_HIGHMEM=y ~ +CONFIG_KGDB=y +CONFIG_KGDB_9600BAUD=y +# CONFIG_KGDB_19200BAUD is not set +# CONFIG_KGDB_38400BAUD is not set +# CONFIG_KGDB_57600BAUD is not set +# CONFIG_KGDB_115200BAUD is not set +CONFIG_KGDB_PORT=0x3f8 +CONFIG_KGDB_IRQ=4 +CONFIG_KGDB_MORE=y +CONFIG_KGDB_OPTIONS=-O1 +CONFIG_NO_KGDB_CPUS=8 The following are not in the latest kgdb... +CONFIG_KGDB_TS=y +# CONFIG_KGDB_TS_64 is not set +CONFIG_KGDB_TS_128=y +# CONFIG_KGDB_TS_256 is not set +# CONFIG_KGDB_TS_512 is not set +# CONFIG_KGDB_TS_1024 is not set . +CONFIG_STACK_OVERFLOW_TEST=y +CONFIG_TRAP_BAD_SYSCALL_EXITS=y --- I recommend against this one, see notes at front of kgdb patch +CONFIG_KGDB_CONSOLE=y--- Likewise use this only if you have only one serial port and no VGA +CONFIG_KGDB_SYSRQ=y # - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. Hi, everybody I found two bugs in kgdb-ga-rt patch. The first one : if CONFIG_SMP is not set, we have a compile error The second one : if CONFIG_KGDB is not set, we have a link error I send you a diff patch to correct this. I am not sure the last patch is correct, but it works. The reported bugs are now rolled into the kgdb patch. Also, there is a new README.txt. I also included, in the kgdb patch, an updated gdb macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu macro to: given a per_cpu structure name and the cpu number returns the address of that structure, properly typed. I am also putting my current version of time_stamp_tool. This is the replacement for kgdb_ts() which I have removed from the kgdb patch. Still a little rough but it has promise of being arch independent. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko needs unknown symbol preempt_locks make[3]: *** [_modinst_post] Erreur 1 erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) Erreur de construction de RPM: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) make[2]: *** [rpm] Erreur 1 make[1]: *** [rpm] Erreur 2 make: *** [rpm] Erreur 2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can bt through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. I'm trying this kgdb patch with 2.6.13 and I get the following errors. Is there something I forgot ? This related to kgdb? I.e. does it go away if you either turn off kgdb at configure time or just don't patch with kgdb? (It sure seems unrelated, but...) George ... INSTALL sound/usb/snd-usb-audio.ko INSTALL sound/usb/snd-usb-lib.ko INSTALL sound/usb/usx2y/snd-usb-usx2y.ko if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko needs unknown symbol preempt_locks WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko needs unknown symbol preempt_locks make[3]: *** [_modinst_post] Erreur 1 erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) Erreur de construction de RPM: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install) make[2]: *** [rpm] Erreur 1 make[1]: *** [rpm] Erreur 2 make: *** [rpm] Erreur 2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Serge Noiraud wrote: mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit : I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can bt through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. Hi, everybody I found two bugs in kgdb-ga-rt patch. The first one : if CONFIG_SMP is not set, we have a compile error The second one : if CONFIG_KGDB is not set, we have a link error I send you a diff patch to correct this. I am not sure the last patch is correct, but it works. The reported bugs are now rolled into the kgdb patch. Also, there is a new README.txt. I also included, in the kgdb patch, an updated gdb macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu macro to: given a per_cpu structure name and the cpu number returns the address of that structure, properly typed. I am also putting my current version of time_stamp_tool. This is the replacement for kgdb_ts() which I have removed from the kgdb patch. Still a little rough but it has promise of being arch independent. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values
john stultz wrote: All, I recently ran into a bug with an older kernel where xtime's tv_nsec field had accumulated more then 2 seconds worth of time. The timespec's tv_nsec is a signed long, however gettimeofday() treats it as an unsigned long. Thus when the failure occured, very strange and difficult to debug time problems occurred. The main cause of the problem I was seeing is already fixed in mainline, however just to be safe, I figured the following patch would be wise. I only audited i386 and x86_64, however other arches probably could have similar signed problems as well. Please let me know if you have any further comments or feedback. John, There is a problem in the way this code handles the conversion to usec. There is a conversion here and also in the get_offset code. If the nanoseconds are carrier until after the addition of the two about 25% of the time you will end up with an additional usec in time. I strongly suggest changing to convert to usec after the addition of xtime and get_offset time to avoid this. If the "correct" thing is done in clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns up as a back step in time WRT gettimeofday and clock_gettime(). George -- thanks -john linux-2.6.13_signed-tv_nsec_A0.patch diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c --- a/arch/i386/kernel/time.c +++ b/arch/i386/kernel/time.c @@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv) usec += lost * (USEC_PER_SEC / HZ); sec = xtime.tv_sec; - usec += (xtime.tv_nsec / 1000); + usec += (unsigned long)xtime.tv_nsec / 1000; } while (read_seqretry(_lock, seq)); while (usec >= 100) { diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv) seq = read_seqbegin(_lock); sec = xtime.tv_sec; - usec = xtime.tv_nsec / 1000; + usec = (unsigned long)xtime.tv_nsec / 1000; /* i386 does some correction here to keep the clock monotonous even when ntpd is fixing drift. diff --git a/kernel/timer.c b/kernel/timer.c --- a/kernel/timer.c +++ b/kernel/timer.c @@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo do { ticks--; update_wall_time_one_tick(); - if (xtime.tv_nsec >= 10) { + if ((unsigned long)xtime.tv_nsec >= 10) { xtime.tv_nsec -= 10; xtime.tv_sec++; second_overflow(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values
john stultz wrote: All, I recently ran into a bug with an older kernel where xtime's tv_nsec field had accumulated more then 2 seconds worth of time. The timespec's tv_nsec is a signed long, however gettimeofday() treats it as an unsigned long. Thus when the failure occured, very strange and difficult to debug time problems occurred. The main cause of the problem I was seeing is already fixed in mainline, however just to be safe, I figured the following patch would be wise. I only audited i386 and x86_64, however other arches probably could have similar signed problems as well. Please let me know if you have any further comments or feedback. John, There is a problem in the way this code handles the conversion to usec. There is a conversion here and also in the get_offset code. If the nanoseconds are carrier until after the addition of the two about 25% of the time you will end up with an additional usec in time. I strongly suggest changing to convert to usec after the addition of xtime and get_offset time to avoid this. If the correct thing is done in clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns up as a back step in time WRT gettimeofday and clock_gettime(). George -- thanks -john linux-2.6.13_signed-tv_nsec_A0.patch diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c --- a/arch/i386/kernel/time.c +++ b/arch/i386/kernel/time.c @@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv) usec += lost * (USEC_PER_SEC / HZ); sec = xtime.tv_sec; - usec += (xtime.tv_nsec / 1000); + usec += (unsigned long)xtime.tv_nsec / 1000; } while (read_seqretry(xtime_lock, seq)); while (usec = 100) { diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv) seq = read_seqbegin(xtime_lock); sec = xtime.tv_sec; - usec = xtime.tv_nsec / 1000; + usec = (unsigned long)xtime.tv_nsec / 1000; /* i386 does some correction here to keep the clock monotonous even when ntpd is fixing drift. diff --git a/kernel/timer.c b/kernel/timer.c --- a/kernel/timer.c +++ b/kernel/timer.c @@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo do { ticks--; update_wall_time_one_tick(); - if (xtime.tv_nsec = 10) { + if ((unsigned long)xtime.tv_nsec = 10) { xtime.tv_nsec -= 10; xtime.tv_sec++; second_overflow(); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()
Tom Rini wrote: On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote: Tom Rini wrote: CC: Andi Kleen <[EMAIL PROTECTED]> This adds a call to notify_die() in the "no context" portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than "no context". This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like "bad kernel address" or "illegal kernel address". "no context" is the label we're in, in the code. What it's actually used for is "hey, we (== kgdb) tried to read/write a very very bogus addr, time to longjmp". If it's not true that kgdb is at fault then we drop to the debugger anyhow, and the user can see where they came from. No. What the user sees is the offending code (i.e. prior to the trap to page_fault), NOT how kgdb happend to be called. The "no_context" is IN the _context_ of page_fault, but that is lost by the time you get to kgdb and ask to see _why_ (via, hint, hint: "p kgdb_info"). -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()
Tom Rini wrote: CC: Andi Kleen <[EMAIL PROTECTED]> This adds a call to notify_die() in the "no context" portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than "no context". This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like "bad kernel address" or "illegal kernel address". -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the no context part of do_page_fault()
Tom Rini wrote: CC: Andi Kleen [EMAIL PROTECTED] This adds a call to notify_die() in the no context portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, no context, regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than no context. This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like bad kernel address or illegal kernel address. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] x86_64: Add a notify_die() call to the no context part of do_page_fault()
Tom Rini wrote: On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote: Tom Rini wrote: CC: Andi Kleen [EMAIL PROTECTED] This adds a call to notify_die() in the no context portion of do_page_fault() as someone on the chain might care and want to do a fixup. --- linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 1 files changed, 4 insertions(+) diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c --- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 11:09:13.0 -0700 +++ linux-2.6.13-trini/arch/x86_64/mm/fault.c 2005-08-29 11:09:13.0 -0700 @@ -514,6 +514,10 @@ no_context: if (is_errata93(regs, address)) return; + if (notify_die(DIE_PAGE_FAULT, no context, regs, error_code, 14, + SIGSEGV) == NOTIFY_STOP) + return; + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice. Please use a more descriptive text than no context. This bit of info SHOULD be available to the gdb/kgdb user and should indicate why kgdb was entered. It thus should be something like bad kernel address or illegal kernel address. no context is the label we're in, in the code. What it's actually used for is hey, we (== kgdb) tried to read/write a very very bogus addr, time to longjmp. If it's not true that kgdb is at fault then we drop to the debugger anyhow, and the user can see where they came from. No. What the user sees is the offending code (i.e. prior to the trap to page_fault), NOT how kgdb happend to be called. The no_context is IN the _context_ of page_fault, but that is lost by the time you get to kgdb and ask to see _why_ (via, hint, hint: p kgdb_info). -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: when or where can the case occur in "linux kernel development " about "kernel preemption"?
linux-os (Dick Johnson) wrote: On Sat, 27 Aug 2005, Sat. wrote: 2005/8/27, Christopher Friesen <[EMAIL PROTECTED]>: Sat. wrote: the case about kernel preemption as follow : the book said "when a process that has a higher priority than the currenty running process is awakened ". but I can think about when such case can occur , could you give me an example ? There may be others, but one common case is when a hardware interrupt causes the higher priority process to become runnable. Some examples of this would be a network packet arriving, or the expiry of a hardware timer. Chris unfortunately, I cannot agree with you , normally ,when the kernel runs in interrupt context , the schedule() should not be invoked --my views . then,could anyone give me a definite example about network like above or anything else to eluminate this , ok? thanks ! -- Sat. Schedule is never executed from an interrupt, BUT, there may be kernel threads or even user tasks that are sleeping, waiting to be awakened when some preliminary interrupt processing has occurred. The interrupt code may execute one of the wake-up calls which will cause the target to be put into the run queue as soon as possible. Actually, this is not completly true. The kernel sets a flag while handling interrupts that says it is within an interrupt. This flag is cleared on the way out of the interrupt but prior to the return from interrupt (rfi) instruction. Between this flag clearing and the rfi, there is a check made to see if the kernel is preemptable and, if so, if it is desired (i.e. something more important should run NOW). If both of these are true, schedule is called to do the context switch. So, schedule IS called from within the interrupt, but NOT within the area the kernel flags as being in an interrupt which is a subset of the actual interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: when or where can the case occur in linux kernel development about kernel preemption?
linux-os (Dick Johnson) wrote: On Sat, 27 Aug 2005, Sat. wrote: 2005/8/27, Christopher Friesen [EMAIL PROTECTED]: Sat. wrote: the case about kernel preemption as follow : the book said when a process that has a higher priority than the currenty running process is awakened . but I can think about when such case can occur , could you give me an example ? There may be others, but one common case is when a hardware interrupt causes the higher priority process to become runnable. Some examples of this would be a network packet arriving, or the expiry of a hardware timer. Chris unfortunately, I cannot agree with you , normally ,when the kernel runs in interrupt context , the schedule() should not be invoked --my views . then,could anyone give me a definite example about network like above or anything else to eluminate this , ok? thanks ! -- Sat. Schedule is never executed from an interrupt, BUT, there may be kernel threads or even user tasks that are sleeping, waiting to be awakened when some preliminary interrupt processing has occurred. The interrupt code may execute one of the wake-up calls which will cause the target to be put into the run queue as soon as possible. Actually, this is not completly true. The kernel sets a flag while handling interrupts that says it is within an interrupt. This flag is cleared on the way out of the interrupt but prior to the return from interrupt (rfi) instruction. Between this flag clearing and the rfi, there is a check made to see if the kernel is preemptable and, if so, if it is desired (i.e. something more important should run NOW). If both of these are true, schedule is called to do the context switch. So, schedule IS called from within the interrupt, but NOT within the area the kernel flags as being in an interrupt which is a subset of the actual interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: George Anzinger [mailto:[EMAIL PROTECTED] wrote: Well, I checked, it is "int $3". Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? Below is the console from the session it is interesting that gdb is not able to access the memory. I let it continue and then ctrl-c broke it later in the boot cycle and tried disass again with the same result. Feel free to flog me if this is stupid but I have just one EM64T machine (test) and I'm using a regular P4 machine as dev. I build the test kernel on the EM64T machine and then copy the updated sources, object files, and images via NFS to the dev machine. I believe I read in the kgdb doc that it was possible to use to different architecture machines for test and dev although there wasn't any information about how to do it. This is probably the source of the OS/ABI warning. I can probably get the mothership to send me another EM64T machine if need be. What you need is a cross development environment. Not having that, your gdb is likely not aware of how to talk to the hardware you are using. The cross develoment should cost a whole lot less than another machine. George -- vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i586-suse-linux"... warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration of GDB. Attempting to continue with the default i386:x86-64 settings. Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) target remote /dev/ttyS0 Remote debugging using /dev/ttyS0 0x80503b50 in ?? () warning: no shared library support for this OS / ABI (gdb) disass gdb_interrupt Dump of assembler code for function gdb_interrupt: 0x80247009 : Cannot access memory at address 0x80247009 (gdb) c Continuing. Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb) Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14 MDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e6000 - 0010 (reserved) BIOS-e820: 0010 - 3fe2f800 (usable) BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS) BIOS-e820: 3ff1 - 3ff3 (reserved) BIOS-e820: 3ff3 - 3ff4 (ACPI data) BIOS-e820: 3ff4 - 3fff (ACPI NVS) BIOS-e820: 3fff - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fed13000 - fed1a000 (reserved) BIOS-e820: fed1c000 - feda (reserved) ACPI: PM-Timer IO Port: 0x408 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
George Anzinger wrote: Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with "target remote /dev/ttyS0" and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, "unable to open an initial console", followed by, "Kernel panic - not syncing: Attempted to kill init!" Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e "\003" > /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. Well, I checked, it is "int $3". Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Christoph Lameter wrote: On Fri, 26 Aug 2005, Alex Williamson wrote: Would we ever want to favor a frequency shifting timer over anything else in the system? If it was noticeable perhaps we'd just need a callback to re-evaluate the frequency and rescan for the best timer. If it happens without notice, a flag that statically assigns it the lowest priority will due. Or maybe if the driver factored the frequency shifting into the drift it would make the timer undesirable without resorting to flags. Thanks, Timers are usually constant. AFAIK Frequency shifts only occur through power management. In that case we usually have some notifiers running before the change. These notifiers need to switch to a different time source if the timer frequency will be shifting or the timer will become unavailable. If there is a notifier, I presume we can track it. We might want to refine things so as to not hit too big a bump when the shift occures, but I think it is doable. The desirability of doing it, I think, depends on the availablity of something better. The access time of the TSC is "really" enticing. Even so, I think a _good_ clock would not depend on long term accuracy of something as fast as the TSC. Vendors are even modulating these to reduce RFI, but still, because of its speed, it makes the best interpolator for the jiffie to jiffie times. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with "target remote /dev/ttyS0" and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, "unable to open an initial console", followed by, "Kernel panic - not syncing: Attempted to kill init!" Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e "\003" > /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Alex Williamson wrote: On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote: I think a priority is something useful for the interpolators. Some of the decisions about which time sources to use also have criteria different from drift/latency/jitter/cpu. F.e. timers may not survive various power-saving configurations. Thus I would think that we need a priority plus some flags. Some of the criteria for choosing a time source may be: Hi Christoph, I sent another followup to this thread with a patch containing a fairly crude algorithm that I think better explains my starting point. I'm sure the weighting and scaling factors need work, but I think many of the criteria you describe will favor the right clock. 1. If a system boots up with a single cpu then there is no question that the ITC/TSC should be used because of the fast access. We need to factor in frequency shifting here, especially if it happens with out notice. ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Alex Williamson wrote: On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote: I think a priority is something useful for the interpolators. Some of the decisions about which time sources to use also have criteria different from drift/latency/jitter/cpu. F.e. timers may not survive various power-saving configurations. Thus I would think that we need a priority plus some flags. Some of the criteria for choosing a time source may be: Hi Christoph, I sent another followup to this thread with a patch containing a fairly crude algorithm that I think better explains my starting point. I'm sure the weighting and scaling factors need work, but I think many of the criteria you describe will favor the right clock. 1. If a system boots up with a single cpu then there is no question that the ITC/TSC should be used because of the fast access. We need to factor in frequency shifting here, especially if it happens with out notice. ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with target remote /dev/ttyS0 and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, unable to open an initial console, followed by, Kernel panic - not syncing: Attempted to kill init! Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e \003 /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Need better is_better_time_interpolator() algorithm
Christoph Lameter wrote: On Fri, 26 Aug 2005, Alex Williamson wrote: Would we ever want to favor a frequency shifting timer over anything else in the system? If it was noticeable perhaps we'd just need a callback to re-evaluate the frequency and rescan for the best timer. If it happens without notice, a flag that statically assigns it the lowest priority will due. Or maybe if the driver factored the frequency shifting into the drift it would make the timer undesirable without resorting to flags. Thanks, Timers are usually constant. AFAIK Frequency shifts only occur through power management. In that case we usually have some notifiers running before the change. These notifiers need to switch to a different time source if the timer frequency will be shifting or the timer will become unavailable. If there is a notifier, I presume we can track it. We might want to refine things so as to not hit too big a bump when the shift occures, but I think it is doable. The desirability of doing it, I think, depends on the availablity of something better. The access time of the TSC is really enticing. Even so, I think a _good_ clock would not depend on long term accuracy of something as fast as the TSC. Vendors are even modulating these to reduce RFI, but still, because of its speed, it makes the best interpolator for the jiffie to jiffie times. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
George Anzinger wrote: Wilkerson, Bryan P wrote: Thanks you Tom and George for the tips on using kgdb with 2.6.13-rc4-mm1. I almost have it working but kgdb seems to have a few issues. I can get it running from the dev machine using the kgdb and console=kgdb boot options on the test kernel. The kernel waits as it should and when I attach with target remote /dev/ttyS0 and I can continue the boot but eventually it gets to a point in the boot where it frees unused kernel memory successfully and then a warning, unable to open an initial console, followed by, Kernel panic - not syncing: Attempted to kill init! Removing the console=kgdb boot option and the machine boots all the way to run level 5. I tried to break into kgdb at this point using the $echo -e \003 /dev/ttyS0 from the dev machine but the test kernel panics at gdb_interrupt+75 when it receives anything on the serial port. Hmmm... I'm wondering if I'm maybe just the first to try this on EM64T (kernel builds in the arch/x86_64 tree). Possibly:). Since the serial port seems to work (i.e. the first test above), the fault seems to be in handling the int3. Is int3 the right instruction for this machine? If not you would make the change in kgdb.h. I think that is the only place it is defined. Well, I checked, it is int $3. Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdb on EM64T
Wilkerson, Bryan P wrote: George Anzinger [mailto:[EMAIL PROTECTED] wrote: Well, I checked, it is int $3. Why then the panic? If you try the boot with kgdb (i.e. wait) and the do: (gdb) disass gdb_interrupt What do you find at +75? Below is the console from the session it is interesting that gdb is not able to access the memory. I let it continue and then ctrl-c broke it later in the boot cycle and tried disass again with the same result. Feel free to flog me if this is stupid but I have just one EM64T machine (test) and I'm using a regular P4 machine as dev. I build the test kernel on the EM64T machine and then copy the updated sources, object files, and images via NFS to the dev machine. I believe I read in the kgdb doc that it was possible to use to different architecture machines for test and dev although there wasn't any information about how to do it. This is probably the source of the OS/ABI warning. I can probably get the mothership to send me another EM64T machine if need be. What you need is a cross development environment. Not having that, your gdb is likely not aware of how to talk to the hardware you are using. The cross develoment should cost a whole lot less than another machine. George -- vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i586-suse-linux... warning: A handler for the OS ABI GNU/Linux is not built into this configuration of GDB. Attempting to continue with the default i386:x86-64 settings. Using host libthread_db library /lib/tls/libthread_db.so.1. (gdb) target remote /dev/ttyS0 Remote debugging using /dev/ttyS0 0x80503b50 in ?? () warning: no shared library support for this OS / ABI (gdb) disass gdb_interrupt Dump of assembler code for function gdb_interrupt: 0x80247009 gdb_interrupt+0: Cannot access memory at address 0x80247009 (gdb) c Continuing. Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb) Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14 MDT 2005 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e6000 - 0010 (reserved) BIOS-e820: 0010 - 3fe2f800 (usable) BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS) BIOS-e820: 3ff1 - 3ff3 (reserved) BIOS-e820: 3ff3 - 3ff4 (ACPI data) BIOS-e820: 3ff4 - 3fff (ACPI NVS) BIOS-e820: 3fff - 4000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fed13000 - fed1a000 (reserved) BIOS-e820: fed1c000 - feda (reserved) ACPI: PM-Timer IO Port: 0x408 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
John McCutchan wrote: On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote: Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? You are reading the log correctly. There are two bugs. One is that if we pass X to idr_get_new_above, it can return X again (doesn't ever seem to return < X). The other problem is that the find fails on 1024 (and 2048 if we skip 1024). That IS strange. 1024 is on a "level" boundry, but then next level is 2**15, not 2**11. I will take a look. So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. Yes that is the exact scenario. Only 2 id's are used at any given time, and once we hit 1024 things break. This doesn't happen when the tree is not empty. Thanks for looking at this! -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 4 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 4 dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 4 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 Note the incrementing wd value even though we are removing them as we go.. What kernel are you running? The wd's should ALWAYS be incrementing, you should never get the same wd as you did before. From your log, you are getting the same wd (after you inotify_rm_watch it). I can reproduce this bug on 2.6.13-rc7. idr_get_new_above isn't returning something above. Also, the idr layer seems to be breaking when we pass in 1024. I can reproduce that on my 2.6.13-rc7 system as well. This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel. Robert, John, what do you think? Is this possibly related to the oops seen in the log that I reported earlier? (Which is still showing up 2-3 times per day, btw) There is definitely something broken here. Jim, George- We are seeing a problem in the idr layer. If we do idr_find(1024) when, say, a low valued idr, like, zero, is unallocated, NULL is returned. I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NTP ntp-helper functions
john stultz wrote: Andrew, All, This patch cleans up a commonly repeated set of changes to the NTP state variables by adding two helper inline functions: ntp_clear(): Clears the ntp state variables How many places is this called in any given arch? I ask because it _may_ save space if it is NOT inlined. I don't think it is ever in a critical code path... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] NTP ntp-helper functions
john stultz wrote: Andrew, All, This patch cleans up a commonly repeated set of changes to the NTP state variables by adding two helper inline functions: ntp_clear(): Clears the ntp state variables How many places is this called in any given arch? I ask because it _may_ save space if it is NOT inlined. I don't think it is ever in a critical code path... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 4 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 4 dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: Invalid argument dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 4 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024 Note the incrementing wd value even though we are removing them as we go.. What kernel are you running? The wd's should ALWAYS be incrementing, you should never get the same wd as you did before. From your log, you are getting the same wd (after you inotify_rm_watch it). I can reproduce this bug on 2.6.13-rc7. idr_get_new_above isn't returning something above. Also, the idr layer seems to be breaking when we pass in 1024. I can reproduce that on my 2.6.13-rc7 system as well. This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel. Robert, John, what do you think? Is this possibly related to the oops seen in the log that I reported earlier? (Which is still showing up 2-3 times per day, btw) There is definitely something broken here. Jim, George- We are seeing a problem in the idr layer. If we do idr_find(1024) when, say, a low valued idr, like, zero, is unallocated, NULL is returned. I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inotify problem [was Re: 2.6.13-rc6-mm1]
John McCutchan wrote: On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote: Robert Love wrote: On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote: On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote: ~ I think the best thing is to take idr into user space and emulate the problem usage. To this end, from the log it appears that you _might_ be moving between 0, 1 and 2 entries increasing the number each time. It also appears that the failure happens here: add 1023 add 1024 find 1024 or is it the remove that fails? It also looks like 1024 got allocated twice. Am I reading the log correctly? You are reading the log correctly. There are two bugs. One is that if we pass X to idr_get_new_above, it can return X again (doesn't ever seem to return X). The other problem is that the find fails on 1024 (and 2048 if we skip 1024). That IS strange. 1024 is on a level boundry, but then next level is 2**15, not 2**11. I will take a look. So, is it correct to assume that the tree is empty save these two at this time? I am just trying to figure out what the test program needs to do. Yes that is the exact scenario. Only 2 id's are used at any given time, and once we hit 1024 things break. This doesn't happen when the tree is not empty. Thanks for looking at this! -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote: john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Yes, clearly a forward knowledge of the NTP adjustment is necessary for gettimeofday(), because after the NTP adjustment has been accumulated into xtime, there's nothing left for gettimeofday to adjust (its already been applied). :) Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so it would be added to the nanosecond offset from the last tick (which is how the current code works). If you are doing scaling (as you have in the equation above), then the problem goes away, since you can apply the adjustment consistently through any interval. Until the end of the correction time... I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a "* ntp_adj2" is needed. Ok, so your forcing gettimeofday to be interval aware, so its applying different fixed NTP adjustments to different chunks of the current interval. The issue of course is if you're using fixed adjustments, is that you have to have n ntp adjustments for n intervals, or you have to apply the same ntp adjustment to multiple intervals. Uh, are you saying that one ntpd call can set up several different adjustments? I was assuming that any given call would set up either a fixed adjustment for ever or a fixed adjustment to be applied for a fixed number of ticks (or until so much correcting was done, which, in the end is the same thing at this point in the code). If ntpd has to come back to change the adjustment, I am assuming that some kernel action can be taken at that time to sync the xtime clock and the gettimeofday reading of it. I.e. we would only have to keep track of one adjustment with a possible pre specified end time. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. If you argue that we only need two adjustments, why not argue for only one? You're saying have one adjustment that you apply for the first tick's worth of time, and a second adjustment that you apply for the following N ticks' worth of time in the interval. Why the odd base case? Correct me if I am wrong here, but I am assuming that ntpd can ask for an adjustment of X amount which the kernel changes into N adjust
Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel
john stultz wrote: On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote: CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. Err, the Cyclone does not generate interrupts. So this issue does not affect those systems. As for the HPET, it sets its own interrupt frequency based off of KERNEL_TICK_USEC (which you're right, isn't quite what is used in the jiffies conversions). Would it be easier to just adjust that value to use ACTHZ or CLOCK_TICK_RATE? If you want to take that approach you would want the HPET to interrupt every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Incorrect CLOCK_TICK_RATE in 2.6 kernel
CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. This means that systems using these interrupt sources will be doing a) incorrect update of xtime and b) incorrect conversion of jiffies. Since these two values will track each other this will not be seen by simple gettimeofday(); sleep();gettimeofday() tests, but will be seen as a system clock drift (without NTP) or with NTP, a somewhat high drift rate (to the point of loosing sync at HZ=1000). The fact that the user/ system chooses the clock to use at boot time and can change the clock after boot means that it is not possible to pin down CLOCK_TICK_RATE at compile time. However, since the computation of TICK_NSEC and the conversion constants is rather involved it is clear that we REALLY do want to compute these at compile time. The suggested solution is to a) set up a structure with the default (clock of choice at config time) conversion constants in it at compile time. Then b) at clock init time, populate the structure with the proper constants for the given clock. These can be computed at compile time, but from the correct CLOCK_TICK_RATE for the given clock. Switching to a fall back clock would also require an update of this structure. Commits? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdbwait in 2.6.13-rc4-mm1?
Wilkerson, Bryan P wrote: Is there an equivalent kernel boot option for kgdbwait in 2.6.13-rc4-mm1? I grep'd the kernel source but didn't find kgdbwait. Is there any documentation other than the source for the flavor of KGDB that is included in the akpm kernel patch? The patch has some documentation at Documentation/i386/kgdb/* as well as a couple of gdb macros... The wait option is "gdb". This has been in flux so, to be absolutely sure, look at include/asm-i386/bugs.h -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a "* ntp_adj2" is needed. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Well, in many arches gettimeofday() works around the above issue by capping the offset_ns value as such: I think this may have been done with only usec gettimeofday. Now that we have clock_gettime() returning nsec, we need to be a bit more careful. gettimeofday: xtime + min(offset_ns, tick_len + ntp_adj) The problem with this is that when we have lost or late ticks, or if we are using dynamic ticks you have granularity problems. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote: john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: xtime + (cycle_offset * mult + error) shift. The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Yes, clearly a forward knowledge of the NTP adjustment is necessary for gettimeofday(), because after the NTP adjustment has been accumulated into xtime, there's nothing left for gettimeofday to adjust (its already been applied). :) Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so it would be added to the nanosecond offset from the last tick (which is how the current code works). If you are doing scaling (as you have in the equation above), then the problem goes away, since you can apply the adjustment consistently through any interval. Until the end of the correction time... I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a * ntp_adj2 is needed. Ok, so your forcing gettimeofday to be interval aware, so its applying different fixed NTP adjustments to different chunks of the current interval. The issue of course is if you're using fixed adjustments, is that you have to have n ntp adjustments for n intervals, or you have to apply the same ntp adjustment to multiple intervals. Uh, are you saying that one ntpd call can set up several different adjustments? I was assuming that any given call would set up either a fixed adjustment for ever or a fixed adjustment to be applied for a fixed number of ticks (or until so much correcting was done, which, in the end is the same thing at this point in the code). If ntpd has to come back to change the adjustment, I am assuming that some kernel action can be taken at that time to sync the xtime clock and the gettimeofday reading of it. I.e. we would only have to keep track of one adjustment with a possible pre specified end time. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. If you argue that we only need two adjustments, why not argue for only one? You're saying have one adjustment that you apply for the first tick's worth of time, and a second adjustment that you apply for the following N ticks' worth of time in the interval. Why the odd base case? Correct me if I am wrong here, but I am assuming that ntpd can ask for an adjustment of X amount which the kernel changes into N adjustments of X/N amount spread over the next N
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
john stultz wrote: On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote: Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: xtime + (cycle_offset * mult + error) shift. The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). Not quite. The issue that I'm trying to describe is that if, we inconsistently calculate time intervals in gettimeofday and the timer interrupt, we have the possibility for time inconsistencies. The trivial example using the current code would be something like: Again with my 2 cyc per tick clock, HZ=1000. gettimeofday(): xtime + offset_ns timer_interrupt: xtime += tick_length + ntp_adj offset_ns = 0 0: gettimeofday: 0 + 0 = 0 ns 1: gettimeofday: 0 + 500k ns = 500k ns 2: gettimeofday: 0 + 1M ns = 1M ns 2: timer_interrupt: 2: gettimeofday: 1M ns + 0 ns = 1M ns 3: gettimeofday: 1M ns + 500k ns = 1.5M ns 4: gettimeofday: 1M ns + 1M ns = 2 ns 4: timer_interrupt (using -500ppm adjustment) 4: gettimeofday: 1,999,500 ns + 0 ns = 1,999,500 ns At point 4 you are introducing a NEW ntp adjustment. This, I submit, needs to actually have been introduced to the system prior to the interrupt at point 2 with the first xtime change at point 4. However, gettimeofday() should be aware of it from the interrupt at point 2 and be doing corrections from that time forward. Thus when the point 4 interrutp happens xtime will be the same at the gettimeofday a ns earlier. Likewise, gettimeofday() needs to know when to stop apply the correction so that if a tick is late, it will apply the correction only for those times that it was needed. This, could be done by figuring the offset thusly: offset = (offset from last tick to end of ntp period * ntp_adj1) + (offset from end of ntp period to now) I suppose it is possible that the latter part of the offset is also under a different ntp correction which would mean a * ntp_adj2 is needed. I would argue that only two terms are needed here regardless of how late a tick is. This is because, I would expect the ntp system call to sync the two clocks. This means in your example, the ntp call would have been made at, or prior to the timer interrupt at 2 and this is the same edge that gettimeofday is to used to start applying the correction. It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Well, in many arches gettimeofday() works around the above issue by capping the offset_ns value as such: I think this may have been done with only usec gettimeofday. Now that we have clock_gettime() returning nsec, we need to be a bit more careful. gettimeofday: xtime + min(offset_ns, tick_len + ntp_adj) The problem with this is that when we have lost or late ticks, or if we are using dynamic ticks you have granularity problems. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kgdbwait in 2.6.13-rc4-mm1?
Wilkerson, Bryan P wrote: Is there an equivalent kernel boot option for kgdbwait in 2.6.13-rc4-mm1? I grep'd the kernel source but didn't find kgdbwait. Is there any documentation other than the source for the flavor of KGDB that is included in the akpm kernel patch? The patch has some documentation at Documentation/i386/kgdb/* as well as a couple of gdb macros... The wait option is gdb. This has been in flux so, to be absolutely sure, look at include/asm-i386/bugs.h -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Incorrect CLOCK_TICK_RATE in 2.6 kernel
CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. This means that systems using these interrupt sources will be doing a) incorrect update of xtime and b) incorrect conversion of jiffies. Since these two values will track each other this will not be seen by simple gettimeofday(); sleep();gettimeofday() tests, but will be seen as a system clock drift (without NTP) or with NTP, a somewhat high drift rate (to the point of loosing sync at HZ=1000). The fact that the user/ system chooses the clock to use at boot time and can change the clock after boot means that it is not possible to pin down CLOCK_TICK_RATE at compile time. However, since the computation of TICK_NSEC and the conversion constants is rather involved it is clear that we REALLY do want to compute these at compile time. The suggested solution is to a) set up a structure with the default (clock of choice at config time) conversion constants in it at compile time. Then b) at clock init time, populate the structure with the proper constants for the given clock. These can be computed at compile time, but from the correct CLOCK_TICK_RATE for the given clock. Switching to a fall back clock would also require an update of this structure. Commits? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel
john stultz wrote: On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote: CLOCK_TICK_RATE is used by the kernel to compute LATCH, TICK_NSEC and tick_nsec. This latter is used to update xtime each tick. TICK_NSEC is then used to compute (at compile time) the conversion constants needed to convert to/from jiffies from/to timespec and timeval (and others). The problem is that, if the timer being used is either Cyclone or HPET, the wrong CLOCK_TICK_RATE is used. Err, the Cyclone does not generate interrupts. So this issue does not affect those systems. As for the HPET, it sets its own interrupt frequency based off of KERNEL_TICK_USEC (which you're right, isn't quite what is used in the jiffies conversions). Would it be easier to just adjust that value to use ACTHZ or CLOCK_TICK_RATE? If you want to take that approach you would want the HPET to interrupt every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult + error) >> shift". The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. John, If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Hope this helps... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2
Jim Ramsay wrote: On 8/23/05, Jim Ramsay <[EMAIL PROTECTED]> wrote: Then I must have found an undocumented feature! I've applied this set of patches to a 2.6.11 kernel (with few problems) and ran into a bunch of "scheduling while atomic" errors when hotplugging a drive, culprit being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or perhaps when it then calls class_device_unregister, which does a 'down_write'. After further debugging, it appears that the problem is the debounce timer in libata-core.c. Timers appear to operate in an atomic context, so timers should not be allowed to call scsi_remove_device, which eventually schedules. Any suggestions on the best way to fix this? Workqueue, perhaps. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2
Jim Ramsay wrote: On 8/23/05, Jim Ramsay [EMAIL PROTECTED] wrote: Then I must have found an undocumented feature! I've applied this set of patches to a 2.6.11 kernel (with few problems) and ran into a bunch of scheduling while atomic errors when hotplugging a drive, culprit being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or perhaps when it then calls class_device_unregister, which does a 'down_write'. After further debugging, it appears that the problem is the debounce timer in libata-core.c. Timers appear to operate in an atomic context, so timers should not be allowed to call scsi_remove_device, which eventually schedules. Any suggestions on the best way to fix this? Workqueue, perhaps. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: Hi, On Tue, 23 Aug 2005, john stultz wrote: I'm assuming gettimeofday()/clock_gettime() looks something like: xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift Where did you get the ntp_adj from? It's not in my example. gettimeofday() was in the previous mail: xtime + (cycle_offset * mult + error) shift. The difference between system time and reference time is really important. gettimeofday() returns the system time, NTP controls the reference time and these two are synchronized regularly. I didn't see that anywhere in your example. John, If I read your example right, the problem is when the NTP adjustment changes while the two clocks are out of sync (because of a late tick). It would appear that gettimeofday would need to know that the NTP adjustment is changing (and to what). It would also appear that this is known by the ntp code and could be made available to gettimeofday. If it is changing due to an NTP call, that system call, itself, should/must force synchronization. So the only case gettimeofday needs to worry/know about is that an adjustment is to change at time X to value Y. Also, me thinks there is only one such change that can be present at any given time. Hope this helps... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: ~ 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: George, On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote: 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? Nothing which applies clean at the moment and I have no access to the box where the patch floats around. It's simply explained. Current code: set_timer() calc interval->jiffies / interval->arch_cycles; based on it.interval rearm() timer->expires += interval->jiffies; timer->arch_cycle_expires += interval->arch_cycles; normalize(timer); Patched code: set_timer() timer.interval = it.interval; timer.next_expire = it.value; both stored as timespec rearm() next_expire += interval; calc timer->expires/arch_cycle_expires; So on each rearm we eliminate the rounding errors and take the drift adjustment into account. It adds some calculation overhead to each rearm, but I think the standard was written to eliminate the need for this. The notion is that we have a resolution which we use in the calculations so while there may be drift WRT his request, there should be no drift WRT the requested value rounded up to the next resolution. Still, if we can't keep that resolution in arch_cycles... On another issue along this line, I have been thinking of changing the x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the arch cycle.) The reason to do this is to correctly track changes in cpu frequency as it is today, we would need to track down and update all pending HR timers when ever the frequency changed. By using a common unit all we need to do is change the conversion constants (well I guess they would not be constants any more :). I REALLY don't want to do this as it does add conversion overhead, but I can not think of another clean way to track TSC frequency changes. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: ~ 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment
Thomas Gleixner wrote: George, On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote: 2. Drift of cyclic timers (armed by set_timer()): Due to rounding errors and the drift adjustment code, the fixed increment which is precalculated when the timer is set up and added on rearm, I see creeping deviation from the timeline. I have a patch lined up to base the rearm on human (nsac) units, so this effect will go away. But this is waste of time until (1.) is not solved. George ??? Could I (we) see what you have in mind? Nothing which applies clean at the moment and I have no access to the box where the patch floats around. It's simply explained. Current code: set_timer() calc interval-jiffies / interval-arch_cycles; based on it.interval rearm() timer-expires += interval-jiffies; timer-arch_cycle_expires += interval-arch_cycles; normalize(timer); Patched code: set_timer() timer.interval = it.interval; timer.next_expire = it.value; both stored as timespec rearm() next_expire += interval; calc timer-expires/arch_cycle_expires; So on each rearm we eliminate the rounding errors and take the drift adjustment into account. It adds some calculation overhead to each rearm, but I think the standard was written to eliminate the need for this. The notion is that we have a resolution which we use in the calculations so while there may be drift WRT his request, there should be no drift WRT the requested value rounded up to the next resolution. Still, if we can't keep that resolution in arch_cycles... On another issue along this line, I have been thinking of changing the x86 TSC arch cycle size to 1ns. (NOT the resolution, the units for the arch cycle.) The reason to do this is to correctly track changes in cpu frequency as it is today, we would need to track down and update all pending HR timers when ever the frequency changed. By using a common unit all we need to do is change the conversion constants (well I guess they would not be constants any more :). I REALLY don't want to do this as it does add conversion overhead, but I can not think of another clean way to track TSC frequency changes. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Latency with Real-Time Preemption with 2.6.12
Steven Rostedt wrote: On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote: Hi, I am trying to experiment using 2.6.12 kernel with the realtime-preempt V0.7.51-38 patch to determine the kernel preemption latencies with the CONFIG_PREEMPT_RT mode. The test program I wrote does the following on a thread with highest priority (99) and SCHED_FIFO policy to simulate a real time thread. t1 = gettimeofday nanosleep(for 3 ms) t2 = gettimeofday I was expecting to see the difference t2-t1 to be close to 3 ms. However, the smallest difference I see is 4 milliseconds under no system load, and the difference is as high as 25 milliseconds under moderate to heavy system load (mostly performing disk I/O). That version of Ingo's patch does not implement High-Resolution Timers. Thomas worked on merging this into the latest RT patch. Without high-res timers, the best you may ever get is 4ms. This is because nanosleep is to guarantee _at_least_ 3 ms. So you have the following situation: 0 1 23 4 (ms) +---+---++---+---> ^^ || Start here 0+3 = 3 here we have the response If we look at this in smaller units than ms, we started on 0.8ms and responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms. So since Ingo's patch doesn't increase the resolution of the timers from a jiffy (which is currently 1ms) Linux is forced to add one more than you need. Based on the articles and the mails I read on this list, I understand that worst case latencies of 1 ms (or less) should be possible using the RT Preemption patch, but I am unable to get anything less than 4 millseconds even with sleep times smaller than 3 ms. I am running the tests on a SBC with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). I believe I have the high resolution timer working correctly, because if I comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond. I don't think you have the high res timer working, since there is no high res timer in that kernel. Following earlier discussions (in July) in this list, I tried to set kernel configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug information, but I didn't find these parameters in my .config file. I appreciate your suggestions/insights into the situation and steps that I should try to get more debug/tracing information that might help to understand the cause of high latency. It's not a high latency. It's doing exactly as it is suppose to, since the nanosleep doesn't have high-res support (in that kernel). If you really want to measure latency, you need to add a device or something and see what the response time of an interrupt going off to the time a thread is woken to respond to it. Now with Ingo's that is really fast. Another way to do it is to set up a repeating timer. You _must_ read back the timer to get the repeat time it is really using, and then measure how well it does giving signals at these repeat times. FAR FAR more than three lines of code... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.
David S. Miller wrote: From: Anton Blanchard <[EMAIL PROTECTED]> Date: Fri, 19 Aug 2005 04:29:55 +1000 Calling itanium the "fastest 64bit processor at any given clock frequency" on lkml is likewise inflammatory :) I totally agree. Since the itanium off loads a lot of its instruction steam decisions on to the compiler(s), where other processors just do it, one might argue that you can not even characterize the itanium without bundling in the compilers... Not to say that is wrong but just to make it clear that saying the itanium speed is is like saying that a cummings diesel is fast with out saying what sort of car/truck it is mounted in. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.
David S. Miller wrote: From: Anton Blanchard [EMAIL PROTECTED] Date: Fri, 19 Aug 2005 04:29:55 +1000 Calling itanium the fastest 64bit processor at any given clock frequency on lkml is likewise inflammatory :) I totally agree. Since the itanium off loads a lot of its instruction steam decisions on to the compiler(s), where other processors just do it, one might argue that you can not even characterize the itanium without bundling in the compilers... Not to say that is wrong but just to make it clear that saying the itanium speed is X is like saying that a cummings diesel is fast with out saying what sort of car/truck it is mounted in. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Latency with Real-Time Preemption with 2.6.12
Steven Rostedt wrote: On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote: Hi, I am trying to experiment using 2.6.12 kernel with the realtime-preempt V0.7.51-38 patch to determine the kernel preemption latencies with the CONFIG_PREEMPT_RT mode. The test program I wrote does the following on a thread with highest priority (99) and SCHED_FIFO policy to simulate a real time thread. t1 = gettimeofday nanosleep(for 3 ms) t2 = gettimeofday I was expecting to see the difference t2-t1 to be close to 3 ms. However, the smallest difference I see is 4 milliseconds under no system load, and the difference is as high as 25 milliseconds under moderate to heavy system load (mostly performing disk I/O). That version of Ingo's patch does not implement High-Resolution Timers. Thomas worked on merging this into the latest RT patch. Without high-res timers, the best you may ever get is 4ms. This is because nanosleep is to guarantee _at_least_ 3 ms. So you have the following situation: 0 1 23 4 (ms) +---+---++---+--- ^^ || Start here 0+3 = 3 here we have the response If we look at this in smaller units than ms, we started on 0.8ms and responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms. So since Ingo's patch doesn't increase the resolution of the timers from a jiffy (which is currently 1ms) Linux is forced to add one more than you need. Based on the articles and the mails I read on this list, I understand that worst case latencies of 1 ms (or less) should be possible using the RT Preemption patch, but I am unable to get anything less than 4 millseconds even with sleep times smaller than 3 ms. I am running the tests on a SBC with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). I believe I have the high resolution timer working correctly, because if I comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond. I don't think you have the high res timer working, since there is no high res timer in that kernel. Following earlier discussions (in July) in this list, I tried to set kernel configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug information, but I didn't find these parameters in my .config file. I appreciate your suggestions/insights into the situation and steps that I should try to get more debug/tracing information that might help to understand the cause of high latency. It's not a high latency. It's doing exactly as it is suppose to, since the nanosleep doesn't have high-res support (in that kernel). If you really want to measure latency, you need to add a device or something and see what the response time of an interrupt going off to the time a thread is woken to respond to it. Now with Ingo's that is really fast. Another way to do it is to set up a repeating timer. You _must_ read back the timer to get the repeat time it is really using, and then measure how well it does giving signals at these repeat times. FAR FAR more than three lines of code... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. I don't want to be too parental either, but I also am trying to avoid code duplication. Lots of drivers basically do something like poll_event() does (or could do with some changes), i.e. looping a constant amount multiple times, checking something every so often. The patch was just a thought, though. I will keep evaluating drivers and see if it's a useful interface to have eventually. I guess I'm just concerned with making an unintuitive interface. As was brought up at OLS, drivers are a major source of bugs/buggy code. The simpler, more useful we can make interfaces, the better, I think. I'm not claiming you disagree, I just want to make my own motives clear. While fixing up the schedule_timeout() comment would make it clear what schedule_timeout() achieves, I'm not sure how useful such an interface is, if every caller adds 1 :) I need to mull it over, though... Lots to consider. I also, of course, want to stay flexible for the reasons you mention (letting the driver adjust the timeout as they expect to). I would leave the +1 alone and put in the correct documentation. This way _more_ folks will be made aware of the mid jiffie issue. Far to often we see (and let get in) patches that mess up user interfaces around this issue. The recent changes to itimer come to mind... ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Ingo Molnar wrote: * George Anzinger wrote: I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. great. For the time being i wont add it to the -RT tree (because KGDB is not destined for upstream merging it seems), but it sure is a useful development/debugging add-on. I agree on not adding it. Tom Rini is working on a version the Andrew seems inclined to merge. When that happens I will most likely put together enhancements to it to bring it up to what this one does. Meanwhile I am trying to capture some of Tom's changes in this one. Also, it is MUCH easier for me to maintain as a seperate patch. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: ~ The thing that worries me about this function is that it does every thing in usec. We are using nsec in xtime now and I wonder if it would not be more accurate to do the math in nsecs. Even tick size (tick_nsec) does not translate well to usec, it currently being 999849 nsecs. George --- kernel/time.c |3 ++- kernel/timer.c | 53 + 2 files changed, 55 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time.c === --- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200 @@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc) } /* txc->modes & ADJ_OFFSET */ if (txc->modes & ADJ_TICK) { tick_usec = txc->tick; - tick_nsec = TICK_USEC_TO_NSEC(tick_usec); } + if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK)) + time_recalc(); } /* txc->modes */ leave: if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0 || ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0 Index: linux-2.6/kernel/timer.c === --- linux-2.6.orig/kernel/timer.c 2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200 @@ -559,6 +559,7 @@ found: */ unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */ unsigned long tick_nsec = TICK_NSEC; /* ACTHZ period (nsec) */ +unsigned long tick_nsec2 = TICK_NSEC; /* * The current time @@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC; /* * the usual normalization. */ struct timespec xtime __attribute__ ((aligned (16))); +struct timespec xtime2 __attribute__ ((aligned (16))); struct timespec wall_to_monotonic __attribute__ ((aligned (16))); EXPORT_SYMBOL(xtime); @@ -596,6 +598,33 @@ static long time_adj; /* tick adjust ( long time_reftime; /* time at last adjustment (s) */ long time_adjust; long time_next_adjust; +static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, time_phase2; + +void time_recalc(void) +{ + long f, t; + tick_nsec = TICK_USEC_TO_NSEC(tick_usec); This leaves bits on the floor. Is it not possible to do this whole calculation in nano seconds? Currently, for example, tick_nsec is 999849... + + t = time_freq >> (SHIFT_USEC + 8); + if (t) { + time_freq -= t << (SHIFT_USEC + 8); + t *= 1000 << 8; + } + f = time_freq * 125; + t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3)); + f &= (1 << (SHIFT_USEC - 3)) - 1; + tick_nsec2 = t / HZ; + f += (t % HZ) << (SHIFT_USEC - 3); + f <<= 5; + time_adj2 = f / HZ; + time_freq_adj2 = f % HZ; + + printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n", + xtime.tv_sec, xtime.tv_sec, + tick_nsec, time_freq, time_offset, time_next_adjust, + xtime2.tv_sec, xtime2.tv_nsec, + tick_nsec2, time_adj2, time_freq_adj2); +} /* * this routine handles the overflow of the microsecond field @@ -739,6 +768,16 @@ static void second_overflow(void) #endif } +static void second_overflow2(void) +{ + time_adj2_cur = time_adj2; + time_freq_phase2 += time_freq_adj2; + if (time_freq_phase2 > HZ) { + time_freq_phase2 -= HZ; + time_adj2_cur++; + } +} + /* in the NTP reference this is called "hardclock()" */ static void update_wall_time_one_tick(void) { @@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo time_adjust = time_next_adjust; time_next_adjust = 0; } + + delta_nsec = tick_nsec2; + time_phase2 += time_adj2_cur; + if (time_phase2 >= (1 << (SHIFT_USEC + 2))) { + long ltemp = time_phase2 >> (SHIFT_USEC + 2); + time_phase2 -= ltemp << (SHIFT_USEC + 2); + delta_nsec += ltemp; + } + xtime2.tv_nsec += delta_nsec; + if (xtime2.tv_nsec >= NSEC_PER_SEC) { + xtime2.tv_nsec -= NSEC_PER_SEC; + xtime2.tv_sec++; + second_overflow2(); + } } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "un
Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)
Roman Zippel wrote: ~ The thing that worries me about this function is that it does every thing in usec. We are using nsec in xtime now and I wonder if it would not be more accurate to do the math in nsecs. Even tick size (tick_nsec) does not translate well to usec, it currently being 999849 nsecs. George --- kernel/time.c |3 ++- kernel/timer.c | 53 + 2 files changed, 55 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time.c === --- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200 @@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc) } /* txc-modes ADJ_OFFSET */ if (txc-modes ADJ_TICK) { tick_usec = txc-tick; - tick_nsec = TICK_USEC_TO_NSEC(tick_usec); } + if (txc-modes (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK)) + time_recalc(); } /* txc-modes */ leave: if ((time_status (STA_UNSYNC|STA_CLOCKERR)) != 0 || ((time_status (STA_PPSFREQ|STA_PPSTIME)) != 0 Index: linux-2.6/kernel/timer.c === --- linux-2.6.orig/kernel/timer.c 2005-07-13 03:18:04.0 +0200 +++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200 @@ -559,6 +559,7 @@ found: */ unsigned long tick_usec = TICK_USEC; /* USER_HZ period (usec) */ unsigned long tick_nsec = TICK_NSEC; /* ACTHZ period (nsec) */ +unsigned long tick_nsec2 = TICK_NSEC; /* * The current time @@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC; /* * the usual normalization. */ struct timespec xtime __attribute__ ((aligned (16))); +struct timespec xtime2 __attribute__ ((aligned (16))); struct timespec wall_to_monotonic __attribute__ ((aligned (16))); EXPORT_SYMBOL(xtime); @@ -596,6 +598,33 @@ static long time_adj; /* tick adjust ( long time_reftime; /* time at last adjustment (s) */ long time_adjust; long time_next_adjust; +static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, time_phase2; + +void time_recalc(void) +{ + long f, t; + tick_nsec = TICK_USEC_TO_NSEC(tick_usec); This leaves bits on the floor. Is it not possible to do this whole calculation in nano seconds? Currently, for example, tick_nsec is 999849... + + t = time_freq (SHIFT_USEC + 8); + if (t) { + time_freq -= t (SHIFT_USEC + 8); + t *= 1000 8; + } + f = time_freq * 125; + t += tick_usec * USER_HZ * 1000 + (f (SHIFT_USEC - 3)); + f = (1 (SHIFT_USEC - 3)) - 1; + tick_nsec2 = t / HZ; + f += (t % HZ) (SHIFT_USEC - 3); + f = 5; + time_adj2 = f / HZ; + time_freq_adj2 = f % HZ; + + printk(tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n, + xtime.tv_sec, xtime.tv_sec, + tick_nsec, time_freq, time_offset, time_next_adjust, + xtime2.tv_sec, xtime2.tv_nsec, + tick_nsec2, time_adj2, time_freq_adj2); +} /* * this routine handles the overflow of the microsecond field @@ -739,6 +768,16 @@ static void second_overflow(void) #endif } +static void second_overflow2(void) +{ + time_adj2_cur = time_adj2; + time_freq_phase2 += time_freq_adj2; + if (time_freq_phase2 HZ) { + time_freq_phase2 -= HZ; + time_adj2_cur++; + } +} + /* in the NTP reference this is called hardclock() */ static void update_wall_time_one_tick(void) { @@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo time_adjust = time_next_adjust; time_next_adjust = 0; } + + delta_nsec = tick_nsec2; + time_phase2 += time_adj2_cur; + if (time_phase2 = (1 (SHIFT_USEC + 2))) { + long ltemp = time_phase2 (SHIFT_USEC + 2); + time_phase2 -= ltemp (SHIFT_USEC + 2); + delta_nsec += ltemp; + } + xtime2.tv_nsec += delta_nsec; + if (xtime2.tv_nsec = NSEC_PER_SEC) { + xtime2.tv_nsec -= NSEC_PER_SEC; + xtime2.tv_sec++; + second_overflow2(); + } } /* - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] KGDB for Real-Time Preemption systems
Ingo Molnar wrote: * George Anzinger george@mvista.com wrote: I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can bt through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. great. For the time being i wont add it to the -RT tree (because KGDB is not destined for upstream merging it seems), but it sure is a useful development/debugging add-on. I agree on not adding it. Tom Rini is working on a version the Andrew seems inclined to merge. When that happens I will most likely put together enhancements to it to bring it up to what this one does. Meanwhile I am trying to capture some of Tom's changes in this one. Also, it is MUCH easier for me to maintain as a seperate patch. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. I don't want to be too parental either, but I also am trying to avoid code duplication. Lots of drivers basically do something like poll_event() does (or could do with some changes), i.e. looping a constant amount multiple times, checking something every so often. The patch was just a thought, though. I will keep evaluating drivers and see if it's a useful interface to have eventually. I guess I'm just concerned with making an unintuitive interface. As was brought up at OLS, drivers are a major source of bugs/buggy code. The simpler, more useful we can make interfaces, the better, I think. I'm not claiming you disagree, I just want to make my own motives clear. While fixing up the schedule_timeout() comment would make it clear what schedule_timeout() achieves, I'm not sure how useful such an interface is, if every caller adds 1 :) I need to mull it over, though... Lots to consider. I also, of course, want to stay flexible for the reasons you mention (letting the driver adjust the timeout as they expect to). I would leave the +1 alone and put in the correct documentation. This way _more_ folks will be made aware of the mid jiffie issue. Far to often we see (and let get in) patches that mess up user interfaces around this issue. The recent changes to itimer come to mind... ~ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] KGDB for Real-Time Preemption systems
I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can "bt" through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: On 04.08.2005 [09:45:55 -0700], George Anzinger wrote: Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and timeval_to_jiffies() to add 1. This is NOT the right thing to do. For repeating times (see setitimer code) we need the actual time as we KNOW where the jiffies edge is in the repeating case. The +1 is needed ONLY for the initial time, not the repeating time. See: http://marc.theaimsgroup.com/?l=linux-kernel=112208357906156=2 I followed that thread, George, but I think it's a different case with schedule_timeout() [maybe this indicates drivers/other users should maybe be using itimers, but I'll get to that in a sec]. I think I miss understood back then :). With schedule_timeout(), we are just given a relative jiffies value. We have no context as to which task is requesting the delay, per se, meaning we don't (can't) know from the interface whether this is the first delay in a sequence, or a brand new one, without changing all users to have some sort of control structure. The callers of schedule_timeout() don't even get a pointer to the timer added internally. So, adding 1 to all sleeps seems like it might be reasonable, as looping sleeps probably need to use a different interface. I had worked a bit ago on something like poll_event() with the kernel-janitors group, which would abstract out the repeated sleeps. Basically wait_event() without wait-queues... Maybe we could make such an interface just use itimers? I've attached my old patch for poll_event(), just for reference. I think not. itimers is really pointed at a particular system call and has resources in the task structure to do it. These would be hard to share... My point, I guess, is that in the schedule_timeout() case, we don't know where the jiffies edge is, as we either expire or receive a wait-queue event/signal, we never mod_timer() the internal timer... So we have to assume that we need to sleep the request. But maybe Roman's idea of sleeping a certain number of jiffy edges is sufficient. I am not yet convinced driver authors want/need such an interface, though, still thinking it over. IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: On 04.08.2005 [09:45:55 -0700], George Anzinger wrote: Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and timeval_to_jiffies() to add 1. This is NOT the right thing to do. For repeating times (see setitimer code) we need the actual time as we KNOW where the jiffies edge is in the repeating case. The +1 is needed ONLY for the initial time, not the repeating time. See: http://marc.theaimsgroup.com/?l=linux-kernelm=112208357906156w=2 I followed that thread, George, but I think it's a different case with schedule_timeout() [maybe this indicates drivers/other users should maybe be using itimers, but I'll get to that in a sec]. I think I miss understood back then :). With schedule_timeout(), we are just given a relative jiffies value. We have no context as to which task is requesting the delay, per se, meaning we don't (can't) know from the interface whether this is the first delay in a sequence, or a brand new one, without changing all users to have some sort of control structure. The callers of schedule_timeout() don't even get a pointer to the timer added internally. So, adding 1 to all sleeps seems like it might be reasonable, as looping sleeps probably need to use a different interface. I had worked a bit ago on something like poll_event() with the kernel-janitors group, which would abstract out the repeated sleeps. Basically wait_event() without wait-queues... Maybe we could make such an interface just use itimers? I've attached my old patch for poll_event(), just for reference. I think not. itimers is really pointed at a particular system call and has resources in the task structure to do it. These would be hard to share... My point, I guess, is that in the schedule_timeout() case, we don't know where the jiffies edge is, as we either expire or receive a wait-queue event/signal, we never mod_timer() the internal timer... So we have to assume that we need to sleep the request. But maybe Roman's idea of sleeping a certain number of jiffy edges is sufficient. I am not yet convinced driver authors want/need such an interface, though, still thinking it over. IMNSHO we should not get too parental with kernel only interfaces. Adding 1 is easy enough for the caller and even easier to explain in the instructions (i.e. this call sleeps for X jiffies edges). This allows the caller to do more if needed and, should he ever just want to sync to the next jiffie he does not have to deal with backing out that +1. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] KGDB for Real-Time Preemption systems
I have put a version of KGDB for x86 RT kernels here: http://source.mvista.com/~ganzinger/ The common_kgdb_cfi_ stuff creates debug records for entry.S and friends so that you can bt through them. Apply in this order: Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch This is, more or less, the same kgdb that is in Andrew's mm tree changed to fix the RT issues. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features
Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: * George Anzinger wrote: Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. ah - thanks, applied. Might explain some of the recent SMP weirdnesses i'm seeing. Attributed them to the HRT patch ;-) i'm still seeing weird crashes under SMP, which go away if i disable CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs in the HRT code) It happens sometime during the bootup, after enabling the network but before users can log in. There's no good debug info, just a hang that comes from all CPUs trying to get some debug info out but crashing deeply. I haven't looked at this new code all that closely as yet. One thing I did notice is that there is an assumption that the "timer being delivered flag" can be shared between LR timers and HR timers. I suspect this is wrong as the delivery code is in seperate threads (I assume). This could lead to del_timer_async missing a timer. In the prior patch we just ignored the del_timer_async issue for HR timers (code I plan to do soon). This WAS taken care of in earlier kernels by a reuse of one of the list link fields, but Andrew convince me that this was _not_ good. So, my guess, a nanosleep for an RT task (I think you said these are promoted to HR) is completing and over writing the deliver in progress flag for a LR timer which just happens to have a del_timer_sync going on at the same time. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers RCU-tasklist features
Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: * George Anzinger george@mvista.com wrote: Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. ah - thanks, applied. Might explain some of the recent SMP weirdnesses i'm seeing. Attributed them to the HRT patch ;-) i'm still seeing weird crashes under SMP, which go away if i disable CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs in the HRT code) It happens sometime during the bootup, after enabling the network but before users can log in. There's no good debug info, just a hang that comes from all CPUs trying to get some debug info out but crashing deeply. I haven't looked at this new code all that closely as yet. One thing I did notice is that there is an assumption that the timer being delivered flag can be shared between LR timers and HR timers. I suspect this is wrong as the delivery code is in seperate threads (I assume). This could lead to del_timer_async missing a timer. In the prior patch we just ignored the del_timer_async issue for HR timers (code I plan to do soon). This WAS taken care of in earlier kernels by a reuse of one of the list link fields, but Andrew convince me that this was _not_ good. So, my guess, a nanosleep for an RT task (I think you said these are promoted to HR) is completing and over writing the deliver in progress flag for a LR timer which just happens to have a del_timer_sync going on at the same time. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] eliminte NMI entry/ exit code
Zachary Amsden wrote: George Anzinger wrote: Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add <- interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... So NMI changing the preempt count and restoring in the middle of a RWM is not the problem. Thus I don't understand what the issue is. NMI must undo all side effects. Does the PREEMPT_DEBUG code check the count somewhere within the NMI handler? If so, shouldn't the proper fix be to make that code aware that it could be running inside of an NMI and/or ensure that code is not called from within the NMI handler? Yes that is the problem. The sanity check in PREEMPT_DEBUG fails when called from the NMI handler. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] eliminte NMI entry/ exit code
Zachary Amsden wrote: George Anzinger wrote: Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add - interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... So NMI changing the preempt count and restoring in the middle of a RWM is not the problem. Thus I don't understand what the issue is. NMI must undo all side effects. Does the PREEMPT_DEBUG code check the count somewhere within the NMI handler? If so, shouldn't the proper fix be to make that code aware that it could be running inside of an NMI and/or ensure that code is not called from within the NMI handler? Yes that is the problem. The sanity check in PREEMPT_DEBUG fails when called from the NMI handler. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] eliminte NMI entry/ exit code
Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add <- interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features
Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzinger hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
[PATCH] eliminte NMI entry/ exit code
The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzinger hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Bill Davidsen wrote: George Anzinger wrote: Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. What are the portability and scaling issues if it were done this way? It clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm less sure how it would work. I am not sure how much is involved. For VST I disabled the tick generated NMI watchdog interrupt on a per cpu basis but stopped the PIT tick only when all cpus were idle. The next step would be to mess with the interrupt steering logic to keep the tick away from idle cpus. I did not get into this level in my work, being mainly interested in embedded systems. But when you go to non-x86 hardware, is there always going to be another source of wakeup available if the PIT is blocked instead of reset? I have to go back and look at how SPARC hardware works, I don't remember enough to be useful. Most (all) other archs don't have PITs. The x86 sucks big time when it comes to time keeping hardware. The most common hardware is a counter that runs forever (much as the TSC but FIXED in frequency). Interrupts are generated either by comparing a register to this or using companion counters that just count down to zero. In either case you don't loose time because you can always precisely set up an interrupt. To sleep, then, you just set your sleep time in the normal time base interrupt counter. At the end, you know exactly what to set to get back to the regular tick. These other platforms make VST and High Res Timers so easy... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Bill Davidsen wrote: George Anzinger wrote: Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. What are the portability and scaling issues if it were done this way? It clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm less sure how it would work. I am not sure how much is involved. For VST I disabled the tick generated NMI watchdog interrupt on a per cpu basis but stopped the PIT tick only when all cpus were idle. The next step would be to mess with the interrupt steering logic to keep the tick away from idle cpus. I did not get into this level in my work, being mainly interested in embedded systems. But when you go to non-x86 hardware, is there always going to be another source of wakeup available if the PIT is blocked instead of reset? I have to go back and look at how SPARC hardware works, I don't remember enough to be useful. Most (all) other archs don't have PITs. The x86 sucks big time when it comes to time keeping hardware. The most common hardware is a counter that runs forever (much as the TSC but FIXED in frequency). Interrupts are generated either by comparing a register to this or using companion counters that just count down to zero. In either case you don't loose time because you can always precisely set up an interrupt. To sleep, then, you just set your sleep time in the normal time base interrupt counter. At the end, you know exactly what to set to get back to the regular tick. These other platforms make VST and High Res Timers so easy... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] eliminte NMI entry/ exit code
The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzingergeorge@mvista.com hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers RCU-tasklist features
Ingo, all I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. Someone put code in the NMI path to modify the preempt count which, often as not will generate a PREEMPT_DEBUG message as there is no tell what state the preempt count is in on an NMI interrupt. I have sent the attached patch to Andrew on this, but meanwhile, if you want RT, SMP, PREEMPT_DEBUG you will be much better off with this. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Defect Fix Description: Modifying a word from NMI code runs the very real risk of loosing either then new or the old bits. Remember, we can not prevent an NMI interrupt from ANYWHERE, inparticular between the read and the write of a read modify write sequence. This patch removes the update of the preempt count from the NMI path. Signed-off-by: George Anzingergeorge@mvista.com hardirq.h |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) Index: linux-2.6.13-rc/include/linux/hardirq.h === --- linux-2.6.13-rc.orig/include/linux/hardirq.h +++ linux-2.6.13-rc/include/linux/hardirq.h @@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int #else # define synchronize_irq(irq) barrier() #endif - -#define nmi_enter()irq_enter() -#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET) +/* + * Re think these. NMI _must_not_ share data words with non-nmi code + * Meanwhile, just do a no-op. + */ +#define nmi_enter()/* irq_enter() */ +#define nmi_exit() /* sub_preempt_count(HARDIRQ_OFFSET) */ #ifndef CONFIG_VIRT_CPU_ACCOUNTING static inline void account_user_vtime(struct task_struct *tsk)
Re: [PATCH] eliminte NMI entry/ exit code
Nick Piggin wrote: George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute here... NMI restores preempt_count back to its old value upon exit, right? So what does a race case look like? Normal code NMI fetch preempt_count add - interrupt here add and store then subtract and store, darn! store preempt_count Ok, no problem. The problem is in the RT code when PREEMPT_DEBUG is on. The tests for reasonable counts fail because of the rather undefined state when NMI picks up the word. The failure is on the NMI side... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: ~ Do you have a patch around for improving next_timer_interrupt()? Well, sort of. The code in the VST patch does the right thing. Problem is it does a bit more than the timer.c code. You can find that code on the HRT site CVS. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote: IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. George, Can't TSC (or equivalent) serve as a backup while PIT is disabled, especially considering that we disable PIT only for short duration in practice (few seconds maybe) _and_ that we don't have HRT support yet? I think it really depends on what you want. If you really want to keep good time, the only rock in town is the one connected to the PIT (and the pmtimer). The problem is, if you want the jiffie edge to be stable, there is just now way to reprogram the PIT to get it back to where it was. In an old version of HRT I did a trick of loading a short count (based on reading the TSC or pmtimer) and then put the LATCH count on top of it. In a correctly performing PIT, it should count down the short count, interrupt, load the long count and continue from there. Aside from the machines that had BAD PITs (they reset on the load instead of the expiry of the current count) there were other problems that, in the end, cause loss of time (too fast, too slow, take your pick). I also found PITs that signaled that they had loaded the count (they set a status bit) prior to actually loading it. All in all, I find the PIT is just an ugly beast to try to program. On the other hand, if you want regular interrupts at some fixed period, it will do this forever (give or take a epoch or two;) with out touching anything after the initial program set up. In the end, I concluded that, for the community kernel, it is really best to just interrupt the irq line and leave the PIT run. Then you use the TSC or pmtimer to figure the gross loss of interrupts and leave the PIT interrupt again to define the jiffie edge. If you have other, more pressing, concerns I suppose you can program the PIT, but don't expect your wall clock to be as stable as it is now. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: ~ Do you have a patch around for improving next_timer_interrupt()? Well, sort of. The code in the VST patch does the right thing. Problem is it does a bit more than the timer.c code. You can find that code on the HRT site CVS. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: * Srivatsa Vaddagiri <[EMAIL PROTECTED]> [050805 05:37]: On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote: This is the dynamic ticks patch for i386 as written by Tony Lindgen <[EMAIL PROTECTED]> and Tuukka Tikkanen <[EMAIL PROTECTED]>. Patch for 2.6.13-rc5 There were a couple of things that I wanted to change so here is an updated version. This code should have stabilised enough for general testing now. Con, I have been looking at some of the requirement of tickless idle CPUs in core kernel areas like scheduler and RCU. Basically, both power management and virtualization benefit if idle CPUs can cut off useless timer ticks. Especially from a virtualization standpoint, I think it makes sense that we enable this feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and when they become idle. The benefit of this is more visible in platforms that host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be partially idle (some CPUs in it are idle, some not) and it is good that we quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports of Linux kernel have this ability today (S390 has it in mainline already and Xen has it out of tree). Good point, and it would be nice to have it resolved for systems that support idling individual CPUs. The current setup was done because when I was tinkering with the amd76x_pm patch a while a back, I noticed that idling the cpu disconnects all cpus from the bus. (As far as I remember) So this may need to be configured depending on the system. From this viewpoint, I think the current implementation of dynamic tick falls short of this requirement. It cuts of the timer ticks only when all CPUs go idle. Apart from this observation, I have some others about the current dynamic tick patch: - All CPUs seem to cut off the same number of ticks (dyn_tick->skip). Isn't this wrong, considering that the timer list is per-CPU? This will cause some timers to be serviced much later than usual. Yes if it's done on per-CPU basis. In the current setup the first interrupt will kick the system off the dyn-tick state and the timers get checked again. - The fact that dyn_tick_state is global and accessed from all CPUs is probably a scalability concern, especially if we allow the ticks to be cut off on per-CPU basis. From idling devices point of view, we still need some global variable I believe. How else would you be able to tell all devices that the whole system does not have any timers for next 2 seconds? - Again, when we allow this on a per-CPU basis, subsystems like RCU need to know the partial set of idle CPUs. RCU already does that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map). Sounds like that could work for dyn-tick too. - Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling do_timer_interrupt so many times and instead update jiffies to (skipped_ticks - 1) and then call do_timer_interrupt once? I think VST does it that way. In the long run we would do the calculations in usecs and just emulate jiffies from the hw timer. But yes, optimizing updating the time would be great. - dyn_tick->max_skip = 0xff / apic_timer_val; From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the above calculation take only 24-bits into account? What am I missing here? Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is optional support, or maybe I accidentally pulled the optional 24-bit support from the ACPI PM timer. But in any case on P4 systems the APIC timer is not the bottleneck as stopping or reprogramming PIT also kills APIC. (This does not happen on P3 systems). So the bottleneck most likely is the length of PIT. I can take a shot at addressing these concerns in dynamic_tick patch, but it seems to me that VST has already addressed all these to a big extent. Had you considered VST before? The biggest bottleneck I see in VST going mainline is its dependency on HRT patch but IMO it should be possible to write a small patch to support VST w/o HRT. George, what do you think? HRT + VST depend on APIC only, and does not use next_timer_interrupt(). I convinced my self that the next_timer... code in timer.c misses timers (i.e. gives the wrong answer). I did this (after wondering due to performance) by scanning the whole timer list after I had the next_timer... answer and finding a better answer, not always, but some times. That code does not address the cascade list correctly. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote: Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen with further code cleanups. Are were there yet? Con, I am afraid until SMP correctness is resolved, then this is not in a position to go in (unless you want to enable it only for UP, which I think should not be our target). I am working on making this work correctly on SMP systems. Hopefully I will post a patch soon. Another observation I have made regarding dynamic tick patch is that PIT is being reprogrammed whenever the CPUs are coming out of sleep state (because of an interrupt say). This can happen at any arbitary time, not necessarily on jiffy boundaries. As a result, there will be an offset between when jiffy interrupts will now occur vs when they would have originally occured had PIT never been stopped. Not sure if having this offset is good, but atleast one necessary change that I foresee if zeroing delay_at_last_interrupt when disabling dynamic tick. For that matter, it may be easier to disable the PIT timer by just masking PIT interrupts (instead of changing its mode). IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. My VST patch just turns masks the interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5
Srivatsa Vaddagiri wrote: On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote: Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen with further code cleanups. Are were there yet? Con, I am afraid until SMP correctness is resolved, then this is not in a position to go in (unless you want to enable it only for UP, which I think should not be our target). I am working on making this work correctly on SMP systems. Hopefully I will post a patch soon. Another observation I have made regarding dynamic tick patch is that PIT is being reprogrammed whenever the CPUs are coming out of sleep state (because of an interrupt say). This can happen at any arbitary time, not necessarily on jiffy boundaries. As a result, there will be an offset between when jiffy interrupts will now occur vs when they would have originally occured had PIT never been stopped. Not sure if having this offset is good, but atleast one necessary change that I foresee if zeroing delay_at_last_interrupt when disabling dynamic tick. For that matter, it may be easier to disable the PIT timer by just masking PIT interrupts (instead of changing its mode). IMNOHO, this is the ONLY way to keep proper time. As soon as you reprogram the PIT you have lost track of the time. My VST patch just turns masks the interrupt. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3
Tony Lindgren wrote: * Srivatsa Vaddagiri [EMAIL PROTECTED] [050805 05:37]: On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote: This is the dynamic ticks patch for i386 as written by Tony Lindgen [EMAIL PROTECTED] and Tuukka Tikkanen [EMAIL PROTECTED]. Patch for 2.6.13-rc5 There were a couple of things that I wanted to change so here is an updated version. This code should have stabilised enough for general testing now. Con, I have been looking at some of the requirement of tickless idle CPUs in core kernel areas like scheduler and RCU. Basically, both power management and virtualization benefit if idle CPUs can cut off useless timer ticks. Especially from a virtualization standpoint, I think it makes sense that we enable this feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and when they become idle. The benefit of this is more visible in platforms that host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be partially idle (some CPUs in it are idle, some not) and it is good that we quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports of Linux kernel have this ability today (S390 has it in mainline already and Xen has it out of tree). Good point, and it would be nice to have it resolved for systems that support idling individual CPUs. The current setup was done because when I was tinkering with the amd76x_pm patch a while a back, I noticed that idling the cpu disconnects all cpus from the bus. (As far as I remember) So this may need to be configured depending on the system. From this viewpoint, I think the current implementation of dynamic tick falls short of this requirement. It cuts of the timer ticks only when all CPUs go idle. Apart from this observation, I have some others about the current dynamic tick patch: - All CPUs seem to cut off the same number of ticks (dyn_tick-skip). Isn't this wrong, considering that the timer list is per-CPU? This will cause some timers to be serviced much later than usual. Yes if it's done on per-CPU basis. In the current setup the first interrupt will kick the system off the dyn-tick state and the timers get checked again. - The fact that dyn_tick_state is global and accessed from all CPUs is probably a scalability concern, especially if we allow the ticks to be cut off on per-CPU basis. From idling devices point of view, we still need some global variable I believe. How else would you be able to tell all devices that the whole system does not have any timers for next 2 seconds? - Again, when we allow this on a per-CPU basis, subsystems like RCU need to know the partial set of idle CPUs. RCU already does that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map). Sounds like that could work for dyn-tick too. - Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling do_timer_interrupt so many times and instead update jiffies to (skipped_ticks - 1) and then call do_timer_interrupt once? I think VST does it that way. In the long run we would do the calculations in usecs and just emulate jiffies from the hw timer. But yes, optimizing updating the time would be great. - dyn_tick-max_skip = 0xff / apic_timer_val; From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the above calculation take only 24-bits into account? What am I missing here? Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is optional support, or maybe I accidentally pulled the optional 24-bit support from the ACPI PM timer. But in any case on P4 systems the APIC timer is not the bottleneck as stopping or reprogramming PIT also kills APIC. (This does not happen on P3 systems). So the bottleneck most likely is the length of PIT. I can take a shot at addressing these concerns in dynamic_tick patch, but it seems to me that VST has already addressed all these to a big extent. Had you considered VST before? The biggest bottleneck I see in VST going mainline is its dependency on HRT patch but IMO it should be possible to write a small patch to support VST w/o HRT. George, what do you think? HRT + VST depend on APIC only, and does not use next_timer_interrupt(). I convinced my self that the next_timer... code in timer.c misses timers (i.e. gives the wrong answer). I did this (after wondering due to performance) by scanning the whole timer list after I had the next_timer... answer and finding a better answer, not always, but some times. That code does not address the cascade list correctly. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Roland McGrath wrote: There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. That's correct. de_thread will turn the thread calling exec into the new leader and kill off all the other threads, including the old leader. The exec'ing thread's existing task_struct is reassigned to the PID of the original leader. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. That is correct. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. You are mistaken. The signal code handles process signals sent when the leader is a zombie. The group leader sticks around with the PID that matches the TGID, until there are no live threads with its TGID. That is how process-wide kill can still work. Yes, I see, traced through the signal delivery. So Linus' patch as well as the regression of Ingo's will fix all of this. Right? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote: Roland McGrath <[EMAIL PROTECTED]> wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. OK.. +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co - if (group_dead) + if (group_dead) { + del_timer_sync(>signal->real_timer); acct_process(code); + } +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * - del_timer_sync(>real_timer); That one fixes it for me. There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. So, don't we need to also change real_timer's task when the exiting task is the real_timer wake up task, assigning it to some other member of the group? Note, I don't say just if it is the group leader... Then when we finally release the signal structure, we can "del" the timer. Did I miss something here? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote: Roland McGrath [EMAIL PROTECTED] wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. OK.. +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co - if (group_dead) + if (group_dead) { + del_timer_sync(tsk-signal-real_timer); acct_process(code); + } +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * - del_timer_sync(sig-real_timer); That one fixes it for me. There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. So, don't we need to also change real_timer's task when the exiting task is the real_timer wake up task, assigning it to some other member of the group? Note, I don't say just if it is the group leader... Then when we finally release the signal structure, we can del the timer. Did I miss something here? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Roland McGrath wrote: There are other concerns. Let me see if I understand this. A thread (other than the leader) can exec and we then need to change the real_timer to wake the new task which will NOT be using the same task struct. That's correct. de_thread will turn the thread calling exec into the new leader and kill off all the other threads, including the old leader. The exec'ing thread's existing task_struct is reassigned to the PID of the original leader. My looking at the code shows that the thread leader can exit and then stays around as a zombi until the last thread in the group exits. That is correct. If an alarm comes during this wait I suspect it will wake this zombi and cause problems. You are mistaken. The signal code handles process signals sent when the leader is a zombie. The group leader sticks around with the PID that matches the TGID, until there are no live threads with its TGID. That is how process-wide kill can still work. Yes, I see, traced through the signal delivery. So Linus' patch as well as the regression of Ingo's will fix all of this. Right? -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Andrew Morton wrote: Roland McGrath <[EMAIL PROTECTED]> wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. Hm... I was looking at 2.6.10 to figure it out. This looks more correct. OK.. --- 25/kernel/exit.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co acct_update_integrals(tsk); update_mem_hiwater(tsk); group_dead = atomic_dec_and_test(>signal->live); - if (group_dead) + if (group_dead) { + del_timer_sync(>signal->real_timer); acct_process(code); + } exit_mm(tsk); exit_sem(tsk); diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c --- 25/kernel/posix-timers.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * tmr = list_entry(sig->posix_timers.next, struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(>real_timer); } /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: Hi, Somewhere between 2.6.11 and 2.6.12 the regression in $subject was added to the linux kernel. Testcase below. Yep. The itimer changes got a bit carried away. Here is a fix. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger Type: Defect Fix Description: The changes to itimer of late (after 2.6.11) cause itimers not to survive the exec* calls. Standard says they should. Signed-off-by: George Anzinger exit.c |1 + posix-timers.c |4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.13-rc/kernel/exit.c === --- linux-2.6.13-rc.orig/kernel/exit.c +++ linux-2.6.13-rc/kernel/exit.c @@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co } tsk->flags |= PF_EXITING; + del_timer_sync(>signal->real_timer); /* * Make sure we don't try to process any timer firings Index: linux-2.6.13-rc/kernel/posix-timers.c === --- linux-2.6.13-rc.orig/kernel/posix-timers.c +++ linux-2.6.13-rc/kernel/posix-timers.c @@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct * struct k_itimer *tmr; while (!list_empty(>posix_timers)) { - tmr = list_entry(sig->posix_timers.next, struct k_itimer, list); + tmr = list_entry(sig->posix_timers.next, +struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(>real_timer); } /*
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ Sorry, I forgot that sys_nanosleep() also always adds 1 to the request (to account for this same issue, I believe, as POSIX demands no early return from nanosleep() calls). There are some other locations where similar + (t.tv_sec || t.tv_nsec) This is not the same as "always add 1". We don't do it this way just to have fun with C. If you change schedule_timeout() to add the 1, nanosleep() will need to do things differently to get the same behavior. (And, YES users do pass in zero sleep times.) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
msecs_to_jiffies(msecs) + 1; + unsigned long timeout = msecs_to_jiffies(msecs); while (timeout && !signal_pending(current)) { set_current_state(TASK_INTERRUPTIBLE); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
; + unsigned long timeout = msecs_to_jiffies(msecs); while (timeout !signal_pending(current)) { set_current_state(TASK_INTERRUPTIBLE); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()
Nishanth Aravamudan wrote: ~ Sorry, I forgot that sys_nanosleep() also always adds 1 to the request (to account for this same issue, I believe, as POSIX demands no early return from nanosleep() calls). There are some other locations where similar + (t.tv_sec || t.tv_nsec) This is not the same as always add 1. We don't do it this way just to have fun with C. If you change schedule_timeout() to add the 1, nanosleep() will need to do things differently to get the same behavior. (And, YES users do pass in zero sleep times.) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Gerd Knorr wrote: Hi, Somewhere between 2.6.11 and 2.6.12 the regression in $subject was added to the linux kernel. Testcase below. Yep. The itimer changes got a bit carried away. Here is a fix. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ Source: MontaVista Software, Inc. George Anzinger george@mvista.com Type: Defect Fix Description: The changes to itimer of late (after 2.6.11) cause itimers not to survive the exec* calls. Standard says they should. Signed-off-by: George Anzingergeorge@mvista.com exit.c |1 + posix-timers.c |4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6.13-rc/kernel/exit.c === --- linux-2.6.13-rc.orig/kernel/exit.c +++ linux-2.6.13-rc/kernel/exit.c @@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co } tsk-flags |= PF_EXITING; + del_timer_sync(tsk-signal-real_timer); /* * Make sure we don't try to process any timer firings Index: linux-2.6.13-rc/kernel/posix-timers.c === --- linux-2.6.13-rc.orig/kernel/posix-timers.c +++ linux-2.6.13-rc/kernel/posix-timers.c @@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct * struct k_itimer *tmr; while (!list_empty(sig-posix_timers)) { - tmr = list_entry(sig-posix_timers.next, struct k_itimer, list); + tmr = list_entry(sig-posix_timers.next, +struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(sig-real_timer); } /*
Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more
Andrew Morton wrote: Roland McGrath [EMAIL PROTECTED] wrote: That's wrong. It has to be done only by the last thread in the group to go. Just revert Ingo's change. Hm... I was looking at 2.6.10 to figure it out. This looks more correct. OK.. --- 25/kernel/exit.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/exit.c Thu Aug 4 15:01:06 2005 @@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co acct_update_integrals(tsk); update_mem_hiwater(tsk); group_dead = atomic_dec_and_test(tsk-signal-live); - if (group_dead) + if (group_dead) { + del_timer_sync(tsk-signal-real_timer); acct_process(code); + } exit_mm(tsk); exit_sem(tsk); diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c --- 25/kernel/posix-timers.c~revert-timer-exit-cleanup Thu Aug 4 15:00:55 2005 +++ 25-akpm/kernel/posix-timers.c Thu Aug 4 15:01:06 2005 @@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct * tmr = list_entry(sig-posix_timers.next, struct k_itimer, list); itimer_delete(tmr); } - del_timer_sync(sig-real_timer); } /* _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
Keith Owens wrote: On Tue, 02 Aug 2005 18:12:27 -0700, George Anzinger wrote: How about something like: if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC) current points to the current struct task, regs points to the kernel stack. Those two data areas can be completely separate, as they are on i386. Also i386 uses a separate kernel stack for interrupts. Acually I must mean the thread_info and not current. i386 only uses a seperate stack if you use 4K stacks. I think others use seperate interrupt stacks, however :(. Also, on thinking on it, I think some archs don't call the registers pt_regs either. Oh, well, it was a thought... Waiting for its brother... :) -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
Steven Rostedt wrote: On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote: Couldn't you just do some math off current->timestamp to see how long the task has been running? This per arch stuff seems a bit invasive.. The thing is, I'm tracking how long the task is running in the kernel without doing a schedule. That's actually easy, but I don't want to count when the task is in userspace. The per-arch is only updating so that we don't count user space, otherwise the count could be in the task_struct. If there is an arch-independent way to tell if a task is running in user-space or kernel when an interrupt goes off then I would use it. The per arch is actually easy, and I would write it, but I don't have the hardware now to test it. I could at least do PPC and MIPS since I'm quite familiar with both, but I don't currently have a cross compiler to compile it. I understand your point, I would really prefer an arch independent solution, but the timestamp from current just wont cut it. Have another idea, I'm all open for it. How about something like: if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC) The idea is that an interrupt from user space will be the ONLY thing on the stack while an interrupt from the kernel will have kernel stack under it. Current is the bottom end of the kernel stack and regs + sizeof(pt_regs) is where the interrupt context started. Assumptions a) stack grows down, b) no switch stack at interrupt. MAGIC is some small number. For x86 user it is actually zero, don't know about others but the saved context should be the first thing on the stack so a minimun frame size should do. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/