Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread George Anzinger

On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
> Jan Kiszka wrote:
>> George Anzinger wrote:
>>> On 01/30/2008 04:08 PM,  Jan Kiszka was caught saying:
>>>> [Here comes a rebased version against latest x86/mm]
>>>>
>>>> In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up
>>>> and connect to the front-end already during early_param evaluation.
>>>> This
>>>> fails on x86 as the exception stack is not yet initialized, 
effectively

>>>> delaying kgdbwait until late-init.
>>>
>>> I wonder how much work it would take to just set up the exception
>>> stack and proceed.  After all the kgbdwait is there to help debug
>>> very early kernel code...
>>
>> In principle a valid question, but I'm not the one to answer it. I
>> would not feel very well if I had to reorder this critical setup code.
>> Look, we would have to move trap_init in start_kernel before
>> parse_early_param, and that would affect _every_ arch...

I can not speak to other archs, but for x86 I called trap_init from the 
code that caught the kgdbwait.  At that time (since I retired, I have 
not looked at the actual kernel code) it could be called again later by 
the kernel code.  I.e. I did not try to reorder the kernel bring up 
code, but just added an additional call to trap_init and then only in 
the case of finding a kgdbwait.


As such, this would need to be arch specific...

>>
>
> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
> parse_early_param as well? It should, because my under standing of
> trap_init is that it's the functions to arm things like... exception
> handlers? And that raises the question of the deeper purpose of this
> check (and the invocation of kgdb_early_init from the argument parsing
> function). Sigh, KGDB is still a quite improvable piece of code.

Likely.  Once you get it in the main line kernel, one would hope that 
other arch code would be forth coming as many more "eyes" will be in play.

>
> Jan
>
> PS: Can we move this to some public list?

Sure, sorry I picked the wrong reply button, never intended it to be 
private.

>

--
George Anzinger   [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread George Anzinger

On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
 Jan Kiszka wrote:
 George Anzinger wrote:
 On 01/30/2008 04:08 PM,  Jan Kiszka was caught saying:
 [Here comes a rebased version against latest x86/mm]

 In case kgdbwait is passed as kernel parameter, KGDB tries to set up
 and connect to the front-end already during early_param evaluation.
 This
 fails on x86 as the exception stack is not yet initialized, 
effectively

 delaying kgdbwait until late-init.

 I wonder how much work it would take to just set up the exception
 stack and proceed.  After all the kgbdwait is there to help debug
 very early kernel code...

 In principle a valid question, but I'm not the one to answer it. I
 would not feel very well if I had to reorder this critical setup code.
 Look, we would have to move trap_init in start_kernel before
 parse_early_param, and that would affect _every_ arch...

I can not speak to other archs, but for x86 I called trap_init from the 
code that caught the kgdbwait.  At that time (since I retired, I have 
not looked at the actual kernel code) it could be called again later by 
the kernel code.  I.e. I did not try to reorder the kernel bring up 
code, but just added an additional call to trap_init and then only in 
the case of finding a kgdbwait.


As such, this would need to be arch specific...



 BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
 parse_early_param as well? It should, because my under standing of
 trap_init is that it's the functions to arm things like... exception
 handlers? And that raises the question of the deeper purpose of this
 check (and the invocation of kgdb_early_init from the argument parsing
 function). Sigh, KGDB is still a quite improvable piece of code.

Likely.  Once you get it in the main line kernel, one would hope that 
other arch code would be forth coming as many more eyes will be in play.


 Jan

 PS: Can we move this to some public list?

Sure, sorry I picked the wrong reply button, never intended it to be 
private.



--
George Anzinger   [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-08 Thread George Anzinger

Serge Noiraud wrote:

mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit :


Serge Noiraud wrote:


...


I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


Where did you get the kgdb you are using?  It looks like kgdb_ts is in 
this version, but it it not in the one on my website 
http://source.mvista.com/~ganzinger/


This related to kgdb?  I.e. does it go away if you either turn off kgdb
at configure time or just don't patch with kgdb?  (It sure seems
unrelated, but...)


I don't get those errors with CONFIG_KGDB=n
bellow I put the diff between a working . config and a non working .config


George



...
 INSTALL sound/usb/snd-usb-audio.ko
 INSTALL sound/usb/snd-usb-lib.ko
 INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F
System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING:


...
If I redo the make command only ( not make rpm ) I obtain the following :
# make
  CHK include/linux/version.h
make[1]: « arch/i386/kernel/asm-offsets.s » est à jour.
  CHK include/linux/compile.h
  CHK usr/initramfs_list
Kernel: arch/i386/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST
*** Warning: "preempt_locks" [net/sunrpc/sunrpc.ko] undefined!
*** Warning: "preempt_locks" [net/appletalk/appletalk.ko] undefined!
*** Warning: "preempt_locks" [fs/reiserfs/reiserfs.ko] undefined!
*** Warning: "preempt_locks" [fs/ntfs/ntfs.ko] undefined!
*** Warning: "preempt_locks" [fs/nfs/nfs.ko] undefined!
*** Warning: "preempt_locks" [fs/minix/minix.ko] undefined!
*** Warning: "preempt_locks" [fs/jbd/jbd.ko] undefined!
*** Warning: "preempt_locks" [fs/ext3/ext3.ko] undefined!
*** Warning: "preempt_locks" [fs/cifs/cifs.ko] undefined!
*** Warning: "preempt_locks" [fs/affs/affs.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/libata.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/ide-scsi.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/gdth.ko] undefined!
*** Warning: "preempt_locks" [drivers/md/raid6.ko] undefined!
*** Warning: "preempt_locks" [drivers/md/raid5.ko] undefined!
*** Warning: "preempt_locks" [drivers/ide/ide-floppy.ko] undefined!
*** Warning: "preempt_locks" [drivers/block/pktcdvd.ko] undefined!
*** Warning: "preempt_locks" [drivers/block/loop.ko] undefined!


preempt_locks is being accessed from a module but is not exported.  This 
is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it 
should build.



#


~

-# CONFIG_EARLY_PRINTK is not set
-# CONFIG_DEBUG_STACKOVERFLOW is not set
+CONFIG_LATENCY_TRACE=y
+CONFIG_RT_DEADLOCK_DETECT=y
+CONFIG_DEBUG_RT_LOCKING_MODE=y <- This one is doing 
it
+CONFIG_DEBUG_KOBJECT=y
+CONFIG_DEBUG_HIGHMEM=y

~

+CONFIG_KGDB=y
+CONFIG_KGDB_9600BAUD=y
+# CONFIG_KGDB_19200BAUD is not set
+# CONFIG_KGDB_38400BAUD is not set
+# CONFIG_KGDB_57600BAUD is not set
+# CONFIG_KGDB_115200BAUD is not set
+CONFIG_KGDB_PORT=0x3f8
+CONFIG_KGDB_IRQ=4
+CONFIG_KGDB_MORE=y
+CONFIG_KGDB_OPTIONS="-O1"
+CONFIG_NO_KGDB_CPUS=8


The following are not in the latest kgdb...

+CONFIG_KGDB_TS=y
+# CONFIG_KGDB_TS_64 is not set
+CONFIG_KGDB_TS_128=y
+# CONFIG_KGDB_TS_256 is not set
+# CONFIG_KGDB_TS_512 is not set
+# CONFIG_KGDB_TS_1024 is not set

.

+CONFIG_STACK_OVERFLOW_TEST=y
+CONFIG_TRAP_BAD_SYSCALL_EXITS=y  <--- I recommend against this one, see notes 
at front of kgdb patch
+CONFIG_KGDB_CONSOLE=y<--- Likewise use this only if you have only 
one serial port and no VGA
+CONFIG_KGDB_SYSRQ=y

 #

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-08 Thread George Anzinger

Serge Noiraud wrote:

mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit :


Serge Noiraud wrote:


...


I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


Where did you get the kgdb you are using?  It looks like kgdb_ts is in 
this version, but it it not in the one on my website 
http://source.mvista.com/~ganzinger/


This related to kgdb?  I.e. does it go away if you either turn off kgdb
at configure time or just don't patch with kgdb?  (It sure seems
unrelated, but...)


I don't get those errors with CONFIG_KGDB=n
bellow I put the diff between a working . config and a non working .config


George



...
 INSTALL sound/usb/snd-usb-audio.ko
 INSTALL sound/usb/snd-usb-lib.ko
 INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F
System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING:


...
If I redo the make command only ( not make rpm ) I obtain the following :
# make
  CHK include/linux/version.h
make[1]: « arch/i386/kernel/asm-offsets.s » est à jour.
  CHK include/linux/compile.h
  CHK usr/initramfs_list
Kernel: arch/i386/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST
*** Warning: preempt_locks [net/sunrpc/sunrpc.ko] undefined!
*** Warning: preempt_locks [net/appletalk/appletalk.ko] undefined!
*** Warning: preempt_locks [fs/reiserfs/reiserfs.ko] undefined!
*** Warning: preempt_locks [fs/ntfs/ntfs.ko] undefined!
*** Warning: preempt_locks [fs/nfs/nfs.ko] undefined!
*** Warning: preempt_locks [fs/minix/minix.ko] undefined!
*** Warning: preempt_locks [fs/jbd/jbd.ko] undefined!
*** Warning: preempt_locks [fs/ext3/ext3.ko] undefined!
*** Warning: preempt_locks [fs/cifs/cifs.ko] undefined!
*** Warning: preempt_locks [fs/affs/affs.ko] undefined!
*** Warning: preempt_locks [drivers/scsi/libata.ko] undefined!
*** Warning: preempt_locks [drivers/scsi/ide-scsi.ko] undefined!
*** Warning: preempt_locks [drivers/scsi/gdth.ko] undefined!
*** Warning: preempt_locks [drivers/md/raid6.ko] undefined!
*** Warning: preempt_locks [drivers/md/raid5.ko] undefined!
*** Warning: preempt_locks [drivers/ide/ide-floppy.ko] undefined!
*** Warning: preempt_locks [drivers/block/pktcdvd.ko] undefined!
*** Warning: preempt_locks [drivers/block/loop.ko] undefined!


preempt_locks is being accessed from a module but is not exported.  This 
is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it 
should build.



#


~

-# CONFIG_EARLY_PRINTK is not set
-# CONFIG_DEBUG_STACKOVERFLOW is not set
+CONFIG_LATENCY_TRACE=y
+CONFIG_RT_DEADLOCK_DETECT=y
+CONFIG_DEBUG_RT_LOCKING_MODE=y - This one is doing 
it
+CONFIG_DEBUG_KOBJECT=y
+CONFIG_DEBUG_HIGHMEM=y

~

+CONFIG_KGDB=y
+CONFIG_KGDB_9600BAUD=y
+# CONFIG_KGDB_19200BAUD is not set
+# CONFIG_KGDB_38400BAUD is not set
+# CONFIG_KGDB_57600BAUD is not set
+# CONFIG_KGDB_115200BAUD is not set
+CONFIG_KGDB_PORT=0x3f8
+CONFIG_KGDB_IRQ=4
+CONFIG_KGDB_MORE=y
+CONFIG_KGDB_OPTIONS=-O1
+CONFIG_NO_KGDB_CPUS=8


The following are not in the latest kgdb...

+CONFIG_KGDB_TS=y
+# CONFIG_KGDB_TS_64 is not set
+CONFIG_KGDB_TS_128=y
+# CONFIG_KGDB_TS_256 is not set
+# CONFIG_KGDB_TS_512 is not set
+# CONFIG_KGDB_TS_1024 is not set

.

+CONFIG_STACK_OVERFLOW_TEST=y
+CONFIG_TRAP_BAD_SYSCALL_EXITS=y  --- I recommend against this one, see notes 
at front of kgdb patch
+CONFIG_KGDB_CONSOLE=y--- Likewise use this only if you have only 
one serial port and no VGA
+CONFIG_KGDB_SYSRQ=y

 #

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can "bt" through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



Hi, everybody

I found two bugs in kgdb-ga-rt patch.

The first one : if CONFIG_SMP is not set, we have a compile error
The second one : if CONFIG_KGDB is not set, we have a link error 
I send you a diff patch to correct this. I am not sure the last patch is 
correct, but it works.


The reported bugs are now rolled into the kgdb patch.  Also, there is a 
new README.txt.  I also included, in the kgdb patch, an updated gdb 
macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu 
macro to:


given a per_cpu structure name and the cpu number returns the
address of that structure, properly typed.

I am also putting my current version of time_stamp_tool.  This is the 
replacement for kgdb_ts() which I have removed from the kgdb patch. 
Still a little rough but it has promise of being arch independent.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can "bt" through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


This related to kgdb?  I.e. does it go away if you either turn off kgdb 
at configure time or just don't patch with kgdb?  (It sure seems 
unrelated, but...)


George


...
  INSTALL sound/usb/snd-usb-audio.ko
  INSTALL sound/usb/snd-usb-lib.ko
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map 
-b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko 
needs unknown symbol preempt_locks

make[3]: *** [_modinst_post] Erreur 1
erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)


Erreur de construction de RPM:
Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)
make[2]: *** [rpm] Erreur 1
make[1]: *** [rpm] Erreur 2
make: *** [rpm] Erreur 2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can bt through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


This related to kgdb?  I.e. does it go away if you either turn off kgdb 
at configure time or just don't patch with kgdb?  (It sure seems 
unrelated, but...)


George


...
  INSTALL sound/usb/snd-usb-audio.ko
  INSTALL sound/usb/snd-usb-lib.ko
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map 
-b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko 
needs unknown symbol preempt_locks

make[3]: *** [_modinst_post] Erreur 1
erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)


Erreur de construction de RPM:
Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)
make[2]: *** [rpm] Erreur 1
make[1]: *** [rpm] Erreur 2
make: *** [rpm] Erreur 2
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can bt through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



Hi, everybody

I found two bugs in kgdb-ga-rt patch.

The first one : if CONFIG_SMP is not set, we have a compile error
The second one : if CONFIG_KGDB is not set, we have a link error 
I send you a diff patch to correct this. I am not sure the last patch is 
correct, but it works.


The reported bugs are now rolled into the kgdb patch.  Also, there is a 
new README.txt.  I also included, in the kgdb patch, an updated gdb 
macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu 
macro to:


given a per_cpu structure name and the cpu number returns the
address of that structure, properly typed.

I am also putting my current version of time_stamp_tool.  This is the 
replacement for kgdb_ts() which I have removed from the kgdb patch. 
Still a little rough but it has promise of being arch independent.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values

2005-09-01 Thread George Anzinger

john stultz wrote:

All,
I recently ran into a bug with an older kernel where xtime's tv_nsec
field had accumulated more then 2 seconds worth of time. The timespec's
tv_nsec is a signed long, however gettimeofday() treats it as an
unsigned long. Thus when the failure occured, very strange and difficult
to debug time problems occurred.

The main cause of the problem I was seeing is already fixed in mainline,
however just to be safe, I figured the following patch would be wise.

I only audited i386 and x86_64, however other arches probably could have
similar signed problems as well.

Please let me know if you have any further comments or feedback.


John,

There is a problem in the way this code handles the conversion to usec. 
 There is a conversion here and also in the get_offset code.  If the 
nanoseconds are carrier until after the addition of the two about 25% of 
the time you will end up with an additional usec in time.  I strongly 
suggest changing to convert to usec after the addition of xtime and 
get_offset time to avoid this.  If the "correct" thing is done in 
clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns 
up as a back step in time WRT gettimeofday and clock_gettime().


George
--


thanks
-john

linux-2.6.13_signed-tv_nsec_A0.patch

diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv)
usec += lost * (USEC_PER_SEC / HZ);
 
 		sec = xtime.tv_sec;

-   usec += (xtime.tv_nsec / 1000);
+   usec += (unsigned long)xtime.tv_nsec / 1000;
} while (read_seqretry(_lock, seq));
 
 	while (usec >= 100) {

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv)
seq = read_seqbegin(_lock);
 
 		sec = xtime.tv_sec;

-   usec = xtime.tv_nsec / 1000;
+   usec = (unsigned long)xtime.tv_nsec / 1000;
 
 		/* i386 does some correction here to keep the clock 
 		   monotonous even when ntpd is fixing drift.

diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo
do {
ticks--;
update_wall_time_one_tick();
-   if (xtime.tv_nsec >= 10) {
+   if ((unsigned long)xtime.tv_nsec >= 10) {
xtime.tv_nsec -= 10;
xtime.tv_sec++;
second_overflow();


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values

2005-09-01 Thread George Anzinger

john stultz wrote:

All,
I recently ran into a bug with an older kernel where xtime's tv_nsec
field had accumulated more then 2 seconds worth of time. The timespec's
tv_nsec is a signed long, however gettimeofday() treats it as an
unsigned long. Thus when the failure occured, very strange and difficult
to debug time problems occurred.

The main cause of the problem I was seeing is already fixed in mainline,
however just to be safe, I figured the following patch would be wise.

I only audited i386 and x86_64, however other arches probably could have
similar signed problems as well.

Please let me know if you have any further comments or feedback.


John,

There is a problem in the way this code handles the conversion to usec. 
 There is a conversion here and also in the get_offset code.  If the 
nanoseconds are carrier until after the addition of the two about 25% of 
the time you will end up with an additional usec in time.  I strongly 
suggest changing to convert to usec after the addition of xtime and 
get_offset time to avoid this.  If the correct thing is done in 
clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns 
up as a back step in time WRT gettimeofday and clock_gettime().


George
--


thanks
-john

linux-2.6.13_signed-tv_nsec_A0.patch

diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv)
usec += lost * (USEC_PER_SEC / HZ);
 
 		sec = xtime.tv_sec;

-   usec += (xtime.tv_nsec / 1000);
+   usec += (unsigned long)xtime.tv_nsec / 1000;
} while (read_seqretry(xtime_lock, seq));
 
 	while (usec = 100) {

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv)
seq = read_seqbegin(xtime_lock);
 
 		sec = xtime.tv_sec;

-   usec = xtime.tv_nsec / 1000;
+   usec = (unsigned long)xtime.tv_nsec / 1000;
 
 		/* i386 does some correction here to keep the clock 
 		   monotonous even when ntpd is fixing drift.

diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo
do {
ticks--;
update_wall_time_one_tick();
-   if (xtime.tv_nsec = 10) {
+   if ((unsigned long)xtime.tv_nsec = 10) {
xtime.tv_nsec -= 10;
xtime.tv_sec++;
second_overflow();


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote:


Tom Rini wrote:


CC: Andi Kleen <[EMAIL PROTECTED]>
This adds a call to notify_die() in the "no context" portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook 
arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c	2005-08-29 
11:09:13.0 -0700

@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
		return; 


+   if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14,
+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.


Please use a more descriptive text than "no context".  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like "bad kernel address" or 
"illegal kernel address".



"no context" is the label we're in, in the code.  What it's actually
used for is "hey, we (== kgdb) tried to read/write a very very bogus
addr, time to longjmp".  If it's not true that kgdb is at fault then we
drop to the debugger anyhow, and the user can see where they came from.

No.  What the user sees is the offending code (i.e. prior to the trap to 
page_fault), NOT how kgdb happend to be called.  The "no_context" is IN 
the _context_ of page_fault, but that is lost by the time you get to 
kgdb and ask to see _why_ (via, hint, hint: "p kgdb_info").


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

CC: Andi Kleen <[EMAIL PROTECTED]>
This adds a call to notify_die() in the "no context" portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

 linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
 1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook  2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c   2005-08-29 11:09:13.0 
-0700
@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
 		return; 
 
+	if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14,

+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
 /*
  * Oops. The kernel tried to access some bad page. We'll have to
  * terminate things with extreme prejudice.


Please use a more descriptive text than "no context".  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like "bad kernel address" or 
"illegal kernel address".



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the no context part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

CC: Andi Kleen [EMAIL PROTECTED]
This adds a call to notify_die() in the no context portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

 linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
 1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook  2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c   2005-08-29 11:09:13.0 
-0700
@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
 		return; 
 
+	if (notify_die(DIE_PAGE_FAULT, no context, regs, error_code, 14,

+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
 /*
  * Oops. The kernel tried to access some bad page. We'll have to
  * terminate things with extreme prejudice.


Please use a more descriptive text than no context.  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like bad kernel address or 
illegal kernel address.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the no context part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote:


Tom Rini wrote:


CC: Andi Kleen [EMAIL PROTECTED]
This adds a call to notify_die() in the no context portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook 
arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c	2005-08-29 
11:09:13.0 -0700

@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
		return; 


+   if (notify_die(DIE_PAGE_FAULT, no context, regs, error_code, 14,
+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.


Please use a more descriptive text than no context.  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like bad kernel address or 
illegal kernel address.



no context is the label we're in, in the code.  What it's actually
used for is hey, we (== kgdb) tried to read/write a very very bogus
addr, time to longjmp.  If it's not true that kgdb is at fault then we
drop to the debugger anyhow, and the user can see where they came from.

No.  What the user sees is the offending code (i.e. prior to the trap to 
page_fault), NOT how kgdb happend to be called.  The no_context is IN 
the _context_ of page_fault, but that is lost by the time you get to 
kgdb and ask to see _why_ (via, hint, hint: p kgdb_info).


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: when or where can the case occur in "linux kernel development " about "kernel preemption"?

2005-08-29 Thread George Anzinger

linux-os (Dick Johnson) wrote:

On Sat, 27 Aug 2005, Sat. wrote:



2005/8/27, Christopher Friesen <[EMAIL PROTECTED]>:


Sat. wrote:


the case about kernel preemption as follow :

the book said "when a process that has a higher priority than the
currenty running process is awakened ".

but I can think about when such case can occur , could you give me an example ?


There may be others, but one common case is when a hardware interrupt
causes the higher priority process to become runnable.  Some examples of
this would be a network packet arriving, or the expiry of a hardware timer.

Chris



unfortunately, I cannot agree with you , normally ,when the kernel
runs in interrupt context , the schedule() should not be invoked
--my views .

then,could anyone  give me a definite example about network like above
or anything else to eluminate  this , ok?

thanks !

--




Sat.



Schedule is never executed from an interrupt, BUT, there may be
kernel threads or even user tasks that are sleeping, waiting
to be awakened when some preliminary interrupt processing has
occurred. The interrupt code may execute one of the wake-up calls
which will cause the target to be put into the run queue as soon
as possible.

Actually, this is not completly true.  The kernel sets a flag while 
handling interrupts that says it is within an interrupt.  This flag is 
cleared on the way out of the interrupt but prior to the return from 
interrupt (rfi) instruction.  Between this flag clearing and the rfi, 
there is a check made to see if the kernel is preemptable and, if so, if 
it is desired (i.e. something more important should run NOW).  If both 
of these are true, schedule is called to do the context switch.  So, 
schedule IS called from within the interrupt, but NOT within the area 
the kernel flags as being in an interrupt which is a subset of the 
actual interrupt.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: when or where can the case occur in linux kernel development about kernel preemption?

2005-08-29 Thread George Anzinger

linux-os (Dick Johnson) wrote:

On Sat, 27 Aug 2005, Sat. wrote:



2005/8/27, Christopher Friesen [EMAIL PROTECTED]:


Sat. wrote:


the case about kernel preemption as follow :

the book said when a process that has a higher priority than the
currenty running process is awakened .

but I can think about when such case can occur , could you give me an example ?


There may be others, but one common case is when a hardware interrupt
causes the higher priority process to become runnable.  Some examples of
this would be a network packet arriving, or the expiry of a hardware timer.

Chris



unfortunately, I cannot agree with you , normally ,when the kernel
runs in interrupt context , the schedule() should not be invoked
--my views .

then,could anyone  give me a definite example about network like above
or anything else to eluminate  this , ok?

thanks !

--




Sat.



Schedule is never executed from an interrupt, BUT, there may be
kernel threads or even user tasks that are sleeping, waiting
to be awakened when some preliminary interrupt processing has
occurred. The interrupt code may execute one of the wake-up calls
which will cause the target to be put into the run queue as soon
as possible.

Actually, this is not completly true.  The kernel sets a flag while 
handling interrupts that says it is within an interrupt.  This flag is 
cleared on the way out of the interrupt but prior to the return from 
interrupt (rfi) instruction.  Between this flag clearing and the rfi, 
there is a check made to see if the kernel is preemptable and, if so, if 
it is desired (i.e. something more important should run NOW).  If both 
of these are true, schedule is called to do the context switch.  So, 
schedule IS called from within the interrupt, but NOT within the area 
the kernel flags as being in an interrupt which is a subset of the 
actual interrupt.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:


George Anzinger [mailto:[EMAIL PROTECTED] wrote:


Well, I checked, it is "int $3".  Why then the panic?  If you try the
boot with kgdb (i.e. wait) and the do:
(gdb) disass gdb_interrupt
What do you find at +75?



Below is the console from the session it is interesting that gdb is not
able to access the memory.   I let it continue and then ctrl-c broke it
later in the boot cycle and tried disass again with the same result.

Feel free to flog me if this is stupid but I have just one EM64T machine
(test) and I'm using a regular P4 machine as dev.  I build the test
kernel on the EM64T machine and then copy the updated sources, object
files, and images via NFS to the dev machine.  I believe I read in the
kgdb doc that it was possible to use to different architecture machines
for test and dev although there wasn't any information about how to do
it.  This is probably the source of the OS/ABI warning.  I can probably
get the mothership to send me another EM64T machine if need be.  


What you need is a cross development environment.  Not having that, your 
gdb is likely not aware of how to talk to the hardware you are using. 
The cross develoment should cost a whole lot less than another machine.


George
--


vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...
warning: A handler for the OS ABI "GNU/Linux" is not built into this
configuration
of GDB.  Attempting to continue with the default i386:x86-64 settings.

Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
0x80503b50 in ?? ()
warning: no shared library support for this OS / ABI
(gdb) disass gdb_interrupt
Dump of assembler code for function gdb_interrupt:
0x80247009 :   Cannot access memory at address
0x80247009
(gdb) c
Continuing.
Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb)
Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version
3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14
MDT 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 3fe2f800 (usable)
 BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS)
 BIOS-e820: 3ff1 - 3ff3 (reserved)
 BIOS-e820: 3ff3 - 3ff4 (ACPI data)
 BIOS-e820: 3ff4 - 3fff (ACPI NVS)
 BIOS-e820: 3fff - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fed13000 - fed1a000 (reserved)
 BIOS-e820: fed1c000 - feda (reserved)
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

George Anzinger wrote:

Wilkerson, Bryan P wrote:


Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1. 
I almost have it working but kgdb seems to have a few issues.  I can get

it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with "target remote /dev/ttyS0" and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, "unable to open an initial
console",  followed by, "Kernel panic - not syncing: Attempted to kill
init!"

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
$echo -e "\003" > /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   



Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.


Well, I checked, it is "int $3".  Why then the panic?  If you try the 
boot with kgdb (i.e. wait) and the do:

(gdb) disass gdb_interrupt
What do you find at +75?






--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Christoph Lameter wrote:

On Fri, 26 Aug 2005, Alex Williamson wrote:



  Would we ever want to favor a frequency shifting timer over anything
else in the system?  If it was noticeable perhaps we'd just need a
callback to re-evaluate the frequency and rescan for the best timer.  If
it happens without notice, a flag that statically assigns it the lowest
priority will due.  Or maybe if the driver factored the frequency
shifting into the drift it would make the timer undesirable without
resorting to flags.  Thanks,



Timers are usually constant. AFAIK Frequency shifts only occur through 
power management. In that case we usually have some notifiers running 
before the change. These notifiers need to switch to a different time 
source if the timer frequency will be shifting or the timer will become 
unavailable.


If there is a notifier, I presume we can track it.  We might want to 
refine things so as to not hit too big a bump when the shift occures, 
but I think it is doable.  The desirability of doing it, I think, 
depends on the availablity of something better.  The access time of the 
TSC is "really" enticing.  Even so, I think a _good_ clock would not 
depend on long term accuracy of something as fast as the TSC.  Vendors 
are even modulating these to reduce RFI, but still, because of its 
speed, it makes the best interpolator for the jiffie to jiffie times.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:

Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1.  


I almost have it working but kgdb seems to have a few issues.  I can get
it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with "target remote /dev/ttyS0" and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, "unable to open an initial
console",  followed by, "Kernel panic - not syncing: Attempted to kill
init!"

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
	$echo -e "\003" > /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   


Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Alex Williamson wrote:

On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote:


I think a priority is something useful for the interpolators. Some of 
the decisions about which time sources to use also have criteria different 
from drift/latency/jitter/cpu. F.e. timers may not survive various 
power-saving configurations. Thus I would think that we need a priority 
plus some flags.


Some of the criteria for choosing a time source may be:



Hi Christoph,

   I sent another followup to this thread with a patch containing a
fairly crude algorithm that I think better explains my starting point.
I'm sure the weighting and scaling factors need work, but I think many
of the criteria you describe will favor the right clock.


1. If a system boots up with a single cpu then there is no question that 
the ITC/TSC should be used because of the fast access.


We need to factor in frequency shifting here, especially if it happens 
with out notice.



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Alex Williamson wrote:

On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote:


I think a priority is something useful for the interpolators. Some of 
the decisions about which time sources to use also have criteria different 
from drift/latency/jitter/cpu. F.e. timers may not survive various 
power-saving configurations. Thus I would think that we need a priority 
plus some flags.


Some of the criteria for choosing a time source may be:



Hi Christoph,

   I sent another followup to this thread with a patch containing a
fairly crude algorithm that I think better explains my starting point.
I'm sure the weighting and scaling factors need work, but I think many
of the criteria you describe will favor the right clock.


1. If a system boots up with a single cpu then there is no question that 
the ITC/TSC should be used because of the fast access.


We need to factor in frequency shifting here, especially if it happens 
with out notice.



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:

Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1.  


I almost have it working but kgdb seems to have a few issues.  I can get
it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with target remote /dev/ttyS0 and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, unable to open an initial
console,  followed by, Kernel panic - not syncing: Attempted to kill
init!

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
	$echo -e \003  /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   


Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Christoph Lameter wrote:

On Fri, 26 Aug 2005, Alex Williamson wrote:



  Would we ever want to favor a frequency shifting timer over anything
else in the system?  If it was noticeable perhaps we'd just need a
callback to re-evaluate the frequency and rescan for the best timer.  If
it happens without notice, a flag that statically assigns it the lowest
priority will due.  Or maybe if the driver factored the frequency
shifting into the drift it would make the timer undesirable without
resorting to flags.  Thanks,



Timers are usually constant. AFAIK Frequency shifts only occur through 
power management. In that case we usually have some notifiers running 
before the change. These notifiers need to switch to a different time 
source if the timer frequency will be shifting or the timer will become 
unavailable.


If there is a notifier, I presume we can track it.  We might want to 
refine things so as to not hit too big a bump when the shift occures, 
but I think it is doable.  The desirability of doing it, I think, 
depends on the availablity of something better.  The access time of the 
TSC is really enticing.  Even so, I think a _good_ clock would not 
depend on long term accuracy of something as fast as the TSC.  Vendors 
are even modulating these to reduce RFI, but still, because of its 
speed, it makes the best interpolator for the jiffie to jiffie times.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

George Anzinger wrote:

Wilkerson, Bryan P wrote:


Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1. 
I almost have it working but kgdb seems to have a few issues.  I can get

it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with target remote /dev/ttyS0 and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, unable to open an initial
console,  followed by, Kernel panic - not syncing: Attempted to kill
init!

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
$echo -e \003  /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   



Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.


Well, I checked, it is int $3.  Why then the panic?  If you try the 
boot with kgdb (i.e. wait) and the do:

(gdb) disass gdb_interrupt
What do you find at +75?






--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:


George Anzinger [mailto:[EMAIL PROTECTED] wrote:


Well, I checked, it is int $3.  Why then the panic?  If you try the
boot with kgdb (i.e. wait) and the do:
(gdb) disass gdb_interrupt
What do you find at +75?



Below is the console from the session it is interesting that gdb is not
able to access the memory.   I let it continue and then ctrl-c broke it
later in the boot cycle and tried disass again with the same result.

Feel free to flog me if this is stupid but I have just one EM64T machine
(test) and I'm using a regular P4 machine as dev.  I build the test
kernel on the EM64T machine and then copy the updated sources, object
files, and images via NFS to the dev machine.  I believe I read in the
kgdb doc that it was possible to use to different architecture machines
for test and dev although there wasn't any information about how to do
it.  This is probably the source of the OS/ABI warning.  I can probably
get the mothership to send me another EM64T machine if need be.  


What you need is a cross development environment.  Not having that, your 
gdb is likely not aware of how to talk to the hardware you are using. 
The cross develoment should cost a whole lot less than another machine.


George
--


vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for
details.
This GDB was configured as i586-suse-linux...
warning: A handler for the OS ABI GNU/Linux is not built into this
configuration
of GDB.  Attempting to continue with the default i386:x86-64 settings.

Using host libthread_db library /lib/tls/libthread_db.so.1.

(gdb) target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
0x80503b50 in ?? ()
warning: no shared library support for this OS / ABI
(gdb) disass gdb_interrupt
Dump of assembler code for function gdb_interrupt:
0x80247009 gdb_interrupt+0:   Cannot access memory at address
0x80247009
(gdb) c
Continuing.
Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb)
Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version
3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14
MDT 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 3fe2f800 (usable)
 BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS)
 BIOS-e820: 3ff1 - 3ff3 (reserved)
 BIOS-e820: 3ff3 - 3ff4 (ACPI data)
 BIOS-e820: 3ff4 - 3fff (ACPI NVS)
 BIOS-e820: 3fff - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fed13000 - fed1a000 (reserved)
 BIOS-e820: fed1c000 - feda (reserved)
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

John McCutchan wrote:

On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote:


Robert Love wrote:


On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:



On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:

~
I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?



You are reading the log correctly. There are two bugs. One is that if we
pass X to idr_get_new_above, it can return X again (doesn't ever seem to
return < X). The other problem is that the find fails on 1024 (and 2048
if we skip 1024).


That IS strange.  1024 is on a "level" boundry, but then next level is 
2**15, not 2**11.  I will take a look.





So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.



Yes that is the exact scenario. Only 2 id's are used at any given time,
and once we hit 1024 things break. This doesn't happen when the tree is
not empty.

Thanks for looking at this!


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

Robert Love wrote:

On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:


On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:


~

dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 
4
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 
4
dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: 
Invalid argument

dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 
4
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024

Note the incrementing wd value even though we are removing them as we go..



What kernel are you running? The wd's should ALWAYS be incrementing, you
should never get the same wd as you did before. From your log, you are
getting the same wd (after you inotify_rm_watch it). I can reproduce
this bug on 2.6.13-rc7.

idr_get_new_above 


isn't returning something above.

Also, the idr layer seems to be breaking when we pass in 1024. I can
reproduce that on my 2.6.13-rc7 system as well.



This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel.

Robert, John, what do you think?   Is this possibly related to the oops seen 
in the log that I reported earlier?  (Which is still showing up 2-3 times per 
day, btw)


There is definitely something broken here.



Jim, George-

We are seeing a problem in the idr layer.  If we do idr_find(1024) when,
say, a low valued idr, like, zero, is unallocated, NULL is returned.


I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?


So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NTP ntp-helper functions

2005-08-25 Thread George Anzinger

john stultz wrote:

Andrew, All,

This patch cleans up a commonly repeated set of changes to the NTP
state variables by adding two helper inline functions:

ntp_clear(): Clears the ntp state variables


How many places is this called in any given arch?  I ask because it 
_may_ save space if it is NOT inlined.  I don't think it is ever in a 
critical code path...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NTP ntp-helper functions

2005-08-25 Thread George Anzinger

john stultz wrote:

Andrew, All,

This patch cleans up a commonly repeated set of changes to the NTP
state variables by adding two helper inline functions:

ntp_clear(): Clears the ntp state variables


How many places is this called in any given arch?  I ask because it 
_may_ save space if it is NOT inlined.  I don't think it is ever in a 
critical code path...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

Robert Love wrote:

On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:


On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:


~

dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 
4
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 
4
dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: 
Invalid argument

dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 
4
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024

Note the incrementing wd value even though we are removing them as we go..



What kernel are you running? The wd's should ALWAYS be incrementing, you
should never get the same wd as you did before. From your log, you are
getting the same wd (after you inotify_rm_watch it). I can reproduce
this bug on 2.6.13-rc7.

idr_get_new_above 


isn't returning something above.

Also, the idr layer seems to be breaking when we pass in 1024. I can
reproduce that on my 2.6.13-rc7 system as well.



This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel.

Robert, John, what do you think?   Is this possibly related to the oops seen 
in the log that I reported earlier?  (Which is still showing up 2-3 times per 
day, btw)


There is definitely something broken here.



Jim, George-

We are seeing a problem in the idr layer.  If we do idr_find(1024) when,
say, a low valued idr, like, zero, is unallocated, NULL is returned.


I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?


So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

John McCutchan wrote:

On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote:


Robert Love wrote:


On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:



On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:

~
I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?



You are reading the log correctly. There are two bugs. One is that if we
pass X to idr_get_new_above, it can return X again (doesn't ever seem to
return  X). The other problem is that the find fails on 1024 (and 2048
if we skip 1024).


That IS strange.  1024 is on a level boundry, but then next level is 
2**15, not 2**11.  I will take a look.





So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.



Yes that is the exact scenario. Only 2 id's are used at any given time,
and once we hit 1024 things break. This doesn't happen when the tree is
not empty.

Thanks for looking at this!


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote:


john stultz wrote:


On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:



Roman Zippel wrote:



Hi,

On Tue, 23 Aug 2005, john stultz wrote:





I'm assuming gettimeofday()/clock_gettime() looks something like:
xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns



At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.



Yes, clearly a forward knowledge of the NTP adjustment is necessary for
gettimeofday(), because after the NTP adjustment has been accumulated
into xtime, there's nothing left for gettimeofday to adjust (its already
been applied). :)



Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)



Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so
it would be added to the nanosecond offset from the last tick (which is
how the current code works). If you are doing scaling (as you have in
the equation above), then the problem goes away, since you can apply the
adjustment consistently through any interval.


Until the end of the correction time...



I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a "* ntp_adj2" is 
needed.  



Ok, so your forcing gettimeofday to be interval aware, so its applying
different fixed NTP adjustments to different chunks of the current
interval. The issue of course is if you're using fixed adjustments, is
that you have to have n ntp adjustments for n intervals, or you have to
apply the same ntp adjustment to multiple intervals. 


Uh, are you saying that one ntpd call can set up several different 
adjustments?  I was assuming that any given call would set up either a 
fixed adjustment for ever or a fixed adjustment to be applied for a 
fixed number of ticks (or until so much correcting was done, which, in 
the end is the same thing at this point in the code).


If ntpd has to come back to change the adjustment, I am assuming that 
some kernel action can be taken at that time to sync the xtime clock and 
the gettimeofday reading of it.  I.e. we would only have to keep track 
of one adjustment with a possible pre specified end time.




I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.



If you argue that we only need two adjustments, why not argue for only
one? You're saying have one adjustment that you apply for the first
tick's worth of time, and a second adjustment that you apply for the
following N ticks' worth of time in the interval. Why the odd base
case? 


Correct me if I am wrong here, but I am assuming that ntpd can ask for 
an adjustment of X amount which the kernel changes into N adjust

Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote:

CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.



Err, the Cyclone does not generate interrupts. So this issue does not
affect those systems.

As for the HPET, it sets its own interrupt frequency based off of
KERNEL_TICK_USEC (which you're right, isn't quite what is used in the
jiffies conversions).  Would it be easier to just adjust that value to
use ACTHZ or CLOCK_TICK_RATE?


If you want to take that approach you would want the HPET to interrupt 
every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger
CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.  This means that systems using these 
interrupt sources will be doing a) incorrect update of xtime and b) 
incorrect conversion of jiffies.  Since these two values will track each 
other this will not be seen by simple gettimeofday(); 
sleep();gettimeofday() tests, but will be seen as a system clock drift 
(without NTP) or with NTP, a somewhat high drift rate (to the point of 
loosing sync at HZ=1000).


The fact that the user/ system chooses the clock to use at boot time and 
can change the clock after boot means that it is not possible to pin 
down CLOCK_TICK_RATE at compile time.  However, since the computation of 
TICK_NSEC and the conversion constants is rather involved it is clear 
that we REALLY do want to compute these at compile time.


The suggested solution is to a) set up a structure with the default 
(clock of choice at config time) conversion constants in it at compile 
time.  Then b) at clock init time, populate the structure with the 
proper constants for the given clock.  These can be computed at compile 
time, but from the correct  CLOCK_TICK_RATE for the given clock. 
Switching to a fall back clock would also require an update of this 
structure.


Commits?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdbwait in 2.6.13-rc4-mm1?

2005-08-24 Thread George Anzinger

Wilkerson, Bryan P wrote:

Is there an equivalent kernel boot option for kgdbwait in
2.6.13-rc4-mm1?  I grep'd the kernel source but didn't find kgdbwait.

Is there any documentation other than the source for the flavor of KGDB
that is included in the akpm kernel patch?   


The patch has some documentation at Documentation/i386/kgdb/* as well as 
a couple of gdb macros...


The wait option is "gdb".  This has been in flux so, to be absolutely 
sure, look at include/asm-i386/bugs.h

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:


Roman Zippel wrote:


Hi,

On Tue, 23 Aug 2005, john stultz wrote:




I'm assuming gettimeofday()/clock_gettime() looks something like:
 xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns

At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.


Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)


I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a "* ntp_adj2" is 
needed.  I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.







It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.



Well, in many arches gettimeofday() works around the above issue by
capping the offset_ns value as such:


I think this may have been done with only usec gettimeofday.  Now that 
we have clock_gettime() returning nsec, we need to be a bit more careful.


gettimeofday:
xtime + min(offset_ns, tick_len + ntp_adj)

The problem with this is that when we have lost or late ticks, or if we
are using dynamic ticks you have granularity problems.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote:


john stultz wrote:


On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:



Roman Zippel wrote:



Hi,

On Tue, 23 Aug 2005, john stultz wrote:





I'm assuming gettimeofday()/clock_gettime() looks something like:
xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: xtime + (cycle_offset * mult +
error)  shift. The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns



At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.



Yes, clearly a forward knowledge of the NTP adjustment is necessary for
gettimeofday(), because after the NTP adjustment has been accumulated
into xtime, there's nothing left for gettimeofday to adjust (its already
been applied). :)



Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)



Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so
it would be added to the nanosecond offset from the last tick (which is
how the current code works). If you are doing scaling (as you have in
the equation above), then the problem goes away, since you can apply the
adjustment consistently through any interval.


Until the end of the correction time...



I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a * ntp_adj2 is 
needed.  



Ok, so your forcing gettimeofday to be interval aware, so its applying
different fixed NTP adjustments to different chunks of the current
interval. The issue of course is if you're using fixed adjustments, is
that you have to have n ntp adjustments for n intervals, or you have to
apply the same ntp adjustment to multiple intervals. 


Uh, are you saying that one ntpd call can set up several different 
adjustments?  I was assuming that any given call would set up either a 
fixed adjustment for ever or a fixed adjustment to be applied for a 
fixed number of ticks (or until so much correcting was done, which, in 
the end is the same thing at this point in the code).


If ntpd has to come back to change the adjustment, I am assuming that 
some kernel action can be taken at that time to sync the xtime clock and 
the gettimeofday reading of it.  I.e. we would only have to keep track 
of one adjustment with a possible pre specified end time.




I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.



If you argue that we only need two adjustments, why not argue for only
one? You're saying have one adjustment that you apply for the first
tick's worth of time, and a second adjustment that you apply for the
following N ticks' worth of time in the interval. Why the odd base
case? 


Correct me if I am wrong here, but I am assuming that ntpd can ask for 
an adjustment of X amount which the kernel changes into N adjustments of 
X/N amount spread over the next N

Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:


Roman Zippel wrote:


Hi,

On Tue, 23 Aug 2005, john stultz wrote:




I'm assuming gettimeofday()/clock_gettime() looks something like:
 xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: xtime + (cycle_offset * mult +
error)  shift. The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns

At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.


Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)


I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a * ntp_adj2 is 
needed.  I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.







It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.



Well, in many arches gettimeofday() works around the above issue by
capping the offset_ns value as such:


I think this may have been done with only usec gettimeofday.  Now that 
we have clock_gettime() returning nsec, we need to be a bit more careful.


gettimeofday:
xtime + min(offset_ns, tick_len + ntp_adj)

The problem with this is that when we have lost or late ticks, or if we
are using dynamic ticks you have granularity problems.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdbwait in 2.6.13-rc4-mm1?

2005-08-24 Thread George Anzinger

Wilkerson, Bryan P wrote:

Is there an equivalent kernel boot option for kgdbwait in
2.6.13-rc4-mm1?  I grep'd the kernel source but didn't find kgdbwait.

Is there any documentation other than the source for the flavor of KGDB
that is included in the akpm kernel patch?   


The patch has some documentation at Documentation/i386/kgdb/* as well as 
a couple of gdb macros...


The wait option is gdb.  This has been in flux so, to be absolutely 
sure, look at include/asm-i386/bugs.h

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger
CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.  This means that systems using these 
interrupt sources will be doing a) incorrect update of xtime and b) 
incorrect conversion of jiffies.  Since these two values will track each 
other this will not be seen by simple gettimeofday(); 
sleep();gettimeofday() tests, but will be seen as a system clock drift 
(without NTP) or with NTP, a somewhat high drift rate (to the point of 
loosing sync at HZ=1000).


The fact that the user/ system chooses the clock to use at boot time and 
can change the clock after boot means that it is not possible to pin 
down CLOCK_TICK_RATE at compile time.  However, since the computation of 
TICK_NSEC and the conversion constants is rather involved it is clear 
that we REALLY do want to compute these at compile time.


The suggested solution is to a) set up a structure with the default 
(clock of choice at config time) conversion constants in it at compile 
time.  Then b) at clock init time, populate the structure with the 
proper constants for the given clock.  These can be computed at compile 
time, but from the correct  CLOCK_TICK_RATE for the given clock. 
Switching to a fall back clock would also require an update of this 
structure.


Commits?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote:

CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.



Err, the Cyclone does not generate interrupts. So this issue does not
affect those systems.

As for the HPET, it sets its own interrupt frequency based off of
KERNEL_TICK_USEC (which you're right, isn't quite what is used in the
jiffies conversions).  Would it be easier to just adjust that value to
use ACTHZ or CLOCK_TICK_RATE?


If you want to take that approach you would want the HPET to interrupt 
every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-23 Thread George Anzinger

Roman Zippel wrote:

Hi,

On Tue, 23 Aug 2005, john stultz wrote:



I'm assuming gettimeofday()/clock_gettime() looks something like:
  xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.


John,
If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 
It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.


Hope this helps...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2

2005-08-23 Thread George Anzinger

Jim Ramsay wrote:

On 8/23/05, Jim Ramsay <[EMAIL PROTECTED]> wrote:


Then I must have found an undocumented feature!  I've applied this set
of patches to a 2.6.11 kernel (with few problems) and ran into a bunch
of "scheduling while atomic" errors when hotplugging a drive, culprit
being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or
perhaps when it then calls class_device_unregister, which does a
'down_write'.



After further debugging, it appears that the problem is the debounce
timer in libata-core.c.

Timers appear to operate in an atomic context, so timers should not be
allowed to call scsi_remove_device, which eventually schedules.

Any suggestions on the best way to fix this?


Workqueue, perhaps.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2

2005-08-23 Thread George Anzinger

Jim Ramsay wrote:

On 8/23/05, Jim Ramsay [EMAIL PROTECTED] wrote:


Then I must have found an undocumented feature!  I've applied this set
of patches to a 2.6.11 kernel (with few problems) and ran into a bunch
of scheduling while atomic errors when hotplugging a drive, culprit
being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or
perhaps when it then calls class_device_unregister, which does a
'down_write'.



After further debugging, it appears that the problem is the debounce
timer in libata-core.c.

Timers appear to operate in an atomic context, so timers should not be
allowed to call scsi_remove_device, which eventually schedules.

Any suggestions on the best way to fix this?


Workqueue, perhaps.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-23 Thread George Anzinger

Roman Zippel wrote:

Hi,

On Tue, 23 Aug 2005, john stultz wrote:



I'm assuming gettimeofday()/clock_gettime() looks something like:
  xtime + (get_cycles()-last_update)*(mult+ntp_adj)shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: xtime + (cycle_offset * mult +
error)  shift. The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.


John,
If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 
It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.


Hope this helps...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread George Anzinger
in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread George Anzinger
-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:
~


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:

George,

On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?



Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.

It's simply explained.

Current code:

set_timer()
calc interval->jiffies / interval->arch_cycles;
based on it.interval

rearm()
timer->expires += interval->jiffies;
timer->arch_cycle_expires += interval->arch_cycles;
normalize(timer);

Patched code:

set_timer()
	timer.interval = it.interval; 
	timer.next_expire = it.value; 
	both stored as timespec


rearm()
next_expire += interval;
calc timer->expires/arch_cycle_expires;

So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.

It adds some calculation overhead to each rearm, but 

I think the standard was written to eliminate the need for this.  The 
notion is that we have a resolution which we use in the calculations so 
while there may be drift WRT his request, there should be no drift WRT 
the requested value rounded up to the next resolution.


Still, if we can't keep that resolution in arch_cycles...

On another issue along this line, I have been thinking of changing the 
x86 TSC arch cycle size to 1ns.  (NOT the resolution, the units for the 
arch cycle.)  The reason to do this is to correctly track changes in cpu 
frequency as it is today, we would need to track down and update all 
pending HR timers when ever the frequency changed.  By using a common 
unit all we need to do is change the conversion constants (well I guess 
they would not be constants any more :).  I REALLY don't want to do this 
as it does add conversion overhead, but I can not think of another clean 
way to track TSC frequency changes.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:
~


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:

George,

On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?



Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.

It's simply explained.

Current code:

set_timer()
calc interval-jiffies / interval-arch_cycles;
based on it.interval

rearm()
timer-expires += interval-jiffies;
timer-arch_cycle_expires += interval-arch_cycles;
normalize(timer);

Patched code:

set_timer()
	timer.interval = it.interval; 
	timer.next_expire = it.value; 
	both stored as timespec


rearm()
next_expire += interval;
calc timer-expires/arch_cycle_expires;

So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.

It adds some calculation overhead to each rearm, but 

I think the standard was written to eliminate the need for this.  The 
notion is that we have a resolution which we use in the calculations so 
while there may be drift WRT his request, there should be no drift WRT 
the requested value rounded up to the next resolution.


Still, if we can't keep that resolution in arch_cycles...

On another issue along this line, I have been thinking of changing the 
x86 TSC arch cycle size to 1ns.  (NOT the resolution, the units for the 
arch cycle.)  The reason to do this is to correctly track changes in cpu 
frequency as it is today, we would need to track down and update all 
pending HR timers when ever the frequency changed.  By using a common 
unit all we need to do is change the conversion constants (well I guess 
they would not be constants any more :).  I REALLY don't want to do this 
as it does add conversion overhead, but I can not think of another clean 
way to track TSC frequency changes.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Latency with Real-Time Preemption with 2.6.12

2005-08-18 Thread George Anzinger

Steven Rostedt wrote:

On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote:


Hi,
I am trying to experiment using 2.6.12 kernel with the realtime-preempt 
V0.7.51-38 patch to determine the kernel preemption latencies with the 
CONFIG_PREEMPT_RT mode. The test program I wrote does the following on

a thread with highest priority (99) and SCHED_FIFO policy to simulate
a real time thread.

t1 = gettimeofday
nanosleep(for 3 ms)
t2 = gettimeofday

I was expecting to see the difference t2-t1 to be close to 3 ms. However, 
the smallest difference I see is 4 milliseconds under no system load, 
and the difference is as high as 25 milliseconds under moderate to 
heavy system load (mostly performing disk I/O).



That version of Ingo's patch does not implement High-Resolution Timers.
Thomas worked on merging this into the latest RT patch.  Without
high-res timers, the best you may ever get is 4ms. This is because
nanosleep is to guarantee _at_least_ 3 ms.  So you have the following
situation:

0   1   23   4 (ms)
+---+---++---+--->
   ^^
   ||
 Start here 0+3 = 3  here we have the response

If we look at this in smaller units than ms, we started on 0.8ms and
responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms.
So since Ingo's patch doesn't increase the resolution of the timers from
a jiffy (which is currently 1ms) Linux is forced to add one more than
you need.


Based on the articles and the mails I read on this list, I understand that 
worst case latencies of 1 ms (or less) should be possible using the RT 
Preemption patch, but I am unable to get anything less than 4 millseconds 
even with sleep times smaller than 3 ms. I am running the tests on a SBC 
with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). 

I believe I have the high resolution timer working correctly, because if I 
comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond.



I don't think you have the high res timer working, since there is no
high res timer in that kernel.


Following earlier discussions (in July) in this list, I tried to set kernel 
configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug 
information, but I didn't find these parameters in my .config file.


I appreciate your suggestions/insights into the situation and steps that I 
should try to get more debug/tracing information that might help to understand 
the cause of high latency.



It's not a high latency.  It's doing exactly as it is suppose to, since
the nanosleep doesn't have high-res support (in that kernel).  If you
really want to measure latency, you need to add a device or something
and see what the response time of an interrupt going off to the time a
thread is woken to respond to it.  Now with Ingo's that is really fast.


Another way to do it is to set up a repeating timer.  You _must_ read 
back the timer to get the repeat time it is really using, and then 
measure how well it does giving signals at these repeat times.  FAR FAR 
more than three lines of code...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.

2005-08-18 Thread George Anzinger

David S. Miller wrote:

From: Anton Blanchard <[EMAIL PROTECTED]>
Date: Fri, 19 Aug 2005 04:29:55 +1000



Calling itanium the "fastest 64bit processor at any given clock frequency"
on lkml is likewise inflammatory :)



I totally agree.


Since the itanium off loads a lot of its instruction steam decisions on 
to the compiler(s), where other processors just do it, one might argue 
that you can not even characterize the itanium without bundling in the 
compilers...


Not to say that is wrong but just to make it clear that saying the 
itanium speed is  is like saying that a cummings diesel is fast with 
out saying what sort of car/truck it is mounted in.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.

2005-08-18 Thread George Anzinger

David S. Miller wrote:

From: Anton Blanchard [EMAIL PROTECTED]
Date: Fri, 19 Aug 2005 04:29:55 +1000



Calling itanium the fastest 64bit processor at any given clock frequency
on lkml is likewise inflammatory :)



I totally agree.


Since the itanium off loads a lot of its instruction steam decisions on 
to the compiler(s), where other processors just do it, one might argue 
that you can not even characterize the itanium without bundling in the 
compilers...


Not to say that is wrong but just to make it clear that saying the 
itanium speed is X is like saying that a cummings diesel is fast with 
out saying what sort of car/truck it is mounted in.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Latency with Real-Time Preemption with 2.6.12

2005-08-18 Thread George Anzinger

Steven Rostedt wrote:

On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote:


Hi,
I am trying to experiment using 2.6.12 kernel with the realtime-preempt 
V0.7.51-38 patch to determine the kernel preemption latencies with the 
CONFIG_PREEMPT_RT mode. The test program I wrote does the following on

a thread with highest priority (99) and SCHED_FIFO policy to simulate
a real time thread.

t1 = gettimeofday
nanosleep(for 3 ms)
t2 = gettimeofday

I was expecting to see the difference t2-t1 to be close to 3 ms. However, 
the smallest difference I see is 4 milliseconds under no system load, 
and the difference is as high as 25 milliseconds under moderate to 
heavy system load (mostly performing disk I/O).



That version of Ingo's patch does not implement High-Resolution Timers.
Thomas worked on merging this into the latest RT patch.  Without
high-res timers, the best you may ever get is 4ms. This is because
nanosleep is to guarantee _at_least_ 3 ms.  So you have the following
situation:

0   1   23   4 (ms)
+---+---++---+---
   ^^
   ||
 Start here 0+3 = 3  here we have the response

If we look at this in smaller units than ms, we started on 0.8ms and
responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms.
So since Ingo's patch doesn't increase the resolution of the timers from
a jiffy (which is currently 1ms) Linux is forced to add one more than
you need.


Based on the articles and the mails I read on this list, I understand that 
worst case latencies of 1 ms (or less) should be possible using the RT 
Preemption patch, but I am unable to get anything less than 4 millseconds 
even with sleep times smaller than 3 ms. I am running the tests on a SBC 
with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). 

I believe I have the high resolution timer working correctly, because if I 
comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond.



I don't think you have the high res timer working, since there is no
high res timer in that kernel.


Following earlier discussions (in July) in this list, I tried to set kernel 
configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug 
information, but I didn't find these parameters in my .config file.


I appreciate your suggestions/insights into the situation and steps that I 
should try to get more debug/tracing information that might help to understand 
the cause of high latency.



It's not a high latency.  It's doing exactly as it is suppose to, since
the nanosleep doesn't have high-res support (in that kernel).  If you
really want to measure latency, you need to add a device or something
and see what the response time of an interrupt going off to the time a
thread is woken to respond to it.  Now with Ingo's that is really fast.


Another way to do it is to set up a repeating timer.  You _must_ read 
back the timer to get the repeat time it is really using, and then 
measure how well it does giving signals at these repeat times.  FAR FAR 
more than three lines of code...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-17 Thread George Anzinger

Nishanth Aravamudan wrote:
~
IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.



I don't want to be too parental either, but I also am trying to avoid
code duplication. Lots of drivers basically do something like
poll_event() does (or could do with some changes), i.e. looping a
constant amount multiple times, checking something every so often. The
patch was just a thought, though. I will keep evaluating drivers and see
if it's a useful interface to have eventually.

I guess I'm just concerned with making an unintuitive interface. As was
brought up at OLS, drivers are a major source of bugs/buggy code. The
simpler, more useful we can make interfaces, the better, I think. I'm
not claiming you disagree, I just want to make my own motives clear.
While fixing up the schedule_timeout() comment would make it clear what
schedule_timeout() achieves, I'm not sure how useful such an interface
is, if every caller adds 1 :) I need to mull it over, though... Lots to
consider. I also, of course, want to stay flexible for the reasons you
mention (letting the driver adjust the timeout as they expect to).


I would leave the +1 alone and put in the correct documentation.  This 
way _more_ folks will be made aware of the mid jiffie issue.  Far to 
often we see (and let get in) patches that mess up user interfaces 
around this issue.  The recent changes to itimer come to mind...



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-08-17 Thread George Anzinger

Ingo Molnar wrote:

* George Anzinger  wrote:



I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can "bt" through them.  Apply in this order: 
Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch


This is, more or less, the same kgdb that is in Andrew's mm tree 
changed to fix the RT issues.



great. For the time being i wont add it to the -RT tree (because KGDB is 
not destined for upstream merging it seems), but it sure is a useful 
development/debugging add-on.


I agree on not adding it.  Tom Rini is working on a version the Andrew 
seems inclined to merge.  When that happens I will most likely put 
together enhancements to it to bring it up to what this one does. 
Meanwhile I am trying to capture some of Tom's changes in this one. 
Also, it is MUCH easier for me to maintain as a seperate patch.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-17 Thread George Anzinger

Roman Zippel wrote:


~

The thing that worries me about this function is that it does every 
thing in usec.  We are using nsec in xtime now and I wonder if it would 
not be more accurate to do the math in nsecs.  Even tick size 
(tick_nsec) does not translate well to usec, it currently being 999849 
nsecs.


George

---

 kernel/time.c  |3 ++-
 kernel/timer.c |   53 +
 2 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/time.c
===
--- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200
@@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc)
} /* txc->modes & ADJ_OFFSET */
if (txc->modes & ADJ_TICK) {
tick_usec = txc->tick;
-   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
}
+   if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK))
+   time_recalc();
} /* txc->modes */
 leave: if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
|| ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
Index: linux-2.6/kernel/timer.c
===
--- linux-2.6.orig/kernel/timer.c   2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200
@@ -559,6 +559,7 @@ found:
  */
 unsigned long tick_usec = TICK_USEC;   /* USER_HZ period (usec) */
 unsigned long tick_nsec = TICK_NSEC;   /* ACTHZ period (nsec) */
+unsigned long tick_nsec2 = TICK_NSEC;
 
 /* 
  * The current time 
@@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC;		/*

  * the usual normalization.
  */
 struct timespec xtime __attribute__ ((aligned (16)));
+struct timespec xtime2 __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 
 EXPORT_SYMBOL(xtime);

@@ -596,6 +598,33 @@ static long time_adj;  /* tick adjust (
 long time_reftime; /* time at last adjustment (s)  */
 long time_adjust;
 long time_next_adjust;
+static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, 
time_phase2;
+
+void time_recalc(void)
+{
+   long f, t;
+   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);


This leaves bits on the floor.  Is it not possible to do this whole 
calculation in nano seconds?  Currently, for example, tick_nsec is 999849...

+
+   t = time_freq >> (SHIFT_USEC + 8);
+   if (t) {
+   time_freq -= t << (SHIFT_USEC + 8);
+   t *= 1000 << 8;
+   }
+   f = time_freq * 125;
+   t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3));
+   f &= (1 << (SHIFT_USEC - 3)) - 1;
+   tick_nsec2 = t / HZ;
+   f += (t % HZ) << (SHIFT_USEC - 3);
+   f <<= 5;
+   time_adj2 = f / HZ;
+   time_freq_adj2 = f % HZ;
+
+   printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n",
+   xtime.tv_sec, xtime.tv_sec,
+   tick_nsec, time_freq, time_offset, time_next_adjust,
+   xtime2.tv_sec, xtime2.tv_nsec,
+   tick_nsec2, time_adj2, time_freq_adj2);
+}
 
 /*

  * this routine handles the overflow of the microsecond field
@@ -739,6 +768,16 @@ static void second_overflow(void)
 #endif
 }
 
+static void second_overflow2(void)

+{
+   time_adj2_cur = time_adj2;
+   time_freq_phase2 += time_freq_adj2;
+   if (time_freq_phase2 > HZ) {
+   time_freq_phase2 -= HZ;
+   time_adj2_cur++;
+   }
+}
+
 /* in the NTP reference this is called "hardclock()" */
 static void update_wall_time_one_tick(void)
 {
@@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo
time_adjust = time_next_adjust;
time_next_adjust = 0;
}
+
+   delta_nsec = tick_nsec2;
+   time_phase2 += time_adj2_cur;
+   if (time_phase2 >= (1 << (SHIFT_USEC + 2))) {
+   long ltemp = time_phase2 >> (SHIFT_USEC + 2);
+   time_phase2 -= ltemp << (SHIFT_USEC + 2);
+   delta_nsec += ltemp;
+   }
+   xtime2.tv_nsec += delta_nsec;
+   if (xtime2.tv_nsec >= NSEC_PER_SEC) {
+   xtime2.tv_nsec -= NSEC_PER_SEC;
+   xtime2.tv_sec++;
+   second_overflow2();
+   }
 }
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "un

Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-17 Thread George Anzinger

Roman Zippel wrote:


~

The thing that worries me about this function is that it does every 
thing in usec.  We are using nsec in xtime now and I wonder if it would 
not be more accurate to do the math in nsecs.  Even tick size 
(tick_nsec) does not translate well to usec, it currently being 999849 
nsecs.


George

---

 kernel/time.c  |3 ++-
 kernel/timer.c |   53 +
 2 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/time.c
===
--- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200
@@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc)
} /* txc-modes  ADJ_OFFSET */
if (txc-modes  ADJ_TICK) {
tick_usec = txc-tick;
-   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
}
+   if (txc-modes  (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK))
+   time_recalc();
} /* txc-modes */
 leave: if ((time_status  (STA_UNSYNC|STA_CLOCKERR)) != 0
|| ((time_status  (STA_PPSFREQ|STA_PPSTIME)) != 0
Index: linux-2.6/kernel/timer.c
===
--- linux-2.6.orig/kernel/timer.c   2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200
@@ -559,6 +559,7 @@ found:
  */
 unsigned long tick_usec = TICK_USEC;   /* USER_HZ period (usec) */
 unsigned long tick_nsec = TICK_NSEC;   /* ACTHZ period (nsec) */
+unsigned long tick_nsec2 = TICK_NSEC;
 
 /* 
  * The current time 
@@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC;		/*

  * the usual normalization.
  */
 struct timespec xtime __attribute__ ((aligned (16)));
+struct timespec xtime2 __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 
 EXPORT_SYMBOL(xtime);

@@ -596,6 +598,33 @@ static long time_adj;  /* tick adjust (
 long time_reftime; /* time at last adjustment (s)  */
 long time_adjust;
 long time_next_adjust;
+static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, 
time_phase2;
+
+void time_recalc(void)
+{
+   long f, t;
+   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);


This leaves bits on the floor.  Is it not possible to do this whole 
calculation in nano seconds?  Currently, for example, tick_nsec is 999849...

+
+   t = time_freq  (SHIFT_USEC + 8);
+   if (t) {
+   time_freq -= t  (SHIFT_USEC + 8);
+   t *= 1000  8;
+   }
+   f = time_freq * 125;
+   t += tick_usec * USER_HZ * 1000 + (f  (SHIFT_USEC - 3));
+   f = (1  (SHIFT_USEC - 3)) - 1;
+   tick_nsec2 = t / HZ;
+   f += (t % HZ)  (SHIFT_USEC - 3);
+   f = 5;
+   time_adj2 = f / HZ;
+   time_freq_adj2 = f % HZ;
+
+   printk(tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n,
+   xtime.tv_sec, xtime.tv_sec,
+   tick_nsec, time_freq, time_offset, time_next_adjust,
+   xtime2.tv_sec, xtime2.tv_nsec,
+   tick_nsec2, time_adj2, time_freq_adj2);
+}
 
 /*

  * this routine handles the overflow of the microsecond field
@@ -739,6 +768,16 @@ static void second_overflow(void)
 #endif
 }
 
+static void second_overflow2(void)

+{
+   time_adj2_cur = time_adj2;
+   time_freq_phase2 += time_freq_adj2;
+   if (time_freq_phase2  HZ) {
+   time_freq_phase2 -= HZ;
+   time_adj2_cur++;
+   }
+}
+
 /* in the NTP reference this is called hardclock() */
 static void update_wall_time_one_tick(void)
 {
@@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo
time_adjust = time_next_adjust;
time_next_adjust = 0;
}
+
+   delta_nsec = tick_nsec2;
+   time_phase2 += time_adj2_cur;
+   if (time_phase2 = (1  (SHIFT_USEC + 2))) {
+   long ltemp = time_phase2  (SHIFT_USEC + 2);
+   time_phase2 -= ltemp  (SHIFT_USEC + 2);
+   delta_nsec += ltemp;
+   }
+   xtime2.tv_nsec += delta_nsec;
+   if (xtime2.tv_nsec = NSEC_PER_SEC) {
+   xtime2.tv_nsec -= NSEC_PER_SEC;
+   xtime2.tv_sec++;
+   second_overflow2();
+   }
 }
 
 /*

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-08-17 Thread George Anzinger

Ingo Molnar wrote:

* George Anzinger george@mvista.com wrote:



I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can bt through them.  Apply in this order: 
Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch


This is, more or less, the same kgdb that is in Andrew's mm tree 
changed to fix the RT issues.



great. For the time being i wont add it to the -RT tree (because KGDB is 
not destined for upstream merging it seems), but it sure is a useful 
development/debugging add-on.


I agree on not adding it.  Tom Rini is working on a version the Andrew 
seems inclined to merge.  When that happens I will most likely put 
together enhancements to it to bring it up to what this one does. 
Meanwhile I am trying to capture some of Tom's changes in this one. 
Also, it is MUCH easier for me to maintain as a seperate patch.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-17 Thread George Anzinger

Nishanth Aravamudan wrote:
~
IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.



I don't want to be too parental either, but I also am trying to avoid
code duplication. Lots of drivers basically do something like
poll_event() does (or could do with some changes), i.e. looping a
constant amount multiple times, checking something every so often. The
patch was just a thought, though. I will keep evaluating drivers and see
if it's a useful interface to have eventually.

I guess I'm just concerned with making an unintuitive interface. As was
brought up at OLS, drivers are a major source of bugs/buggy code. The
simpler, more useful we can make interfaces, the better, I think. I'm
not claiming you disagree, I just want to make my own motives clear.
While fixing up the schedule_timeout() comment would make it clear what
schedule_timeout() achieves, I'm not sure how useful such an interface
is, if every caller adds 1 :) I need to mull it over, though... Lots to
consider. I also, of course, want to stay flexible for the reasons you
mention (letting the driver adjust the timeout as they expect to).


I would leave the +1 alone and put in the correct documentation.  This 
way _more_ folks will be made aware of the mid jiffie issue.  Far to 
often we see (and let get in) patches that mess up user interfaces 
around this issue.  The recent changes to itimer come to mind...



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] KGDB for Real-Time Preemption systems

2005-08-16 Thread George Anzinger

I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can "bt" through them.  Apply in this order:

Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed 
to fix the RT issues.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-16 Thread George Anzinger

Nishanth Aravamudan wrote:

On 04.08.2005 [09:45:55 -0700], George Anzinger wrote:

Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and 
timeval_to_jiffies() to add 1.  This is NOT the right thing to do.  For 
repeating times (see setitimer code) we need the actual time as we KNOW 
where the jiffies edge is in the repeating case.  The +1 is needed ONLY 
for the initial time, not the repeating time.



See:
http://marc.theaimsgroup.com/?l=linux-kernel=112208357906156=2



I followed that thread, George, but I think it's a different case with
schedule_timeout() [maybe this indicates drivers/other users should
maybe be using itimers, but I'll get to that in a sec].


I think I miss understood back then :).



With schedule_timeout(), we are just given a relative jiffies value. We
have no context as to which task is requesting the delay, per se,
meaning we don't (can't) know from the interface whether this is the
first delay in a sequence, or a brand new one, without changing all
users to have some sort of control structure. The callers of
schedule_timeout() don't even get a pointer to the timer added
internally.

So, adding 1 to all sleeps seems like it might be reasonable, as looping
sleeps probably need to use a different interface. I had worked a bit
ago on something like poll_event() with the kernel-janitors group, which
would abstract out the repeated sleeps. Basically wait_event() without
wait-queues... Maybe we could make such an interface just use itimers?
I've attached my old patch for poll_event(), just for reference.


I think not.  itimers is really pointed at a particular system call and 
has resources in the task structure to do it.  These would be hard to 
share...


My point, I guess, is that in the schedule_timeout() case, we don't know
where the jiffies edge is, as we either expire or receive a wait-queue
event/signal, we never mod_timer() the internal timer... So we have to
assume that we need to sleep the request. But maybe Roman's idea of
sleeping a certain number of jiffy edges is sufficient. I am not yet
convinced driver authors want/need such an interface, though, still
thinking it over.


IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-16 Thread George Anzinger

Nishanth Aravamudan wrote:

On 04.08.2005 [09:45:55 -0700], George Anzinger wrote:

Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and 
timeval_to_jiffies() to add 1.  This is NOT the right thing to do.  For 
repeating times (see setitimer code) we need the actual time as we KNOW 
where the jiffies edge is in the repeating case.  The +1 is needed ONLY 
for the initial time, not the repeating time.



See:
http://marc.theaimsgroup.com/?l=linux-kernelm=112208357906156w=2



I followed that thread, George, but I think it's a different case with
schedule_timeout() [maybe this indicates drivers/other users should
maybe be using itimers, but I'll get to that in a sec].


I think I miss understood back then :).



With schedule_timeout(), we are just given a relative jiffies value. We
have no context as to which task is requesting the delay, per se,
meaning we don't (can't) know from the interface whether this is the
first delay in a sequence, or a brand new one, without changing all
users to have some sort of control structure. The callers of
schedule_timeout() don't even get a pointer to the timer added
internally.

So, adding 1 to all sleeps seems like it might be reasonable, as looping
sleeps probably need to use a different interface. I had worked a bit
ago on something like poll_event() with the kernel-janitors group, which
would abstract out the repeated sleeps. Basically wait_event() without
wait-queues... Maybe we could make such an interface just use itimers?
I've attached my old patch for poll_event(), just for reference.


I think not.  itimers is really pointed at a particular system call and 
has resources in the task structure to do it.  These would be hard to 
share...


My point, I guess, is that in the schedule_timeout() case, we don't know
where the jiffies edge is, as we either expire or receive a wait-queue
event/signal, we never mod_timer() the internal timer... So we have to
assume that we need to sleep the request. But maybe Roman's idea of
sleeping a certain number of jiffy edges is sufficient. I am not yet
convinced driver authors want/need such an interface, though, still
thinking it over.


IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] KGDB for Real-Time Preemption systems

2005-08-16 Thread George Anzinger

I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can bt through them.  Apply in this order:

Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed 
to fix the RT issues.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features

2005-08-15 Thread George Anzinger

Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:



* George Anzinger  wrote:



Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent 
the attached patch to Andrew on this, but meanwhile, if you want RT, 
SMP, PREEMPT_DEBUG you will be much better off with this.


ah - thanks, applied. Might explain some of the recent SMP weirdnesses 
i'm seeing. Attributed them to the HRT patch ;-)



i'm still seeing weird crashes under SMP, which go away if i disable 
CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs 
in the HRT code) It happens sometime during the bootup, after enabling 
the network but before users can log in. There's no good debug info, 
just a hang that comes from all CPUs trying to get some debug info out 
but crashing deeply.


I haven't looked at this new code all that closely as yet.  One thing I 
did notice is that there is an assumption that the "timer being 
delivered flag" can be shared between LR timers and HR timers.  I 
suspect this is wrong as the delivery code is in seperate threads (I 
assume).  This could lead to del_timer_async missing a timer.


In the prior patch we just ignored the del_timer_async issue for HR 
timers (code I plan to do soon).  This WAS taken care of in earlier 
kernels by a reuse of one of the list link fields, but Andrew convince 
me that this was _not_ good.


So, my guess, a nanosleep for an RT task (I think you said these are 
promoted to HR) is completing and over writing the deliver in progress 
flag for a LR timer which just happens to have a del_timer_sync going on 
at the same time.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers RCU-tasklist features

2005-08-15 Thread George Anzinger

Ingo Molnar wrote:

* Ingo Molnar [EMAIL PROTECTED] wrote:



* George Anzinger george@mvista.com wrote:



Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent 
the attached patch to Andrew on this, but meanwhile, if you want RT, 
SMP, PREEMPT_DEBUG you will be much better off with this.


ah - thanks, applied. Might explain some of the recent SMP weirdnesses 
i'm seeing. Attributed them to the HRT patch ;-)



i'm still seeing weird crashes under SMP, which go away if i disable 
CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs 
in the HRT code) It happens sometime during the bootup, after enabling 
the network but before users can log in. There's no good debug info, 
just a hang that comes from all CPUs trying to get some debug info out 
but crashing deeply.


I haven't looked at this new code all that closely as yet.  One thing I 
did notice is that there is an assumption that the timer being 
delivered flag can be shared between LR timers and HR timers.  I 
suspect this is wrong as the delivery code is in seperate threads (I 
assume).  This could lead to del_timer_async missing a timer.


In the prior patch we just ignored the del_timer_async issue for HR 
timers (code I plan to do soon).  This WAS taken care of in earlier 
kernels by a reuse of one of the list link fields, but Andrew convince 
me that this was _not_ good.


So, my guess, a nanosleep for an RT task (I think you said these are 
promoted to HR) is completing and over writing the deliver in progress 
flag for a LR timer which just happens to have a del_timer_sync going on 
at the same time.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-13 Thread George Anzinger

Zachary Amsden wrote:

George Anzinger wrote:


Nick Piggin wrote:


George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?




Normal code   NMI
fetch preempt_count
add   <-  interrupt here add and store then 
subtract and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side... 




So NMI changing the preempt count and restoring in the middle of a RWM 
is not the problem.  Thus I don't understand what the issue is.  NMI 
must undo all side effects.  Does the PREEMPT_DEBUG code check the count 
somewhere within the NMI handler?  If so, shouldn't the proper fix be to 
make that code aware that it could be running inside of an NMI and/or 
ensure that code is not called from within the NMI handler?


Yes that is the problem.  The sanity check in PREEMPT_DEBUG fails when 
called from the NMI handler.





--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-13 Thread George Anzinger

Zachary Amsden wrote:

George Anzinger wrote:


Nick Piggin wrote:


George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?




Normal code   NMI
fetch preempt_count
add   -  interrupt here add and store then 
subtract and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side... 




So NMI changing the preempt count and restoring in the middle of a RWM 
is not the problem.  Thus I don't understand what the issue is.  NMI 
must undo all side effects.  Does the PREEMPT_DEBUG code check the count 
somewhere within the NMI handler?  If so, shouldn't the proper fix be to 
make that code aware that it could be running inside of an NMI and/or 
ensure that code is not called from within the NMI handler?


Yes that is the problem.  The sanity check in PREEMPT_DEBUG fails when 
called from the NMI handler.





--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger

Nick Piggin wrote:

George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?


Normal code   NMI
fetch preempt_count
add   <-  interrupt here add and store then subtract 
and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side...




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features

2005-08-12 Thread George Anzinger

Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
 Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent the 
attached patch to Andrew on this, but meanwhile, if you want RT, SMP, 
PREEMPT_DEBUG you will be much better off with this.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzinger

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


[PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger
The NMI entry and exit code fiddles with bits in the preempt count.  If 
an NMI happens while some other code is doing the same, bits will be 
lost.  This patch removes this modify code from the NMI path till we can 
come up with something better.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzinger

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-12 Thread George Anzinger

Bill Davidsen wrote:

George Anzinger wrote:


Srivatsa Vaddagiri wrote:


On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.





George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration in 
practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to 
keep good time, the only rock in town is the one connected to the PIT 
(and the pmtimer).  The problem is, if you want the jiffie edge to be 
stable, there is just now way to reprogram the PIT to get it back to 
where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in 
the end, cause loss of time (too fast, too slow, take your pick).  I 
also found PITs that signaled that they had loaded the count (they set 
a status bit) prior to actually loading it.  All in all, I find the 
PIT is just an ugly beast to try to program.  On the other hand, if 
you want regular interrupts at some fixed period, it will do this 
forever (give or take a epoch or two;) with out touching anything 
after the initial program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you 
use the TSC or pmtimer to figure the gross loss of interrupts and 
leave the PIT interrupt again to define the jiffie edge.  If you have 
other, more pressing, concerns I suppose you can program the PIT, but 
don't expect your wall clock to be as stable as it is now.


What are the portability and scaling issues if it were done this way? It 
clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm 
less sure how it would work.


I am not sure how much is involved.  For VST I disabled the tick 
generated NMI watchdog interrupt on a per cpu basis but stopped the PIT 
tick only when all cpus were idle.  The next step would be to mess with 
the interrupt steering logic to keep the tick away from idle cpus.  I 
did not get into this level in my work, being mainly interested in 
embedded systems.


But when you go to non-x86 hardware, is there always going to be another 
source of wakeup available if the PIT is blocked instead of reset? I 
have to go back and look at how SPARC hardware works, I don't remember 
enough to be useful.


Most (all) other archs don't have PITs.  The x86 sucks big time when it 
comes to time keeping hardware.  The most common hardware is a counter 
that runs forever (much as the TSC but FIXED in frequency).  Interrupts 
are generated either by comparing a register to this or using companion 
counters that just count down to zero.  In either case you don't loose 
time because you can always precisely set up an interrupt.  To sleep, 
then, you just set your sleep time in the normal time base interrupt 
counter.  At the end, you know exactly what to set to get back to the 
regular tick.


These other platforms make VST and High Res Timers so easy...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-12 Thread George Anzinger

Bill Davidsen wrote:

George Anzinger wrote:


Srivatsa Vaddagiri wrote:


On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.





George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration in 
practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to 
keep good time, the only rock in town is the one connected to the PIT 
(and the pmtimer).  The problem is, if you want the jiffie edge to be 
stable, there is just now way to reprogram the PIT to get it back to 
where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in 
the end, cause loss of time (too fast, too slow, take your pick).  I 
also found PITs that signaled that they had loaded the count (they set 
a status bit) prior to actually loading it.  All in all, I find the 
PIT is just an ugly beast to try to program.  On the other hand, if 
you want regular interrupts at some fixed period, it will do this 
forever (give or take a epoch or two;) with out touching anything 
after the initial program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you 
use the TSC or pmtimer to figure the gross loss of interrupts and 
leave the PIT interrupt again to define the jiffie edge.  If you have 
other, more pressing, concerns I suppose you can program the PIT, but 
don't expect your wall clock to be as stable as it is now.


What are the portability and scaling issues if it were done this way? It 
clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm 
less sure how it would work.


I am not sure how much is involved.  For VST I disabled the tick 
generated NMI watchdog interrupt on a per cpu basis but stopped the PIT 
tick only when all cpus were idle.  The next step would be to mess with 
the interrupt steering logic to keep the tick away from idle cpus.  I 
did not get into this level in my work, being mainly interested in 
embedded systems.


But when you go to non-x86 hardware, is there always going to be another 
source of wakeup available if the PIT is blocked instead of reset? I 
have to go back and look at how SPARC hardware works, I don't remember 
enough to be useful.


Most (all) other archs don't have PITs.  The x86 sucks big time when it 
comes to time keeping hardware.  The most common hardware is a counter 
that runs forever (much as the TSC but FIXED in frequency).  Interrupts 
are generated either by comparing a register to this or using companion 
counters that just count down to zero.  In either case you don't loose 
time because you can always precisely set up an interrupt.  To sleep, 
then, you just set your sleep time in the normal time base interrupt 
counter.  At the end, you know exactly what to set to get back to the 
regular tick.


These other platforms make VST and High Res Timers so easy...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger
The NMI entry and exit code fiddles with bits in the preempt count.  If 
an NMI happens while some other code is doing the same, bits will be 
lost.  This patch removes this modify code from the NMI path till we can 
come up with something better.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzingergeorge@mvista.com

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers RCU-tasklist features

2005-08-12 Thread George Anzinger

Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
 Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent the 
attached patch to Andrew on this, but meanwhile, if you want RT, SMP, 
PREEMPT_DEBUG you will be much better off with this.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzingergeorge@mvista.com

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger

Nick Piggin wrote:

George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?


Normal code   NMI
fetch preempt_count
add   -  interrupt here add and store then subtract 
and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side...




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-10 Thread George Anzinger

Tony Lindgren wrote:
~

Do you have a patch around for improving next_timer_interrupt()?

Well, sort of.  The code in the VST patch does the right thing.  Problem 
is it does a bit more than the timer.c code.  You can find that code on 
the HRT site CVS.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-10 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.



George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration 
in practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to keep 
good time, the only rock in town is the one connected to the PIT (and 
the pmtimer).  The problem is, if you want the jiffie edge to be stable, 
there is just now way to reprogram the PIT to get it back to where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in the 
end, cause loss of time (too fast, too slow, take your pick).  I also 
found PITs that signaled that they had loaded the count (they set a 
status bit) prior to actually loading it.  All in all, I find the PIT is 
just an ugly beast to try to program.  On the other hand, if you want 
regular interrupts at some fixed period, it will do this forever (give 
or take a epoch or two;) with out touching anything after the initial 
program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you use 
the TSC or pmtimer to figure the gross loss of interrupts and leave the 
PIT interrupt again to define the jiffie edge.  If you have other, more 
pressing, concerns I suppose you can program the PIT, but don't expect 
your wall clock to be as stable as it is now.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-10 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.



George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration 
in practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to keep 
good time, the only rock in town is the one connected to the PIT (and 
the pmtimer).  The problem is, if you want the jiffie edge to be stable, 
there is just now way to reprogram the PIT to get it back to where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in the 
end, cause loss of time (too fast, too slow, take your pick).  I also 
found PITs that signaled that they had loaded the count (they set a 
status bit) prior to actually loading it.  All in all, I find the PIT is 
just an ugly beast to try to program.  On the other hand, if you want 
regular interrupts at some fixed period, it will do this forever (give 
or take a epoch or two;) with out touching anything after the initial 
program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you use 
the TSC or pmtimer to figure the gross loss of interrupts and leave the 
PIT interrupt again to define the jiffie edge.  If you have other, more 
pressing, concerns I suppose you can program the PIT, but don't expect 
your wall clock to be as stable as it is now.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-10 Thread George Anzinger

Tony Lindgren wrote:
~

Do you have a patch around for improving next_timer_interrupt()?

Well, sort of.  The code in the VST patch does the right thing.  Problem 
is it does a bit more than the timer.c code.  You can find that code on 
the HRT site CVS.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-09 Thread George Anzinger

Tony Lindgren wrote:

* Srivatsa Vaddagiri <[EMAIL PROTECTED]> [050805 05:37]:


On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote:

This is the dynamic ticks patch for i386 as written by Tony Lindgen 
<[EMAIL PROTECTED]> and Tuukka Tikkanen <[EMAIL PROTECTED]>. 
Patch for 2.6.13-rc5


There were a couple of things that I wanted to change so here is an updated 
version. This code should have stabilised enough for general testing now.


Con,
I have been looking at some of the requirement of tickless idle CPUs in
core kernel areas like scheduler and RCU. Basically, both power management and 
virtualization benefit if idle CPUs can cut off useless timer ticks. Especially 
from a virtualization standpoint, I think it makes sense that we enable this 
feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and 
when they become idle. The benefit of this is more visible in platforms that 
host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be 
partially idle (some CPUs in it are idle, some not) and it is good that we 
quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports
of Linux kernel have this ability today (S390 has it in mainline already and 
Xen has it out of tree).



Good point, and it would be nice to have it resolved for systems that support
idling individual CPUs. The current setup was done because when I was tinkering
with the amd76x_pm patch a while a back, I noticed that idling the cpu
disconnects all cpus from the bus. (As far as I remember)

So this may need to be configured depending on the system.



From this viewpoint, I think the current implementation of dynamic tick
falls short of this requirement. It cuts of the timer ticks only when 
all CPUs go idle.


Apart from this observation, I have some others about the current dynamic tick
patch:

- All CPUs seem to cut off the same number of ticks (dyn_tick->skip). Isn't
 this wrong, considering that the timer list is per-CPU? This will cause
 some timers to be serviced much later than usual.



Yes if it's done on per-CPU basis. In the current setup the first interrupt
will kick the system off the dyn-tick state and the timers get checked again.



- The fact that dyn_tick_state is global and accessed from all CPUs
 is probably a scalability concern, especially if we allow the ticks
 to be cut off on per-CPU basis.




From idling devices point of view, we still need some global variable I

believe. How else would you be able to tell all devices that the whole
system does not have any timers for next 2 seconds?



- Again, when we allow this on a per-CPU basis, subsystems like
 RCU need to know the partial set of idle CPUs. RCU already does
 that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map).



Sounds like that could work for dyn-tick too.


- Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling 
 do_timer_interrupt so many times and instead update jiffies to

 (skipped_ticks - 1) and then call do_timer_interrupt once? I think
 VST does it that way.



In the long run we would do the calculations in usecs and just emulate
jiffies from the hw timer. But yes, optimizing updating the time would be
great.



- dyn_tick->max_skip = 0xff / apic_timer_val;
From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the
 above calculation take only 24-bits into account? What am I missing here?



Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is
optional support, or maybe I accidentally pulled the optional 24-bit support
from the ACPI PM timer.

But in any case on P4 systems the APIC timer is not the bottleneck as
stopping or reprogramming PIT also kills APIC. (This does not happen on P3
systems). So the bottleneck most likely is the length of PIT.


I can take a shot at addressing these concerns in dynamic_tick patch, but it 
seems to me that VST has already addressed all these to a big extent. Had you 
considered VST before? The biggest bottleneck I see in VST going mainline is 
its dependency on HRT patch but IMO it should be possible to write a small patch
to support VST w/o HRT. 


George, what do you think?



HRT + VST depend on APIC only, and does not use next_timer_interrupt().


I convinced my self that the next_timer... code in timer.c misses timers 
(i.e. gives the wrong answer).  I did this (after wondering due to 
performance) by scanning the whole timer list after I had the 
next_timer... answer and finding a better answer, not always, but some 
times.  That code does not address the cascade list correctly.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-09 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote:

Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen 
with further code cleanups. Are were there yet?



Con,
I am afraid until SMP correctness is resolved, then this is not
in a position to go in (unless you want to enable it only for UP, which
I think should not be our target). I am working on making this work 
correctly on SMP systems. Hopefully I will post a patch soon.


Another observation I have made regarding dynamic tick patch is that PIT is 
being reprogrammed whenever the CPUs are coming out of sleep state (because of 
an interrupt say). This can happen at any arbitary time, not necessarily on 
jiffy boundaries. As a result, there will be an offset between when jiffy 
interrupts will now occur vs when they would have originally occured had PIT 
never been stopped. Not sure if having this offset is good, but atleast one 
necessary change that I foresee if zeroing delay_at_last_interrupt when 
disabling dynamic tick.  For that matter, it may be easier to disable the PIT 
timer by just masking PIT interrupts (instead of changing its mode).


IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.


My VST patch just turns masks the interrupt.
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-09 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote:

Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen 
with further code cleanups. Are were there yet?



Con,
I am afraid until SMP correctness is resolved, then this is not
in a position to go in (unless you want to enable it only for UP, which
I think should not be our target). I am working on making this work 
correctly on SMP systems. Hopefully I will post a patch soon.


Another observation I have made regarding dynamic tick patch is that PIT is 
being reprogrammed whenever the CPUs are coming out of sleep state (because of 
an interrupt say). This can happen at any arbitary time, not necessarily on 
jiffy boundaries. As a result, there will be an offset between when jiffy 
interrupts will now occur vs when they would have originally occured had PIT 
never been stopped. Not sure if having this offset is good, but atleast one 
necessary change that I foresee if zeroing delay_at_last_interrupt when 
disabling dynamic tick.  For that matter, it may be easier to disable the PIT 
timer by just masking PIT interrupts (instead of changing its mode).


IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.


My VST patch just turns masks the interrupt.
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-09 Thread George Anzinger

Tony Lindgren wrote:

* Srivatsa Vaddagiri [EMAIL PROTECTED] [050805 05:37]:


On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote:

This is the dynamic ticks patch for i386 as written by Tony Lindgen 
[EMAIL PROTECTED] and Tuukka Tikkanen [EMAIL PROTECTED]. 
Patch for 2.6.13-rc5


There were a couple of things that I wanted to change so here is an updated 
version. This code should have stabilised enough for general testing now.


Con,
I have been looking at some of the requirement of tickless idle CPUs in
core kernel areas like scheduler and RCU. Basically, both power management and 
virtualization benefit if idle CPUs can cut off useless timer ticks. Especially 
from a virtualization standpoint, I think it makes sense that we enable this 
feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and 
when they become idle. The benefit of this is more visible in platforms that 
host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be 
partially idle (some CPUs in it are idle, some not) and it is good that we 
quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports
of Linux kernel have this ability today (S390 has it in mainline already and 
Xen has it out of tree).



Good point, and it would be nice to have it resolved for systems that support
idling individual CPUs. The current setup was done because when I was tinkering
with the amd76x_pm patch a while a back, I noticed that idling the cpu
disconnects all cpus from the bus. (As far as I remember)

So this may need to be configured depending on the system.



From this viewpoint, I think the current implementation of dynamic tick
falls short of this requirement. It cuts of the timer ticks only when 
all CPUs go idle.


Apart from this observation, I have some others about the current dynamic tick
patch:

- All CPUs seem to cut off the same number of ticks (dyn_tick-skip). Isn't
 this wrong, considering that the timer list is per-CPU? This will cause
 some timers to be serviced much later than usual.



Yes if it's done on per-CPU basis. In the current setup the first interrupt
will kick the system off the dyn-tick state and the timers get checked again.



- The fact that dyn_tick_state is global and accessed from all CPUs
 is probably a scalability concern, especially if we allow the ticks
 to be cut off on per-CPU basis.




From idling devices point of view, we still need some global variable I

believe. How else would you be able to tell all devices that the whole
system does not have any timers for next 2 seconds?



- Again, when we allow this on a per-CPU basis, subsystems like
 RCU need to know the partial set of idle CPUs. RCU already does
 that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map).



Sounds like that could work for dyn-tick too.


- Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling 
 do_timer_interrupt so many times and instead update jiffies to

 (skipped_ticks - 1) and then call do_timer_interrupt once? I think
 VST does it that way.



In the long run we would do the calculations in usecs and just emulate
jiffies from the hw timer. But yes, optimizing updating the time would be
great.



- dyn_tick-max_skip = 0xff / apic_timer_val;
From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the
 above calculation take only 24-bits into account? What am I missing here?



Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is
optional support, or maybe I accidentally pulled the optional 24-bit support
from the ACPI PM timer.

But in any case on P4 systems the APIC timer is not the bottleneck as
stopping or reprogramming PIT also kills APIC. (This does not happen on P3
systems). So the bottleneck most likely is the length of PIT.


I can take a shot at addressing these concerns in dynamic_tick patch, but it 
seems to me that VST has already addressed all these to a big extent. Had you 
considered VST before? The biggest bottleneck I see in VST going mainline is 
its dependency on HRT patch but IMO it should be possible to write a small patch
to support VST w/o HRT. 


George, what do you think?



HRT + VST depend on APIC only, and does not use next_timer_interrupt().


I convinced my self that the next_timer... code in timer.c misses timers 
(i.e. gives the wrong answer).  I did this (after wondering due to 
performance) by scanning the whole timer list after I had the 
next_timer... answer and finding a better answer, not always, but some 
times.  That code does not address the cascade list correctly.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Roland McGrath wrote:
There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.



That's correct.  de_thread will turn the thread calling exec into the new
leader and kill off all the other threads, including the old leader.  The
exec'ing thread's existing task_struct is reassigned to the PID of the
original leader.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  



That is correct.



If an alarm comes during this wait I suspect it will wake this zombi and
cause problems.



You are mistaken.  The signal code handles process signals sent when the
leader is a zombie.  The group leader sticks around with the PID that
matches the TGID, until there are no live threads with its TGID.  That is
how process-wide kill can still work.


Yes, I see, traced through the signal delivery.  So Linus' patch as well 
as the regression of Ingo's will fix all of this.  Right?


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Gerd Knorr wrote:

On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote:


Roland McGrath <[EMAIL PROTECTED]> wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.



OK..

+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(>signal->real_timer);
acct_process(code);
+   }
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
-   del_timer_sync(>real_timer);



That one fixes it for me.


There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  If an 
alarm comes during this wait I suspect it will wake this zombi and cause 
problems.  So, don't we need to also change real_timer's task when the 
exiting task is the real_timer wake up task, assigning it to some other 
member of the group?  Note, I don't say just if it is the group leader...


Then when we finally release the signal structure, we can "del" the timer.

Did I miss something here?



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Gerd Knorr wrote:

On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote:


Roland McGrath [EMAIL PROTECTED] wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.



OK..

+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(tsk-signal-real_timer);
acct_process(code);
+   }
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
-   del_timer_sync(sig-real_timer);



That one fixes it for me.


There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  If an 
alarm comes during this wait I suspect it will wake this zombi and cause 
problems.  So, don't we need to also change real_timer's task when the 
exiting task is the real_timer wake up task, assigning it to some other 
member of the group?  Note, I don't say just if it is the group leader...


Then when we finally release the signal structure, we can del the timer.

Did I miss something here?



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Roland McGrath wrote:
There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.



That's correct.  de_thread will turn the thread calling exec into the new
leader and kill off all the other threads, including the old leader.  The
exec'ing thread's existing task_struct is reassigned to the PID of the
original leader.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  



That is correct.



If an alarm comes during this wait I suspect it will wake this zombi and
cause problems.



You are mistaken.  The signal code handles process signals sent when the
leader is a zombie.  The group leader sticks around with the PID that
matches the TGID, until there are no live threads with its TGID.  That is
how process-wide kill can still work.


Yes, I see, traced through the signal delivery.  So Linus' patch as well 
as the regression of Ingo's will fix all of this.  Right?


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Andrew Morton wrote:

Roland McGrath <[EMAIL PROTECTED]> wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.


Hm... I was looking at 2.6.10 to figure it out.  This looks more correct.




OK..

--- 25/kernel/exit.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
acct_update_integrals(tsk);
update_mem_hiwater(tsk);
group_dead = atomic_dec_and_test(>signal->live);
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(>signal->real_timer);
acct_process(code);
+   }
exit_mm(tsk);
 
 	exit_sem(tsk);

diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c
--- 25/kernel/posix-timers.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
tmr = list_entry(sig->posix_timers.next, struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(>real_timer);
 }
 
 /*

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Gerd Knorr wrote:

  Hi,

Somewhere between 2.6.11 and 2.6.12 the regression in $subject
was added to the linux kernel.  Testcase below.


Yep.  The itimer changes got a bit carried away.  Here is a fix.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 
Description:

The changes to itimer of late (after 2.6.11) cause itimers not
to survive the exec* calls.  Standard says they should.  

Signed-off-by: George Anzinger

 exit.c |1 +
 posix-timers.c |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)


Index: linux-2.6.13-rc/kernel/exit.c
===
--- linux-2.6.13-rc.orig/kernel/exit.c
+++ linux-2.6.13-rc/kernel/exit.c
@@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co
}
 
tsk->flags |= PF_EXITING;
+   del_timer_sync(>signal->real_timer);
 
/*
 * Make sure we don't try to process any timer firings
Index: linux-2.6.13-rc/kernel/posix-timers.c
===
--- linux-2.6.13-rc.orig/kernel/posix-timers.c
+++ linux-2.6.13-rc/kernel/posix-timers.c
@@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct *
struct k_itimer *tmr;
 
while (!list_empty(>posix_timers)) {
-   tmr = list_entry(sig->posix_timers.next, struct k_itimer, list);
+   tmr = list_entry(sig->posix_timers.next,
+struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(>real_timer);
 }
 
 /*


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger

Nishanth Aravamudan wrote:
~

Sorry, I forgot that sys_nanosleep() also always adds 1 to the request
(to account for this same issue, I believe, as POSIX demands no early
return from nanosleep() calls). There are some other locations where
similar

+ (t.tv_sec || t.tv_nsec)


This is not the same as "always add 1".  We don't do it this way just to 
have fun with C.  If you change schedule_timeout() to add the 1, 
nanosleep() will need to do things differently to get the same behavior. 
 (And, YES users do pass in zero sleep times.)



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger
 msecs_to_jiffies(msecs) + 1;
+   unsigned long timeout = msecs_to_jiffies(msecs);
 
 	while (timeout && !signal_pending(current)) {

set_current_state(TASK_INTERRUPTIBLE);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger
;
+   unsigned long timeout = msecs_to_jiffies(msecs);
 
 	while (timeout  !signal_pending(current)) {

set_current_state(TASK_INTERRUPTIBLE);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger

Nishanth Aravamudan wrote:
~

Sorry, I forgot that sys_nanosleep() also always adds 1 to the request
(to account for this same issue, I believe, as POSIX demands no early
return from nanosleep() calls). There are some other locations where
similar

+ (t.tv_sec || t.tv_nsec)


This is not the same as always add 1.  We don't do it this way just to 
have fun with C.  If you change schedule_timeout() to add the 1, 
nanosleep() will need to do things differently to get the same behavior. 
 (And, YES users do pass in zero sleep times.)



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Gerd Knorr wrote:

  Hi,

Somewhere between 2.6.11 and 2.6.12 the regression in $subject
was added to the linux kernel.  Testcase below.


Yep.  The itimer changes got a bit carried away.  Here is a fix.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type: Defect Fix 
Description:

The changes to itimer of late (after 2.6.11) cause itimers not
to survive the exec* calls.  Standard says they should.  

Signed-off-by: George Anzingergeorge@mvista.com

 exit.c |1 +
 posix-timers.c |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)


Index: linux-2.6.13-rc/kernel/exit.c
===
--- linux-2.6.13-rc.orig/kernel/exit.c
+++ linux-2.6.13-rc/kernel/exit.c
@@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co
}
 
tsk-flags |= PF_EXITING;
+   del_timer_sync(tsk-signal-real_timer);
 
/*
 * Make sure we don't try to process any timer firings
Index: linux-2.6.13-rc/kernel/posix-timers.c
===
--- linux-2.6.13-rc.orig/kernel/posix-timers.c
+++ linux-2.6.13-rc/kernel/posix-timers.c
@@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct *
struct k_itimer *tmr;
 
while (!list_empty(sig-posix_timers)) {
-   tmr = list_entry(sig-posix_timers.next, struct k_itimer, list);
+   tmr = list_entry(sig-posix_timers.next,
+struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(sig-real_timer);
 }
 
 /*


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Andrew Morton wrote:

Roland McGrath [EMAIL PROTECTED] wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.


Hm... I was looking at 2.6.10 to figure it out.  This looks more correct.




OK..

--- 25/kernel/exit.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
acct_update_integrals(tsk);
update_mem_hiwater(tsk);
group_dead = atomic_dec_and_test(tsk-signal-live);
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(tsk-signal-real_timer);
acct_process(code);
+   }
exit_mm(tsk);
 
 	exit_sem(tsk);

diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c
--- 25/kernel/posix-timers.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
tmr = list_entry(sig-posix_timers.next, struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(sig-real_timer);
 }
 
 /*

_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread George Anzinger

Keith Owens wrote:
On Tue, 02 Aug 2005 18:12:27 -0700, 
George Anzinger  wrote:



How about something like:
if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > 
MAGIC)



current points to the current struct task, regs points to the kernel
stack.  Those two data areas can be completely separate, as they are on
i386.  Also i386 uses a separate kernel stack for interrupts.


Acually I must mean the thread_info and not current.  i386 only uses a 
seperate stack if you use 4K stacks.  I think others use seperate 
interrupt stacks, however :(.  Also, on thinking on it, I think some 
archs don't call the registers pt_regs either.  Oh, well, it was a 
thought...


Waiting for its brother... :)
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread George Anzinger

Steven Rostedt wrote:

On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:


Couldn't you just do some math off current->timestamp to see how long
the task has been running? This per arch stuff seems a bit invasive..



The thing is, I'm tracking how long the task is running in the kernel
without doing a schedule.  That's actually easy, but I don't want to
count when the task is in userspace. The per-arch is only updating so
that we don't count user space, otherwise the count could be in the
task_struct.  If there is an arch-independent way to tell if a task is
running in user-space or kernel when an interrupt goes off then I would
use it.  The per arch is actually easy, and I would write it, but I
don't have the hardware now to test it.  I could at least do PPC and
MIPS since I'm quite familiar with both, but I don't currently have a
cross compiler to compile it.

I understand your point, I would really prefer an arch independent
solution, but the timestamp from current just wont cut it.  Have another
idea, I'm all open for it.


How about something like:
if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > 
MAGIC)

The idea is that an interrupt from user space will be the ONLY thing on 
the stack while an interrupt from the kernel will have kernel stack 
under it.  Current is the bottom end of the kernel stack and regs + 
sizeof(pt_regs) is where the interrupt context started.  Assumptions a) 
stack grows down, b) no switch stack at interrupt.
MAGIC is some small number.  For x86 user it is actually zero, don't 
know about others but the saved context should be the first thing on the 
stack so a minimun frame size should do.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >