Re: [Kgdb-bugreport] [PATCH 1/5] KGDB: improve early init

2008-01-31 Thread George Anzinger

On 01/31/2008 01:36 AM,  Jan Kiszka was caught saying:
> Jan Kiszka wrote:
>> George Anzinger wrote:
>>> On 01/30/2008 04:08 PM,  Jan Kiszka was caught saying:
>>>> [Here comes a rebased version against latest x86/mm]
>>>>
>>>> In case "kgdbwait" is passed as kernel parameter, KGDB tries to set up
>>>> and connect to the front-end already during early_param evaluation.
>>>> This
>>>> fails on x86 as the exception stack is not yet initialized, 
effectively

>>>> delaying kgdbwait until late-init.
>>>
>>> I wonder how much work it would take to just set up the exception
>>> stack and proceed.  After all the kgbdwait is there to help debug
>>> very early kernel code...
>>
>> In principle a valid question, but I'm not the one to answer it. I
>> would not feel very well if I had to reorder this critical setup code.
>> Look, we would have to move trap_init in start_kernel before
>> parse_early_param, and that would affect _every_ arch...

I can not speak to other archs, but for x86 I called trap_init from the 
code that caught the kgdbwait.  At that time (since I retired, I have 
not looked at the actual kernel code) it could be called again later by 
the kernel code.  I.e. I did not try to reorder the kernel bring up 
code, but just added an additional call to trap_init and then only in 
the case of finding a kgdbwait.


As such, this would need to be arch specific...

>>
>
> BTW, do you know if EXCEPTION_STACK_READY fails for other archs in
> parse_early_param as well? It should, because my under standing of
> trap_init is that it's the functions to arm things like... exception
> handlers? And that raises the question of the deeper purpose of this
> check (and the invocation of kgdb_early_init from the argument parsing
> function). Sigh, KGDB is still a quite improvable piece of code.

Likely.  Once you get it in the main line kernel, one would hope that 
other arch code would be forth coming as many more "eyes" will be in play.

>
> Jan
>
> PS: Can we move this to some public list?

Sure, sorry I picked the wrong reply button, never intended it to be 
private.

>

--
George Anzinger   [EMAIL PROTECTED]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-08 Thread George Anzinger

Serge Noiraud wrote:

mercredi 7 Septembre 2005 23:16, George Anzinger wrote/a écrit :


Serge Noiraud wrote:


...


I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


Where did you get the kgdb you are using?  It looks like kgdb_ts is in 
this version, but it it not in the one on my website 
http://source.mvista.com/~ganzinger/


This related to kgdb?  I.e. does it go away if you either turn off kgdb
at configure time or just don't patch with kgdb?  (It sure seems
unrelated, but...)


I don't get those errors with CONFIG_KGDB=n
bellow I put the diff between a working . config and a non working .config


George



...
 INSTALL sound/usb/snd-usb-audio.ko
 INSTALL sound/usb/snd-usb-lib.ko
 INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F
System.map -b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING:


...
If I redo the make command only ( not make rpm ) I obtain the following :
# make
  CHK include/linux/version.h
make[1]: « arch/i386/kernel/asm-offsets.s » est à jour.
  CHK include/linux/compile.h
  CHK usr/initramfs_list
Kernel: arch/i386/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST
*** Warning: "preempt_locks" [net/sunrpc/sunrpc.ko] undefined!
*** Warning: "preempt_locks" [net/appletalk/appletalk.ko] undefined!
*** Warning: "preempt_locks" [fs/reiserfs/reiserfs.ko] undefined!
*** Warning: "preempt_locks" [fs/ntfs/ntfs.ko] undefined!
*** Warning: "preempt_locks" [fs/nfs/nfs.ko] undefined!
*** Warning: "preempt_locks" [fs/minix/minix.ko] undefined!
*** Warning: "preempt_locks" [fs/jbd/jbd.ko] undefined!
*** Warning: "preempt_locks" [fs/ext3/ext3.ko] undefined!
*** Warning: "preempt_locks" [fs/cifs/cifs.ko] undefined!
*** Warning: "preempt_locks" [fs/affs/affs.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/libata.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/ide-scsi.ko] undefined!
*** Warning: "preempt_locks" [drivers/scsi/gdth.ko] undefined!
*** Warning: "preempt_locks" [drivers/md/raid6.ko] undefined!
*** Warning: "preempt_locks" [drivers/md/raid5.ko] undefined!
*** Warning: "preempt_locks" [drivers/ide/ide-floppy.ko] undefined!
*** Warning: "preempt_locks" [drivers/block/pktcdvd.ko] undefined!
*** Warning: "preempt_locks" [drivers/block/loop.ko] undefined!


preempt_locks is being accessed from a module but is not exported.  This 
is turned on with CONFIG_DEBUG_RT_LOCKING_MODE so change that and it 
should build.



#


~

-# CONFIG_EARLY_PRINTK is not set
-# CONFIG_DEBUG_STACKOVERFLOW is not set
+CONFIG_LATENCY_TRACE=y
+CONFIG_RT_DEADLOCK_DETECT=y
+CONFIG_DEBUG_RT_LOCKING_MODE=y <- This one is doing 
it
+CONFIG_DEBUG_KOBJECT=y
+CONFIG_DEBUG_HIGHMEM=y

~

+CONFIG_KGDB=y
+CONFIG_KGDB_9600BAUD=y
+# CONFIG_KGDB_19200BAUD is not set
+# CONFIG_KGDB_38400BAUD is not set
+# CONFIG_KGDB_57600BAUD is not set
+# CONFIG_KGDB_115200BAUD is not set
+CONFIG_KGDB_PORT=0x3f8
+CONFIG_KGDB_IRQ=4
+CONFIG_KGDB_MORE=y
+CONFIG_KGDB_OPTIONS="-O1"
+CONFIG_NO_KGDB_CPUS=8


The following are not in the latest kgdb...

+CONFIG_KGDB_TS=y
+# CONFIG_KGDB_TS_64 is not set
+CONFIG_KGDB_TS_128=y
+# CONFIG_KGDB_TS_256 is not set
+# CONFIG_KGDB_TS_512 is not set
+# CONFIG_KGDB_TS_1024 is not set

.

+CONFIG_STACK_OVERFLOW_TEST=y
+CONFIG_TRAP_BAD_SYSCALL_EXITS=y  <--- I recommend against this one, see notes 
at front of kgdb patch
+CONFIG_KGDB_CONSOLE=y<--- Likewise use this only if you have only 
one serial port and no VGA
+CONFIG_KGDB_SYSRQ=y

 #

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can "bt" through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



Hi, everybody

I found two bugs in kgdb-ga-rt patch.

The first one : if CONFIG_SMP is not set, we have a compile error
The second one : if CONFIG_KGDB is not set, we have a link error 
I send you a diff patch to correct this. I am not sure the last patch is 
correct, but it works.


The reported bugs are now rolled into the kgdb patch.  Also, there is a 
new README.txt.  I also included, in the kgdb patch, an updated gdb 
macro file (Documentation/i386/kgdb/gdbinit.hw) which has a per_cpu 
macro to:


given a per_cpu structure name and the cpu number returns the
address of that structure, properly typed.

I am also putting my current version of time_stamp_tool.  This is the 
replacement for kgdb_ts() which I have removed from the kgdb patch. 
Still a little rough but it has promise of being arch independent.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-09-07 Thread George Anzinger

Serge Noiraud wrote:

mercredi 17 Août 2005 02:53, George Anzinger wrote/a écrit :


I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and
friends so that you can "bt" through them.  Apply in this order:
Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed
to fix the RT issues.



I'm trying this kgdb patch with 2.6.13 and I get the following errors.
Is there something I forgot ?


This related to kgdb?  I.e. does it go away if you either turn off kgdb 
at configure time or just don't patch with kgdb?  (It sure seems 
unrelated, but...)


George


...
  INSTALL sound/usb/snd-usb-audio.ko
  INSTALL sound/usb/snd-usb-lib.ko
  INSTALL sound/usb/usx2y/snd-usb-usx2y.ko
if [ -r System.map -a -x /sbin/depmod ]; then /sbin/depmod -ae -F System.map 
-b /var/tmp/kernel-2.6.13-rt4-root -r 2.6.13-rt4; fi
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/sunrpc/sunrpc.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/net/appletalk/appletalk.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/reiserfs/reiserfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ntfs/ntfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/nfs/nfs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/minix/minix.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/jbd/jbd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/ext3/ext3.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/cifs/cifs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/fs/affs/affs.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/libata.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/ide-scsi.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/scsi/gdth.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid6.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/md/raid5.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/ide/ide-floppy.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/pktcdvd.ko 
needs unknown symbol preempt_locks
WARNING: /var/tmp/kernel-2.6.13-rt4-root/lib/modules/2.6.13-rt4/kernel/drivers/block/loop.ko 
needs unknown symbol preempt_locks

make[3]: *** [_modinst_post] Erreur 1
erreur: Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)


Erreur de construction de RPM:
Mauvais status de sortie pour /var/tmp/rpm-tmp.51405 (%install)
make[2]: *** [rpm] Erreur 1
make[1]: *** [rpm] Erreur 2
make: *** [rpm] Erreur 2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Use proper casting with signed timespec.tv_nsec values

2005-09-01 Thread George Anzinger

john stultz wrote:

All,
I recently ran into a bug with an older kernel where xtime's tv_nsec
field had accumulated more then 2 seconds worth of time. The timespec's
tv_nsec is a signed long, however gettimeofday() treats it as an
unsigned long. Thus when the failure occured, very strange and difficult
to debug time problems occurred.

The main cause of the problem I was seeing is already fixed in mainline,
however just to be safe, I figured the following patch would be wise.

I only audited i386 and x86_64, however other arches probably could have
similar signed problems as well.

Please let me know if you have any further comments or feedback.


John,

There is a problem in the way this code handles the conversion to usec. 
 There is a conversion here and also in the get_offset code.  If the 
nanoseconds are carrier until after the addition of the two about 25% of 
the time you will end up with an additional usec in time.  I strongly 
suggest changing to convert to usec after the addition of xtime and 
get_offset time to avoid this.  If the "correct" thing is done in 
clock_gettime() (i.e. get_offset is in nanoseconds) this actually turns 
up as a back step in time WRT gettimeofday and clock_gettime().


George
--


thanks
-john

linux-2.6.13_signed-tv_nsec_A0.patch

diff --git a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c
+++ b/arch/i386/kernel/time.c
@@ -156,7 +156,7 @@ void do_gettimeofday(struct timeval *tv)
usec += lost * (USEC_PER_SEC / HZ);
 
 		sec = xtime.tv_sec;

-   usec += (xtime.tv_nsec / 1000);
+   usec += (unsigned long)xtime.tv_nsec / 1000;
} while (read_seqretry(&xtime_lock, seq));
 
 	while (usec >= 100) {

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -128,7 +128,7 @@ void do_gettimeofday(struct timeval *tv)
seq = read_seqbegin(&xtime_lock);
 
 		sec = xtime.tv_sec;

-   usec = xtime.tv_nsec / 1000;
+   usec = (unsigned long)xtime.tv_nsec / 1000;
 
 		/* i386 does some correction here to keep the clock 
 		   monotonous even when ntpd is fixing drift.

diff --git a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -824,7 +824,7 @@ static void update_wall_time(unsigned lo
do {
ticks--;
update_wall_time_one_tick();
-   if (xtime.tv_nsec >= 10) {
+   if ((unsigned long)xtime.tv_nsec >= 10) {
xtime.tv_nsec -= 10;
xtime.tv_sec++;
second_overflow();


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

On Tue, Aug 30, 2005 at 12:33:25AM -0700, George Anzinger wrote:


Tom Rini wrote:


CC: Andi Kleen <[EMAIL PROTECTED]>
This adds a call to notify_die() in the "no context" portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook 
arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook 2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c	2005-08-29 
11:09:13.0 -0700

@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
		return; 


+   if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14,
+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.


Please use a more descriptive text than "no context".  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like "bad kernel address" or 
"illegal kernel address".



"no context" is the label we're in, in the code.  What it's actually
used for is "hey, we (== kgdb) tried to read/write a very very bogus
addr, time to longjmp".  If it's not true that kgdb is at fault then we
drop to the debugger anyhow, and the user can see where they came from.

No.  What the user sees is the offending code (i.e. prior to the trap to 
page_fault), NOT how kgdb happend to be called.  The "no_context" is IN 
the _context_ of page_fault, but that is lost by the time you get to 
kgdb and ask to see _why_ (via, hint, hint: "p kgdb_info").


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] x86_64: Add a notify_die() call to the "no context" part of do_page_fault()

2005-08-30 Thread George Anzinger

Tom Rini wrote:

CC: Andi Kleen <[EMAIL PROTECTED]>
This adds a call to notify_die() in the "no context" portion of
do_page_fault() as someone on the chain might care and want to do a fixup.

---

 linux-2.6.13-trini/arch/x86_64/mm/fault.c |4 
 1 files changed, 4 insertions(+)

diff -puN arch/x86_64/mm/fault.c~x86_64-no_context_hook arch/x86_64/mm/fault.c
--- linux-2.6.13/arch/x86_64/mm/fault.c~x86_64-no_context_hook  2005-08-29 
11:09:13.0 -0700
+++ linux-2.6.13-trini/arch/x86_64/mm/fault.c   2005-08-29 11:09:13.0 
-0700
@@ -514,6 +514,10 @@ no_context:
if (is_errata93(regs, address))
 		return; 
 
+	if (notify_die(DIE_PAGE_FAULT, "no context", regs, error_code, 14,

+   SIGSEGV) == NOTIFY_STOP)
+   return;
+
 /*
  * Oops. The kernel tried to access some bad page. We'll have to
  * terminate things with extreme prejudice.


Please use a more descriptive text than "no context".  This bit of info 
SHOULD be available to the gdb/kgdb user and should indicate why kgdb 
was entered.  It thus should be something like "bad kernel address" or 
"illegal kernel address".



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: when or where can the case occur in "linux kernel development " about "kernel preemption"?

2005-08-29 Thread George Anzinger

linux-os (Dick Johnson) wrote:

On Sat, 27 Aug 2005, Sat. wrote:



2005/8/27, Christopher Friesen <[EMAIL PROTECTED]>:


Sat. wrote:


the case about kernel preemption as follow :

the book said "when a process that has a higher priority than the
currenty running process is awakened ".

but I can think about when such case can occur , could you give me an example ?


There may be others, but one common case is when a hardware interrupt
causes the higher priority process to become runnable.  Some examples of
this would be a network packet arriving, or the expiry of a hardware timer.

Chris



unfortunately, I cannot agree with you , normally ,when the kernel
runs in interrupt context , the schedule() should not be invoked
--my views .

then,could anyone  give me a definite example about network like above
or anything else to eluminate  this , ok?

thanks !

--




Sat.



Schedule is never executed from an interrupt, BUT, there may be
kernel threads or even user tasks that are sleeping, waiting
to be awakened when some preliminary interrupt processing has
occurred. The interrupt code may execute one of the wake-up calls
which will cause the target to be put into the run queue as soon
as possible.

Actually, this is not completly true.  The kernel sets a flag while 
handling interrupts that says it is within an interrupt.  This flag is 
cleared on the way out of the interrupt but prior to the return from 
interrupt (rfi) instruction.  Between this flag clearing and the rfi, 
there is a check made to see if the kernel is preemptable and, if so, if 
it is desired (i.e. something more important should run NOW).  If both 
of these are true, schedule is called to do the context switch.  So, 
schedule IS called from within the interrupt, but NOT within the area 
the kernel flags as being in an interrupt which is a subset of the 
actual interrupt.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:


George Anzinger [mailto:[EMAIL PROTECTED] wrote:


Well, I checked, it is "int $3".  Why then the panic?  If you try the
boot with kgdb (i.e. wait) and the do:
(gdb) disass gdb_interrupt
What do you find at +75?



Below is the console from the session it is interesting that gdb is not
able to access the memory.   I let it continue and then ctrl-c broke it
later in the boot cycle and tried disass again with the same result.

Feel free to flog me if this is stupid but I have just one EM64T machine
(test) and I'm using a regular P4 machine as dev.  I build the test
kernel on the EM64T machine and then copy the updated sources, object
files, and images via NFS to the dev machine.  I believe I read in the
kgdb doc that it was possible to use to different architecture machines
for test and dev although there wasn't any information about how to do
it.  This is probably the source of the OS/ABI warning.  I can probably
get the mothership to send me another EM64T machine if need be.  


What you need is a cross development environment.  Not having that, your 
gdb is likely not aware of how to talk to the hardware you are using. 
The cross develoment should cost a whole lot less than another machine.


George
--


vincent:/home/bwilkers/proj/linux-2.6.13-rc4-mm1 # gdb vmlinux
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...
warning: A handler for the OS ABI "GNU/Linux" is not built into this
configuration
of GDB.  Attempting to continue with the default i386:x86-64 settings.

Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) target remote /dev/ttyS0
Remote debugging using /dev/ttyS0
0x80503b50 in ?? ()
warning: no shared library support for this OS / ABI
(gdb) disass gdb_interrupt
Dump of assembler code for function gdb_interrupt:
0x80247009 :   Cannot access memory at address
0x80247009
(gdb) c
Continuing.
Bootdata ok (command line is root=/dev/sda2 kgdb console=kgdb)
Linux version 2.6.13-rc4-mm1-perfmon-em64t ([EMAIL PROTECTED]) (gcc version
3.3.5 20050117 (prerelease) (SUSE Linux)) #43 SMP Sat Aug 27 15:56:14
MDT 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - 3fe2f800 (usable)
 BIOS-e820: 3fe2f800 - 3fe3f832 (ACPI NVS)
 BIOS-e820: 3ff1 - 3ff3 (reserved)
 BIOS-e820: 3ff3 - 3ff4 (ACPI data)
 BIOS-e820: 3ff4 - 3fff (ACPI NVS)
 BIOS-e820: 3fff - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fed13000 - fed1a000 (reserved)
 BIOS-e820: fed1c000 - feda (reserved)
ACPI: PM-Timer IO Port: 0x408
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

George Anzinger wrote:

Wilkerson, Bryan P wrote:


Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1. 
I almost have it working but kgdb seems to have a few issues.  I can get

it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with "target remote /dev/ttyS0" and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, "unable to open an initial
console",  followed by, "Kernel panic - not syncing: Attempted to kill
init!"

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
$echo -e "\003" > /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   



Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.


Well, I checked, it is "int $3".  Why then the panic?  If you try the 
boot with kgdb (i.e. wait) and the do:

(gdb) disass gdb_interrupt
What do you find at +75?






--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Christoph Lameter wrote:

On Fri, 26 Aug 2005, Alex Williamson wrote:



  Would we ever want to favor a frequency shifting timer over anything
else in the system?  If it was noticeable perhaps we'd just need a
callback to re-evaluate the frequency and rescan for the best timer.  If
it happens without notice, a flag that statically assigns it the lowest
priority will due.  Or maybe if the driver factored the frequency
shifting into the drift it would make the timer undesirable without
resorting to flags.  Thanks,



Timers are usually constant. AFAIK Frequency shifts only occur through 
power management. In that case we usually have some notifiers running 
before the change. These notifiers need to switch to a different time 
source if the timer frequency will be shifting or the timer will become 
unavailable.


If there is a notifier, I presume we can track it.  We might want to 
refine things so as to not hit too big a bump when the shift occures, 
but I think it is doable.  The desirability of doing it, I think, 
depends on the availablity of something better.  The access time of the 
TSC is "really" enticing.  Even so, I think a _good_ clock would not 
depend on long term accuracy of something as fast as the TSC.  Vendors 
are even modulating these to reduce RFI, but still, because of its 
speed, it makes the best interpolator for the jiffie to jiffie times.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdb on EM64T

2005-08-26 Thread George Anzinger

Wilkerson, Bryan P wrote:

Thanks you Tom and George for the tips on using kgdb with
2.6.13-rc4-mm1.  


I almost have it working but kgdb seems to have a few issues.  I can get
it running from the dev machine using the kgdb and console=kgdb boot
options on the test kernel.  The kernel waits as it should and when I
attach with "target remote /dev/ttyS0" and I can continue the boot but
eventually it gets to a point in the boot where it frees unused kernel
memory successfully and then a warning, "unable to open an initial
console",  followed by, "Kernel panic - not syncing: Attempted to kill
init!"

Removing the console=kgdb boot option and the machine boots all the way
to run level 5.   I tried to break into kgdb at this point using the 
	$echo -e "\003" > /dev/ttyS0

from the dev machine but the test kernel panics at gdb_interrupt+75 when
it receives anything on the serial port.  Hmmm...

I'm wondering if I'm maybe just the first to try this on EM64T (kernel
builds in the arch/x86_64 tree).   


Possibly:).  Since the serial port seems to work (i.e. the first test 
above), the fault seems to be in handling the int3.  Is int3 the right 
instruction for this machine?  If not you would make the change in 
kgdb.h.  I think that is the only place it is defined.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need better is_better_time_interpolator() algorithm

2005-08-26 Thread George Anzinger

Alex Williamson wrote:

On Fri, 2005-08-26 at 08:39 -0700, Christoph Lameter wrote:


I think a priority is something useful for the interpolators. Some of 
the decisions about which time sources to use also have criteria different 
from drift/latency/jitter/cpu. F.e. timers may not survive various 
power-saving configurations. Thus I would think that we need a priority 
plus some flags.


Some of the criteria for choosing a time source may be:



Hi Christoph,

   I sent another followup to this thread with a patch containing a
fairly crude algorithm that I think better explains my starting point.
I'm sure the weighting and scaling factors need work, but I think many
of the criteria you describe will favor the right clock.


1. If a system boots up with a single cpu then there is no question that 
the ITC/TSC should be used because of the fast access.


We need to factor in frequency shifting here, especially if it happens 
with out notice.



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

John McCutchan wrote:

On Thu, 2005-08-25 at 11:54 -0700, George Anzinger wrote:


Robert Love wrote:


On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:



On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:

~
I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?



You are reading the log correctly. There are two bugs. One is that if we
pass X to idr_get_new_above, it can return X again (doesn't ever seem to
return < X). The other problem is that the find fails on 1024 (and 2048
if we skip 1024).


That IS strange.  1024 is on a "level" boundry, but then next level is 
2**15, not 2**11.  I will take a look.





So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.



Yes that is the exact scenario. Only 2 id's are used at any given time,
and once we hit 1024 things break. This doesn't happen when the tree is
not empty.

Thanks for looking at this!


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Inotify problem [was Re: 2.6.13-rc6-mm1]

2005-08-25 Thread George Anzinger

Robert Love wrote:

On Thu, 2005-08-25 at 09:33 -0400, John McCutchan wrote:


On Thu, 2005-08-25 at 22:07 +1200, Reuben Farrelly wrote:


~

dovecot: Aug 25 19:31:26 Warning: IMAP(gilly): removing wd 1022 from inotify fd 
4
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1023
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1024 from inotify fd 
4
dovecot: Aug 25 19:31:27 Error: IMAP(gilly): inotify_rm_watch() failed: 
Invalid argument

dovecot: Aug 25 19:31:27 Warning: IMAP(gilly): removing wd 1023 from inotify fd 
4
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024
dovecot: Aug 25 19:31:28 Warning: IMAP(gilly): inotify_add_watch returned 1024

Note the incrementing wd value even though we are removing them as we go..



What kernel are you running? The wd's should ALWAYS be incrementing, you
should never get the same wd as you did before. From your log, you are
getting the same wd (after you inotify_rm_watch it). I can reproduce
this bug on 2.6.13-rc7.

idr_get_new_above 


isn't returning something above.

Also, the idr layer seems to be breaking when we pass in 1024. I can
reproduce that on my 2.6.13-rc7 system as well.



This is using latest CVS of dovecot code and with 2.6.12-rc6-mm(1|2) kernel.

Robert, John, what do you think?   Is this possibly related to the oops seen 
in the log that I reported earlier?  (Which is still showing up 2-3 times per 
day, btw)


There is definitely something broken here.



Jim, George-

We are seeing a problem in the idr layer.  If we do idr_find(1024) when,
say, a low valued idr, like, zero, is unallocated, NULL is returned.


I think the best thing is to take idr into user space and emulate the 
problem usage.  To this end, from the log it appears that you _might_ be 
moving between 0, 1 and 2 entries increasing the number each time.  It 
also appears that the failure happens here:

add 1023
add 1024
find 1024  or is it the remove that fails?  It also looks like 1024 got 
allocated twice.  Am I reading the log correctly?


So, is it correct to assume that the tree is empty save these two at 
this time?  I am just trying to figure out what the test program needs 
to do.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NTP ntp-helper functions

2005-08-25 Thread George Anzinger

john stultz wrote:

Andrew, All,

This patch cleans up a commonly repeated set of changes to the NTP
state variables by adding two helper inline functions:

ntp_clear(): Clears the ntp state variables


How many places is this called in any given arch?  I ask because it 
_may_ save space if it is NOT inlined.  I don't think it is ever in a 
critical code path...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 16:46 -0700, George Anzinger wrote:


john stultz wrote:


On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:



Roman Zippel wrote:



Hi,

On Tue, 23 Aug 2005, john stultz wrote:





I'm assuming gettimeofday()/clock_gettime() looks something like:
xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns



At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.



Yes, clearly a forward knowledge of the NTP adjustment is necessary for
gettimeofday(), because after the NTP adjustment has been accumulated
into xtime, there's nothing left for gettimeofday to adjust (its already
been applied). :)



Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)



Well, in my example, the ntp_adjustment is a fixed nanosecond offset, so
it would be added to the nanosecond offset from the last tick (which is
how the current code works). If you are doing scaling (as you have in
the equation above), then the problem goes away, since you can apply the
adjustment consistently through any interval.


Until the end of the correction time...



I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a "* ntp_adj2" is 
needed.  



Ok, so your forcing gettimeofday to be interval aware, so its applying
different fixed NTP adjustments to different chunks of the current
interval. The issue of course is if you're using fixed adjustments, is
that you have to have n ntp adjustments for n intervals, or you have to
apply the same ntp adjustment to multiple intervals. 


Uh, are you saying that one ntpd call can set up several different 
adjustments?  I was assuming that any given call would set up either a 
fixed adjustment for ever or a fixed adjustment to be applied for a 
fixed number of ticks (or until so much correcting was done, which, in 
the end is the same thing at this point in the code).


If ntpd has to come back to change the adjustment, I am assuming that 
some kernel action can be taken at that time to sync the xtime clock and 
the gettimeofday reading of it.  I.e. we would only have to keep track 
of one adjustment with a possible pre specified end time.




I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.



If you argue that we only need two adjustments, why not argue for only
one? You're saying have one adjustment that you apply for the first
tick's worth of time, and a second adjustment that you apply for the
following N ticks' worth of time in the interval. Why the odd base
case? 


Correct me if I am wrong here, but I am assuming that ntpd can ask for 
an adjustment of X amou

Re: Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger

john stultz wrote:

On Wed, 2005-08-24 at 17:24 -0700, George Anzinger wrote:

CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.



Err, the Cyclone does not generate interrupts. So this issue does not
affect those systems.

As for the HPET, it sets its own interrupt frequency based off of
KERNEL_TICK_USEC (which you're right, isn't quite what is used in the
jiffies conversions).  Would it be easier to just adjust that value to
use ACTHZ or CLOCK_TICK_RATE?


If you want to take that approach you would want the HPET to interrupt 
every TICK_NSEC nanoseconds, that being what xtime is pushed by each tick.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Incorrect CLOCK_TICK_RATE in 2.6 kernel

2005-08-24 Thread George Anzinger
CLOCK_TICK_RATE	is used by the kernel to compute LATCH, TICK_NSEC and 
tick_nsec.  This latter is used to update xtime each tick.  TICK_NSEC is 
then used to compute (at compile time) the conversion constants needed 
to convert to/from jiffies from/to timespec and timeval (and others).


The problem is that, if the timer being used is either Cyclone or HPET, 
the wrong CLOCK_TICK_RATE is used.  This means that systems using these 
interrupt sources will be doing a) incorrect update of xtime and b) 
incorrect conversion of jiffies.  Since these two values will track each 
other this will not be seen by simple gettimeofday(); 
sleep();gettimeofday() tests, but will be seen as a system clock drift 
(without NTP) or with NTP, a somewhat high drift rate (to the point of 
loosing sync at HZ=1000).


The fact that the user/ system chooses the clock to use at boot time and 
can change the clock after boot means that it is not possible to pin 
down CLOCK_TICK_RATE at compile time.  However, since the computation of 
TICK_NSEC and the conversion constants is rather involved it is clear 
that we REALLY do want to compute these at compile time.


The suggested solution is to a) set up a structure with the default 
(clock of choice at config time) conversion constants in it at compile 
time.  Then b) at clock init time, populate the structure with the 
proper constants for the given clock.  These can be computed at compile 
time, but from the correct  CLOCK_TICK_RATE for the given clock. 
Switching to a fall back clock would also require an update of this 
structure.


Commits?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kgdbwait in 2.6.13-rc4-mm1?

2005-08-24 Thread George Anzinger

Wilkerson, Bryan P wrote:

Is there an equivalent kernel boot option for kgdbwait in
2.6.13-rc4-mm1?  I grep'd the kernel source but didn't find kgdbwait.

Is there any documentation other than the source for the flavor of KGDB
that is included in the akpm kernel patch?   


The patch has some documentation at Documentation/i386/kgdb/* as well as 
a couple of gdb macros...


The wait option is "gdb".  This has been in flux so, to be absolutely 
sure, look at include/asm-i386/bugs.h

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-24 Thread George Anzinger

john stultz wrote:

On Tue, 2005-08-23 at 17:29 -0700, George Anzinger wrote:


Roman Zippel wrote:


Hi,

On Tue, 23 Aug 2005, john stultz wrote:




I'm assuming gettimeofday()/clock_gettime() looks something like:
 xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.




If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 



Not quite. The issue that I'm trying to describe is that if, we
inconsistently calculate time intervals in gettimeofday and the timer
interrupt, we have the possibility for time inconsistencies.

The trivial example using the current code would be something like:

Again with my 2 cyc per tick clock, HZ=1000.

gettimeofday():
xtime + offset_ns

timer_interrupt:
xtime += tick_length + ntp_adj
offset_ns = 0

0:  gettimeofday:  0 + 0 = 0 ns
1:  gettimeofday:  0 + 500k ns = 500k ns
2:  gettimeofday:  0 + 1M ns = 1M ns
2:  timer_interrupt:  
2:  gettimeofday:  1M ns + 0 ns = 1M ns

3:  gettimeofday:  1M ns + 500k ns = 1.5M ns
4:  gettimeofday:  1M ns + 1M ns = 2 ns
4:  timer_interrupt (using -500ppm adjustment)
4:  gettimeofday:  1,999,500 ns + 0 ns = 1,999,500 ns

At point 4 you are introducing a NEW ntp adjustment.  This, I submit, 
needs to actually have been introduced to the system prior to the 
interrupt at point 2 with the first xtime change at point 4.  However, 
gettimeofday() should be aware of it from the interrupt at point 2 and 
be doing corrections from that time forward.  Thus when the point 4 
interrutp happens xtime will be the same at the gettimeofday a ns earlier.


Likewise, gettimeofday() needs to know when to stop apply the correction 
so that if a tick is late, it will apply the correction only for those 
times that it was needed.  This, could be done by figuring the offset 
thusly:


offset = (offset from last tick to end of ntp period * ntp_adj1) + 
(offset from end of ntp period to now)


I suppose it is possible that the latter part of the offset is also 
under a different ntp correction which would mean a "* ntp_adj2" is 
needed.  I would argue that only two terms are needed here regardless of 
how late a tick is.  This is because, I would expect the ntp system call 
to sync the two clocks.  This means in your example, the ntp call would 
have been made at, or prior to the timer interrupt at 2 and this is the 
same edge that gettimeofday is to used to start applying the correction.







It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.



Well, in many arches gettimeofday() works around the above issue by
capping the offset_ns value as such:


I think this may have been done with only usec gettimeofday.  Now that 
we have clock_gettime() returning nsec, we need to be a bit more careful.


gettimeofday:
xtime + min(offset_ns, tick_len + ntp_adj)

The problem with this is that when we have lost or late ticks, or if we
are using dynamic ticks you have granularity problems.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-23 Thread George Anzinger

Roman Zippel wrote:

Hi,

On Tue, 23 Aug 2005, john stultz wrote:



I'm assuming gettimeofday()/clock_gettime() looks something like:
  xtime + (get_cycles()-last_update)*(mult+ntp_adj)>>shift



Where did you get the ntp_adj from? It's not in my example.
gettimeofday() was in the previous mail: "xtime + (cycle_offset * mult +
error) >> shift". The difference between system time and reference 
time is really important. gettimeofday() returns the system time, NTP 
controls the reference time and these two are synchronized regularly.

I didn't see that anywhere in your example.


John,
If I read your example right, the problem is when the NTP adjustment 
changes while the two clocks are out of sync (because of a late tick). 
It would appear that gettimeofday would need to know that the NTP 
adjustment is changing  (and to what).  It would also appear that this 
is known by the ntp code and could be made available to gettimeofday. 
If it is changing due to an NTP call, that system call, itself, 
should/must force synchronization.  So the only case gettimeofday needs 
to worry/know about is that an adjustment is to change at time X to 
value Y.  Also, me thinks there is only one such change that can be 
present at any given time.


Hope this helps...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Add disk hotswap support to libata RESEND #2

2005-08-23 Thread George Anzinger

Jim Ramsay wrote:

On 8/23/05, Jim Ramsay <[EMAIL PROTECTED]> wrote:


Then I must have found an undocumented feature!  I've applied this set
of patches to a 2.6.11 kernel (with few problems) and ran into a bunch
of "scheduling while atomic" errors when hotplugging a drive, culprit
being probably scsi_sysfs.c where scsi_remove_device locks a mutex, or
perhaps when it then calls class_device_unregister, which does a
'down_write'.



After further debugging, it appears that the problem is the debounce
timer in libata-core.c.

Timers appear to operate in an atomic context, so timers should not be
allowed to call scsi_remove_device, which eventually schedules.

Any suggestions on the best way to fix this?


Workqueue, perhaps.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-22 Thread George Anzinger
   return(ret);
 }
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:
~


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.13-rc6-rt9] PI aware dynamic priority adjustment

2005-08-20 Thread George Anzinger

Thomas Gleixner wrote:

George,

On Fri, 2005-08-19 at 17:19 -0700, George Anzinger wrote:


2. Drift of cyclic timers (armed by set_timer()):

Due to rounding errors and the drift adjustment code, the fixed
increment which is precalculated when the timer is set up and added on
rearm, I see creeping deviation from the timeline. 


I have a patch lined up to base the rearm on human (nsac) units, so this
effect will go away. But this is waste of time until (1.) is not solved.

George ???


Could I (we) see what you have in mind?



Nothing which applies clean at the moment and I have no access to the
box where the patch floats around.

It's simply explained.

Current code:

set_timer()
calc interval->jiffies / interval->arch_cycles;
based on it.interval

rearm()
timer->expires += interval->jiffies;
timer->arch_cycle_expires += interval->arch_cycles;
normalize(timer);

Patched code:

set_timer()
	timer.interval = it.interval; 
	timer.next_expire = it.value; 
	both stored as timespec


rearm()
next_expire += interval;
calc timer->expires/arch_cycle_expires;

So on each rearm we eliminate the rounding errors and take the drift
adjustment into account.

It adds some calculation overhead to each rearm, but 

I think the standard was written to eliminate the need for this.  The 
notion is that we have a resolution which we use in the calculations so 
while there may be drift WRT his request, there should be no drift WRT 
the requested value rounded up to the next resolution.


Still, if we can't keep that resolution in arch_cycles...

On another issue along this line, I have been thinking of changing the 
x86 TSC arch cycle size to 1ns.  (NOT the resolution, the units for the 
arch cycle.)  The reason to do this is to correctly track changes in cpu 
frequency as it is today, we would need to track down and update all 
pending HR timers when ever the frequency changed.  By using a common 
unit all we need to do is change the conversion constants (well I guess 
they would not be constants any more :).  I REALLY don't want to do this 
as it does add conversion overhead, but I can not think of another clean 
way to track TSC frequency changes.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Latency with Real-Time Preemption with 2.6.12

2005-08-18 Thread George Anzinger

Steven Rostedt wrote:

On Wed, 2005-08-17 at 19:38 -0700, Sundar Narayanaswamy wrote:


Hi,
I am trying to experiment using 2.6.12 kernel with the realtime-preempt 
V0.7.51-38 patch to determine the kernel preemption latencies with the 
CONFIG_PREEMPT_RT mode. The test program I wrote does the following on

a thread with highest priority (99) and SCHED_FIFO policy to simulate
a real time thread.

t1 = gettimeofday
nanosleep(for 3 ms)
t2 = gettimeofday

I was expecting to see the difference t2-t1 to be close to 3 ms. However, 
the smallest difference I see is 4 milliseconds under no system load, 
and the difference is as high as 25 milliseconds under moderate to 
heavy system load (mostly performing disk I/O).



That version of Ingo's patch does not implement High-Resolution Timers.
Thomas worked on merging this into the latest RT patch.  Without
high-res timers, the best you may ever get is 4ms. This is because
nanosleep is to guarantee _at_least_ 3 ms.  So you have the following
situation:

0   1   23   4 (ms)
+---+---++---+--->
   ^^
   ||
 Start here 0+3 = 3  here we have the response

If we look at this in smaller units than ms, we started on 0.8ms and
responded at 3.2ms where we have 3.2 - 0.8 = 2.4 which is less than 3ms.
So since Ingo's patch doesn't increase the resolution of the timers from
a jiffy (which is currently 1ms) Linux is forced to add one more than
you need.


Based on the articles and the mails I read on this list, I understand that 
worst case latencies of 1 ms (or less) should be possible using the RT 
Preemption patch, but I am unable to get anything less than 4 millseconds 
even with sleep times smaller than 3 ms. I am running the tests on a SBC 
with a 1.4G Pentium M, 512M RAM, 1GB compact flash (using IDE). 

I believe I have the high resolution timer working correctly, because if I 
comment out the sleep line above t2-t1 is consistenly 0 or 1 microsecond.



I don't think you have the high res timer working, since there is no
high res timer in that kernel.


Following earlier discussions (in July) in this list, I tried to set kernel 
configuration parameters like CONFIG_LATENCY_TRACE to get tracing/debug 
information, but I didn't find these parameters in my .config file.


I appreciate your suggestions/insights into the situation and steps that I 
should try to get more debug/tracing information that might help to understand 
the cause of high latency.



It's not a high latency.  It's doing exactly as it is suppose to, since
the nanosleep doesn't have high-res support (in that kernel).  If you
really want to measure latency, you need to add a device or something
and see what the response time of an interrupt going off to the time a
thread is woken to respond to it.  Now with Ingo's that is really fast.


Another way to do it is to set up a repeating timer.  You _must_ read 
back the timer to get the repeat time it is really using, and then 
measure how well it does giving signals at these repeat times.  FAR FAR 
more than three lines of code...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Multiple virtual address mapping for the same code on IA-64 linux kernel.

2005-08-18 Thread George Anzinger

David S. Miller wrote:

From: Anton Blanchard <[EMAIL PROTECTED]>
Date: Fri, 19 Aug 2005 04:29:55 +1000



Calling itanium the "fastest 64bit processor at any given clock frequency"
on lkml is likewise inflammatory :)



I totally agree.


Since the itanium off loads a lot of its instruction steam decisions on 
to the compiler(s), where other processors just do it, one might argue 
that you can not even characterize the itanium without bundling in the 
compilers...


Not to say that is wrong but just to make it clear that saying the 
itanium speed is  is like saying that a cummings diesel is fast with 
out saying what sort of car/truck it is mounted in.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-17 Thread George Anzinger

Nishanth Aravamudan wrote:
~
IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.



I don't want to be too parental either, but I also am trying to avoid
code duplication. Lots of drivers basically do something like
poll_event() does (or could do with some changes), i.e. looping a
constant amount multiple times, checking something every so often. The
patch was just a thought, though. I will keep evaluating drivers and see
if it's a useful interface to have eventually.

I guess I'm just concerned with making an unintuitive interface. As was
brought up at OLS, drivers are a major source of bugs/buggy code. The
simpler, more useful we can make interfaces, the better, I think. I'm
not claiming you disagree, I just want to make my own motives clear.
While fixing up the schedule_timeout() comment would make it clear what
schedule_timeout() achieves, I'm not sure how useful such an interface
is, if every caller adds 1 :) I need to mull it over, though... Lots to
consider. I also, of course, want to stay flexible for the reasons you
mention (letting the driver adjust the timeout as they expect to).


I would leave the +1 alone and put in the correct documentation.  This 
way _more_ folks will be made aware of the mid jiffie issue.  Far to 
often we see (and let get in) patches that mess up user interfaces 
around this issue.  The recent changes to itimer come to mind...



~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] KGDB for Real-Time Preemption systems

2005-08-17 Thread George Anzinger

Ingo Molnar wrote:

* George Anzinger  wrote:



I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can "bt" through them.  Apply in this order: 
Ingo's patch kgdb-ga-rt.patch common_kgdb_cfi_annotations.patch


This is, more or less, the same kgdb that is in Andrew's mm tree 
changed to fix the RT issues.



great. For the time being i wont add it to the -RT tree (because KGDB is 
not destined for upstream merging it seems), but it sure is a useful 
development/debugging add-on.


I agree on not adding it.  Tom Rini is working on a version the Andrew 
seems inclined to merge.  When that happens I will most likely put 
together enhancements to it to bring it up to what this one does. 
Meanwhile I am trying to capture some of Tom's changes in this one. 
Also, it is MUCH easier for me to maintain as a seperate patch.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC - 0/9] Generic timekeeping subsystem (v. B5)

2005-08-17 Thread George Anzinger

Roman Zippel wrote:


~

The thing that worries me about this function is that it does every 
thing in usec.  We are using nsec in xtime now and I wonder if it would 
not be more accurate to do the math in nsecs.  Even tick size 
(tick_nsec) does not translate well to usec, it currently being 999849 
nsecs.


George

---

 kernel/time.c  |3 ++-
 kernel/timer.c |   53 +
 2 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/time.c
===
--- linux-2.6.orig/kernel/time.c2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/time.c 2005-08-16 01:37:20.0 +0200
@@ -366,8 +366,9 @@ int do_adjtimex(struct timex *txc)
} /* txc->modes & ADJ_OFFSET */
if (txc->modes & ADJ_TICK) {
tick_usec = txc->tick;
-   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);
}
+   if (txc->modes & (ADJ_FREQUENCY|ADJ_OFFSET|ADJ_TICK))
+   time_recalc();
} /* txc->modes */
 leave: if ((time_status & (STA_UNSYNC|STA_CLOCKERR)) != 0
|| ((time_status & (STA_PPSFREQ|STA_PPSTIME)) != 0
Index: linux-2.6/kernel/timer.c
===
--- linux-2.6.orig/kernel/timer.c   2005-07-13 03:18:04.0 +0200
+++ linux-2.6/kernel/timer.c2005-08-16 23:10:53.0 +0200
@@ -559,6 +559,7 @@ found:
  */
 unsigned long tick_usec = TICK_USEC;   /* USER_HZ period (usec) */
 unsigned long tick_nsec = TICK_NSEC;   /* ACTHZ period (nsec) */
+unsigned long tick_nsec2 = TICK_NSEC;
 
 /* 
  * The current time 
@@ -569,6 +570,7 @@ unsigned long tick_nsec = TICK_NSEC;		/*

  * the usual normalization.
  */
 struct timespec xtime __attribute__ ((aligned (16)));
+struct timespec xtime2 __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 
 EXPORT_SYMBOL(xtime);

@@ -596,6 +598,33 @@ static long time_adj;  /* tick adjust (
 long time_reftime; /* time at last adjustment (s)  */
 long time_adjust;
 long time_next_adjust;
+static long time_adj2, time_adj2_cur, time_freq_adj2, time_freq_phase2, 
time_phase2;
+
+void time_recalc(void)
+{
+   long f, t;
+   tick_nsec = TICK_USEC_TO_NSEC(tick_usec);


This leaves bits on the floor.  Is it not possible to do this whole 
calculation in nano seconds?  Currently, for example, tick_nsec is 999849...

+
+   t = time_freq >> (SHIFT_USEC + 8);
+   if (t) {
+   time_freq -= t << (SHIFT_USEC + 8);
+   t *= 1000 << 8;
+   }
+   f = time_freq * 125;
+   t += tick_usec * USER_HZ * 1000 + (f >> (SHIFT_USEC - 3));
+   f &= (1 << (SHIFT_USEC - 3)) - 1;
+   tick_nsec2 = t / HZ;
+   f += (t % HZ) << (SHIFT_USEC - 3);
+   f <<= 5;
+   time_adj2 = f / HZ;
+   time_freq_adj2 = f % HZ;
+
+   printk("tr: %ld.%09ld(%ld,%ld,%ld,%ld) - %ld.%09ld(%ld,%ld,%ld)\n",
+   xtime.tv_sec, xtime.tv_sec,
+   tick_nsec, time_freq, time_offset, time_next_adjust,
+   xtime2.tv_sec, xtime2.tv_nsec,
+   tick_nsec2, time_adj2, time_freq_adj2);
+}
 
 /*

  * this routine handles the overflow of the microsecond field
@@ -739,6 +768,16 @@ static void second_overflow(void)
 #endif
 }
 
+static void second_overflow2(void)

+{
+   time_adj2_cur = time_adj2;
+   time_freq_phase2 += time_freq_adj2;
+   if (time_freq_phase2 > HZ) {
+   time_freq_phase2 -= HZ;
+   time_adj2_cur++;
+   }
+}
+
 /* in the NTP reference this is called "hardclock()" */
 static void update_wall_time_one_tick(void)
 {
@@ -786,6 +825,20 @@ static void update_wall_time_one_tick(vo
time_adjust = time_next_adjust;
time_next_adjust = 0;
}
+
+   delta_nsec = tick_nsec2;
+   time_phase2 += time_adj2_cur;
+   if (time_phase2 >= (1 << (SHIFT_USEC + 2))) {
+   long ltemp = time_phase2 >> (SHIFT_USEC + 2);
+   time_phase2 -= ltemp << (SHIFT_USEC + 2);
+   delta_nsec += ltemp;
+   }
+   xtime2.tv_nsec += delta_nsec;
+   if (xtime2.tv_nsec >= NSEC_PER_SEC) {
+   xtime2.tv_nsec -= NSEC_PER_SEC;
+   xtime2.tv_sec++;
+   second_overflow2();
+   }
 }
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "un

[patch] KGDB for Real-Time Preemption systems

2005-08-16 Thread George Anzinger

I have put a version of KGDB for x86 RT kernels here:
http://source.mvista.com/~ganzinger/

The common_kgdb_cfi_ stuff creates debug records for entry.S and 
friends so that you can "bt" through them.  Apply in this order:

Ingo's patch
kgdb-ga-rt.patch
common_kgdb_cfi_annotations.patch

This is, more or less, the same kgdb that is in Andrew's mm tree changed 
to fix the RT issues.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-16 Thread George Anzinger

Nishanth Aravamudan wrote:

On 04.08.2005 [09:45:55 -0700], George Anzinger wrote:

Uh... PLEASE tell me you are NOT changing timespec_to_jiffies() (and 
timeval_to_jiffies() to add 1.  This is NOT the right thing to do.  For 
repeating times (see setitimer code) we need the actual time as we KNOW 
where the jiffies edge is in the repeating case.  The +1 is needed ONLY 
for the initial time, not the repeating time.



See:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112208357906156&w=2



I followed that thread, George, but I think it's a different case with
schedule_timeout() [maybe this indicates drivers/other users should
maybe be using itimers, but I'll get to that in a sec].


I think I miss understood back then :).



With schedule_timeout(), we are just given a relative jiffies value. We
have no context as to which task is requesting the delay, per se,
meaning we don't (can't) know from the interface whether this is the
first delay in a sequence, or a brand new one, without changing all
users to have some sort of control structure. The callers of
schedule_timeout() don't even get a pointer to the timer added
internally.

So, adding 1 to all sleeps seems like it might be reasonable, as looping
sleeps probably need to use a different interface. I had worked a bit
ago on something like poll_event() with the kernel-janitors group, which
would abstract out the repeated sleeps. Basically wait_event() without
wait-queues... Maybe we could make such an interface just use itimers?
I've attached my old patch for poll_event(), just for reference.


I think not.  itimers is really pointed at a particular system call and 
has resources in the task structure to do it.  These would be hard to 
share...


My point, I guess, is that in the schedule_timeout() case, we don't know
where the jiffies edge is, as we either expire or receive a wait-queue
event/signal, we never mod_timer() the internal timer... So we have to
assume that we need to sleep the request. But maybe Roman's idea of
sleeping a certain number of jiffy edges is sufficient. I am not yet
convinced driver authors want/need such an interface, though, still
thinking it over.


IMNSHO we should not get too parental with kernel only interfaces. 
Adding 1 is easy enough for the caller and even easier to explain in the 
instructions (i.e. this call sleeps for X jiffies edges).  This allows 
the caller to do more if needed and, should he ever just want to sync to 
the next jiffie he does not have to deal with backing out that +1.




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features

2005-08-15 Thread George Anzinger

Ingo Molnar wrote:

* Ingo Molnar <[EMAIL PROTECTED]> wrote:



* George Anzinger  wrote:



Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent 
the attached patch to Andrew on this, but meanwhile, if you want RT, 
SMP, PREEMPT_DEBUG you will be much better off with this.


ah - thanks, applied. Might explain some of the recent SMP weirdnesses 
i'm seeing. Attributed them to the HRT patch ;-)



i'm still seeing weird crashes under SMP, which go away if i disable 
CONFIG_HIGH_RES_TIMERS. (this after i fixed a couple of other SMP bugs 
in the HRT code) It happens sometime during the bootup, after enabling 
the network but before users can log in. There's no good debug info, 
just a hang that comes from all CPUs trying to get some debug info out 
but crashing deeply.


I haven't looked at this new code all that closely as yet.  One thing I 
did notice is that there is an assumption that the "timer being 
delivered flag" can be shared between LR timers and HR timers.  I 
suspect this is wrong as the delivery code is in seperate threads (I 
assume).  This could lead to del_timer_async missing a timer.


In the prior patch we just ignored the del_timer_async issue for HR 
timers (code I plan to do soon).  This WAS taken care of in earlier 
kernels by a reuse of one of the list link fields, but Andrew convince 
me that this was _not_ good.


So, my guess, a nanosleep for an RT task (I think you said these are 
promoted to HR) is completing and over writing the deliver in progress 
flag for a LR timer which just happens to have a del_timer_sync going on 
at the same time.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-13 Thread George Anzinger

Zachary Amsden wrote:

George Anzinger wrote:


Nick Piggin wrote:


George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?




Normal code   NMI
fetch preempt_count
add   <-  interrupt here add and store then 
subtract and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side... 




So NMI changing the preempt count and restoring in the middle of a RWM 
is not the problem.  Thus I don't understand what the issue is.  NMI 
must undo all side effects.  Does the PREEMPT_DEBUG code check the count 
somewhere within the NMI handler?  If so, shouldn't the proper fix be to 
make that code aware that it could be running inside of an NMI and/or 
ensure that code is not called from within the NMI handler?


Yes that is the problem.  The sanity check in PREEMPT_DEBUG fails when 
called from the NMI handler.





--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger

Nick Piggin wrote:

George Anzinger wrote:

The NMI entry and exit code fiddles with bits in the preempt count.  
If an NMI happens while some other code is doing the same, bits will 
be lost.  This patch removes this modify code from the NMI path till 
we can come up with something better.




Humour me for a minute here...
NMI restores preempt_count back to its old value upon exit, right?
So what does a race case look like?


Normal code   NMI
fetch preempt_count
add   <-  interrupt here add and store then subtract 
and store, darn!

store preempt_count

Ok, no problem.

The problem is in the RT code when PREEMPT_DEBUG is on.  The tests for 
reasonable counts fail because of the rather undefined state when NMI 
picks up the word.  The failure is on the NMI side...




--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.53-01, High Resolution Timers & RCU-tasklist features

2005-08-12 Thread George Anzinger

Ingo, all

I, silly person that I am, configured an RT, SMP, PREEMPT_DEBUG system. 
 Someone put code in the NMI path to modify the preempt count which, 
often as not will generate a PREEMPT_DEBUG message as there is no tell 
what state the preempt count is in on an NMI interrupt.  I have sent the 
attached patch to Andrew on this, but meanwhile, if you want RT, SMP, 
PREEMPT_DEBUG you will be much better off with this.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzinger

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


[PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread George Anzinger
The NMI entry and exit code fiddles with bits in the preempt count.  If 
an NMI happens while some other code is doing the same, bits will be 
lost.  This patch removes this modify code from the NMI path till we can 
come up with something better.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 

Description:

Modifying a word from NMI code runs the very real risk of loosing
either then new or the old bits.  Remember, we can not prevent an
NMI interrupt from ANYWHERE, inparticular between the read and the
write of a read modify write sequence.

This patch removes the update of the preempt count from the NMI
path.

Signed-off-by: George Anzinger

 hardirq.h |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/include/linux/hardirq.h
===
--- linux-2.6.13-rc.orig/include/linux/hardirq.h
+++ linux-2.6.13-rc/include/linux/hardirq.h
@@ -98,9 +98,12 @@ extern void synchronize_irq(unsigned int
 #else
 # define synchronize_irq(irq)  barrier()
 #endif
-
-#define nmi_enter()irq_enter()
-#define nmi_exit() sub_preempt_count(HARDIRQ_OFFSET)
+/*
+ * Re think these.  NMI _must_not_ share data words with non-nmi code
+ * Meanwhile, just do a no-op.
+ */
+#define nmi_enter()/*  irq_enter()  */
+#define nmi_exit() /*  sub_preempt_count(HARDIRQ_OFFSET) */
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING
 static inline void account_user_vtime(struct task_struct *tsk)


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-12 Thread George Anzinger

Bill Davidsen wrote:

George Anzinger wrote:


Srivatsa Vaddagiri wrote:


On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.





George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration in 
practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to 
keep good time, the only rock in town is the one connected to the PIT 
(and the pmtimer).  The problem is, if you want the jiffie edge to be 
stable, there is just now way to reprogram the PIT to get it back to 
where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in 
the end, cause loss of time (too fast, too slow, take your pick).  I 
also found PITs that signaled that they had loaded the count (they set 
a status bit) prior to actually loading it.  All in all, I find the 
PIT is just an ugly beast to try to program.  On the other hand, if 
you want regular interrupts at some fixed period, it will do this 
forever (give or take a epoch or two;) with out touching anything 
after the initial program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you 
use the TSC or pmtimer to figure the gross loss of interrupts and 
leave the PIT interrupt again to define the jiffie edge.  If you have 
other, more pressing, concerns I suppose you can program the PIT, but 
don't expect your wall clock to be as stable as it is now.


What are the portability and scaling issues if it were done this way? It 
clearly looks practical on x86 uni, but if we want per-CPU non-tick, I'm 
less sure how it would work.


I am not sure how much is involved.  For VST I disabled the tick 
generated NMI watchdog interrupt on a per cpu basis but stopped the PIT 
tick only when all cpus were idle.  The next step would be to mess with 
the interrupt steering logic to keep the tick away from idle cpus.  I 
did not get into this level in my work, being mainly interested in 
embedded systems.


But when you go to non-x86 hardware, is there always going to be another 
source of wakeup available if the PIT is blocked instead of reset? I 
have to go back and look at how SPARC hardware works, I don't remember 
enough to be useful.


Most (all) other archs don't have PITs.  The x86 sucks big time when it 
comes to time keeping hardware.  The most common hardware is a counter 
that runs forever (much as the TSC but FIXED in frequency).  Interrupts 
are generated either by comparing a register to this or using companion 
counters that just count down to zero.  In either case you don't loose 
time because you can always precisely set up an interrupt.  To sleep, 
then, you just set your sleep time in the normal time base interrupt 
counter.  At the end, you know exactly what to set to get back to the 
regular tick.


These other platforms make VST and High Res Timers so easy...
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-10 Thread George Anzinger

Tony Lindgren wrote:
~

Do you have a patch around for improving next_timer_interrupt()?

Well, sort of.  The code in the VST patch does the right thing.  Problem 
is it does a bit more than the timer.c code.  You can find that code on 
the HRT site CVS.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-10 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Tue, Aug 09, 2005 at 12:36:58PM -0700, George Anzinger wrote:

IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.



George,
Can't TSC (or equivalent) serve as a backup while PIT is disabled,
especially considering that we disable PIT only for short duration 
in practice (few seconds maybe) _and_ that we don't have HRT support yet?


I think it really depends on what you want.  If you really want to keep 
good time, the only rock in town is the one connected to the PIT (and 
the pmtimer).  The problem is, if you want the jiffie edge to be stable, 
there is just now way to reprogram the PIT to get it back to where it was.


In an old version of HRT I did a trick of loading a short count (based 
on reading the TSC or pmtimer) and then put the LATCH count on top of 
it.  In a correctly performing PIT, it should count down the short 
count, interrupt, load the long count and continue from there.  Aside 
from the machines that had BAD PITs (they reset on the load instead of 
the expiry of the current count) there were other problems that, in the 
end, cause loss of time (too fast, too slow, take your pick).  I also 
found PITs that signaled that they had loaded the count (they set a 
status bit) prior to actually loading it.  All in all, I find the PIT is 
just an ugly beast to try to program.  On the other hand, if you want 
regular interrupts at some fixed period, it will do this forever (give 
or take a epoch or two;) with out touching anything after the initial 
program set up.


In the end, I concluded that, for the community kernel, it is really 
best to just interrupt the irq line and leave the PIT run.  Then you use 
the TSC or pmtimer to figure the gross loss of interrupts and leave the 
PIT interrupt again to define the jiffie edge.  If you have other, more 
pressing, concerns I suppose you can program the PIT, but don't expect 
your wall clock to be as stable as it is now.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 3

2005-08-09 Thread George Anzinger

Tony Lindgren wrote:

* Srivatsa Vaddagiri <[EMAIL PROTECTED]> [050805 05:37]:


On Wed, Aug 03, 2005 at 06:05:28AM +, Con Kolivas wrote:

This is the dynamic ticks patch for i386 as written by Tony Lindgen 
<[EMAIL PROTECTED]> and Tuukka Tikkanen <[EMAIL PROTECTED]>. 
Patch for 2.6.13-rc5


There were a couple of things that I wanted to change so here is an updated 
version. This code should have stabilised enough for general testing now.


Con,
I have been looking at some of the requirement of tickless idle CPUs in
core kernel areas like scheduler and RCU. Basically, both power management and 
virtualization benefit if idle CPUs can cut off useless timer ticks. Especially 
from a virtualization standpoint, I think it makes sense that we enable this 
feature on a per-CPU basis i.e let individual CPUs cut off their ticks as and 
when they become idle. The benefit of this is more visible in platforms that 
host lot of (SMP) VMs on the same machine. Most of the time, these VMs may be 
partially idle (some CPUs in it are idle, some not) and it is good that we 
quiesce the timer ticks on the partial set of idle CPUs. Both S390 and Xen ports
of Linux kernel have this ability today (S390 has it in mainline already and 
Xen has it out of tree).



Good point, and it would be nice to have it resolved for systems that support
idling individual CPUs. The current setup was done because when I was tinkering
with the amd76x_pm patch a while a back, I noticed that idling the cpu
disconnects all cpus from the bus. (As far as I remember)

So this may need to be configured depending on the system.



From this viewpoint, I think the current implementation of dynamic tick
falls short of this requirement. It cuts of the timer ticks only when 
all CPUs go idle.


Apart from this observation, I have some others about the current dynamic tick
patch:

- All CPUs seem to cut off the same number of ticks (dyn_tick->skip). Isn't
 this wrong, considering that the timer list is per-CPU? This will cause
 some timers to be serviced much later than usual.



Yes if it's done on per-CPU basis. In the current setup the first interrupt
will kick the system off the dyn-tick state and the timers get checked again.



- The fact that dyn_tick_state is global and accessed from all CPUs
 is probably a scalability concern, especially if we allow the ticks
 to be cut off on per-CPU basis.




From idling devices point of view, we still need some global variable I

believe. How else would you be able to tell all devices that the whole
system does not have any timers for next 2 seconds?



- Again, when we allow this on a per-CPU basis, subsystems like
 RCU need to know the partial set of idle CPUs. RCU already does
 that thr' nohz_cpu_mask (which will need to replace dyn_cpu_map).



Sounds like that could work for dyn-tick too.


- Looking at dyn_tick_timer_interrupt, would it be nice if we avoid calling 
 do_timer_interrupt so many times and instead update jiffies to

 (skipped_ticks - 1) and then call do_timer_interrupt once? I think
 VST does it that way.



In the long run we would do the calculations in usecs and just emulate
jiffies from the hw timer. But yes, optimizing updating the time would be
great.



- dyn_tick->max_skip = 0xff / apic_timer_val;
From my reading of Intel docs, APIC_TMICT is 32-bit. So why does the
 above calculation take only 24-bits into account? What am I missing here?



Hmm, could be a bug here, needs to be checked. Maybe 32-bit APIC timer is
optional support, or maybe I accidentally pulled the optional 24-bit support
from the ACPI PM timer.

But in any case on P4 systems the APIC timer is not the bottleneck as
stopping or reprogramming PIT also kills APIC. (This does not happen on P3
systems). So the bottleneck most likely is the length of PIT.


I can take a shot at addressing these concerns in dynamic_tick patch, but it 
seems to me that VST has already addressed all these to a big extent. Had you 
considered VST before? The biggest bottleneck I see in VST going mainline is 
its dependency on HRT patch but IMO it should be possible to write a small patch
to support VST w/o HRT. 


George, what do you think?



HRT + VST depend on APIC only, and does not use next_timer_interrupt().


I convinced my self that the next_timer... code in timer.c misses timers 
(i.e. gives the wrong answer).  I did this (after wondering due to 
performance) by scanning the whole timer list after I had the 
next_timer... answer and finding a better answer, not always, but some 
times.  That code does not address the cascade list correctly.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386 No-Idle-Hz aka Dynamic-Ticks 5

2005-08-09 Thread George Anzinger

Srivatsa Vaddagiri wrote:

On Sun, Aug 07, 2005 at 03:12:21PM +1000, Con Kolivas wrote:

Respin of the dynamic ticks patch for i386 by Tony Lindgen and Tuukka Tikkanen 
with further code cleanups. Are were there yet?



Con,
I am afraid until SMP correctness is resolved, then this is not
in a position to go in (unless you want to enable it only for UP, which
I think should not be our target). I am working on making this work 
correctly on SMP systems. Hopefully I will post a patch soon.


Another observation I have made regarding dynamic tick patch is that PIT is 
being reprogrammed whenever the CPUs are coming out of sleep state (because of 
an interrupt say). This can happen at any arbitary time, not necessarily on 
jiffy boundaries. As a result, there will be an offset between when jiffy 
interrupts will now occur vs when they would have originally occured had PIT 
never been stopped. Not sure if having this offset is good, but atleast one 
necessary change that I foresee if zeroing delay_at_last_interrupt when 
disabling dynamic tick.  For that matter, it may be easier to disable the PIT 
timer by just masking PIT interrupts (instead of changing its mode).


IMNOHO, this is the ONLY way to keep proper time.  As soon as you 
reprogram the PIT you have lost track of the time.


My VST patch just turns masks the interrupt.
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Roland McGrath wrote:
There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.



That's correct.  de_thread will turn the thread calling exec into the new
leader and kill off all the other threads, including the old leader.  The
exec'ing thread's existing task_struct is reassigned to the PID of the
original leader.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  



That is correct.



If an alarm comes during this wait I suspect it will wake this zombi and
cause problems.



You are mistaken.  The signal code handles process signals sent when the
leader is a zombie.  The group leader sticks around with the PID that
matches the TGID, until there are no live threads with its TGID.  That is
how process-wide kill can still work.


Yes, I see, traced through the signal delivery.  So Linus' patch as well 
as the regression of Ingo's will fix all of this.  Right?


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-05 Thread George Anzinger

Gerd Knorr wrote:

On Thu, Aug 04, 2005 at 03:02:51PM -0700, Andrew Morton wrote:


Roland McGrath <[EMAIL PROTECTED]> wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.



OK..

+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(&tsk->signal->real_timer);
acct_process(code);
+   }
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
-   del_timer_sync(&sig->real_timer);



That one fixes it for me.


There are other concerns.  Let me see if I understand this.  A thread 
(other than the leader) can exec and we then need to change the 
real_timer to wake the new task which will NOT be using the same task 
struct.


My looking at the code shows that the thread leader can exit and then 
stays around as a zombi until the last thread in the group exits.  If an 
alarm comes during this wait I suspect it will wake this zombi and cause 
problems.  So, don't we need to also change real_timer's task when the 
exiting task is the real_timer wake up task, assigning it to some other 
member of the group?  Note, I don't say just if it is the group leader...


Then when we finally release the signal structure, we can "del" the timer.

Did I miss something here?



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Andrew Morton wrote:

Roland McGrath <[EMAIL PROTECTED]> wrote:


That's wrong.  It has to be done only by the last thread in the group to go.
Just revert Ingo's change.


Hm... I was looking at 2.6.10 to figure it out.  This looks more correct.




OK..

--- 25/kernel/exit.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/exit.c   Thu Aug  4 15:01:06 2005
@@ -829,8 +829,10 @@ fastcall NORET_TYPE void do_exit(long co
acct_update_integrals(tsk);
update_mem_hiwater(tsk);
group_dead = atomic_dec_and_test(&tsk->signal->live);
-   if (group_dead)
+   if (group_dead) {
+   del_timer_sync(&tsk->signal->real_timer);
acct_process(code);
+   }
exit_mm(tsk);
 
 	exit_sem(tsk);

diff -puN kernel/posix-timers.c~revert-timer-exit-cleanup kernel/posix-timers.c
--- 25/kernel/posix-timers.c~revert-timer-exit-cleanup  Thu Aug  4 15:00:55 2005
+++ 25-akpm/kernel/posix-timers.c   Thu Aug  4 15:01:06 2005
@@ -1166,7 +1166,6 @@ void exit_itimers(struct signal_struct *
tmr = list_entry(sig->posix_timers.next, struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(&sig->real_timer);
 }
 
 /*

_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: 2.6.12: itimer_real timers don't survive execve() any more

2005-08-04 Thread George Anzinger

Gerd Knorr wrote:

  Hi,

Somewhere between 2.6.11 and 2.6.12 the regression in $subject
was added to the linux kernel.  Testcase below.


Yep.  The itimer changes got a bit carried away.  Here is a fix.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 
Description:

The changes to itimer of late (after 2.6.11) cause itimers not
to survive the exec* calls.  Standard says they should.  

Signed-off-by: George Anzinger

 exit.c |1 +
 posix-timers.c |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)


Index: linux-2.6.13-rc/kernel/exit.c
===
--- linux-2.6.13-rc.orig/kernel/exit.c
+++ linux-2.6.13-rc/kernel/exit.c
@@ -794,6 +794,7 @@ fastcall NORET_TYPE void do_exit(long co
}
 
tsk->flags |= PF_EXITING;
+   del_timer_sync(&tsk->signal->real_timer);
 
/*
 * Make sure we don't try to process any timer firings
Index: linux-2.6.13-rc/kernel/posix-timers.c
===
--- linux-2.6.13-rc.orig/kernel/posix-timers.c
+++ linux-2.6.13-rc/kernel/posix-timers.c
@@ -1183,10 +1183,10 @@ void exit_itimers(struct signal_struct *
struct k_itimer *tmr;
 
while (!list_empty(&sig->posix_timers)) {
-   tmr = list_entry(sig->posix_timers.next, struct k_itimer, list);
+   tmr = list_entry(sig->posix_timers.next,
+struct k_itimer, list);
itimer_delete(tmr);
}
-   del_timer_sync(&sig->real_timer);
 }
 
 /*


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger

Nishanth Aravamudan wrote:
~

Sorry, I forgot that sys_nanosleep() also always adds 1 to the request
(to account for this same issue, I believe, as POSIX demands no early
return from nanosleep() calls). There are some other locations where
similar

+ (t.tv_sec || t.tv_nsec)


This is not the same as "always add 1".  We don't do it this way just to 
have fun with C.  If you change schedule_timeout() to add the 1, 
nanosleep() will need to do things differently to get the same behavior. 
 (And, YES users do pass in zero sleep times.)



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [UPDATE PATCH] push rounding up of relative request to schedule_timeout()

2005-08-04 Thread George Anzinger
e(unsigned int msecs)
 {
-   unsigned long timeout = msecs_to_jiffies(msecs) + 1;
+   unsigned long timeout = msecs_to_jiffies(msecs);
 
 	while (timeout && !signal_pending(current)) {

set_current_state(TASK_INTERRUPTIBLE);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread George Anzinger

Keith Owens wrote:
On Tue, 02 Aug 2005 18:12:27 -0700, 
George Anzinger  wrote:



How about something like:
if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > 
MAGIC)



current points to the current struct task, regs points to the kernel
stack.  Those two data areas can be completely separate, as they are on
i386.  Also i386 uses a separate kernel stack for interrupts.


Acually I must mean the thread_info and not current.  i386 only uses a 
seperate stack if you use 4K stacks.  I think others use seperate 
interrupt stacks, however :(.  Also, on thinking on it, I think some 
archs don't call the registers pt_regs either.  Oh, well, it was a 
thought...


Waiting for its brother... :)
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread George Anzinger

Steven Rostedt wrote:

On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:


Couldn't you just do some math off current->timestamp to see how long
the task has been running? This per arch stuff seems a bit invasive..



The thing is, I'm tracking how long the task is running in the kernel
without doing a schedule.  That's actually easy, but I don't want to
count when the task is in userspace. The per-arch is only updating so
that we don't count user space, otherwise the count could be in the
task_struct.  If there is an arch-independent way to tell if a task is
running in user-space or kernel when an interrupt goes off then I would
use it.  The per arch is actually easy, and I would write it, but I
don't have the hardware now to test it.  I could at least do PPC and
MIPS since I'm quite familiar with both, but I don't currently have a
cross compiler to compile it.

I understand your point, I would really prefer an arch independent
solution, but the timestamp from current just wont cut it.  Have another
idea, I'm all open for it.


How about something like:
if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > 
MAGIC)

The idea is that an interrupt from user space will be the ONLY thing on 
the stack while an interrupt from the kernel will have kernel stack 
under it.  Current is the bottom end of the kernel stack and regs + 
sizeof(pt_regs) is where the interrupt context started.  Assumptions a) 
stack grows down, b) no switch stack at interrupt.
MAGIC is some small number.  For x86 user it is actually zero, don't 
know about others but the saved context should be the first thing on the 
stack so a minimun frame size should do.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: [PATCH] NMI watch dog notify patch

2005-08-02 Thread George Anzinger
It seems that the subject patch generates a warning (missed it on the 
compile).  Here is a patch to eliminate the warning.

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger
Type: Defect Fix 

Description:
This patch eliminates the warning generated in die_nmi() when
calling notify_die() by adding "const" to notify_die()'s
definition.

Signed-off-by: George Anzinger 

Index: linux-2.6.13-rc/include/asm-i386/kdebug.h
===
--- linux-2.6.13-rc.orig/include/asm-i386/kdebug.h
+++ linux-2.6.13-rc/include/asm-i386/kdebug.h
@@ -41,7 +41,7 @@ enum die_val {
DIE_PAGE_FAULT,
 };
 
-static inline int notify_die(enum die_val val,char *str,struct pt_regs 
*regs,long err,int trap, int sig)
+static inline int notify_die(enum die_val val, const char *str,struct pt_regs 
*regs,long err,int trap, int sig)
 {
struct die_args args = { .regs=regs, .str=str, .err=err, 
.trapnr=trap,.signr=sig };
return notifier_call_chain(&i386die_chain, val, &args);


Re: Clock resolution / RT preemption

2005-08-01 Thread George Anzinger

greg wrote:

Hi folks,

I'm looking for a timer resolution lower than 1 ms (and monotonic clock 
rate) destined to be used with some network code running on x86 
platforms. Would you please provide me with informations about how to 
get/implement this.


AFAIK, there's a "high resultion timer" patch hanging around, but 
there's not much informations with regard to portability (specific 
hardware requirements ?), scalability, integration with RT patches.
I understand the POSIX 1003.1b Clocks and Timers system calls are not 
fully available within the linux kernel (and libc ?), am I right on that ?


On the HRT web site (see signature) there is a CVS repository.  In there 
is a special version for the RT kernel.  As to porting it to other 
archs, have a look at the include/linux/hrtimer.h file.  It has (or 
should have) all you need to know.  Please pass back any port you do.


One more question : I believe Ingo's preemption patch run 
timers/interrupt handlers within kernel threads, how should I assign 
specific priority to address my goals without compromising system 
stability ?


Carefully :)

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-08-01 Thread George Anzinger

Keith Owens wrote:
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger  wrote:



This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

void die_nmi (struct pt_regs *regs, const char *msg)
{
+	if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
+		   0, 0, SIGINT) == NOTIFY_STOP)

+   return;
+
spin_lock(&nmi_print_lock);
/*
* We are in trouble anyway, lets at least try



Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.


Excellent idea!


Also your patch adds a trailing whitespace on the call to notify_die().


Fixed.

This should do it.
-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, "NMI Watchdog detected LOCKUP");
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 0, SIGINT) ==
+   NOTIFY_STOP)
+   return;
+
spin_lock(&nmi_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread George Anzinger

Andrew Morton wrote:

Keith Owens <[EMAIL PROTECTED]> wrote:

I had though that too, but it does not allow recovery (i.e. lets reset 


>the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

>Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
>trap, only from the watchdog.  Also, there is a notify in the path to 
>the other nmi stuff.


I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;



That all makes sense - let's go that way?


Looks good to me.  Trimed a bit more fat too.  Here is the complete patch.

-

-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:

This patch adds a notify to the die_nmi notify that the system
is about to be taken down.  If the notify is handled with a
NOTIFY_STOP return, the system is given a new lease on life.

We also change the nmi watchdog to carry on if die_nmi returns.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c   |5 -
 traps.c |4 
 2 files changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -495,8 +495,11 @@ void nmi_watchdog_tick (struct pt_regs *
 */
alert_counter[cpu]++;
if (alert_counter[cpu] == 5*nmi_hz)
+   /*
+* die_nmi will return ONLY if NOTIFY_STOP happens..
+*/
die_nmi(regs, "NMI Watchdog detected LOCKUP");
-   } else {
+
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
}
Index: linux-2.6.13-rc/arch/i386/kernel/traps.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/traps.c
+++ linux-2.6.13-rc/arch/i386/kernel/traps.c
@@ -555,6 +555,10 @@ static DEFINE_SPINLOCK(nmi_print_lock);
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+   if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
+  0, 0, SIGINT) == NOTIFY_STOP)
+   return;
+
spin_lock(&nmi_print_lock);
/*
* We are in trouble anyway, lets at least try


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Keith Owens wrote:
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger  wrote:


I have been doing some work on kgdb to pull a few of it "fingers" out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.



I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

I had though that too, but it does not allow recovery (i.e. lets reset 
the watchdog and try again).


Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew Morton wrote:

George Anzinger  wrote:


This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.



It looks sensible, but as there aren't actually any in-kernel uses for this
I'd have thought it would be better for it to live out-of-tree?


I should just bundle it with the kgdb patch then?
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] NMI watch dog notify patch

2005-07-28 Thread George Anzinger

Andrew,
I have been doing some work on kgdb to pull a few of it "fingers" out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.


On a related issue, I feel very queasy with sending nmi interrupts and 
non-nmi events to the same notify code.  Would you be open to a patch to 
create a seperate notify list for nmi events?



-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Enhancement 
Description:
This patch adds a notify to the nmi watchdog to notify that
the system is about to be taken down by the watchdog.  If the
notify is handled with a NOTIFY_STOP return, the system is
given a new lease on life.

This give debug code a chance to a) catch watchdog timeouts and
b) possibly allow the system to continue, realizing that 
the time out may be due to debugger activities such as single 
stepping which is usually done with "other" cpus held.

Signed-off-by: George Anzinger

 nmi.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

Index: linux-2.6.13-rc/arch/i386/kernel/nmi.c
===
--- linux-2.6.13-rc.orig/arch/i386/kernel/nmi.c
+++ linux-2.6.13-rc/arch/i386/kernel/nmi.c
@@ -26,11 +26,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "mach_traps.h"
 
@@ -494,8 +496,15 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
-   die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   if (alert_counter[cpu] == 5*nmi_hz) {
+   if (notify_die(DIE_NMIWATCHDOG, "nmi_ipi_watchdog", 
+  regs, 0, 0, SIGINT) == NOTIFY_STOP) {
+   last_irq_sums[cpu] = sum;
+   alert_counter[cpu] = 0;
+   } else {
+   die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   }
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;
@@ -555,7 +564,7 @@ int proc_unknown_nmi_panic(ctl_table *ta
return -EBUSY;
} else {
set_nmi_callback(unknown_nmi_panic_callback);
-   }
+   } 
} else {
release_lapic_nmi();
unset_nmi_callback();


[PATCH] fix normalize problem in posix timers.

2005-07-28 Thread George Anzinger
We found this (after a customer complained) and it is in the kernel.org 
kernel.  Seems that for CLOCK_MONOTONIC absolute timers and 
clock_nanosleep calls both the request time and wall_to_monotonic are 
subtracted prior to the normalize resulting in an overflow in the 
existing normalize test.  This causes the result to be shifted ~4 
seconds ahead instead of ~2 seconds back in time.  Patch is attached.

-
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger 
Type: Defect Fix 
Description:
The normalize code in posix-timers.c fails when the tv_nsec 
member is ~1.2 seconds negative.  This can happen on absolute
timers (and clock_nanosleeps) requested on CLOCK_MONOTONIC
(both the request time and wall_to_monotonic are subtracted
resulting in the possibility of a number close to -2 seconds.)

This fix uses the set_normalized_timespec() (which does not 
have an overflow problem) to fix the problem and as a side
effect makes the code cleaner.

Signed-off-by: George Anzinger 

 posix-timers.c |   17 +++--
 1 files changed, 3 insertions(+), 14 deletions(-)

Index: linux-2.6.13-rc/kernel/posix-timers.c
===
--- linux-2.6.13-rc.orig/kernel/posix-timers.c
+++ linux-2.6.13-rc/kernel/posix-timers.c
@@ -915,21 +915,10 @@ static int adjust_abs_time(struct k_cloc
jiffies_64_f = get_jiffies_64();
}
/*
-* Take away now to get delta
+* Take away now to get delta and normalize
 */
-   oc.tv_sec -= now.tv_sec;
-   oc.tv_nsec -= now.tv_nsec;
-   /*
-* Normalize...
-*/
-   while ((oc.tv_nsec - NSEC_PER_SEC) >= 0) {
-   oc.tv_nsec -= NSEC_PER_SEC;
-   oc.tv_sec++;
-   }
-   while ((oc.tv_nsec) < 0) {
-   oc.tv_nsec += NSEC_PER_SEC;
-   oc.tv_sec--;
-   }
+   set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec,
+   oc.tv_nsec - now.tv_nsec);
}else{
jiffies_64_f = get_jiffies_64();
}


Re: [PATCH] Re: itimer oddness in 2.6.12

2005-07-26 Thread George Anzinger

Andrew Morton wrote:

George Anzinger  wrote:


+   while (time_before_eq(p->signal->real_timer.expires, jiffies))
+   p->signal->real_timer.expires += inc;



It gives me the creeps when I see timer code doing this, and it seems to be
done relatively frequently.

Surely it can be calculated arithmetically?  If not, are you really sure
that it is not exploitable by malicious code?


Hm.. the system only falls into a loop here if the system is loaded to 
the point where we are a jiffie or more late.  The prior code just did 
the "+=" and called add_timer, possibly with a time in the past.  I 
suspect that way of doing this would never catch up if the user asked 
for a one jiffie repeat time.  Also, this is faster than the div, mpy if 
you are not late (or even if you are several jiffies late).


A possible alternative might be:
p->signal->real_timer.expires += inc; 
if (time_before_eq(p->signal->real_timer.expires, jiffies))
		p->signal->real_timer.expires += ((jiffies - 
p->signal->real_timer.expires + inc -1) / inc) * inc;


Both a div and a mpy in there.  I really think the "while" is ok, but if 
you prefer...	


The last time you questioned this sort of thing was in the code to 
correct an absolute timer.  In that case we were adjusting after a clock 
set and, yes, it was possibly exploitable (assuming you could set the 
clock).  Here we don't have that possibility, i.e. we only get into the 
loop if the system is late.

-

--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: itimer oddness in 2.6.12

2005-07-22 Thread George Anzinger

Tom Marshall wrote:

On Fri, Jul 22, 2005 at 08:21:25PM +0100, Paulo Marques wrote:


Tom Marshall wrote:


The patch to fix "setitimer timer expires too early" is causing issues for
the Helix server.  We have a timer processs that updates the server's
timestamp on an itimer and it expects the signal to be delivered at roughly
the interval retrieved from getitimer.  This is very consistent on every
platform, including Linux up to 2.6.11, but breaks on 2.6.12.  On 2.6.12,
setting the itimer to 10ms and retrieving the actual interval from 
getitimer

reports 10.998ms, but the timer interrupts are consistently delivered at
roughly 11.998ms.  


Unfortunately, this is not so clear cut as it seems :(


Oops!  That patch is wrong.  The +1 should be applied to the initial 
interval _only_.  We KNOW when the repeating intervals start (i.e. at 
the jiffie edge) and don't need to adjust them.  The patch, however, 
incorrectly, rolls them all into one.  The attach patch should fix the 
problem.  Warnning, it compiles and boots, but I have not tested it.


George
--



Yes, I am sure that it is not a simple problem.  I am not a kernel developer
but I imagine that issues such as NTP adjustments would complicate this
issue.  I must also admit that I am not intimately familiar with the POSIX
spec regarding itimers.

Our current code does a setitimer followed by getitimer, then uses the
actual interval retrieved by getitimer to set a global timer delta.  On each
timer signal, it updates the notion of the current time by the timer delta. 
As mentioned, this works on every other platform (Solaris, BSD, HPUX, AIX,

DGUX, IRIX, Tru64, and Linux up to 2.6.11) but breaks on 2.6.12.




This is not an insurmountable problem for userspace.  It can be easily
solved by using gettimeofday in the timer interrupt instead of adding the
delta to the current time blindly.  No big deal.  I just wanted to point
this issue out and ensure that (1) it was a known issue, and (2) it is the
direction that the Linux kernel intends to take.  If so, no big deal and we
can modify the timer code to take that into account.



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. 
Type: Defect Fix
Disposition: 
Description:

This changes setitimer as follows:
1. The repeating timer is figured using the requested time 
(not +1 as we know where we are in the jiffie).
2. The tests for interval too large are left to the time_val to jiffie code.


Signed-off-by: George Anzinger 
 itimer.c |   37 -
 1 files changed, 16 insertions(+), 21 deletions(-)

Index: linux-2.6.13-rc/kernel/itimer.c
===
--- linux-2.6.13-rc.orig/kernel/itimer.c
+++ linux-2.6.13-rc/kernel/itimer.c
@@ -112,28 +112,11 @@ asmlinkage long sys_getitimer(int which,
return error;
 }
 
-/*
- * Called with P->sighand->siglock held and P->signal->real_timer inactive.
- * If interval is nonzero, arm the timer for interval ticks from now.
- */
-static inline void it_real_arm(struct task_struct *p, unsigned long interval)
-{
-   p->signal->it_real_value = interval; /* XXX unnecessary field?? */
-   if (interval == 0)
-   return;
-   if (interval > (unsigned long) LONG_MAX)
-   interval = LONG_MAX;
-   /* the "+ 1" below makes sure that the timer doesn't go off before
-* the interval requested. This could happen if
-* time requested % (usecs per jiffy) is more than the usecs left
-* in the current jiffy */
-   p->signal->real_timer.expires = jiffies + interval + 1;
-   add_timer(&p->signal->real_timer);
-}
 
 void it_real_fn(unsigned long __data)
 {
struct task_struct * p = (struct task_struct *) __data;
+   unsigned long inc = p->signal->it_real_incr;
 
send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p);
 
@@ -141,14 +124,23 @@ void it_real_fn(unsigned long __data)
 * Now restart the timer if necessary.  We don't need any locking
 * here because do_setitimer makes sure we have finished running
 * before it touches anything.
+* Note, we KNOW we are (or should be) at a jiffie edge here so 
+* we don't need the +1 stuff.  Also, we want to use the prior
+* expire value so as to not "slip" a jiffie if we are late.
+* Deal with requesting a time prior to "now" here rather than
+* in add_timer.
 */
-   it_real_arm(p, p->signal->it_real_incr);
+   if (!inc)
+   return;
+   while (time_before_eq(p->signal->real_timer.expires, jiffies))
+   p->signal->real_timer.expires += inc;
+   add_timer(&p->signal->real_timer);  
 }
 
 int do_setitimer(int which, struct itimerval *v

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-13 Thread George Anzinger

Con Kolivas wrote:

On Thu, 14 Jul 2005 05:10, Linus Torvalds wrote:


On Wed, 13 Jul 2005, Vojtech Pavlik wrote:


No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have
a counter that counts the ticks in nanoseconds (xtime ...), the first
will be exact, the second will be accumulating an error.


It's not even that we have a counter like that, it's the simple fact that
we have a standard interface to user space that is based on milli-, micro-
and nanoseconds.

(For "poll()", "struct timeval" and "struct timespec" respectively).

It's totally pointless saying that we can do 864 Hz "exactly", when the
fact is that all the timeouts we ever get from user space aren't in that
format. So the only thing that matters is how close to a millisecond we
can get, not how close to some random number.



That may be the case but when I've measured the actual delay of schedule 
timeout when using nanosleep from userspace, the average at 1000Hz was 1.4ms 
+/- 1.5 sd . When we're expecting a sleep of "up to 1ms" we're getting 50% 
longer than the longest expected. Purely mathematically the accuracy of 
changing HZ from 1000 -> 864 will not bring with it any significant change to 
the accuracy. This can easily be measured as well to confirm. 

Using schedule timeout as an argument against it doesn't hold for me. 
Vojtech's comment of :


"No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have a 
counter that counts the ticks in nanoseconds (xtime ...), the first will be 
exact, the second will be accumulating an error." 


is probably the most valid argument against such a funky number. 


No, that doesn't hold water either.  We already jigger jiffie to be _close_ to 
1/HZ and closer still to what we can get from the PIT as its true period (for 
example, today the jiffie is 999849 nanoseconds) and this too is only accurate 
to the nanosecond.  Here are the jiffie values for several HZ values using the 
formulas in the code which use the TICK_RATE as given by the hardware.  Note the 
error here is the difference between an asked for repeating timer of 1 second 
and what the system clock on the same system says, NOT what real time is in 
either case, just relative between the two.  In otherwords, if you set up an 
itimer to signal every second and looked at the long term drift between the 
signals it gives and the system clock you would see the itimer drifting by 
~914ppm (with HZ = 846).


HZ  TICK RATE   jiffie(ns)  second(ns)   error (ppbillion)
 100 1193182100010 0
 200 1193182 598119600 19600
 250 1193182 4000250162500 62500
 500 1193182 19996881001843688   1843688
1000 1193182  999848    1000847848    847848
 846 1193182 11817171000914299914299



Cheers,
Con


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-13 Thread George Anzinger

Lee Revell wrote:

On Wed, 2005-07-13 at 14:16 -0700, Chris Wedgwood wrote:


Both can be detected from you .config and we could see HZ as needed
there and everyone else could avoid this surely?




Does anyone object to setting HZ at boot?  I suspect nothing else will
make everyone happy.

This will really mess up the jiffie_to_* and *_to_jiffie conversions.  They rely 
in a rather large way on the complier doing all the heavy lifting.  If HZ is a 
variable we introduce a LOT of runtime overhead here.  (Try make kernel/itimer.i 
and look for jiffies_to_t* and friends.)



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-13 Thread George Anzinger

Linus Torvalds wrote:


On Wed, 13 Jul 2005, Vojtech Pavlik wrote:


No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have
a counter that counts the ticks in nanoseconds (xtime ...), the first
will be exact, the second will be accumulating an error.



It's not even that we have a counter like that, it's the simple fact that
we have a standard interface to user space that is based on milli-, micro-
and nanoseconds.

(For "poll()", "struct timeval" and "struct timespec" respectively).

It's totally pointless saying that we can do 864 Hz "exactly", when the
fact is that all the timeouts we ever get from user space aren't in that 
format. So the only thing that matters is how close to a millisecond we 
can get, not how close to some random number.


So we do a lot of conversions from "struct timeval" to "jiffies", and if
you don't take the error in that conversion into account, then you're
ignoring what is likely a _bigger_ error.

Long-term time drift is a known issue, and is unavoidable since you don't 
even know the exact frequency of the crystal, since that is not only not 
that exact in the first place, it depends on temperature etc. So long-term 
time drift is something that we inevitably have to use things like NTP to 
handle, if you want an exact clock.


And in short-term things, the timeval/jiffie conversion is likely to be a 
_bigger_ issue than the crystal frequency conversion.


So we should aim for a HZ value that makes it easy to convert to and from
the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all
good values for that reason. 864 is not.Linus Torvalds wrote:

On Wed, 13 Jul 2005, Vojtech Pavlik wrote:


No, but 1/1000Hz = 100ns, while 1/864Hz = 1157407.407ns. If you have
a counter that counts the ticks in nanoseconds (xtime ...), the first
will be exact, the second will be accumulating an error.



It's not even that we have a counter like that, it's the simple fact that
we have a standard interface to user space that is based on milli-, micro-
and nanoseconds.

(For "poll()", "struct timeval" and "struct timespec" respectively).

It's totally pointless saying that we can do 864 Hz "exactly", when the
fact is that all the timeouts we ever get from user space aren't in that 
format. So the only thing that matters is how close to a millisecond we 
can get, not how close to some random number.


So we do a lot of conversions from "struct timeval" to "jiffies", and if
you don't take the error in that conversion into account, then you're
ignoring what is likely a _bigger_ error.

Long-term time drift is a known issue, and is unavoidable since you don't 
even know the exact frequency of the crystal, since that is not only not 
that exact in the first place, it depends on temperature etc. So long-term 
time drift is something that we inevitably have to use things like NTP to 
handle, if you want an exact clock.


And in short-term things, the timeval/jiffie conversion is likely to be a 
_bigger_ issue than the crystal frequency conversion.


So we should aim for a HZ value that makes it easy to convert to and from
the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all
good values for that reason. 864 is not.


Uh, WAIT A NANOSECOND!  Look at what we are doing today in that department.  The 
key is not the ability to convert based on the value of HZ but on the implied 
value of jiffie given CLOCK_TICK_RATE.  Today the value we use for jiffie is 
999849 nanoseconds which is what the given CLOCK_TICK_RATE and HZ end up getting 
from the PIT.


By the time the user comes along we have TICK_NSEC and the current conversion 
routines which are not exactly simple but they are correct.


--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-12 Thread George Anzinger

Con Kolivas wrote:

On Tue, 12 Jul 2005 22:39, Con Kolivas wrote:


On Tue, 12 Jul 2005 22:10, Vojtech Pavlik wrote:


The PIT crystal runs at 14.3181818 MHz (CGA dotclock, found on ISA, ...)
and is divided by 12 to get PIT tick rate

14.3181818 MHz / 12 = 1193182 Hz


Yes, but the current code uses 1193180.  Wonder why that is...



The reality is that the crystal is usually off by 50-100 ppm from the
standard value, depending on temperature.

   HZ   ticks/jiffie  1 second  error (ppm)
---
  100  11932  1.15238  15.2
  200   5966  1.15238  15.2
  250   4773  1.57143  57.1
  300   3977  0.31429 -68.6
  333   3583  0.64114 -35.9
  500   2386  0.999847619-152.4
 1000   1193  0.999847619-152.4


If we are following the standard and trying to set up a timer, the 1 second time 
MUST be >= 1 second.  Thus the values for 300 and above in this table don't fly. 
 If we are trying to keep system time, well we do just fine at that by using 
the actual value of the jiffie (NOT 1/HZ) when we update time (one of the 
reasons for going to nanoseconds in xtime).  The observable thing the user sees 
is best seen by setting up an itimer to repeat every second.  Then you will see 
the drift AND it will be against the system clock which itself is quite accurate 
(the 50-100ppm you mention), even without ntp.  And the error really is in the 
range of 848ppm for HZ=1000 BECAUSE we need to follow the standard.  You can 
easily see this with the current 2.6 kernel.  We even have a bug report on it:


http://bugzilla.kernel.org/show_bug.cgi?id=3289
~
--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-11 Thread George Anzinger

Martin J. Bligh wrote:

Lots of people have switched from 2.4 to 2.6 (100 Hz to 1000 Hz) with no impact 
in
stability, AFAIK. (I only remember some weird warning about HZ with debian 
woody's
ps).



Yes, that's called "progress" so no one complained.  Going back is
called a "regression".  People don't like those as much.



That's a very subjective viewpoint. Realize that this is a balancing
act between latency and overhead ... and you're firmly only looking
at one side of the argument, instead of taking a compromise in the
middle ...

If I start arguing for 100HZ on the grounds that it's much more efficient,
will that make 250/300 look much better to you? ;-)


I would like to interject an addition data point, and I will NOT be subjective. 
 The nature of the PIT is that it can _hit_ some frequencies better than 
others.  We have had complaints about repeating timers not keeping good time. 
These are not jitter issues, but drift issues.  The standard says we may not 
return early from a timer so any timer will either be on time or late.  The 
amount of lateness depends very much on the HZ value.  Here is what the values 
are for the standard CLOCK_TICK_RATE:


HZ  TICK RATE   jiffie(ns)  second(ns)   error (ppbillion)
 100 1193180100010 0
 200 1193180 598119600 19600
 250 1193180 4000250162500 62500
 500 1193180 19997031001851203   1851203
1000 1193180  9998481000847848847848

The jiffie values here are exactly what the kernel uses and are based on the 
best one can do with the PIT hardware.


I am not suggesting any given default HZ, but rather an argumentation of the 
help text that goes with it.  For those who want timers to repeat at one second 
(or multiples there of) this is useful info.


For you enjoyment I have attached the program used to print this.  It allows you 
to try additional values...



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/


#define NSEC_PER_SEC  10
//#define CLOCK_TICK_RATE /*1 */ 1193180
#define LATCH(CLOCK_TICK_RATE,HZ)  ((CLOCK_TICK_RATE + HZ/2) / HZ)
#define SH_DIV(NOM,DEN,LSH) (	((NOM / DEN) << LSH)			\
			 + (((NOM % DEN) << LSH) + DEN / 2) / DEN)
#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH(CLOCK_TICK_RATE,HZ), 8))
#define TICK_NSEC (SH_DIV (100UL * 1000, ACTHZ, 8))


struct {
	int hz;
	int clocktickrate;
} vals[] = {{100, 1193180}, {200, 1193180}, {250, 1193180}, {500, 1193180}, {1000, 1193180},{0,0}};

void do_it(int hz,int tickrate)
{
	int HZ = hz;
	int CLOCK_TICK_RATE = tickrate;
	int tick_nsec = TICK_NSEC;
	int ticks_per_sec = NSEC_PER_SEC/tick_nsec;
	int sec_size = ticks_per_sec * tick_nsec;
	int one_sec_p;
	int err;

	if (sec_size < NSEC_PER_SEC)
		sec_size += tick_nsec;
	one_sec_p = sec_size;
	err = one_sec_p - NSEC_PER_SEC;
	printf( "%4d\t%8d\t%8d\t%10d\t%8d\n",hz, tickrate, tick_nsec, 
		one_sec_p, err);
}
	
void bail(void)
{
	printf("run as: as [hz [clock_tick_rate]]\n");
	exit(1);
}

main(int argc, char** argv)
{
	int i = 0;
	int phz = 0;
	int pcr = vals[0].clocktickrate;

	if (argc > 1) { 
		phz = atoi(argv[1]);
		if (!phz)
			bail();
	}
	if (argc > 2) {
		pcr = atoi(argv[2]);
		if (!pcr)
			bail();
	}

	printf("HZ  \tTICK RATE\tjiffie(ns)\tsecond(ns)\t error (ppbillion)\n");
	while(vals[i].hz) {
		do_it(vals[i].hz, vals[i].clocktickrate);
		i++;
	}
	if (phz)
		do_it(phz, pcr);
}


Re: Build TAGS problem with O=

2005-07-06 Thread George Anzinger

Cleaned up to be a standard "p 1" patch.  Make the comments more concise.


 make O=/dir TAGS

 fails with:

   MAKE   TAGS
 find: security/selinux/include: No such file or directory
 find: include: No such file or directory
 find: include/asm-i386: No such file or directory
 find: include/asm-generic: No such file or directory


 The problem is in this line:
 ifeq ($(KBUILD_OUTPUT),)

KBUILD_OUTPUT is not defined (ever) after make reruns itself.  This line is used 
in the TAGS, tags, and cscope makes.


Here is a fix:

Signed-off-by:  George Anzinger  

--- linux-2.6.12-org/Makefile   2005-07-01 14:37:44.0 -0700
+++ linux-2.6.13-rc/Makefile2005-07-05 19:45:00.588314304 -0700
@@ -1149,7 +1149,7 @@
 #(which is the most common case IMHO) to avoid unneeded clutter in the big
tags file.
 #Adding $(srctree) adds about 20M on i386 to the size of the output file!

-ifeq ($(KBUILD_OUTPUT),)
+ifeq ($(src),$(obj))
 __srctree =
 else
 __srctree = $(srctree)/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Build TAGS problem with O=

2005-07-05 Thread George Anzinger

George Anzinger wrote:

If you try:
make O=/usr/src/ver/2.6.13-rc/obj/ -j5 LOCALVERSION=_2.6.13-rc TAGS 
ARCH=i386


it fails with:
  MAKE   TAGS
find: security/selinux/include: No such file or directory
find: include: No such file or directory
find: include/asm-i386: No such file or directory
find: include/asm-generic: No such file or directory


The problem seems to be this bit of the topdir Makefile:


#We want __srctree to totally vanish out when KBUILD_OUTPUT is not set
#(which is the most common case IMHO) to avoid unneeded clutter in the 
big tags file.

#Adding $(srctree) adds about 20M on i386 to the size of the output file!

ifeq ($(KBUILD_OUTPUT),)
__srctree =
else
__srctree = $(srctree)/
endif

It would appear that the "ifeq ($(KBUILD_OUTPUT),)" is doing the wrong 
thing.  I am not a make expert, but I have had a lot of BAD experience 
trying to use this construct.  Any one up to proposing a fix?


The problem appears to be that KBUILD_OUTPUT is NOT defined after make reruns 
itself.  Here is a fix:


Signed-off-by:  George Anzinger  

--- /usr/src/linux-2.6.12-org/Makefile  2005-07-01 14:37:44.0 -0700
+++ /usr/src/linux-2.6.13-rc/Makefile   2005-07-05 19:45:00.588314304 -0700
@@ -1149,7 +1149,7 @@
 #(which is the most common case IMHO) to avoid unneeded clutter in the big 
tags file.

 #Adding $(srctree) adds about 20M on i386 to the size of the output file!

-ifeq ($(KBUILD_OUTPUT),)
+ifeq ($(src),$(obj))
 __srctree =
 else
 __srctree = $(srctree)/



--
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Maintainers list update: linux-net -> netdev

2005-04-13 Thread George Anzinger
Horms wrote:
On Tue, Apr 12, 2005 at 12:14:56PM -0700, George Anzinger wrote:
Horms wrote:
Use netdev as the mailing list contact instead of the mostly dead
linux-net list.
~
PHRAM MTD DRIVER
@@ -1795,7 +1795,7 @@
POSIX CLOCKS and TIMERS
P:  George Anzinger
M:  george@mvista.com
-L: linux-net@vger.kernel.org
+L: netdev@oss.sgi.com
S:  Supported
I don't really know about the rest of them, but I think this should be:
L: linux-kernel@vger.kernel.org
Least wise that is where I look...

Yes, I was wondering about that one. Here is a patch that
adds to my previous patch. Trivial to say the least. 
I can re-diff the whole thing if that is more convenient.
Looks good to me.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Maintainers list update: linux-net -> netdev

2005-04-12 Thread George Anzinger
Horms wrote:
On Sat, Apr 09, 2005 at 03:52:05PM +0200, Jörn Engel wrote:
On Fri, 8 April 2005 22:16:07 +0200, Pavel Machek wrote:
More importantly, it is still listed as "the list" for network
drivers...
NETWORK DEVICE DRIVERS
P:  Andrew Morton
M:  [EMAIL PROTECTED]
P:  Jeff Garzik
M:  [EMAIL PROTECTED]
L:  linux-net@vger.kernel.org
S:  Maintained
Maybe one of the two maintainers might want to change that? ;)

Use netdev as the mailing list contact instead of the mostly dead
linux-net list.
~
 PHRAM MTD DRIVER
@@ -1795,7 +1795,7 @@
 POSIX CLOCKS and TIMERS
 P:	George Anzinger
 M:	george@mvista.com
-L:	linux-net@vger.kernel.org
+L:	netdev@oss.sgi.com
 S:	Supported
 
I don't really know about the rest of them, but I think this should be:
L: linux-kernel@vger.kernel.org
Least wise that is where I look...
~
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clean up FIXME in do_timer_interrupt-lock fix

2005-03-19 Thread George Anzinger
Andrew Morton wrote:
George Anzinger  wrote:
Did you pick this up?  First sent on 3-11.

I did, although now looking at it I have issues.

I was not happy with the locking on this.  Two changes:
1) Turn off irq while setting the clock.
2) Call the timer code only through the timer interface 
   (set a short timer to do it from the ntp call).

I wanted the calls to sync_cmos_clock() to be made in a consistent environment. 
 This was not true when calling it directly from the NTP call code.  The change 
means that sync_cmos_clock() is ALWAYS called from run_timers(), i.e. as a timer 
call back function.
I would consider this to be an inadequate description :(

Signed-off-by: George Anzinger 
 time.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)
Index: linux-2.6.12-rc/arch/i386/kernel/time.c
===
--- linux-2.6.12-rc.orig/arch/i386/kernel/time.c
+++ linux-2.6.12-rc/arch/i386/kernel/time.c
@@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no
 	int retval;
 
 	/* gets recalled with irq locally disabled */
-	spin_lock(&rtc_lock);
+	spin_lock_irq(&rtc_lock);
 	if (efi_enabled)
 		retval = efi_set_rtc_mmss(nowtime);
 	else
 		retval = mach_set_rtc_mmss(nowtime);
-	spin_unlock(&rtc_lock);
+	spin_unlock_irq(&rtc_lock);
 
 	return retval;
 }

If the comment is correct, and this code is called with local irq's
disabled then this patch should be using spin_lock_irqsave()
With the change below, it is always called from the timer call back code which, 
I believe, is always called with irq on.  Looks like I missed the comment :(

@@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon
 }
 void notify_arch_cmos_timer(void)
 {
-	sync_cmos_clock(0);
+	mod_timer(&sync_cmos_timer, jiffies + 1);
 }
 static long clock_cmos_diff, sleep_start;
 

Your description says what this does, but it doesn't way why it was done?
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clean up FIXME in do_timer_interrupt-lock fix

2005-03-19 Thread George Anzinger
Did you pick this up?  First sent on 3-11.
Andrew Morton wrote:
Lee Revell <[EMAIL PROTECTED]> wrote:
On Thu, 2005-03-10 at 00:42 -0800, George Anzinger wrote:
This patch changes the update of the cmos clock to be timer driven
rather than poll driven by the timer interrupt function.  If the clock
is not being synced to an outside source the timer is removed and thus
system overhead is nill in that case.  The update frequency is still ~11
minutes and missing the update window still causes a retry in 60
seconds.
No replies yet.  Are there any objections to this patch?

Nope.  I think it's neat.  I queued it up.
I had a nightmare about ntp coming in at the "wrong" time resulting in a
deadlock.  Attached locking changes will make me sleep better :)
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc.
Type: Defect Fix 
Disposition: Pending
Description:

I was not happy with the locking on this.  Two changes:
1) Turn off irq while setting the clock.
2) Call the timer code only through the timer interface 
   (set a short timer to do it from the ntp call).

Signed-off-by: George Anzinger 

 time.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.12-rc/arch/i386/kernel/time.c
===
--- linux-2.6.12-rc.orig/arch/i386/kernel/time.c
+++ linux-2.6.12-rc/arch/i386/kernel/time.c
@@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no
int retval;
 
/* gets recalled with irq locally disabled */
-   spin_lock(&rtc_lock);
+   spin_lock_irq(&rtc_lock);
if (efi_enabled)
retval = efi_set_rtc_mmss(nowtime);
else
retval = mach_set_rtc_mmss(nowtime);
-   spin_unlock(&rtc_lock);
+   spin_unlock_irq(&rtc_lock);
 
return retval;
 }
@@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon
 }
 void notify_arch_cmos_timer(void)
 {
-   sync_cmos_clock(0);
+   mod_timer(&sync_cmos_timer, jiffies + 1);
 }
 static long clock_cmos_diff, sleep_start;
 



Re: [PATCH 2.6] fix POSIX timers expire before their scheduled time

2005-03-16 Thread George Anzinger
Liu, Hong wrote:
POSIX says: POSIX timers should not expire before their scheduled time.
Due to the timer started between jiffies, there are cases that the timer
will expire before its scheduled time.
This patch ensures timers will not expire early.
--- a/kernel/posix-timers.c 2005-03-10 15:46:27.329333664 +0800
+++ b/kernel/posix-timers.c 2005-03-10 15:50:11.884196136 +0800
@@ -957,7 +957,8 @@
&expire_64, &(timr->wall_to_prev))) {
return -EINVAL;
}
-   timr->it_timer.expires = (unsigned long)expire_64;
+   timr->it_timer.expires = (unsigned long)expire_64 + 1;
tstojiffie(&new_setting->it_interval, clock->res, &expire_64);
timr->it_incr = (unsigned long)expire_64;
Has this happened??  The following code (in adjust_abs_time()) is supposed to 
prevent this sort of thing:

if (oc.tv_sec | oc.tv_nsec) {
oc.tv_nsec += clock->res;
timespec_norm(&oc);
}
Also, we run rather extensive tests for this sort of thing.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tvtime audio vs pcHDTV-3000 card and pvHDTV-1.6 software

2005-03-16 Thread George Anzinger
Heavens, no need to clean the tree at all.  Just add "-X  to your diff.  I 
have attached what I use for .  It is likely over kill, but should do...

-g
Gene Heskett wrote:
Greetings;
I've spent a goodly part of the last 3 hours rebooting, to find out 
where this audio control function died, and I think now I can point 
an accusatory finger at the 2.6.11.2 patch with some degree of 
certainty.

The scenario goes like this:
reboot to 2.6.11-rc5, everything works flawlessly except the 1394 
stuff, that kernel didn't have it built in yet.

reboot to 2.6.11+bk-ieee1394.patch  everything works flawlessly
reboot to 2.6.11.1+bk-ieee1394.patch everything works flawlessly
reboot to 2.6.11.2+bk-ieee1394.patch tvtime has no volume control, and 
the sound gets very very tinny about 1 second after it starts

This scenario continues up to and includeing 2.6.11.4.
So now my next question is, how to I clean up those src trees so that 
a diff actually outputs only the src code differences, thereby 
allowing a simple diff -urN (or whatever is the recommended command 
line to do a recursive diff on the whole maryann) to disclose the 
real diffs.  In other words, is a simple 'make clean' sufficient?

I got the impression from a comment that was made, that quite a body 
of work was actually done, in the i2c area, that somehow does not 
show in the changelog, nor in that simple little 10 line patch that 
was 2.6.11.2.  And how that little patch could be responsible for 
breaking this boggles what tiny little miniscule piece of a mind I 
have left at this point.

If thats the case, then how did it get into my src code tree since the 
exact same 2.6.11.tar.gz was used as the base for applying each of 
the incrementals to each of the src trees I now have sitting 
in /usr/src?  Good question that...

Unforch, the 2.6.11 plain tree has not, in this case been built yet as 
it got accidently nuked by a missfire of my 'buildit26' script, which 
normally moves a base version tree out of the way before it unpacks a 
fresh copy, and then renames that tree to be the current version and 
then restores the base tree to its original name.

Thats not the one I want to use as the 'gold standard' anyway. 
2.6.11.1 works, and 2.6.11.2 doesn't.  So at this point, 2.6.11.1 is 
the 'gold standard'.

But, both the 2.6.11.1 and the 2.6.11.2 trees are as built, and the
diff I got was far larger than forgetting to apply the 
bk-ieee1394.patch to one of them would account for.  Many tens of 
kilobytes in fact.

Please throw me a bone here folks.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
*.o
*.i
.*
*.*~
*~
*.rej
*.orig
*.orig.*
#*
*#
*.ver
ETAGS
TAGS
tags
*.map
*.s
*.a
*X
*Y

*.*X
*.*Y
SCCS
CVS
*.*,*
dwarf2-defs.h
kconfig
configs.c
defconfig
mkdep
split-include
tkparse
vmlinux
consolemap_deftbl.c
tkparse.c
classlist.h
crc32table.h
devlist.h
config
autoconf.h
compile.h
version.h
kconfig.tk
soundmodem
defkeymap.c
patest
asm
boot
conmakehash
gen-devlist
modversions.h
elfconfig.h
asm_offsets.h
*.old
cscope.*
*.so
gen_crc32table
docproc
fixdep
kallsyms
mk_elfconfig
modpost
pnmtologo
initramfs_data.*
gen_init_cpio



Re: [topic change] jiffies as a time value

2005-03-15 Thread George Anzinger
john stultz wrote:
On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote:
john stultz wrote:
On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
+   /* finally, update legacy time values */
+   write_seqlock_irqsave(&xtime_lock, x_flags);
+   xtime = ns2timespec(system_time + wall_time_offset);
+   wall_to_monotonic = ns2timespec(wall_time_offset);
+   wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
+   wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
+   /* XXX - should jiffies be updated here? */
Excellent question. 
Indeed.  Currently jiffies is used as both a interrupt counter and a
time unit, and I'm trying make it just the former. If I emulate it then
it stops functioning as a interrupt counter, and if I don't then I'll
probably break assumptions about jiffies being a time unit. So I'm not
sure which is the easiest path to go until all the users of jiffies are
audited for intent. 
Really?  Who counts interrupts???  The timer code treats jiffies as a unit of 
time.  You will need to rewrite that to make it otherwise.  

Ug. I'm thin on time this week, so I was hoping to save this discussion
for later, but I guess we can get into it now.
Well, assuming timer interrupts actually occur HZ times a second, yes
one could (and current practice, one does) implicitly interpret jiffies
as being a valid notion of time.  However with SMIs, bad drivers that
disable interrupts for too long, and virtualization the reality is that
that assumption doesn't hold. 

We do have the lost-ticks compensation code that tries to help this, but
that conflicts with some virtualization implementations. Suspend/resume
tries to compensate jiffies for ticks missed over time suspended, but
I'm not sure how accurate it really is (additionally, looking at it now,
it assumes jiffies is only 32bits).
Adding to that, the whole jiffies doesn't really increment at HZ, but
ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount
of trouble stemming from folks using jiffies as a time value.  Because
in reality, it is just a interrupt counter.
Well, currently, in x86 systems it causes wall clock to advance a very well 
defined amount.  That it is not exactly 1/HZ is something we need to live with...
So now, if new timeofday code emulates jiffies, we have to decide if it
emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies
possibly jittering from it being incremented every tick and then set to
the proper time when the timekeeping code runs. 
I think your overlooking timers.  We have a given resolution for timers and some 
code, at least, expects timers to run with that resolution.  This REQUIRES 
interrupts at resolution frequency.  We can argue about what that interrupt 
event is called (currently a jiffies interrupt) and disparage the fact that 
hardware can not give us "nice" numbers for the resolution, but we do need the 
interrupts.  That there are bad places in the code where interrupts are delayed 
is not really important in this discussion.  For what it worth, the RT patch 
Ingo is working on is getting latencies down in the 10s of microseconds region.

We also need, IMNSHO to recognize that, at lest with some hardware, that 
interrupt IS in fact the clock and is the only reasonable way we have of reading 
it.  This is true, for example, on the x86.  The TSC we use as a fill in for 
between interrupts is not stable in the long term and should only be used to 
interpolate over 1 to 10 ticks or so.
I'm not sure which is the best way to go, but it sounds that emulating
it is probably the easiest. I just deferred the question with a comment
until now because its not completely obvious. Any suggestions on the
above questions (I'm guessing the answers are: use ACTHZ, and the jitter
won't hurt that bad). 


But then you have 
another problem.  To correctly function, times need to expire on time (hay how 
bout that) not some time later.  To do this we need an interrupt source.  To 
this point in time, the jiffies interrupt has been the indication that one or 
more timer may have expired.  While we don't need to "count" the interrupts, we 
DO need them to expire the timers AND they need to be on time.

Well, something Nish Aravamudan has been working on is converting the
common users of jiffies (drivers) to start using human time units. These
very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can
then be accurately changed to jiffies (or possibly some other time unit)
internally. It would even be possible for soft-timers to expire based
upon the actual high-res time value, rather then the low-res tick-
counter(which is something else Nish has been playing with). When that
occurs we can easily start doing other interesting things that I believe
you've already been working on in your HRT code, such as changing the
timer interrupt frequency dynamically, or

Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-15 Thread George Anzinger
john stultz wrote:
On Mon, 2005-03-14 at 21:37 -0800, Christoph Lameter wrote:
Note that similarities exist between the posix clock and the time sources.
Will all time sources be exportable as posix clocks?

At this point I'm not familiar enough with the posix clocks interface to
say, although its probably outside the scope of the initial timeofday
rework.
I do think we need to consider the needs of that subsystem.  Clock wise, it 
makes a monotonic and a real time clock available to the user.  The real time 
clock is just a timespec version of the timeval gettimeofday clock.  At the 
current time, the monotonic clock is the real time clock plus wall_to_monotonic. 
 All that is rather simple and straight forward, an I don't recommend adding 
any other clocks unless there is a real need.

The interesting thing is that the posix timers are based on the posix clocks 
which are base on wall_clock, and the jiffies clock which is what runs the 
timers.  In order to make sense of timer requests it is neccessary to, 
atomically, grab all three clocks (i.e. wall_clock aka gettimeofday, 
wall_to_monotonic, and jiffies with the jiffies offset).  The code can then 
figure out when a timer needs to expire in jiffies time in order to expire at a 
given wall or monotonic time.  Currently the xtime_time sequence lock is used to 
do this.

Another issue that posix timers brings forward is the need to know when the 
clock is set.  This is needed to cause timers that were requested to expire at 
some absolute wall_time to do so even if time is set while they are running.  A 
word on how this is done is in order...

Since the processing of a clock set by the posix timers code may, in fact, allow 
the time to be set more than once before the affected timers are adjusted (or 
rather to avoid the locking rats nest not allowing this would cause), the 
wall_to_monotonic value is exploited.  In particular, a clock setting changes 
this value by the exact amount that time was adjusted.  So, each posix timer 
carries the value of wall_to_monotonic that was in use when the timer was 
started.  The clock_was_set code uses this to compute the clock movement and 
thus the adjustment needed to make the timer expire at the right time.

What this translates to in the new code is a) the need for a way to atomically 
get all the key times (wall, monotonic, jiffie) and b) access to a value that 
will allow it to compute the amount of time a clock set, or a series of clock 
settings, changed time by.  Of course, it also needs the clock_was_set() notify 
call.
Do you have a link that might explain the posix clocks spec and its
intent?
Well, there is my signature :)  Really, on the high-res-timers project site you 
want to download the support patch.  In there, among other things, is a set of 
man pages on posix clocks & timers.  The patch applies to any kernel and just 
adds a new set of directories off of Documentation.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spin_lock error in arch/i386/kernel/time.c on APM resume

2005-03-15 Thread George Anzinger
Pavel Machek wrote:
Hi!

I agree.  Still in all that follows, no one has addressed the apparent 
race described above.  The reason the system reported the errors that 
started this thread is that the APM restore code was trying to read the 
cmos clock (I assume to set the xtime clock) WHILE the timer interrupt 
code what trying to set the cmos clock from xtime.  In other words, it is 
destroying the time it is trying to read.  I repeat "Possibly the APM 
code should change time_status to STA_UNSYNC on the way into the sleep."  
I am not sure how ntp is supposed to react to the resume but I suspect 
that the system time is rather out of sync...

It needs to work without NTP, too. You don't get NTP on plane (etc)
where suspend is most usefull.
We have CMOS clock, it should be possible to get time from there
without resorting to NTP..
Eh... sure, but... the bug was reported because the system was attempting 
to update the cmos clock (which it does every ~11 min.) during APM exit.   
It does this IF AND ONLY IF the system is synced to an external source as 
indicated by the STA_UNSYNC bit being cleared in the time_state.  Now, I 
don't know what or how APM and NTP are supposed to play together, but I 
suspect that on entry to APM time is no longer synced, thus my comment.

As to your comment, the bug would never have shown its ugly face if the 
system wasn't using NTP.

Uh, ok, you are right. We should set time to STA_UNSYNC so that we do
not write back to CMOS during/shortly after resume. I did not realize
what STA_UNSYNC means. Perhaps you have patch to  do that somewhere?
;-
Zwane has convinced me that the real problem is doing the right thing (tm) in 
the APM code, i.e. not allowing the timer interrupt until after reading the cmos 
clock, for which he has a patch tendered.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spin_lock error in arch/i386/kernel/time.c on APM resume

2005-03-14 Thread George Anzinger
Pavel Machek wrote:
Hi!

And more... That this occures implies we are attempting to update the cmos
clock on resume seems wrong.  One would presume that the time is wrong at 
this
time and we are about to save that wrong time.  Possibly the APM code 
should
change time_status to STA_UNSYNC on the way into the sleep (or what ever 
it is
called).  Who should we ping with this?

timer_resume, which appears to be the problem, wants to calculate amount 
of time was spent suspended, also your unconditional irq enable in 
get_cmos_time breaks the atomicity of device_power_up and would deadlock 
in sections of code which call get_time_diff() with xtime_lock held. I 
sent a patch subject "APM: fix interrupts enabled in device_power_up" 
which should address this.
I agree.  Still in all that follows, no one has addressed the apparent race 
described above.  The reason the system reported the errors that started 
this thread is that the APM restore code was trying to read the cmos clock 
(I assume to set the xtime clock) WHILE the timer interrupt code what 
trying to set the cmos clock from xtime.  In other words, it is destroying 
the time it is trying to read.  I repeat "Possibly the APM code should 
change time_status to STA_UNSYNC on the way into the sleep."  I am not sure 
how ntp is supposed to react to the resume but I suspect that the system 
time is rather out of sync...

It needs to work without NTP, too. You don't get NTP on plane (etc)
where suspend is most usefull.
We have CMOS clock, it should be possible to get time from there
without resorting to NTP..
Eh... sure, but... the bug was reported because the system was attempting to 
update the cmos clock (which it does every ~11 min.) during APM exit.   It does 
this IF AND ONLY IF the system is synced to an external source as indicated by 
the STA_UNSYNC bit being cleared in the time_state.  Now, I don't know what or 
how APM and NTP are supposed to play together, but I suspect that on entry to 
APM time is no longer synced, thus my comment.

As to your comment, the bug would never have shown its ugly face if the system 
wasn't using NTP.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread George Anzinger
john stultz wrote:
On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
~


+   /* finally, update legacy time values */
+   write_seqlock_irqsave(&xtime_lock, x_flags);
+   xtime = ns2timespec(system_time + wall_time_offset);
+   wall_to_monotonic = ns2timespec(wall_time_offset);
+   wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
+   wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
+   /* XXX - should jiffies be updated here? */
Excellent question. 

Indeed.  Currently jiffies is used as both a interrupt counter and a
time unit, and I'm trying make it just the former. If I emulate it then
it stops functioning as a interrupt counter, and if I don't then I'll
probably break assumptions about jiffies being a time unit. So I'm not
sure which is the easiest path to go until all the users of jiffies are
audited for intent. 
Really?  Who counts interrupts???  The timer code treats jiffies as a unit of 
time.  You will need to rewrite that to make it otherwise.  But then you have 
another problem.  To correctly function, times need to expire on time (hay how 
bout that) not some time later.  To do this we need an interrupt source.  To 
this point in time, the jiffies interrupt has been the indication that one or 
more timer may have expired.  While we don't need to "count" the interrupts, we 
DO need them to expire the timers AND they need to be on time.

~
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spin_lock error in arch/i386/kernel/time.c on APM resume

2005-03-12 Thread George Anzinger
Zwane Mwaikambo wrote:
On Sat, 12 Mar 2005, George Anzinger wrote:

I agree.  Still in all that follows, no one has addressed the apparent race
described above.  The reason the system reported the errors that started this
thread is that the APM restore code was trying to read the cmos clock (I
assume to set the xtime clock) WHILE the timer interrupt code what trying to
set the cmos clock from xtime.

Doesn't my reply explain the actual problem? The code path being;
Sorry, I just didn't look at the apm code.  My bad.
-g
arch/i386/kernel/apm.c
suspend()
write_seqlock_irq(xtime_lock)
...
write_sequnlock_irq(xtime_lock)

device_power_up()
timer_resume()
get_cmos_time();
S
So this covers the problem that the reporter reported, so yes it's setting 
xtime but we shouldn't be taking interrupts in the first place, so i 
posted the patch to cover that. APM was clearly violating PM resume 
procedures.

Thanks,
    Zwane
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spin_lock error in arch/i386/kernel/time.c on APM resume

2005-03-12 Thread George Anzinger
Zwane Mwaikambo wrote:
On Sat, 12 Mar 2005, George Anzinger wrote:

Looks like we need the irq on the read clock also.  This is true both before
and  after the prior cmos_time changes.
The attached replaces the patch I sent yesterday.
For those wanting to fix the kernel with out those patches, all that is needed
its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks.
And more... That this occures implies we are attempting to update the cmos
clock on resume seems wrong.  One would presume that the time is wrong at this
time and we are about to save that wrong time.  Possibly the APM code should
change time_status to STA_UNSYNC on the way into the sleep (or what ever it is
called).  Who should we ping with this?

timer_resume, which appears to be the problem, wants to calculate amount 
of time was spent suspended, also your unconditional irq enable in 
get_cmos_time breaks the atomicity of device_power_up and would deadlock 
in sections of code which call get_time_diff() with xtime_lock held. I 
sent a patch subject "APM: fix interrupts enabled in device_power_up" 
which should address this.
I agree.  Still in all that follows, no one has addressed the apparent race 
described above.  The reason the system reported the errors that started this 
thread is that the APM restore code was trying to read the cmos clock (I assume 
to set the xtime clock) WHILE the timer interrupt code what trying to set the 
cmos clock from xtime.  In other words, it is destroying the time it is trying 
to read.  I repeat "Possibly the APM code should change time_status to 
STA_UNSYNC on the way into the sleep."  I am not sure how ntp is supposed to 
react to the resume but I suspect that the system time is rather out of sync...
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spin_lock error in arch/i386/kernel/time.c on APM resume

2005-03-12 Thread George Anzinger
J. Bruce Fields wrote:
On APM resume this morning on my Thinkpad X31, I got a "spin_lock is
already locked" error; see below.  This doesn't happen on every resume,
though it's happened before.  The kernel is 2.6.11 plus a bunch of
(hopefully unrelated...) NFS patches.
Any ideas?
Yesterday's night mare, todays bug :(
Looks like we need the irq on the read clock also.  This is true both before and 
 after the prior cmos_time changes.

Andrew,
The attached replaces the patch I sent yesterday.
For those wanting to fix the kernel with out those patches, all that is needed 
its the chunk that applies, i.e. the _irq on the get_cmos_time() spinlocks.

And more... That this occures implies we are attempting to update the cmos clock 
on resume seems wrong.  One would presume that the time is wrong at this time 
and we are about to save that wrong time.  Possibly the APM code should change 
time_status to STA_UNSYNC on the way into the sleep (or what ever it is called). 
 Who should we ping with this?
~
Mar 12 07:07:31 puzzle kernel: PCI: Setting latency timer of device 
:00:1f.5 to 64
Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:179: 
spin_lock(arch/i386/kernel/time.c:c0603c28) already locked by 
arch/i386/kernel/time.c/309
Mar 12 07:07:31 puzzle kernel: arch/i386/kernel/time.c:316: 
spin_unlock(arch/i386/kernel/time.c:c0603c28) not locked
~
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc.
Type: Defect Fix 
Disposition: Pending
Description:

I was not happy with the locking on this.  Two changes:
1) Turn off irq while setting the clock.
2) Call the timer code only through the timer interface 
   (set a short timer to do it from the ntp call).

Signed-off-by: George Anzinger 

 time.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.12-rc/arch/i386/kernel/time.c
===
--- linux-2.6.12-rc.orig/arch/i386/kernel/time.c
+++ linux-2.6.12-rc/arch/i386/kernel/time.c
@@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no
int retval;
 
/* gets recalled with irq locally disabled */
-   spin_lock(&rtc_lock);
+   spin_lock_irq(&rtc_lock);
if (efi_enabled)
retval = efi_set_rtc_mmss(nowtime);
else
retval = mach_set_rtc_mmss(nowtime);
-   spin_unlock(&rtc_lock);
+   spin_unlock_irq(&rtc_lock);
 
return retval;
 }
@@ -282,14 +282,14 @@ unsigned long get_cmos_time(void)
 {
unsigned long retval;
 
-   spin_lock(&rtc_lock);
+   spin_lock_irq(&rtc_lock);
 
if (efi_enabled)
retval = efi_get_time();
else
retval = mach_get_cmos_time();
 
-   spin_unlock(&rtc_lock);
+   spin_unlock_irq(&rtc_lock);
 
return retval;
 }
@@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon
 }
 void notify_arch_cmos_timer(void)
 {
-   sync_cmos_clock(0);
+   mod_timer(&sync_cmos_timer, jiffies + 1);
 }
 static long clock_cmos_diff, sleep_start;
 


Re: [PATCH] clean up FIXME in do_timer_interrupt

2005-03-11 Thread George Anzinger
Andrew Morton wrote:
Lee Revell <[EMAIL PROTECTED]> wrote:
On Thu, 2005-03-10 at 00:42 -0800, George Anzinger wrote:
This patch changes the update of the cmos clock to be timer driven
rather than poll driven by the timer interrupt function.  If the clock
is not being synced to an outside source the timer is removed and thus
system overhead is nill in that case.  The update frequency is still ~11
minutes and missing the update window still causes a retry in 60
seconds.
No replies yet.  Are there any objections to this patch?

Nope.  I think it's neat.  I queued it up.
I had a nightmare about ntp coming in at the "wrong" time resulting in a 
deadlock.  Attached locking changes will make me sleep better :)

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc.
Type: Defect Fix 
Disposition: Pending
Description:

I was not happy with the locking on this.  Two changes:
1) Turn off irq while setting the clock.
2) Call the timer code only through the timer interface 
   (set a short timer to do it from the ntp call).

Signed-off-by: George Anzinger 

 time.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6.12-rc/arch/i386/kernel/time.c
===
--- linux-2.6.12-rc.orig/arch/i386/kernel/time.c
+++ linux-2.6.12-rc/arch/i386/kernel/time.c
@@ -176,12 +176,12 @@ static int set_rtc_mmss(unsigned long no
int retval;
 
/* gets recalled with irq locally disabled */
-   spin_lock(&rtc_lock);
+   spin_lock_irq(&rtc_lock);
if (efi_enabled)
retval = efi_set_rtc_mmss(nowtime);
else
retval = mach_set_rtc_mmss(nowtime);
-   spin_unlock(&rtc_lock);
+   spin_unlock_irq(&rtc_lock);
 
return retval;
 }
@@ -338,7 +338,7 @@ static void sync_cmos_clock(unsigned lon
 }
 void notify_arch_cmos_timer(void)
 {
-   sync_cmos_clock(0);
+   mod_timer(&sync_cmos_timer, jiffies + 1);
 }
 static long clock_cmos_diff, sleep_start;
 


Re: [PATCH] more reliable system timer for SC1100 CPU

2005-03-11 Thread George Anzinger
Ted Phelps wrote:
First, procedure...  patches should be *.patch and not compressed.  If too long 
they need to be broken up.  Lately, folks have said they should be inline in the 
email text, but watch out for your mailer doing UGLY things with white space.

Hello,
The attached patch is an attempt to work around the buggy timestamp
counter on the NatSemi SC1100 CPU by using the on-board 27MHz
high-resolution timer as an alternative time source.  It should,
in theory, work with any of the SCx200 CPUs as well, though I have
been unable to test this.  I have tested it fairly thoroughly with NTP
on an SC1100 and it seems to behave sanely.
That said, there are three things about it that I'm not entirely
comfortable with:
(1) The high-resolution timer is driven by a separate crystal than the
CPU's timer interrupt, and on the SC1100 I have access to, it's
consistently slower.  I've found that it is necessary to
periodically *decrement* the jiffies_64 counter in mark_offset in
order to make gettimeofday produce anything reasonable.  In
practice jiffies_64 is incremented again in do_timer before
anything else reads it, so the net effect is minimal.
I don't think this is what your seeing.  As I read the code, if an interrupt 
gets delayed and the next one is not, you will determine that you should 
decrement jiffies.  Interrupts DO get delayed.  This counter is only being used 
to cover the jiffie to jiffie time.  I suspect that any systemic errors such as 
different rocks are not really important (but drift needs to be accounted for, 
see below).

The better thing to do here is to figure some arbitrary start time when a 
jiffies edge is "close" to the actually interrupt time and use the counter time 
at that time as the "base" time.  Each jiffie you then bump this by the counts 
per jiffie.  (By the way, this should be calculated using TICK_NSEC (nsecs per 
tick) and NOT HZ.  TICK_NSEC accounts for the fact that the PIT does not produce 
exactly 1/HZ ticks.)

In addition to this, at each interrupt, to account for drift, I have been using 
code that, on each interrupt, checks if it is early (i.e.:
 base + ticks_per_jiffy > now) if so adjust base to make it on time.  If it is 
late, I keep the minimum amount it is late for several ticks and then adjust 
base to make it on time.  This ends up making small changes in "base" to account 
for any drift.  It also ends up ignoring occasional late times caused by normal 
interrupt latency.  If it is late by over a tick, jiffies is adjusted for the 
lost tick.  (All this code is in the high-res-timers patch, see signature.)

Do note this assumes (and IMHO rightly so) that the PIT is the system time gold 
standard.

George
(2) The 27MHz timer is accessed via the PCI bus, which is not
available when the system clock is initialized.  To work around
this, I've written the init function to always fail so that
loops_per_jiffy is computed using another timer (the TSC in my
case).  Once the high-resolution timer is accessible, the kernel
will switch to using it to compute gettimeofday and the monotonic
clock, but still use the original timer's delay function.  This
is somewhat kludgy, but I can't see a cleaner way.
(3) The timer depends on CONFIG_SCx200, which appears later in the
configuration hierarchy to the timers, and in an entirely
different part.  For now I've kept its Kconfig with the other
timers, but I'm not entirely happy with this choice.

The patch is against linux-2.6.11-mm2 as it relies on the
'determine-scx200-cb-address-at-run-time.patch' patch which has not
made it into in the mainline.

Please CC me if you reply as I'm not subscribed to LKML.
Cheers,
-Ted
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clean up FIXME in do_timer_interrupt

2005-03-10 Thread George Anzinger
Ok, here is a patch.  See what you think.  This patch assumes that Lee's patch 
has been merged (although it eliminates all of it).

George
George Anzinger wrote:
Lee Revell wrote:
On Fri, 2005-03-04 at 12:58 -0800, George Anzinger wrote:
Lee Revell wrote:
On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote:
The thing that brought this code to my attention is that with 
PREEMPT_RT
this happens to be the longest non-preemptible code path in the kernel.
On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with
the rest of timer irq we end up disabling preemption for about 90 
usecs.
Unfortunately I don't have the trace anymore.

Anyway the upshot is if we hung this off a timer it looks like we would
improve the worst case latency with PREEMPT_RT by almost 50%.  Unless
there is some reason it has to be done synchronously of course.

Well, it does have to be done at the right WRT the second, but I 
suspect we can hit that as well with a timer as it is hit now.  Also, 
if we are _really_ off the mark, this can be defered till the next 
second.


Do you have a patch?

Not at the moment, but I will work one up.
Andrew merged my trivial patch to clean up the logic, but a real fix
would be better.
Lee

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Source: MontaVista Software, Inc. George Anzinger george@mvista.com
Type:  Enhancement 
Disposition: pending
Description:

This patch changes the update of the cmos clock to be timer driven
rather than poll driven by the timer interrupt function.  If the clock
is not being synced to an outside source the timer is removed and thus
system overhead is nill in that case.  The update frequency is still ~11
minutes and missing the update window still causes a retry in 60
seconds.

signed off by George Anzinger george@mvista.com

 arch/i386/kernel/time.c |   67 +---
 kernel/time.c   |9 ++
 2 files changed, 56 insertions(+), 20 deletions(-)

Index: linux-2.6.12-rc/arch/i386/kernel/time.c
===
--- linux-2.6.12-rc.orig/arch/i386/kernel/time.c
+++ linux-2.6.12-rc/arch/i386/kernel/time.c
@@ -186,8 +186,6 @@ static int set_rtc_mmss(unsigned long no
return retval;
 }
 
-/* last time the cmos clock got updated */
-static long last_rtc_update;
 
 int timer_ack;
 
@@ -239,24 +237,6 @@ static inline void do_timer_interrupt(in
 
do_timer_interrupt_hook(regs);
 
-   /*
-* If we have an externally synchronized Linux clock, then update
-* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-* called as close as possible to 500 ms before the new second starts.
-*/
-   if ((time_status & STA_UNSYNC) == 0 &&
-   xtime.tv_sec > last_rtc_update + 660 &&
-   (xtime.tv_nsec / 1000)
-   >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
-   (xtime.tv_nsec / 1000)
-   <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
-   last_rtc_update = xtime.tv_sec;
-   if (efi_enabled) {
-   if (efi_set_rtc_mmss(xtime.tv_sec))
-   last_rtc_update -= 600;
-   } else if (set_rtc_mmss(xtime.tv_sec))
-   last_rtc_update -= 600;
-   }
 
if (MCA_bus) {
/* The PS/2 uses level-triggered interrupts.  You can't
@@ -313,7 +293,54 @@ unsigned long get_cmos_time(void)
 
return retval;
 }
+static void sync_cmos_clock(unsigned long dummy);
 
+static struct timer_list sync_cmos_timer = 
+  TIMER_INITIALIZER(sync_cmos_clock, 0, 0);
+
+static void sync_cmos_clock(unsigned long dummy)
+{
+   struct timeval now, next;
+   int fail = 1;
+   /*
+* If we have an externally synchronized Linux clock, then update
+* CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
+* called as close as possible to 500 ms before the new second starts.
+* This code is run on a timer.  If the clock is set, that timer
+* may not expire at the correct time.  Thus, we adjust...
+*/
+   if ((time_status & STA_UNSYNC) != 0)
+   /*
+* Not synced, exit, do not restart a timer (if one is 
+* running, let it run out).
+*/
+   return;
+
+   do_gettimeofday(&now);
+   if (now.tv_usec >= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
+   now.tv_usec <= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
+   fail = set_rtc_mmss(now.tv_sec);
+   }
+   next.tv_usec = USEC_AFTER - now.tv_usec;
+   if (next.tv_usec <= 0)
+   next.tv_usec += USEC_PER_SEC;
+   if (!fail) {
+   next.tv_sec = 659;
+  

Re: [PATCH] clean up FIXME in do_timer_interrupt

2005-03-08 Thread George Anzinger
Lee Revell wrote:
On Fri, 2005-03-04 at 12:58 -0800, George Anzinger wrote:
Lee Revell wrote:
On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote:
The thing that brought this code to my attention is that with PREEMPT_RT
this happens to be the longest non-preemptible code path in the kernel.
On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with
the rest of timer irq we end up disabling preemption for about 90 usecs.
Unfortunately I don't have the trace anymore.
Anyway the upshot is if we hung this off a timer it looks like we would
improve the worst case latency with PREEMPT_RT by almost 50%.  Unless
there is some reason it has to be done synchronously of course.
Well, it does have to be done at the right WRT the second, but I suspect we can 
hit that as well with a timer as it is hit now.  Also, if we are _really_ off 
the mark, this can be defered till the next second.


Do you have a patch?
Not at the moment, but I will work one up.
Andrew merged my trivial patch to clean up the logic, but a real fix
would be better.
Lee
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clean up FIXME in do_timer_interrupt

2005-03-04 Thread George Anzinger
Lee Revell wrote:
On Fri, 2005-03-04 at 02:28 -0800, George Anzinger wrote:
Lee Revell wrote:
On Thu, 2005-03-03 at 16:45 -0800, Andrew Morton wrote:

If efi_enabled is true and efi_set_rtc_mmss(xtime.tv_sec) returns zero, the
new code will run set_rtc_mmss(xtime.tv_sec) whereas the old code won't.

Argh, I should know better then to send patches before having coffee.
Here's a new patch.  Still ugly, but might be a worthwhile cleanup.
Lets ask the obvious question: Why isn't this update hung on a timer?  It seems 
silly to check this 6000 times per update.  I am sure we can sync a timer to the 
same degree we do timer interrupts, so there _must_ be some other reason.  Right?


Thanks George, I knew there was an obvious question here, I just didn't
know what it was ;-).
The thing that brought this code to my attention is that with PREEMPT_RT
this happens to be the longest non-preemptible code path in the kernel.
On my 1.3 Ghz machine set_rtc_mmss takes about 50 usecs, combined with
the rest of timer irq we end up disabling preemption for about 90 usecs.
Unfortunately I don't have the trace anymore.
Anyway the upshot is if we hung this off a timer it looks like we would
improve the worst case latency with PREEMPT_RT by almost 50%.  Unless
there is some reason it has to be done synchronously of course.
Well, it does have to be done at the right WRT the second, but I suspect we can 
hit that as well with a timer as it is hit now.  Also, if we are _really_ off 
the mark, this can be defered till the next second.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, deactivate() scheduling issue

2005-03-04 Thread George Anzinger
Eugeny S. Mints wrote:
Esben Nielsen wrote:
As I read the code the driver task (A) should _not_ be removed from the
runqueue. It has to be waken up to call schedule_timeout() such it gets
back on the runqueue after 10 ms. If it is taken out of the runqueue at
line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE
state!
Exactly. This is definilty the bug in the driver code - a developer just
didn;t care about proper utilization of set_current_state(). The driver 
works
just because as you have described - his fortune
that scheduler doesn't remove task in not TASK_RUNNING state from a run 
queue.
And my main question was - does everybody think it's ok have task in not 
TASK_RUNNING state in run queue. My current feeling is that this should 
not be allowed.
This is the normal and specified way to handle this sort of thing.  There is a 
race issue that coding in this way avoids.  The coding sequence is:
a) set the task state to some state other than TASK_RUNNING.
b) do what ever triggers the wake up.  This may be several things, for example, 
an interrupt from some device OR a timeout.
c) call schedule to wait.

The race is getting to the schedule call before the wake up happens.  If, for 
some reason, the wake up condition happens prior to the schedule call, it will 
set the task state back to TASK_RUNNING so that when the schedule() call is made 
the scheduler will just return which is the right thing (tm) to do as the 
condition being waited on has happened.  We also note that disabling interrupts 
or preemption will NOT avoid the race unless you disable interrupts on ALL cpus, 
which is a VERY expensive cross cpu call.

As I read the use PREEMPT_ACTIVE, it is there to test on whether this
rescheduling is voluntary or forced (a preemption). If it is forced the
task shall of course not go off the runqueue but stay there to run again
when it gets the highest priority. That is why PREEMPT_ACTIVE is set in
preempt_schedule() and preempt_schedule_irq(). On the other hand if the
task itself has called schedule() or schedule_timeout() it has to go out
of the runqueue and wait for some event to wake it up.
You right - it works perfectly - but not for  my test case - I believe 
task in not TASK_RUNNING state should be removed from a run queue by the 
first (any - voluntary or forced) execution of the schedule() which 
detects the task state is not TASK_RUNNIG.
This would cause the task to loose control prior to its setting up the needed 
wakeup events.

Yes there will be tasks in state other that TASK_RUNNING on the runqueue.
The "bug" as I see it is in the scheduler interface: There is no way to
set the task state and call schedule() or schedule_timeout() atomicly.
Therefore you can be preempted while the state is not TASK_RUNNING.
Exactly. IMO this interface is weird and needs rework. I don;t understand 
what the reason to set task state before schedule_timeout() call but not 
inside, right before the schedule(). The actual task state may be passed 
as a parameter.
You are assuming that the task ONLY wants to do a timeout.  Most of the time the 
timeout indicates an error condition.   The timeout bounds the wait for what is 
really desired, i.e. a device interrupt, some other task signaling, or some such.

Surly this is covered in the various driver writing guides...
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clean up FIXME in do_timer_interrupt

2005-03-04 Thread George Anzinger
Lee Revell wrote:
On Thu, 2005-03-03 at 16:45 -0800, Andrew Morton wrote:
If efi_enabled is true and efi_set_rtc_mmss(xtime.tv_sec) returns zero, the
new code will run set_rtc_mmss(xtime.tv_sec) whereas the old code won't.

Argh, I should know better then to send patches before having coffee.
Here's a new patch.  Still ugly, but might be a worthwhile cleanup.
Lets ask the obvious question: Why isn't this update hung on a timer?  It seems 
silly to check this 6000 times per update.  I am sure we can sync a timer to the 
same degree we do timer interrupts, so there _must_ be some other reason.  Right?

George
Lee
--- linux-2.6.11-rc4-V0.7.39-02/arch/i386/kernel/time.c	2005-02-14 18:10:49.0 -0500
+++ linux-2.6.11-rc4/arch/i386/kernel/time.c	2005-03-03 20:15:39.0 -0500
@@ -254,16 +254,12 @@
 			>= USEC_AFTER - ((unsigned) TICK_SIZE) / 2 &&
 	(xtime.tv_nsec / 1000)
 			<= USEC_BEFORE + ((unsigned) TICK_SIZE) / 2) {
-		/* horrible...FIXME */
+	last_rtc_update = xtime.tv_sec;
 		if (efi_enabled) {
-	 		if (efi_set_rtc_mmss(xtime.tv_sec) == 0)
-last_rtc_update = xtime.tv_sec;
-			else
-last_rtc_update = xtime.tv_sec - 600;
-		} else if (set_rtc_mmss(xtime.tv_sec) == 0)
-			last_rtc_update = xtime.tv_sec;
-		else
-			last_rtc_update = xtime.tv_sec - 600; /* do it again in 60 s */
+		if (efi_set_rtc_mmss(xtime.tv_sec))
+			last_rtc_update -= 600;
+		} else if (set_rtc_mmss(xtime.tv_sec))
+			last_rtc_update -= 600;
 	}
 
 	if (MCA_bus) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: realtime patch

2005-02-24 Thread George Anzinger
Fabian Fenaut wrote:
shabanip a ecrit le 25.02.2005 00:37:
where can i find realtime patchs to kernel 2.6?

http://sourceforge.net/projects/realtime-lsm/ ?
What??  NO, they are here:
  http://redhat.com/~mingo/realtime-preempt/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Needed faster implementation of do_gettimeofday()

2005-02-22 Thread George Anzinger
Puneet Kaushik wrote:
Hello Parag and George,
Thanks for immediate reply.
The main problem is I am working on a SMP system. I have written a small
program that just calls the gettimeofday(), one billion times. I have
run it with time utility and it takes almost double time on SMP then a
UP.

with kernel 2.6.10 on UP
real4m5.495s
user1m17.088s
sys 2m48.046s
With Kernel 2.6.10 on SMP
real6m24.485s
user1m43.723s
sys 4m30.749s
And the fact is this SMP machine is faster and with more memory than the
UP one. In SMP systems it make a spinlock every time it got called,
synchronizes both the processors, and unlock them. Thats all I know
about it.
On 2.6 the lock is a r/w sequence lock.  The machines are not synchronized or 
locked, but some of the sequence lock instructions around the locking are 
"locked".  I find it hard to believe that this would double the time, however.

Ah..., now I remember.  On SMP x86 boxen, the accounting/ run_timer interrupt 
comes from the lapic timer.  This is triggered at a 1/HZ rate and means that 
there is an additional time keeping interrupt.  Actually, over the box, you get 
(N+1)/HZ interrupts where N is the number of cpus.  Assuming that the PIT and 
the lapic interrupt take about the same amount of time and that the PIT 
interrupt is evenly distributed on the CPUs, the interrupt contention should go 
from 1 to 1.5.  This alone would take your 4.084 sec UP time to 6.125 sec on an 
SMP boxen (that is amazingly close to what you are seeing if you ask me).

Again, I recommend my HRT patch.  There the accounting interrupt is generated by 
an "all-but-self" IPI.  This is generated by the PIT interrupt code which also 
does the accounting on the cpu handling the PIT interrupt.  Result: total time 
keeping interrupts N/HZ where N is the number of CPUs.


George I am just working on your suggestion, let me know if it will work
for SMPs.
See above.  Should solve your problem.
If there is some good implementation for SMP, please let me know.
Thanks,
- Puneet

On Tue, 2005-02-22 at 08:36, George Anzinger wrote:
Parag Warudkar wrote:
On Sunday 20 February 2005 05:58 am, [EMAIL PROTECTED] wrote:

9859138.6083  vmlinux  mark_offset_tsc
5844735.1032  libc-2.3.2.sogetc

What makes you think mark_offset_tsc is slow? Do you have any comparative 
numbers?  It might just be that the workload you are throwing at it justifies 
it. (For e.g. if your workload does a zillion system calls, system_call will 
show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
it's just overused.) Can you post the relevant code?
He really is right.  Mark offset is reading the PIT counter and that is not only 
rather dumb but dog slow.

A suggestion, try the high res timers patch.  Even if you don't use the timers 
the mark offset there is MUCH faster.  It does not read the PIT.

The difference is where we assume the jiffie bump is in time.  If we assume it 
is at the point that the PIT interrupts, well then the only way to get to that 
is to read the PIT.  If, on the other hand, we assume it is at the time after 
the interrrupt where we mark offset, we can observe the "best" time for this 
event based on the TSC and avoid reading the PIT.

Try the HRT patch (see signature below) and see if if doesn't do better.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Needed faster implementation of do_gettimeofday()

2005-02-21 Thread George Anzinger
Parag Warudkar wrote:
On Sunday 20 February 2005 05:58 am, [EMAIL PROTECTED] wrote:
9859138.6083  vmlinux  mark_offset_tsc
5844735.1032  libc-2.3.2.sogetc

What makes you think mark_offset_tsc is slow? Do you have any comparative 
numbers?  It might just be that the workload you are throwing at it justifies 
it. (For e.g. if your workload does a zillion system calls, system_call will 
show up as a hot spot in oprofile - doesn't necessarily mean it is slow - 
it's just overused.) Can you post the relevant code?
He really is right.  Mark offset is reading the PIT counter and that is not only 
rather dumb but dog slow.

A suggestion, try the high res timers patch.  Even if you don't use the timers 
the mark offset there is MUCH faster.  It does not read the PIT.

The difference is where we assume the jiffie bump is in time.  If we assume it 
is at the point that the PIT interrupts, well then the only way to get to that 
is to read the PIT.  If, on the other hand, we assume it is at the time after 
the interrrupt where we mark offset, we can observe the "best" time for this 
event based on the TSC and avoid reading the PIT.

Try the HRT patch (see signature below) and see if if doesn't do better.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: queue_work from interrupt Real time preemption2.6.11-rc2-RT-V0.7.37-03

2005-02-16 Thread George Anzinger
David S. Miller wrote:
On Wed, 16 Feb 2005 06:16:45 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:

Maybe the networking
stack would break if we allowed the TIMER softirq (thread) to preempt
the NET softirq (threads) (and vice versa)?

The major assumption is that softirq's run indivisibly per-cpu.
Otherwise the per-cpu queues of RX and TX packet work would
get corrupted.
For what its worth, I, a short while ago, put together a workqueue package to a) 
allow easy priority setting for work queues and b) change either softirq, 
tasklet or bh code to use workqueues.  This was done mostly with CPP macros and 
a few conversion routines.  I then converted the network code to use this 
package simply by adding a key include to a couple of files.  The result worked 
on UP but ended up hanging the network code on SMP.  Everything else still 
worked, but not the net stuff.  I never ran down the problem as the "boss" was 
not interested in SMP...

George
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-02-10 Thread George Anzinger
Sven Dietrich wrote:
Hi George,
you may want to use this for reference.
This patch adds a config option to allow you to select whether timer IRQ runs 
in thread or not.
I'm not totally happy with the #ifdefs, but it may make witching back and forth easier.
Thanks, but...
You are addressing a different problem than I.  I want to code the VST patch to 
work in a system with or without the RT patch (it is easy to work with the RT 
option on or off).  The problem is setting up the spin locks it needs.  My 
solution assumes that RAW_SPIN_LOCK_UNLOCKED will not be defined unless the RT 
patch is applied.

As to your patch, in most archs the timer interrupt does accounting which 
requires input on just who was interrupted on the interrupt.  This is lost when 
threading the timer IRQ.  I think it was problems of this sort that caused Ingo 
to back away...

George
PS
By the way, your mailer (Microsoft Outlook) set up your attachment in such a 
way that my mailer would not inline it.  You might want to look into this.
Sven

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
George Anzinger
Sent: Thursday, February 10, 2005 12:21 PM
To: Ingo Molnar
Cc: William Weston; linux-kernel@vger.kernel.org
Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

If I want to write a patch that will work with or without the 
RT patch applied 
is the following enough?

#ifndef RAW_SPIN_LOCK_UNLOCKED
typedef raw_spinlock_t spinlock_t
#define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
#endif
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in the body of a message to 
[EMAIL PROTECTED] More majordomo info at  
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-02-10 Thread George Anzinger
I am seeing:
kernel/built-in.o(.text+0x4974): In function `copy_mm':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/fork.c:493: undefined 
reference to `__spin_is_locked'
kernel/built-in.o(.text+0x9f5a): In function `next_thread':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/kernel/exit.c:877: undefined 
reference to `__raw_rwlock_is_locked'
net/built-in.o(.text+0x1258): In function `__sock_create':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/socket.c:175: undefined 
reference to `__spin_is_locked'
net/built-in.o(.text+0x16b54): In function `dev_deactivate':
/usr/src/cvs/mvl-kernel-26/makena/linux-2.6.10/net/sched/sch_generic.c:594: 
undefined reference to `__spin_is_locked'
make[1]: *** [vmlinux] Error 1
make: *** [bzImage] Error 2

Possibly from:
define __raw_spin_is_locked(x)  (*(volatile signed char *)(&(x)->lock) <= 0)
#define __raw_spin_unlock_wait(x) \
do { barrier(); } while(__spin_is_locked(x))
in asm/spinlock.h
should that be __raw_spin_is_locked(x) instead?
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01

2005-02-10 Thread George Anzinger
If I want to write a patch that will work with or without the RT patch applied 
is the following enough?

#ifndef RAW_SPIN_LOCK_UNLOCKED
typedef raw_spinlock_t spinlock_t
#define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
#endif
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Dynamic tick, version 050127-1

2005-02-07 Thread George Anzinger
Pavel Machek wrote:
Hi!

I do have CONFIG_X86_PM_TIMER enabled, but it seems by board does not
have such piece of hardware:
[EMAIL PROTECTED]:/usr/src/linux-mm$ dmesg | grep -i "time\|tick\|apic"
PCI: Setting latency timer of device :00:11.5 to 64
[EMAIL PROTECTED]:/usr/src/linux-mm$ 
If you are sure that machine supports ACPI, maybe this is your problem
(from the POSIX high res timer patch):
 If you enable the ACPI pm timer and it cannot be found, it is
 possible that your BIOS is not producing the ACPI table or
 that your machine does not support ACPI.  In the former case,
 see "Default ACPI pm timer address".  If the timer is not
 found the boot will fail when trying to calibrate the 'delay'
 loop.

Well, but how do I get the address? I'll try looking at BIOS
options...
Pavel
In my machine, if I turned off the PM code (in the BIOS) (or possibly turning on 
the ACPI, again in the BIOS) it did produce the address.  Booting then would put 
that address in the dmesg file.  You can then change the BIOS back to what it 
was and use the address found in the dmesg file.
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High resolution timers and BH processing on -RT

2005-01-28 Thread George Anzinger
Ingo Molnar wrote:
* Thomas Gleixner <[EMAIL PROTECTED]> wrote:

or is it that we have a 'group' of normal timers expiring, which, if
they happen to occur _just_ prior a HRT event will generate a larger
delay?
Yep. The timers expire at random times. So it's likely to have short
sequences of timer interrupts going off. This needs reprogramming of
the PIT and processing of the expired timers.
If you can use a machine that has a local apic we can leave the PIT out of it. 
Really this is MUCH preferred.  If your box has a LAPIC, make sure it is not 
disabled by your config setup.

Leaving the PIT out of it, the structure is that HRT timers are put in the 
normal timer list and, when they expire, are moved to a HRT list which only 
contains timers that will expire prior to the next jiffie.  This list is managed 
by interrupt, ideally from the LAPIC, or the PIT is need be.  Aside from the PIT 
reprograming (once per HRT timer plus once to get back to the 1/HZ period), 
there can be delays in getting the timer out of the normal timer list.  The main 
thing here is that the list MUST be processed as close to the jiffie edge as 
possible as any timers due shortly after the jiffie edge will be shadowed by 
this regardless of the HRT interrupt.  Of course, it an expired timer is 
presented to the HRT code by the normal timer expire code, it is expired 
immeadiatly.

A quick comment here on the current RT code.  It looks to me like there is a 
race in timer delivery.  It looks like the softirq is "raised" by the PIT 
interrupt code and the jiffie is bumped by the timer thread.  If the softirq 
gets to run prior to the PIT interrupt thread we could end up in the run_timer 
list code with a stale jiffie value and do nothing.  This would delay normal 
timers for a jiffie and HRT timers for some time less than a jiffie, depending 
on when they were really due.

I thing we should move the raising of the timer softirq to the PIT interrupt 
thread after we release the xtime_lock.

i dont really like the static splitup of 'normal' vs. 'HRT' timers -
there might in fact be separate priority requirements between HRT timers
too.
Yes, and high priority tasks might want low res timers...
i think one possible solution would be to introduce some notion of
'timer priority', and to expire each timer priority level in a separate
timer expiry thread. Priority 0 (lowest) would be expired in ksoftirqd,
and there would be 3 separate threads for say priorities 1-3. Or
something like this. Potentially exposed to user-space as well, via new
APIs. Hm?
To push this even further: in theory timers could inherit the priority
of the task that starts them, and they would be expired in that priority
order - but this probably needs a pretty clever (and most likely
complex) data-structure ...
A long time ago in another land, I did such a system.  The timer priority was 
taken from the calling task.  At that time (and now, till convinced otherwise) I 
thought it a _good thing_ to expire timers in order, regardless of their 
priority, so all timers pending delivery were delivered at the priority of the 
highest priority timer in the "batch".  The basic idea was that the interrupt 
code pulled expired timers from the timer list and pushed them into the pending 
list.  In the process it found the highest priority timer in the list.  The 
timer delivery thread was then run at that priority.  This thread adjusted its 
priority downward as needed, but in all cases the timers were delivered in 
strict time order.

Since then, as now, the timer delivery usually just _notified_ a task of a 
pending signal, the low priority timers did not really hold up things for long. 
 Once the high priority timer was delivered and the thread either finished or 
dropped its priority, the waiting task (having been wakened by the signal 
delivery) could switch in.

The primary thing needed for this is a simple and quick way to switch a tasks 
priority, both from outside and from the task itself.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] to fix xtime lock for in the RT kernel patch

2005-01-27 Thread George Anzinger
George Anzinger wrote:
Ingo Molnar wrote:
* George Anzinger  wrote:

What I am suggesting is spliting the mark code so that it would only
grap the offset (current TSC in most systems) during interrupt
processing.  Applying this would be done later in the thread.  Since
it is not applying the offset, the xtime_lock would not need to be
taken.

ok, you are right, and this would be fine with me. Wanna take a shot at
it? I've uploaded the -03 patch which is my most current tree. (with the
do_timer() moving done already.) I've reviewed the TSC offset codepath
again and i'm not sure where i got the 10 usecs from ... it's a pretty
cheap codepath that can be done in the direct interrupt just fine.
Tomorrow, uh, later today.  Need some sleep now...
Ingo, I have been looking at the code being proposed by John Stultz.  It looks 
like it handles all the issues I am talking about here.  I think it would be 
best to leave the RT patch as it is WRT this issue and work on getting John's 
patch ready for prime time as any work I would do here will just get tossed when 
his patch hits the steet.

Meanwhile, I will (already have) get HRT working on RT and make that available 
in the next few days.

--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] posix-timers: CPU clock support for POSIX timers

2005-01-25 Thread George Anzinger
.clock = TIMER_OFF;
+   timr->it.mmtimer.expires = 0;
spin_unlock_irqrestore(&t->lock, irqflags);
}
return 0;
@@ -558,7 +558,7 @@ static int sgi_timer_del(struct k_itimer
 static void sgi_timer_get(struct k_itimer *timr, struct itimerspec 
*cur_setting)
 {
-   if (timr->it_timer.magic == TIMER_OFF) {
+   if (timr->it.mmtimer.clock == TIMER_OFF) {
cur_setting->it_interval.tv_nsec = 0;
cur_setting->it_interval.tv_sec = 0;
cur_setting->it_value.tv_nsec = 0;
@@ -566,8 +566,8 @@ static void sgi_timer_get(struct k_itime
return;
}
-   ns_to_timespec(cur_setting->it_interval, timr->it_incr * 
sgi_clock_period);
-   ns_to_timespec(cur_setting->it_value, (timr->it_timer.expires - 
rtc_time())* sgi_clock_period);
+   ns_to_timespec(cur_setting->it_interval, timr->it.mmtimer.incr * 
sgi_clock_period);
+   ns_to_timespec(cur_setting->it_value, (timr->it.mmtimer.expires - 
rtc_time())* sgi_clock_period);
return;
 }
@@ -640,19 +640,19 @@ retry:
base[i].timer = timr;
base[i].cpu = smp_processor_id();
-   timr->it_timer.magic = i;
-   timr->it_timer.data = nodeid;
-   timr->it_incr = period;
-   timr->it_timer.expires = when;
+   timr->it.mmtimer.clock = i;
+   timr->it.mmtimer.node = nodeid;
+   timr->it.mmtimer.incr = period;
+   timr->it.mmtimer.expires = when;
if (period == 0) {
if (mmtimer_setup(i, when)) {
mmtimer_disable_int(-1, i);
posix_timer_event(timr, 0);
-   timr->it_timer.expires = 0;
+   timr->it.mmtimer.expires = 0;
}
} else {
-   timr->it_timer.expires -= period;
+   timr->it.mmtimer.expires -= period;
if (reschedule_periodic_timer(base+i))
    err = -EINVAL;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
--
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >