subject:"Re\: \[patch\] CFS scheduler, \-v19"


Ingo Molnar wrote:

* Bill Davidsen <[EMAIL PROTECTED]> wrote:


Does the patch below help?
Spectacularly no! With this patch the "glitch1" script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Okay, as noted off-list, after I exported the xtime_seconds it now 
builds and works. However, there are a *lot* of "section mismatches" 
which are not reassuring.


Boots, runs, glitch1 test runs reasonably smoothly. automount has not 
used significant CPU yet, but I don't know what triggers it, the bad 
behavior did not happen immediately without the patch. However, it looks 
very hopeful.


Warnings attached to save you the trouble...

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
Script started on Thu 19 Jul 2007 05:29:08 PM EDT
Common profile 1.13 lastmod 2006-01-04 22:43:25-05
No common directory available
Session time 17:29:08 on 07/19/07
posidon:davidsen> time nice -10 make -j4 -s; sleep 2; exit
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CHK include/linux/compile.h
  CHK include/linux/compile.h
  UPD include/linux/compile.h
  CHK include/linux/version.h
  Building modules, stage 2.
WARNING: vmlinux(.text+0xc1001183): Section mismatch: reference to 
.init.text:start_kernel (between 'is386' and 'check_x87')
WARNING: vmlinux(.text+0xc1213fb4): Section mismatch: reference to .init.text: 
(between 'rest_init' and 'kthreadd_setup')
WARNING: vmlinux(.text+0xc1218786): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1218792): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc121879e): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc12187aa): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1214071): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'alloc_node_mem_map' and 
'zone_wait_table_init')
WARNING: vmlinux(.text+0xc1214117): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'zone_wait_table_init' and 'schedule')
WARNING: vmlinux(.text+0xc10fbaae): Section mismatch: reference to 
.init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta')
WARNING: vmlinux(.text+0xc1218eda): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
Root device is (253, 0)
Setup is 11240 bytes (padded to 11264 bytes).
System is 1915 kB
Kernel: arch/i386/boot/bzImage is ready  (#3)

real4m11.024s
user2m5.121s
sys 0m30.952s
exit

Script done on Thu 19 Jul 2007 05:33:35 PM EDT

Re: [patch] CFS scheduler, -v19


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> Bill Davidsen wrote:
> >Ingo Molnar wrote:
> >>* Bill Davidsen <[EMAIL PROTECTED]> wrote:
> >>
> Does the patch below help?
> >
> >Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
> >I recreate it.
> 
> Applied to 2.6.22-git9, building now.

ok, that's fine too.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Bill Davidsen wrote:

Ingo Molnar wrote:

* Bill Davidsen <[EMAIL PROTECTED]> wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
I recreate it.


Applied to 2.6.22-git9, building now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> >* Bill Davidsen <[EMAIL PROTECTED]> wrote:
> >
> >>>Does the patch below help?
> 
> Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
> I recreate it.

the patch below is merged against 2.6.22.1-cfs-v19 - does it solve the 
autofs problem (without any other bad side-effects)?

Ingo

--->
Subject: time: introduce xtime_seconds
From: Ingo Molnar <[EMAIL PROTECTED]>

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)

this way high-resolution time results update their seconds component at 
the same time sys_time() does it:

 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]

instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   26 +++---
 3 files changed, 40 insertions(+), 24 deletions(-)

Index: linux-cfs-2.6.22.q/include/linux/time.h
===
--- linux-cfs-2.6.22.q.orig/include/linux/time.h
+++ linux-cfs-2.6.22.q/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);
 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);
+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds) > 0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME   (current_kernel_time())
-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
Index: linux-cfs-2.6.22.q/kernel/time.c
===
--- linux-cfs-2.6.22.q.orig/kernel/time.c
+++ linux-cfs-2.6.22.q/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
do {
seq = read_seqbegin(_lock);
-   
+
now = xtime;
} while (read_seqretry(_lock, seq));
 
-   return now; 
+   return now;
 }
 
 EXPORT_SYMBOL(current_kernel_time);
@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv->tv_sec = sec;
tv->tv_usec = usec;
 
-   /*
-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv->tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(_lock);
-   update_wall_time();
-   write_seqlock_irqrestore(_lock);
-   }
+   update_xtime_seconds(sec);
 }
 
 EXPORT_SYMBOL(do_gettimeofday);

Re: [patch] CFS scheduler, -v19


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> >* Bill Davidsen <[EMAIL PROTECTED]> wrote:
> >
> >>>Does the patch below help?
> 
> Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
> I recreate it.

hm, it's against recent -git.

dont waste your time on 2.6.21.6-cfsv19, it will likely not apply - give 
me a few minutes to create a patch for you against 2.6.22.1-cfsv19, ok?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Ingo Molnar wrote:

* Bill Davidsen <[EMAIL PROTECTED]> wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as I 
recreate it.


Spectacularly no! With this patch the "glitch1" script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Ingo

--->
Subject: time: introduce xtime_seconds
From: Ingo Molnar <[EMAIL PROTECTED]>

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)


this way high-resolution time results update their seconds component at 
the same time sys_time() does it:


 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]


instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   28 
 3 files changed, 41 insertions(+), 25 deletions(-)

Index: linux/include/linux/time.h
===
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);

 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);

+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds) > 0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME		(current_kernel_time())

-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);

 extern int do_settimeofday(struct timespec *tv);
Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
 	smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
 	do {

seq = read_seqbegin(_lock);
-   
+
now = xtime;
} while (read_seqretry(_lock, seq));
 
-	return now; 
+	return now;

 }
 
 EXPORT_SYMBOL(current_kernel_time);

@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv->tv_sec = sec;
tv->tv_usec = usec;
 
-	/*

-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv->tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(_lock, flags);
-   update_wall_time();
-   write_sequnlock_irqrestore(_lock, flags);
-   }
+

Re: [patch] CFS scheduler, -v19


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> > Does the patch below help?
>
> Spectacularly no! With this patch the "glitch1" script with multiple 
> scrolling windows has all xterms and glxgears stop totally dead for 
> ~200ms once per second. I didn't properly test anything else after 
> that.

Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?

Ingo

--->
Subject: time: introduce xtime_seconds
From: Ingo Molnar <[EMAIL PROTECTED]>

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)

this way high-resolution time results update their seconds component at 
the same time sys_time() does it:

 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]

instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   28 
 3 files changed, 41 insertions(+), 25 deletions(-)

Index: linux/include/linux/time.h
===
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);
 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);
+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds) > 0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME   (current_kernel_time())
-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
do {
seq = read_seqbegin(_lock);
-   
+
now = xtime;
} while (read_seqretry(_lock, seq));
 
-   return now; 
+   return now;
 }
 
 EXPORT_SYMBOL(current_kernel_time);
@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv->tv_sec = sec;
tv->tv_usec = usec;
 
-   /*
-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv->tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(_lock, flags);
-   update_wall_time();
-   write_sequnlock_irqrestore(_lock, flags);
-   }
+   update_xtime_seconds(sec);
 }
 EXPORT_SYMBOL(do_gettimeofday);
 
Index:

Re: [patch] CFS scheduler, -v19


* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > ah! It passes in a low-res time source into a high-res time 
> > interface (pthread_cond_timedwait()). Could you change the 
> > time(NULL) + 1 to time(NULL) + 2, or change it to:
> > 
> > gettimeofday(, NULL);
> > wait.tv_sec++;
> 
> This is wrong. It's wrong for two reasons:
> 
>  - it really shouldn't be needed. I don't think "time()" has to be 
>*exactly* in sync, but I don't think it can be off by a third of a 
>second or whatever (as the "30% CPU load" would seem to imply)
> 
>  - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
>timespec.

ah, i didnt notice that automount mixed up timespec with timeval! That 
is nasty and the tv_nsec field (which really is ts_usec to 
pthread_cond_timewait()) must stay cleared - or rather, to avoid bugs of 
this type, a timespec variable should be used for all this.

> So if it actually makes a difference, it makes a difference for the 
> *wrong* reason: the time is still totally nonsensical in the tv_nsec 
> field (because it actually got filled in with msecs!), but now the 
> tv_sec field is in sync, so it hides the bug.
> 
> Anyway, hopefully the patch below might help. But we probably should make 
> this whole thing a much more generic routine (ie we have our internal 
> "getnstimeofday()" that still is missing the second-overflow logic, and 
> that is quite possibly the one that triggers the "30% off" behaviour).

yeah, i'll generalize it, but our internal getnstimeofday() used on most 
architectures is using __get_realtime_clock_ns(), and the patch you 
attached already adds the second-overflow logic to it.

there are two versions of getnstimeofday(), a TIME_INTERPOLATION one and 
a !TIME_INTERPOLATION one. TIME_INTERPOLATION is only used on ia64 at 
the moment - and that one indeed does not have the second overflow 
logic.

> Ingo, I'd suggest:
>  - ger rid of "timespec_add_ns()", or at least make it return a return 
>value for when it overflows.
>  - make all the people who overflow into tv_sec call a "fix_up_seconds()" 
>thing that does the xtime overflow handling.

ok, i'll do something clean.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen <[EMAIL PROTECTED]> wrote:

> > Does the patch below help?
>
> Spectacularly no! With this patch the "glitch1" script with multiple 
> scrolling windows has all xterms and glxgears stop totally dead for 
> ~200ms once per second. I didn't properly test anything else after 
> that. Since the automount issue doesn't seem to start until something 
> kicks it off, I didn't see it but that doesn't mean it's fixed.

thanks. Andrew also just reported that it broke his laptop and i'm 
working on a proper version.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen [EMAIL PROTECTED] wrote:

  Does the patch below help?

 Spectacularly no! With this patch the glitch1 script with multiple 
 scrolling windows has all xterms and glxgears stop totally dead for 
 ~200ms once per second. I didn't properly test anything else after 
 that. Since the automount issue doesn't seem to start until something 
 kicks it off, I didn't see it but that doesn't mean it's fixed.

thanks. Andrew also just reported that it broke his laptop and i'm 
working on a proper version.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Linus Torvalds [EMAIL PROTECTED] wrote:

  ah! It passes in a low-res time source into a high-res time 
  interface (pthread_cond_timedwait()). Could you change the 
  time(NULL) + 1 to time(NULL) + 2, or change it to:
  
  gettimeofday(wait, NULL);
  wait.tv_sec++;
 
 This is wrong. It's wrong for two reasons:
 
  - it really shouldn't be needed. I don't think time() has to be 
*exactly* in sync, but I don't think it can be off by a third of a 
second or whatever (as the 30% CPU load would seem to imply)
 
  - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
timespec.

ah, i didnt notice that automount mixed up timespec with timeval! That 
is nasty and the tv_nsec field (which really is ts_usec to 
pthread_cond_timewait()) must stay cleared - or rather, to avoid bugs of 
this type, a timespec variable should be used for all this.

 So if it actually makes a difference, it makes a difference for the 
 *wrong* reason: the time is still totally nonsensical in the tv_nsec 
 field (because it actually got filled in with msecs!), but now the 
 tv_sec field is in sync, so it hides the bug.
 
 Anyway, hopefully the patch below might help. But we probably should make 
 this whole thing a much more generic routine (ie we have our internal 
 getnstimeofday() that still is missing the second-overflow logic, and 
 that is quite possibly the one that triggers the 30% off behaviour).

yeah, i'll generalize it, but our internal getnstimeofday() used on most 
architectures is using __get_realtime_clock_ns(), and the patch you 
attached already adds the second-overflow logic to it.

there are two versions of getnstimeofday(), a TIME_INTERPOLATION one and 
a !TIME_INTERPOLATION one. TIME_INTERPOLATION is only used on ia64 at 
the moment - and that one indeed does not have the second overflow 
logic.

 Ingo, I'd suggest:
  - ger rid of timespec_add_ns(), or at least make it return a return 
value for when it overflows.
  - make all the people who overflow into tv_sec call a fix_up_seconds() 
thing that does the xtime overflow handling.

ok, i'll do something clean.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen [EMAIL PROTECTED] wrote:

  Does the patch below help?

 Spectacularly no! With this patch the glitch1 script with multiple 
 scrolling windows has all xterms and glxgears stop totally dead for 
 ~200ms once per second. I didn't properly test anything else after 
 that.

Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?

Ingo

---
Subject: time: introduce xtime_seconds
From: Ingo Molnar [EMAIL PROTECTED]

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)

this way high-resolution time results update their seconds component at 
the same time sys_time() does it:

 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]

instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   28 
 3 files changed, 41 insertions(+), 25 deletions(-)

Index: linux/include/linux/time.h
===
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);
 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);
+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds)  0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME   (current_kernel_time())
-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
do {
seq = read_seqbegin(xtime_lock);
-   
+
now = xtime;
} while (read_seqretry(xtime_lock, seq));
 
-   return now; 
+   return now;
 }
 
 EXPORT_SYMBOL(current_kernel_time);
@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv-tv_sec = sec;
tv-tv_usec = usec;
 
-   /*
-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv-tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(xtime_lock, flags);
-   update_wall_time();
-   write_sequnlock_irqrestore(xtime_lock, flags);
-   }
+   update_xtime_seconds(sec);
 }
 EXPORT_SYMBOL(do_gettimeofday);
 
Index:

Re: [patch] CFS scheduler, -v19


Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as I 
recreate it.


Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Ingo

---
Subject: time: introduce xtime_seconds
From: Ingo Molnar [EMAIL PROTECTED]

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)


this way high-resolution time results update their seconds component at 
the same time sys_time() does it:


 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]


instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   28 
 3 files changed, 41 insertions(+), 25 deletions(-)

Index: linux/include/linux/time.h
===
--- linux.orig/include/linux/time.h
+++ linux/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);

 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);

+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds)  0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME		(current_kernel_time())

-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);

 extern int do_settimeofday(struct timespec *tv);
Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
 	smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
 	do {

seq = read_seqbegin(xtime_lock);
-   
+
now = xtime;
} while (read_seqretry(xtime_lock, seq));
 
-	return now; 
+	return now;

 }
 
 EXPORT_SYMBOL(current_kernel_time);

@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv-tv_sec = sec;
tv-tv_usec = usec;
 
-	/*

-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv-tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(xtime_lock, flags);
-   update_wall_time();
-   write_sequnlock_irqrestore(xtime_lock, flags);
-   }
+

Re: [patch] CFS scheduler, -v19


* Bill Davidsen [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 * Bill Davidsen [EMAIL PROTECTED] wrote:
 
 Does the patch below help?
 
 Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
 I recreate it.

hm, it's against recent -git.

dont waste your time on 2.6.21.6-cfsv19, it will likely not apply - give 
me a few minutes to create a patch for you against 2.6.22.1-cfsv19, ok?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 * Bill Davidsen [EMAIL PROTECTED] wrote:
 
 Does the patch below help?
 
 Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
 I recreate it.

the patch below is merged against 2.6.22.1-cfs-v19 - does it solve the 
autofs problem (without any other bad side-effects)?

Ingo

---
Subject: time: introduce xtime_seconds
From: Ingo Molnar [EMAIL PROTECTED]

introduce the xtime_seconds optimization. This is a read-mostly 
low-resolution time source available to sys_time() and kernel-internal 
use. This variable is kept uptodate atomically, and it's monotically 
increased, every time some time interface constructs an xtime-alike time 
result that overflows the seconds value. (it's updated from the timer 
interrupt as well)

this way high-resolution time results update their seconds component at 
the same time sys_time() does it:

 118485883289000
 11848588320
 118485883292000
 11848588320
 118485883296000
 11848588320
 118485883299000
 11848588320
 118485883303000
 11848588330
 118485883306000
 11848588330
 118485883309000
 11848588330

 [ these are nsec time results from alternating calls to sys_time() and 
   sys_gettimeofday(), recorded at the seconds boundary. ]

instead of the previous (non-coherent) behavior:

 118484895087000
 11848489500
 11848489509
 11848489500
 118484895094000
 11848489500
 118484895097000
 11848489500
 118484895101000
 11848489500
 118484895105000
 11848489500
 118484895108000
 11848489500
 118484895111000
 11848489500
 118484895115000

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 include/linux/time.h  |   13 +++--
 kernel/time.c |   25 ++---
 kernel/time/timekeeping.c |   26 +++---
 3 files changed, 40 insertions(+), 24 deletions(-)

Index: linux-cfs-2.6.22.q/include/linux/time.h
===
--- linux-cfs-2.6.22.q.orig/include/linux/time.h
+++ linux-cfs-2.6.22.q/include/linux/time.h
@@ -91,19 +91,28 @@ static inline struct timespec timespec_s
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock __attribute__((weak));
+extern unsigned long xtime_seconds;
 
 extern unsigned long read_persistent_clock(void);
 void timekeeping_init(void);
 
+extern void __update_xtime_seconds(unsigned long new_xtime_seconds);
+
+static inline void update_xtime_seconds(unsigned long new_xtime_seconds)
+{
+   if (unlikely((long)(new_xtime_seconds - xtime_seconds)  0))
+   __update_xtime_seconds(new_xtime_seconds);
+}
+
 static inline unsigned long get_seconds(void)
 {
-   return xtime.tv_sec;
+   return xtime_seconds;
 }
 
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME   (current_kernel_time())
-#define CURRENT_TIME_SEC   ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME_SEC   ((struct timespec) { xtime_seconds, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
Index: linux-cfs-2.6.22.q/kernel/time.c
===
--- linux-cfs-2.6.22.q.orig/kernel/time.c
+++ linux-cfs-2.6.22.q/kernel/time.c
@@ -58,11 +58,10 @@ EXPORT_SYMBOL(sys_tz);
 asmlinkage long sys_time(time_t __user * tloc)
 {
/*
-* We read xtime.tv_sec atomically - it's updated
-* atomically by update_wall_time(), so no need to
-* even read-lock the xtime seqlock:
+* We read xtime_seconds atomically - it's updated
+* atomically by update_xtime_seconds():
 */
-   time_t i = xtime.tv_sec;
+   time_t i = xtime_seconds;
 
smp_rmb(); /* sys_time() results are coherent */
 
@@ -226,11 +225,11 @@ inline struct timespec current_kernel_ti
 
do {
seq = read_seqbegin(xtime_lock);
-   
+
now = xtime;
} while (read_seqretry(xtime_lock, seq));
 
-   return now; 
+   return now;
 }
 
 EXPORT_SYMBOL(current_kernel_time);
@@ -377,19 +376,7 @@ void do_gettimeofday (struct timeval *tv
tv-tv_sec = sec;
tv-tv_usec = usec;
 
-   /*
-* Make sure xtime.tv_sec [returned by sys_time()] always
-* follows the gettimeofday() result precisely. This
-* condition is extremely unlikely, it can hit at most
-* once per second:
-*/
-   if (unlikely(xtime.tv_sec != tv-tv_sec)) {
-   unsigned long flags;
-
-   write_seqlock_irqsave(xtime_lock);
-   update_wall_time();
-   write_seqlock_irqrestore(xtime_lock);
-   }
+   update_xtime_seconds(sec);
 }
 
 EXPORT_SYMBOL(do_gettimeofday);

Re: [patch] CFS scheduler, -v19


Bill Davidsen wrote:

Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?


Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
I recreate it.


Applied to 2.6.22-git9, building now.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Bill Davidsen [EMAIL PROTECTED] wrote:

 Bill Davidsen wrote:
 Ingo Molnar wrote:
 * Bill Davidsen [EMAIL PROTECTED] wrote:
 
 Does the patch below help?
 
 Doesn't seem to apply against 2.6.22.1, I'm trying 2.6.22.6 as soon as 
 I recreate it.
 
 Applied to 2.6.22-git9, building now.

ok, that's fine too.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Ingo Molnar wrote:

* Bill Davidsen [EMAIL PROTECTED] wrote:


Does the patch below help?
Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after 
that.


Bill, could you try the patch below - does it fix the automount problem, 
without introducing new problems?


Okay, as noted off-list, after I exported the xtime_seconds it now 
builds and works. However, there are a *lot* of section mismatches 
which are not reassuring.


Boots, runs, glitch1 test runs reasonably smoothly. automount has not 
used significant CPU yet, but I don't know what triggers it, the bad 
behavior did not happen immediately without the patch. However, it looks 
very hopeful.


Warnings attached to save you the trouble...

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
Script started on Thu 19 Jul 2007 05:29:08 PM EDT
Common profile 1.13 lastmod 2006-01-04 22:43:25-05
No common directory available
Session time 17:29:08 on 07/19/07
posidon:davidsen time nice -10 make -j4 -s; sleep 2; exit
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CHK include/linux/compile.h
  CHK include/linux/compile.h
  UPD include/linux/compile.h
  CHK include/linux/version.h
  Building modules, stage 2.
WARNING: vmlinux(.text+0xc1001183): Section mismatch: reference to 
.init.text:start_kernel (between 'is386' and 'check_x87')
WARNING: vmlinux(.text+0xc1213fb4): Section mismatch: reference to .init.text: 
(between 'rest_init' and 'kthreadd_setup')
WARNING: vmlinux(.text+0xc1218786): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1218792): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc121879e): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc12187aa): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
WARNING: vmlinux(.text+0xc1214071): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'alloc_node_mem_map' and 
'zone_wait_table_init')
WARNING: vmlinux(.text+0xc1214117): Section mismatch: reference to 
.init.text:__alloc_bootmem_node (between 'zone_wait_table_init' and 'schedule')
WARNING: vmlinux(.text+0xc10fbaae): Section mismatch: reference to 
.init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta')
WARNING: vmlinux(.text+0xc1218eda): Section mismatch: reference to .init.text: 
(between 'iret_exc' and '_etext')
Root device is (253, 0)
Setup is 11240 bytes (padded to 11264 bytes).
System is 1915 kB
Kernel: arch/i386/boot/bzImage is ready  (#3)

real4m11.024s
user2m5.121s
sys 0m30.952s
exit

Script done on Thu 19 Jul 2007 05:33:35 PM EDT

Re: [patch] CFS scheduler, -v19


Linus Torvalds wrote:

On Tue, 17 Jul 2007, Ingo Molnar wrote:
  

* Ian Kent <[EMAIL PROTECTED]> wrote:


In several places I have code similar to:

wait.tv_sec = time(NULL) + 1;
wait.tv_nsec = 0;
  


Ok, that definitely should work.

Does the patch below help?

  
Spectacularly no! With this patch the "glitch1" script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after that. 
Since the automount issue doesn't seem to start until something kicks it 
off, I didn't see it but that doesn't mean it's fixed.
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(, NULL);
wait.tv_sec++;



This is wrong. It's wrong for two reasons:

 - it really shouldn't be needed. I don't think "time()" has to be 
   *exactly* in sync, but I don't think it can be off by a third of a 
   second or whatever (as the "30% CPU load" would seem to imply)


 - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
   timespec.


So if it actually makes a difference, it makes a difference for the 
*wrong* reason: the time is still totally nonsensical in the tv_nsec field 
(because it actually got filled in with msecs!), but now the tv_sec field 
is in sync, so it hides the bug.


Anyway, hopefully the patch below might help. But we probably should make 
this whole thing a much more generic routine (ie we have our internal 
"getnstimeofday()" that still is missing the second-overflow logic, and 
that is quite possibly the one that triggers the "30% off" behaviour).


  

Hope that info helps.


Ingo, I'd suggest:
 - ger rid of "timespec_add_ns()", or at least make it return a return 
   value for when it overflows.
 - make all the people who overflow into tv_sec call a "fix_up_seconds()" 
   thing that does the xtime overflow handling.


Linus
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

On Wed, 2007-07-18 at 09:03 -0700, Linus Torvalds wrote:
> 
> On Tue, 17 Jul 2007, Ingo Molnar wrote:
> > 
> > * Ian Kent <[EMAIL PROTECTED]> wrote:
> > > 
> > > In several places I have code similar to:
> > > 
> > > wait.tv_sec = time(NULL) + 1;
> > > wait.tv_nsec = 0;
> 
> Ok, that definitely should work.
> 
> Does the patch below help?
> 
> > ah! It passes in a low-res time source into a high-res time interface 
> > (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> > time(NULL) + 2, or change it to:
> > 
> > gettimeofday(, NULL);
> > wait.tv_sec++;
> 
> This is wrong. It's wrong for two reasons:
> 
>  - it really shouldn't be needed. I don't think "time()" has to be 
>*exactly* in sync, but I don't think it can be off by a third of a 
>second or whatever (as the "30% CPU load" would seem to imply)
> 
>  - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
>timespec.
> 
> So if it actually makes a difference, it makes a difference for the 
> *wrong* reason: the time is still totally nonsensical in the tv_nsec field 
> (because it actually got filled in with msecs!), but now the tv_sec field 
> is in sync, so it hides the bug.

Oh ya .. I thought it wouldn't hurt to add the fraction of the current
second for correctness and actually put things like:

gettimeofday(, NULL);
wait.tv_sec = now.tv_sec + 1;
wait.tv_nsec = now.tv_usec * 1000;

in autofs.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

On Wed, 18 Jul 2007, Ingo Molnar wrote:
> 
> Linus, Thomas, what do you think, should we keep the time.c change? 

No, not if it's off by the second field. That 30% CPU usage indicates that 
there's some nasty bug there somewhere, and that's just not worth it.

If time() cannot get the second field right, it's bogus. I'm ok with us 
not *guaranteeing* monotonicity of the second field when you compare 
gettimeofday() with time(), but the 30% thing implies that it's much worse 
than that, and that "time()" will likely report the previous second (when 
compared to hrtimers) roughly a quarter of the time.

And that isn't acceptable. 

So either it should be fixed, or reverted.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19



On Tue, 17 Jul 2007, Ingo Molnar wrote:
> 
> * Ian Kent <[EMAIL PROTECTED]> wrote:
> > 
> > In several places I have code similar to:
> > 
> > wait.tv_sec = time(NULL) + 1;
> > wait.tv_nsec = 0;

Ok, that definitely should work.

Does the patch below help?

> ah! It passes in a low-res time source into a high-res time interface 
> (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> time(NULL) + 2, or change it to:
> 
>   gettimeofday(, NULL);
>   wait.tv_sec++;

This is wrong. It's wrong for two reasons:

 - it really shouldn't be needed. I don't think "time()" has to be 
   *exactly* in sync, but I don't think it can be off by a third of a 
   second or whatever (as the "30% CPU load" would seem to imply)

 - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
   timespec.

So if it actually makes a difference, it makes a difference for the 
*wrong* reason: the time is still totally nonsensical in the tv_nsec field 
(because it actually got filled in with msecs!), but now the tv_sec field 
is in sync, so it hides the bug.

Anyway, hopefully the patch below might help. But we probably should make 
this whole thing a much more generic routine (ie we have our internal 
"getnstimeofday()" that still is missing the second-overflow logic, and 
that is quite possibly the one that triggers the "30% off" behaviour).

Ingo, I'd suggest:
 - ger rid of "timespec_add_ns()", or at least make it return a return 
   value for when it overflows.
 - make all the people who overflow into tv_sec call a "fix_up_seconds()" 
   thing that does the xtime overflow handling.

Linus

---
Subject: time: make sure sys_gettimeofday() and sys_time() are in sync
From: Ingo Molnar <[EMAIL PROTECTED]>

make sure sys_gettimeofday() and sys_time() results are coherent.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/time/timekeeping.c |   13 +
 1 file changed, 13 insertions(+)

Index: linux/kernel/time/timekeeping.c
===
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -92,6 +92,19 @@ static inline void __get_realtime_clock_
} while (read_seqretry(_lock, seq));
 
timespec_add_ns(ts, nsecs);
+   /*
+* Make sure xtime.tv_sec [returned by sys_time()] always
+* follows the gettimeofday() result precisely. This
+* condition is extremely unlikely, it can hit at most
+* once per second:
+*/
+   if (unlikely(xtime.tv_sec != ts->tv_sec)) {
+   unsigned long flags;
+
+   write_seqlock_irqsave(_lock, flags);
+   update_wall_time();
+   write_sequnlock_irqrestore(_lock, flags);
+   }
 }
 
 /**

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Ingo Molnar wrote:

* Ian Kent <[EMAIL PROTECTED]> wrote:

  
ah! It passes in a low-res time source into a high-res time 
interface (pthread_cond_timedwait()). Could you change the 
time(NULL) + 1 to time(NULL) + 2, or change it to:


gettimeofday(, NULL);
wait.tv_sec++;

does this solve the spinning?

Yes, adding in the offset within the current second appears to resolve 
the issue. Thanks Ingo.



i'm wondering how widespread this is. If automount is the only app 
doing this then _maybe_ we could get away with it by changing 
automount?

I don't think the change is unreasonable since I wasn't using an 
accurate time in the condition wait, so that's a coding mistake on my 
part which I will fix.



thanks Ian for taking care of this and for fixing it!

Linus, Thomas, what do you think, should we keep the time.c change? 
Automount is one app affected so far, and it's a borderline case: the 
increased (30%) CPU usage is annoying, but it does not prevent the 
system from working per se, and an upgrade to a fixed/enhanced automount 
version resolves it.


The temptation of using a really (and trivially) scalable low-resolution 
time-source (which is _easily_ vsyscall-able, on any platform) for DBMS 
use is really large, to me at least. Should i perhaps add a boot/config 
option that enables/disables this optimization, to allow distros finer

grained control about this? And we've also got to wait whether there's
any other app affected.
  
Allow it to be selected by the "features" so that admins can evaluate 
the implications without a reboot?  That would be a convenient interface 
if you could provide it.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-18 Thread Ingo Molnar

* Ian Kent <[EMAIL PROTECTED]> wrote:

> > > ah! It passes in a low-res time source into a high-res time 
> > > interface (pthread_cond_timedwait()). Could you change the 
> > > time(NULL) + 1 to time(NULL) + 2, or change it to:
> > >
> > >   gettimeofday(, NULL);
> > >   wait.tv_sec++;
> > >
> > > does this solve the spinning?
> 
> Yes, adding in the offset within the current second appears to resolve 
> the issue. Thanks Ingo.
> 
> > > i'm wondering how widespread this is. If automount is the only app 
> > > doing this then _maybe_ we could get away with it by changing 
> > > automount?
> 
> I don't think the change is unreasonable since I wasn't using an 
> accurate time in the condition wait, so that's a coding mistake on my 
> part which I will fix.

thanks Ian for taking care of this and for fixing it!

Linus, Thomas, what do you think, should we keep the time.c change? 
Automount is one app affected so far, and it's a borderline case: the 
increased (30%) CPU usage is annoying, but it does not prevent the 
system from working per se, and an upgrade to a fixed/enhanced automount 
version resolves it.

The temptation of using a really (and trivially) scalable low-resolution 
time-source (which is _easily_ vsyscall-able, on any platform) for DBMS 
use is really large, to me at least. Should i perhaps add a boot/config 
option that enables/disables this optimization, to allow distros finer
grained control about this? And we've also got to wait whether there's
any other app affected.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

On Tue, 2007-07-17 at 21:24 -0400, Bill Davidsen wrote:
> Ingo Molnar wrote:
> > * Ian Kent <[EMAIL PROTECTED]> wrote:
> >
> >   
> >>> ah! It passes in a low-res time source into a high-res time interface 
> >>> (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> >>> time(NULL) + 2, or change it to:
> >>>
> >>>   gettimeofday(, NULL);
> >>>   wait.tv_sec++;
> >>>   
> >> OK, I'm with you, hi-res timer.
> >> But even so, how is the time in the past after adding a second.
> >>
> >> Is it because I'm not setting tv_nsec when it's close to a second 
> >> boundary, and hence your recommendation above?
> >> 
> >
> > yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
> > 1 second resolution timesource. I dont yet understand the failure mode 
> > though that results in that looping and in the 30% CPU time use - do you 
> > understand it perhaps? (and automount is still functional while this is 
> > happening, correct?)
> >   
> 
> Can't say, I have automount running because I get it by default, but I 
> have nothing using at on my test machine. Why is it looping so fast when 
> there are no mount points defined? If the config changes there's no 
> requirement to notice right away, is there?

There are two threads where this mistake is made.

One is used to trigger expire events for all automounted filesystems
which happen all the time since I need to run the expire to check if
anything is mounted and whether it needs to be umounted. The alarm
handler sleeps on a condition until the alarm list in not empty and then
sleeps on a condition until the next alarm in the list expires or an
alarm is added to the list, in which case it then checks the list again.
Since the autofs timeout granularity is one second this is a problem and
will be fixed. This isn't the source of the problem that's been
reported.

The second is the state queue handler which runs tasks such as expires,
map re-reads, shutdowns etc. for all automounted filesystems. While the
check interval could be longer it causes autofs to be slugish in
situations such as shutdowns where there are a largish number of mounts
present and I need to cancel such things as expires and the like. It's
possible I could improve this but, in fact, once the timespec is set
correctly as Ingo suggests it works fine and uses very little resource.

Ian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] CFS scheduler, -v19

On Tue, 2007-07-17 at 14:16 -0700, David Schwartz wrote:
> > * Ian Kent <[EMAIL PROTECTED]> wrote:
> >
> > > Yes it does and I have two reported bugs so far.
> > >
> > > In several places I have code similar to:
> > >
> > > wait.tv_sec = time(NULL) + 1;
> > > wait.tv_nsec = 0;
> > >
> > > signaled = 0;
> > > while (!signaled) {
> > > status = pthread_cond_timedwait(, , );
> > >if (status) {
> > >  if (status == ETIMEDOUT)
> > >   break;
> > >  fatal(status);
> > >   }
> > > }
> >
> > ah! It passes in a low-res time source into a high-res time interface
> > (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to
> > time(NULL) + 2, or change it to:
> >
> > gettimeofday(, NULL);
> > wait.tv_sec++;
> >
> > does this solve the spinning?

Yes, adding in the offset within the current second appears to resolve
the issue. Thanks Ingo.

> >
> > i'm wondering how widespread this is. If automount is the only app doing
> > this then _maybe_ we could get away with it by changing automount?

I don't think the change is unreasonable since I wasn't using an
accurate time in the condition wait, so that's a coding mistake on my
part which I will fix.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] CFS scheduler, -v19

On Tue, 2007-07-17 at 14:16 -0700, David Schwartz wrote:
  * Ian Kent [EMAIL PROTECTED] wrote:
 
   Yes it does and I have two reported bugs so far.
  
   In several places I have code similar to:
  
   wait.tv_sec = time(NULL) + 1;
   wait.tv_nsec = 0;
  
   signaled = 0;
   while (!signaled) {
   status = pthread_cond_timedwait(cond, mutex, wait);
  if (status) {
if (status == ETIMEDOUT)
 break;
fatal(status);
 }
   }
 
  ah! It passes in a low-res time source into a high-res time interface
  (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to
  time(NULL) + 2, or change it to:
 
  gettimeofday(wait, NULL);
  wait.tv_sec++;
 
  does this solve the spinning?

Yes, adding in the offset within the current second appears to resolve
the issue. Thanks Ingo.

 
  i'm wondering how widespread this is. If automount is the only app doing
  this then _maybe_ we could get away with it by changing automount?

I don't think the change is unreasonable since I wasn't using an
accurate time in the condition wait, so that's a coding mistake on my
part which I will fix.

Ian


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

On Tue, 2007-07-17 at 21:24 -0400, Bill Davidsen wrote:
 Ingo Molnar wrote:
  * Ian Kent [EMAIL PROTECTED] wrote:
 

  ah! It passes in a low-res time source into a high-res time interface 
  (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
  time(NULL) + 2, or change it to:
 
gettimeofday(wait, NULL);
wait.tv_sec++;

  OK, I'm with you, hi-res timer.
  But even so, how is the time in the past after adding a second.
 
  Is it because I'm not setting tv_nsec when it's close to a second 
  boundary, and hence your recommendation above?
  
 
  yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
  1 second resolution timesource. I dont yet understand the failure mode 
  though that results in that looping and in the 30% CPU time use - do you 
  understand it perhaps? (and automount is still functional while this is 
  happening, correct?)

 
 Can't say, I have automount running because I get it by default, but I 
 have nothing using at on my test machine. Why is it looping so fast when 
 there are no mount points defined? If the config changes there's no 
 requirement to notice right away, is there?

There are two threads where this mistake is made.

One is used to trigger expire events for all automounted filesystems
which happen all the time since I need to run the expire to check if
anything is mounted and whether it needs to be umounted. The alarm
handler sleeps on a condition until the alarm list in not empty and then
sleeps on a condition until the next alarm in the list expires or an
alarm is added to the list, in which case it then checks the list again.
Since the autofs timeout granularity is one second this is a problem and
will be fixed. This isn't the source of the problem that's been
reported.

The second is the state queue handler which runs tasks such as expires,
map re-reads, shutdowns etc. for all automounted filesystems. While the
check interval could be longer it causes autofs to be slugish in
situations such as shutdowns where there are a largish number of mounts
present and I need to cancel such things as expires and the like. It's
possible I could improve this but, in fact, once the timespec is set
correctly as Ingo suggests it works fine and uses very little resource.

Ian

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-18 Thread Ingo Molnar


* Ian Kent [EMAIL PROTECTED] wrote:

   ah! It passes in a low-res time source into a high-res time 
   interface (pthread_cond_timedwait()). Could you change the 
   time(NULL) + 1 to time(NULL) + 2, or change it to:
  
 gettimeofday(wait, NULL);
 wait.tv_sec++;
  
   does this solve the spinning?
 
 Yes, adding in the offset within the current second appears to resolve 
 the issue. Thanks Ingo.
 
   i'm wondering how widespread this is. If automount is the only app 
   doing this then _maybe_ we could get away with it by changing 
   automount?
 
 I don't think the change is unreasonable since I wasn't using an 
 accurate time in the condition wait, so that's a coding mistake on my 
 part which I will fix.

thanks Ian for taking care of this and for fixing it!

Linus, Thomas, what do you think, should we keep the time.c change? 
Automount is one app affected so far, and it's a borderline case: the 
increased (30%) CPU usage is annoying, but it does not prevent the 
system from working per se, and an upgrade to a fixed/enhanced automount 
version resolves it.

The temptation of using a really (and trivially) scalable low-resolution 
time-source (which is _easily_ vsyscall-able, on any platform) for DBMS 
use is really large, to me at least. Should i perhaps add a boot/config 
option that enables/disables this optimization, to allow distros finer
grained control about this? And we've also got to wait whether there's
any other app affected.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Ingo Molnar wrote:

* Ian Kent [EMAIL PROTECTED] wrote:

  
ah! It passes in a low-res time source into a high-res time 
interface (pthread_cond_timedwait()). Could you change the 
time(NULL) + 1 to time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;

does this solve the spinning?

Yes, adding in the offset within the current second appears to resolve 
the issue. Thanks Ingo.



i'm wondering how widespread this is. If automount is the only app 
doing this then _maybe_ we could get away with it by changing 
automount?

I don't think the change is unreasonable since I wasn't using an 
accurate time in the condition wait, so that's a coding mistake on my 
part which I will fix.



thanks Ian for taking care of this and for fixing it!

Linus, Thomas, what do you think, should we keep the time.c change? 
Automount is one app affected so far, and it's a borderline case: the 
increased (30%) CPU usage is annoying, but it does not prevent the 
system from working per se, and an upgrade to a fixed/enhanced automount 
version resolves it.


The temptation of using a really (and trivially) scalable low-resolution 
time-source (which is _easily_ vsyscall-able, on any platform) for DBMS 
use is really large, to me at least. Should i perhaps add a boot/config 
option that enables/disables this optimization, to allow distros finer

grained control about this? And we've also got to wait whether there's
any other app affected.
  
Allow it to be selected by the features so that admins can evaluate 
the implications without a reboot?  That would be a convenient interface 
if you could provide it.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19



On Tue, 17 Jul 2007, Ingo Molnar wrote:
 
 * Ian Kent [EMAIL PROTECTED] wrote:
  
  In several places I have code similar to:
  
  wait.tv_sec = time(NULL) + 1;
  wait.tv_nsec = 0;

Ok, that definitely should work.

Does the patch below help?

 ah! It passes in a low-res time source into a high-res time interface 
 (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
 time(NULL) + 2, or change it to:
 
   gettimeofday(wait, NULL);
   wait.tv_sec++;

This is wrong. It's wrong for two reasons:

 - it really shouldn't be needed. I don't think time() has to be 
   *exactly* in sync, but I don't think it can be off by a third of a 
   second or whatever (as the 30% CPU load would seem to imply)

 - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
   timespec.

So if it actually makes a difference, it makes a difference for the 
*wrong* reason: the time is still totally nonsensical in the tv_nsec field 
(because it actually got filled in with msecs!), but now the tv_sec field 
is in sync, so it hides the bug.

Anyway, hopefully the patch below might help. But we probably should make 
this whole thing a much more generic routine (ie we have our internal 
getnstimeofday() that still is missing the second-overflow logic, and 
that is quite possibly the one that triggers the 30% off behaviour).

Ingo, I'd suggest:
 - ger rid of timespec_add_ns(), or at least make it return a return 
   value for when it overflows.
 - make all the people who overflow into tv_sec call a fix_up_seconds() 
   thing that does the xtime overflow handling.

Linus

---
Subject: time: make sure sys_gettimeofday() and sys_time() are in sync
From: Ingo Molnar [EMAIL PROTECTED]

make sure sys_gettimeofday() and sys_time() results are coherent.

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 kernel/time/timekeeping.c |   13 +
 1 file changed, 13 insertions(+)

Index: linux/kernel/time/timekeeping.c
===
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -92,6 +92,19 @@ static inline void __get_realtime_clock_
} while (read_seqretry(xtime_lock, seq));
 
timespec_add_ns(ts, nsecs);
+   /*
+* Make sure xtime.tv_sec [returned by sys_time()] always
+* follows the gettimeofday() result precisely. This
+* condition is extremely unlikely, it can hit at most
+* once per second:
+*/
+   if (unlikely(xtime.tv_sec != ts-tv_sec)) {
+   unsigned long flags;
+
+   write_seqlock_irqsave(xtime_lock, flags);
+   update_wall_time();
+   write_sequnlock_irqrestore(xtime_lock, flags);
+   }
 }
 
 /**

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19



On Wed, 18 Jul 2007, Ingo Molnar wrote:
 
 Linus, Thomas, what do you think, should we keep the time.c change? 

No, not if it's off by the second field. That 30% CPU usage indicates that 
there's some nasty bug there somewhere, and that's just not worth it.

If time() cannot get the second field right, it's bogus. I'm ok with us 
not *guaranteeing* monotonicity of the second field when you compare 
gettimeofday() with time(), but the 30% thing implies that it's much worse 
than that, and that time() will likely report the previous second (when 
compared to hrtimers) roughly a quarter of the time.

And that isn't acceptable. 

So either it should be fixed, or reverted.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

On Wed, 2007-07-18 at 09:03 -0700, Linus Torvalds wrote:
 
 On Tue, 17 Jul 2007, Ingo Molnar wrote:
  
  * Ian Kent [EMAIL PROTECTED] wrote:
   
   In several places I have code similar to:
   
   wait.tv_sec = time(NULL) + 1;
   wait.tv_nsec = 0;
 
 Ok, that definitely should work.
 
 Does the patch below help?
 
  ah! It passes in a low-res time source into a high-res time interface 
  (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
  time(NULL) + 2, or change it to:
  
  gettimeofday(wait, NULL);
  wait.tv_sec++;
 
 This is wrong. It's wrong for two reasons:
 
  - it really shouldn't be needed. I don't think time() has to be 
*exactly* in sync, but I don't think it can be off by a third of a 
second or whatever (as the 30% CPU load would seem to imply)
 
  - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
timespec.
 
 So if it actually makes a difference, it makes a difference for the 
 *wrong* reason: the time is still totally nonsensical in the tv_nsec field 
 (because it actually got filled in with msecs!), but now the tv_sec field 
 is in sync, so it hides the bug.

Oh ya .. I thought it wouldn't hurt to add the fraction of the current
second for correctness and actually put things like:

gettimeofday(now, NULL);
wait.tv_sec = now.tv_sec + 1;
wait.tv_nsec = now.tv_usec * 1000;

in autofs.

Ian


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


Linus Torvalds wrote:

On Tue, 17 Jul 2007, Ingo Molnar wrote:
  

* Ian Kent [EMAIL PROTECTED] wrote:


In several places I have code similar to:

wait.tv_sec = time(NULL) + 1;
wait.tv_nsec = 0;
  


Ok, that definitely should work.

Does the patch below help?

  
Spectacularly no! With this patch the glitch1 script with multiple 
scrolling windows has all xterms and glxgears stop totally dead for 
~200ms once per second. I didn't properly test anything else after that. 
Since the automount issue doesn't seem to start until something kicks it 
off, I didn't see it but that doesn't mean it's fixed.
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;



This is wrong. It's wrong for two reasons:

 - it really shouldn't be needed. I don't think time() has to be 
   *exactly* in sync, but I don't think it can be off by a third of a 
   second or whatever (as the 30% CPU load would seem to imply)


 - gettimeofday works on a timeval, pthread_cond_timedwait() works on a 
   timespec.


So if it actually makes a difference, it makes a difference for the 
*wrong* reason: the time is still totally nonsensical in the tv_nsec field 
(because it actually got filled in with msecs!), but now the tv_sec field 
is in sync, so it hides the bug.


Anyway, hopefully the patch below might help. But we probably should make 
this whole thing a much more generic routine (ie we have our internal 
getnstimeofday() that still is missing the second-overflow logic, and 
that is quite possibly the one that triggers the 30% off behaviour).


  

Hope that info helps.


Ingo, I'd suggest:
 - ger rid of timespec_add_ns(), or at least make it return a return 
   value for when it overflows.
 - make all the people who overflow into tv_sec call a fix_up_seconds() 
   thing that does the xtime overflow handling.


Linus
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Bill Davidsen


Ingo Molnar wrote:

* Ian Kent <[EMAIL PROTECTED]> wrote:

  
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(, NULL);
wait.tv_sec++;
  

OK, I'm with you, hi-res timer.
But even so, how is the time in the past after adding a second.

Is it because I'm not setting tv_nsec when it's close to a second 
boundary, and hence your recommendation above?



yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
1 second resolution timesource. I dont yet understand the failure mode 
though that results in that looping and in the 30% CPU time use - do you 
understand it perhaps? (and automount is still functional while this is 
happening, correct?)
  


Can't say, I have automount running because I get it by default, but I 
have nothing using at on my test machine. Why is it looping so fast when 
there are no mount points defined? If the config changes there's no 
requirement to notice right away, is there?


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

> hm, Markus indicated that he tried the v2.6.21.6-cfsv19 patch, and 
that 
> does not include the time.c change. Markus - does your kernel include 
> the code below? (if yes, please revert it via patch -p1 -R )
Well, the 2.6.22.1-cfs-v19 does include it, but the 2.6.21.6-cfs-v19 
does not have that patch applied.
But both show this problem.

   Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Willy Tarreau

Hi Ingo,

sorry for the long delay, I've spent a week doing non-kernel work.

On Tue, Jul 10, 2007 at 12:39:50AM +0200, Ingo Molnar wrote:
> 
> * Willy Tarreau <[EMAIL PROTECTED]> wrote:
> 
> > > The biggest user-visible change in -v19 is reworked sleeper 
> > > fairness: it's similar in behavior to -v18 but works more 
> > > consistently across nice levels. Fork-happy workloads (like kernel 
> > > builds) should behave better as well. There are also a handful of 
> > > speedups: unsigned math, 32-bit speedups, O(1) task pickup, 
> > > debloating and other micro-optimizations.
> > 
> > Interestingly, I also noticed the possibility of O(1) task pickup when 
> > playing with v18, but did not detect any noticeable improvement with 
> > it. Of course, it depends on the workload and I probably didn't 
> > perform the most relevant tests.
> 
> yeah - it's a small tweak. CFS is O(31) in sleep/wakeup so it's now all 
> a big O(1) family again :)

Yes, that's what I tried to explain to a guy once : what I like with log(N)
algos is that even with N very large, log(N) is always small, and it's
sometimes faster to perform log(N) fast operations than 1 slow operation.
That's also why I don't care about balanced trees : my unbalanced trees may
hold 32 levels for 32 carefully chosen values, while balanced trees will
have 5 levels (worst difference between both). If I can insert and delete
a node 6 times faster, I always win. And quite frankly, I'm not interested
at the 32 entries case in a tree :-)

> > V19 works very well here on 2.6.20.14. I could start 32k busy loops at 
> > nice +19 (I exhausted the 32k pids limit), and could still perform 
> > normal operations. I noticed that 'vmstat' scans all the pid entries 
> > under /proc, which takes ages to collect data before displaying a 
> > line. Obviously, the system sometimes shows some short latencies, but 
> > not much more than what you get from and SSH through a remote DSL 
> > connection.
> 
> great! I did not try to push it this far, yet.

Well, I borrowed two 1GB sticks because I discovered that one of my 512MB
had one defect bit. It was finally an opportunity for me to push the test
this far.

> > Here's a vmstat 1 output :
> > 
> >  r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy 
> > id
> > 32437  0  0  0 809724488   619600 1 0  135 0 24 
> > 72 4
> > 32436  0  0  0 811336488   619600 0 0  717 0 78 
> > 22 0
> 
> crazy :-)

indeed :-)

> > Amusingly, I started mpg123 during this test and it skipped quite a 
> > bit. After setting all tasks to SCHED_IDLE, it did not skip anymore. 
> > All this seems to behave like one could expect.
> 
> yeah. It behaves better than i expected in fact - 32K tasks is pushing 
> things quite a bit. (we've got a 32K PID limit for example)

Yes, and in fact, I suspect that we still have an O(N) or O(N^2) pid
allocation algo somewhere (I did not look at the code), because forking
was very very slow when reaching those numbers. I'll possibly check this
when I have some spare time, because it reminds me a trivial source port
ring allocator I wrote a few years ago which was O(1). With 32k pids, it
will only require 64kB RAM for the whole system, and we may even optimize
it to spread CPUs entry points in order to nearly always avoid lock
contention.

> > I also started 30k processes distributed in 130 groups of 234 chained 
> > by pipes in which one byte is passed. I get an average of 8000 in the 
> > run queue. The context switch rate is very low and sometimes even null 
> > in this test, maybe some of them are starving, I really do not know :
> > 
> >  r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy 
> > id
> > 7752  0  1  0 656892244   419600 0 0  725 0 16 
> > 84  0
> 
> hm, could you profile this? We could have some bottleneck somewhere 
> (likely not in the scheduler) with that many tasks being runnable. [ 
> With CFS you can actually run a profiler under this workload ;-) ]

I may probably try some time later (not this week-end, I have some 2.4 to
work on).

> > In my tree, I have replaced the rbtree with the ebtree we talked 
> > about, but it did not bring any performance boost because, eventhough 
> > insert() and delete() are faster, the scheduler is already quite good 
> > at avoiding them as much as possible, mostly relying on rb_next() 
> > which has the same cost in both trees. All in all, the only variations 
> > I noticed were caused by cacheline alignment when I tried to reorder 
> > fields in the eb_node. So I will stop my experimentations here since I 
> > don't see any more room for improvement.
> 
> well, just a little bit of improvement would be nice to have too :)

Yes but I prefer to merge it where it really bring something (I'll have a
look at epoll, I noticed epollctl() was 30% slower under 2.6 with an rbtree
as it is under 2.4 with a hash). Then people will tell me "you're

RE: [patch] CFS scheduler, -v19

2007-07-17 Thread David Schwartz


> * Ian Kent <[EMAIL PROTECTED]> wrote:
>
> > Yes it does and I have two reported bugs so far.
> >
> > In several places I have code similar to:
> >
> > wait.tv_sec = time(NULL) + 1;
> > wait.tv_nsec = 0;
> >
> > signaled = 0;
> > while (!signaled) {
> > status = pthread_cond_timedwait(, , );
> >if (status) {
> >  if (status == ETIMEDOUT)
> >   break;
> >  fatal(status);
> >   }
> > }
>
> ah! It passes in a low-res time source into a high-res time interface
> (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to
> time(NULL) + 2, or change it to:
>
>   gettimeofday(, NULL);
>   wait.tv_sec++;
>
> does this solve the spinning?
>
> i'm wondering how widespread this is. If automount is the only app doing
> this then _maybe_ we could get away with it by changing automount?

This code is horribly broken. Don't change the kernel because this code is
broken.

First it adds a second, but then it subtracts up to a second. Just before
the second boundary, this code can burn CPU like crazy, with each wait being
just a few nanoseconds.

What is the intent of this code? Is it to wait "up to a second, possibly for
no time at all" or is to wait "for at least a second"? If so, why are you
zeroing the nanosecond count?

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> But why does that happen? And why would the scheduler have *anything* 
> to do with this? No idea. Maybe timing. Maybe the time.c changes. 
> Dunno.

hm, Markus indicated that he tried the v2.6.21.6-cfsv19 patch, and that 
does not include the time.c change. Markus - does your kernel include 
the code below? (if yes, please revert it via patch -p1 -R )

Ingo

Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -57,14 +57,17 @@ EXPORT_SYMBOL(sys_tz);
  */
 asmlinkage long sys_time(time_t __user * tloc)
 {
-   time_t i;
-   struct timeval tv;
+   /*
+* We read xtime.tv_sec atomically - it's updated
+* atomically by update_wall_time(), so no need to
+* even read-lock the xtime seqlock:
+*/
+   time_t i = xtime.tv_sec;
 
-   do_gettimeofday();
-   i = tv.tv_sec;
+   smp_rmb(); /* sys_time() results are coherent */
 
if (tloc) {
-   if (put_user(i,tloc))
+   if (put_user(i, tloc))
i = -EFAULT;
}
return i;
@@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv
 
tv->tv_sec = sec;
tv->tv_usec = usec;
+
+   /*
+* Make sure xtime.tv_sec [returned by sys_time()] always
+* follows the gettimeofday() result precisely. This
+* condition is extremely unlikely, it can hit at most
+* once per second:
+*/
+   if (unlikely(xtime.tv_sec != tv->tv_sec)) {
+   unsigned long flags;
+
+   write_seqlock_irqsave(_lock);
+   update_wall_time();
+   write_seqlock_irqrestore(_lock);
+   }
 }
 
 EXPORT_SYMBOL(do_gettimeofday);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Linus Torvalds

On Tue, 17 Jul 2007, Ingo Molnar wrote:
> 
> i think the problem starts here:
> 
>   11902 1184699865.141939 read(3, "", 32) = 0 <0.07>

Well, it's preceded by a poll() that says that it has a POLLHUP event, so 
that socket would seem to have simply been closed from the other end. 
There's also a huge amount of select() calls showing the same thing 
(except since it's just the input side, you cannot tell that it's 
POLLHUP).

Don't ask me *why*, though. It's preceded by

..
11902 1184699848.615201 read(3, 0x7fffb5b9c8b0, 32) = -1 EAGAIN 
(Resource temporarily unavailable) <0.09>
11902 1184699848.615252 poll([{fd=3, events=POLLIN, revents=POLLIN}], 
1, -1) = 1 <0.009307>
11902 1184699848.624614 read(3, "\1 
\303!\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0b\340T\0\0\0\0\0", 32) = 32 
<0.11>

.. got data ..

11902 1184699848.624710 ioctl(3, FIONREAD, [0]) = 0 <0.09>
11902 1184699848.624762 ioctl(3, FIONREAD, [0]) = 0 <0.48>

.. ok, nothing more..

11902 1184699848.624866 select(10, [3 4 5 7 9], [], [], NULL) = 1 (in 
[3]) <16.495008>
11902 1184699865.119950 ioctl(3, FIONREAD, [0]) = 0 <0.06>

16+ seconds pass, now it's marked as readable, but returns zero bytes of 
data: the other end closed it.

Tons of unnecessary and stupid sequences of:

11902 1184699865.119988 select(10, [3 4 5 7 9], [], [], NULL) = 1 (in 
[3]) <0.07>
11902 1184699865.120031 ioctl(3, FIONREAD, [0]) = 0 <0.05>
..

and then finally:

...
11902 1184699865.141809 poll([{fd=3, events=POLLIN, 
revents=POLLIN|POLLHUP}], 1, 0) = 1 <0.05>
11902 1184699865.141838 ioctl(3, FIONREAD, [0]) = 0 <0.05>
11902 1184699865.141939 read(3, "", 32) = 0 <0.07>

ie now konqueror noticed that it was *really* closed, and read the EOF.

But why does that happen? And why would the scheduler have *anything* to 
do with this? No idea. Maybe timing. Maybe the time.c changes. Dunno.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

> 9173  1184675906.194424 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 
0x7fff341af5c0)
> = -1 ENOTTY (Inappropriate ioctl for device) <0.06>
> 9173  1184675906.194463 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 
0x7fff341af5c0)
> = -1 ENOTTY (Inappropriate ioctl for device) <0.04>
> 
> ? Are those -ENOTTY results normal?
Yes, I see it on any kernel.
 
> 9173  1184675906.155015 write(2, "In file 
kernel/qpixmap_x11.cpp, "..., 56) = 56 <0.06>
> 9173  1184675906.155052 write(2, "QImage::convertDepth: Image is 
a"..., 44) = 44 <0.04>
> 9173  1184675906.155169 gettimeofday({1184675906, 155179}, NULL) = 0 
<0.06>
> 9173  1184675906.155249 write(11, "close(6f1c2f7):about:konqueror\n", 
31) = 31 <0.32>
> 
> i think konqueror tried to say something here about an image problem?
Well, yes:
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
konqueror: Fatal IO error: client killed

And no, my 2 GB of RAM are not full:
$ free -m
 total   used   free sharedbuffers 
cached
Mem:  2012   1077935  0 22
441
-/+ buffers/cache:612   1400
Swap: 2070  0   2070

> could you perhaps upload the strace to some webpage so that others can 
> take a look too?
hm, I dont have any webspace...

> it might also be good to add "-s 1000" to the strace command, so that 
we 
> can see the full messages that konqueror tried to log to some other 
> task, i.e.:
> 
>   strace -s 1000 -ttt -TTT -o trace.log -f 
> 
> and perhaps try to do a 'comparison' trace.normal.log as well, with 
> konqueror having no problems.
I now made some new strace logs:
- konq crash 251K
- Konq without crash on cfs 302K
- konq without crash on non-cfs 248K


   Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ian Kent <[EMAIL PROTECTED]> wrote:

> > ah! It passes in a low-res time source into a high-res time interface 
> > (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> > time(NULL) + 2, or change it to:
> > 
> > gettimeofday(, NULL);
> > wait.tv_sec++;
> 
> OK, I'm with you, hi-res timer.
> But even so, how is the time in the past after adding a second.
> 
> Is it because I'm not setting tv_nsec when it's close to a second 
> boundary, and hence your recommendation above?

yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
1 second resolution timesource. I dont yet understand the failure mode 
though that results in that looping and in the 30% CPU time use - do you 
understand it perhaps? (and automount is still functional while this is 
happening, correct?)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> i think it fails here due to some IO error:
> 
>  9173 1184675906.674610 write(2, "konqueror: Fatal IO error: 
>  clien"..., 41) = 41 <0.07>

oh, and i missed the obvious request: could you start konqueror from a 
terminal and see what it prints when it goes down?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Markus <[EMAIL PROTECTED]> wrote:

> > > Nothing is printed for a disapeared app for me.
> > > 
> > > Is there anything more I can try?
> > 
> > sure - could you start one of those apps via:
> > 
> > strace -ttt -TTT -o trace.log -f 
> > 
> > and wait for it to "disappear"? Then compress the trace.log via 
> > bzip2 -9 (it's probably going to be a really large file) and send me 
> > it?
> private mail, aswell (187K)

i think it fails here due to some IO error:

 9173  1184675906.674610 write(2, "konqueror: Fatal IO error: clien"..., 41) = 
41 <0.07>

could this be due to:

9173  1184675906.194424 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff341af5c0)
= -1 ENOTTY (Inappropriate ioctl for device) <0.06>
9173  1184675906.194463 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff341af5c0)
= -1 ENOTTY (Inappropriate ioctl for device) <0.04>

? Are those -ENOTTY results normal?

or perhaps the problem started alot earlier, at:

9173  1184675906.155015 write(2, "In file kernel/qpixmap_x11.cpp, "..., 56) = 
56 <0.06>
9173  1184675906.155052 write(2, "QImage::convertDepth: Image is a"..., 44) = 
44 <0.04>
9173  1184675906.155169 gettimeofday({1184675906, 155179}, NULL) = 0 <0.06>
9173  1184675906.155249 write(11, "close(6f1c2f7):about:konqueror\n", 31) = 31 
<0.32>

i think konqueror tried to say something here about an image problem?

could you perhaps upload the strace to some webpage so that others can 
take a look too?

it might also be good to add "-s 1000" to the strace command, so that we 
can see the full messages that konqueror tried to log to some other 
task, i.e.:

  strace -s 1000 -ttt -TTT -o trace.log -f 

and perhaps try to do a 'comparison' trace.normal.log as well, with 
konqueror having no problems. Also a KDE expert's advice would be useful 
here too i guess ...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Chuck Ebbert

On 07/17/2007 03:45 AM, Ingo Molnar wrote:
> * Ian Kent <[EMAIL PROTECTED]> wrote:
> 
>> Yes it does and I have two reported bugs so far.
>>
>> In several places I have code similar to:
>>
>> wait.tv_sec = time(NULL) + 1;
>> wait.tv_nsec = 0;
>>
>> signaled = 0;
>> while (!signaled) {
>> status = pthread_cond_timedwait(, , );
>>if (status) {
>>  if (status == ETIMEDOUT)
>>   break;
>>  fatal(status);
>>   }
>> }
> 
> ah! It passes in a low-res time source into a high-res time interface 
> (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> time(NULL) + 2, or change it to:
> 
>   gettimeofday(, NULL);
>   wait.tv_sec++;
> 
> does this solve the spinning?
> 
> i'm wondering how widespread this is. If automount is the only app doing 
> this then _maybe_ we could get away with it by changing automount?

Odds are there's at least one other app doing that somewhere.

Would reverting the CFS changes to time.c fix this problem?
That optimization just got merged in 2.6.22 mainline...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

> could you please send me the cfs-debug-info output nevertheless?
private mail (4,9K)

> > Nothing is printed for a disapeared app for me.
> > 
> > Is there anything more I can try?
> 
> sure - could you start one of those apps via:
> 
>   strace -ttt -TTT -o trace.log -f 
> 
> and wait for it to "disappear"? Then compress the trace.log via 
bzip2 -9 
> (it's probably going to be a really large file) and send me it?
private mail, aswell (187K)

When attachments are allowed, I can resend them on the list as well (or 
just ask me...)


To answer a private mail: I do not use any kernel-module thats not part 
of the official kernel!
And of course nothing proprietary
# cat /proc/sys/kernel/tainted
0

I used gcc-4.1.2 (glibc-2.5-r4) to build the kernels. (Its a amd64 
system, quite stable so far.)

Programs that "disappeared" are most graphical, because others I have 
not noticed so far... also [1] might be caused by this...
amarok, kdesktop, whole X, konqueror, konsole but also gtk-apps


   Markus


[1] http://lkml.org/lkml/2007/07/14/64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Ian Kent

On Tue, 2007-07-17 at 09:45 +0200, Ingo Molnar wrote:
> * Ian Kent <[EMAIL PROTECTED]> wrote:
> 
> > Yes it does and I have two reported bugs so far.
> > 
> > In several places I have code similar to:
> > 
> > wait.tv_sec = time(NULL) + 1;
> > wait.tv_nsec = 0;
> > 
> > signaled = 0;
> > while (!signaled) {
> > status = pthread_cond_timedwait(, , );
> >if (status) {
> >  if (status == ETIMEDOUT)
> >   break;
> >  fatal(status);
> >   }
> > }
> 
> ah! It passes in a low-res time source into a high-res time interface 
> (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
> time(NULL) + 2, or change it to:
> 
>   gettimeofday(, NULL);
>   wait.tv_sec++;

OK, I'm with you, hi-res timer.
But even so, how is the time in the past after adding a second.

Is it because I'm not setting tv_nsec when it's close to a second
boundary, and hence your recommendation above?

> 
> does this solve the spinning?

I don't have a system to test this on so I'll try to get one of the
people that logged the problem to test a patch.

> 
> i'm wondering how widespread this is. If automount is the only app doing 
> this then _maybe_ we could get away with it by changing automount?

I'm happy to change automount but that could cause odd version specific
problems for people updating their kernel on an older installed base.

Aaah .. and they'll all blame me!! ;)

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

* Ian Kent <[EMAIL PROTECTED]> wrote:

> Yes it does and I have two reported bugs so far.
> 
> In several places I have code similar to:
> 
> wait.tv_sec = time(NULL) + 1;
> wait.tv_nsec = 0;
> 
> signaled = 0;
> while (!signaled) {
> status = pthread_cond_timedwait(, , );
>if (status) {
>  if (status == ETIMEDOUT)
>   break;
>  fatal(status);
>   }
> }

ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:

gettimeofday(, NULL);
wait.tv_sec++;

does this solve the spinning?

i'm wondering how widespread this is. If automount is the only app doing 
this then _maybe_ we could get away with it by changing automount?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Markus <[EMAIL PROTECTED]> wrote:

> The dmesg output is not differing in any interesting point (just some 
> numbers, like raid-benchmark, some irqs or usb-numbers...)

could you please send me the cfs-debug-info output nevertheless?

> Nothing is printed for a disapeared app for me.
> 
> Is there anything more I can try?

sure - could you start one of those apps via:

strace -ttt -TTT -o trace.log -f 

and wait for it to "disappear"? Then compress the trace.log via bzip2 -9 
(it's probably going to be a really large file) and send me it?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Markus [EMAIL PROTECTED] wrote:

 The dmesg output is not differing in any interesting point (just some 
 numbers, like raid-benchmark, some irqs or usb-numbers...)

could you please send me the cfs-debug-info output nevertheless?

 Nothing is printed for a disapeared app for me.
 
 Is there anything more I can try?

sure - could you start one of those apps via:

strace -ttt -TTT -o trace.log -f app

and wait for it to disappear? Then compress the trace.log via bzip2 -9 
(it's probably going to be a really large file) and send me it?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ian Kent [EMAIL PROTECTED] wrote:

 Yes it does and I have two reported bugs so far.
 
 In several places I have code similar to:
 
 wait.tv_sec = time(NULL) + 1;
 wait.tv_nsec = 0;
 
 signaled = 0;
 while (!signaled) {
 status = pthread_cond_timedwait(cond, mutex, wait);
if (status) {
  if (status == ETIMEDOUT)
   break;
  fatal(status);
   }
 }

ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:

gettimeofday(wait, NULL);
wait.tv_sec++;

does this solve the spinning?

i'm wondering how widespread this is. If automount is the only app doing 
this then _maybe_ we could get away with it by changing automount?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Ian Kent

On Tue, 2007-07-17 at 09:45 +0200, Ingo Molnar wrote:
 * Ian Kent [EMAIL PROTECTED] wrote:
 
  Yes it does and I have two reported bugs so far.
  
  In several places I have code similar to:
  
  wait.tv_sec = time(NULL) + 1;
  wait.tv_nsec = 0;
  
  signaled = 0;
  while (!signaled) {
  status = pthread_cond_timedwait(cond, mutex, wait);
 if (status) {
   if (status == ETIMEDOUT)
break;
   fatal(status);
}
  }
 
 ah! It passes in a low-res time source into a high-res time interface 
 (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
 time(NULL) + 2, or change it to:
 
   gettimeofday(wait, NULL);
   wait.tv_sec++;

OK, I'm with you, hi-res timer.
But even so, how is the time in the past after adding a second.

Is it because I'm not setting tv_nsec when it's close to a second
boundary, and hence your recommendation above?

 
 does this solve the spinning?

I don't have a system to test this on so I'll try to get one of the
people that logged the problem to test a patch.

 
 i'm wondering how widespread this is. If automount is the only app doing 
 this then _maybe_ we could get away with it by changing automount?

I'm happy to change automount but that could cause odd version specific
problems for people updating their kernel on an older installed base.

Aaah .. and they'll all blame me!! ;)

Ian


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

 could you please send me the cfs-debug-info output nevertheless?
private mail (4,9K)

  Nothing is printed for a disapeared app for me.
  
  Is there anything more I can try?
 
 sure - could you start one of those apps via:
 
   strace -ttt -TTT -o trace.log -f app
 
 and wait for it to disappear? Then compress the trace.log via 
bzip2 -9 
 (it's probably going to be a really large file) and send me it?
private mail, aswell (187K)

When attachments are allowed, I can resend them on the list as well (or 
just ask me...)


To answer a private mail: I do not use any kernel-module thats not part 
of the official kernel!
And of course nothing proprietary
# cat /proc/sys/kernel/tainted
0

I used gcc-4.1.2 (glibc-2.5-r4) to build the kernels. (Its a amd64 
system, quite stable so far.)

Programs that disappeared are most graphical, because others I have 
not noticed so far... also [1] might be caused by this...
amarok, kdesktop, whole X, konqueror, konsole but also gtk-apps


   Markus


[1] http://lkml.org/lkml/2007/07/14/64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Chuck Ebbert

On 07/17/2007 03:45 AM, Ingo Molnar wrote:
 * Ian Kent [EMAIL PROTECTED] wrote:
 
 Yes it does and I have two reported bugs so far.

 In several places I have code similar to:

 wait.tv_sec = time(NULL) + 1;
 wait.tv_nsec = 0;

 signaled = 0;
 while (!signaled) {
 status = pthread_cond_timedwait(cond, mutex, wait);
if (status) {
  if (status == ETIMEDOUT)
   break;
  fatal(status);
   }
 }
 
 ah! It passes in a low-res time source into a high-res time interface 
 (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
 time(NULL) + 2, or change it to:
 
   gettimeofday(wait, NULL);
   wait.tv_sec++;
 
 does this solve the spinning?
 
 i'm wondering how widespread this is. If automount is the only app doing 
 this then _maybe_ we could get away with it by changing automount?

Odds are there's at least one other app doing that somewhere.

Would reverting the CFS changes to time.c fix this problem?
That optimization just got merged in 2.6.22 mainline...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Markus [EMAIL PROTECTED] wrote:

   Nothing is printed for a disapeared app for me.
   
   Is there anything more I can try?
  
  sure - could you start one of those apps via:
  
  strace -ttt -TTT -o trace.log -f app
  
  and wait for it to disappear? Then compress the trace.log via 
  bzip2 -9 (it's probably going to be a really large file) and send me 
  it?
 private mail, aswell (187K)

i think it fails here due to some IO error:

 9173  1184675906.674610 write(2, konqueror: Fatal IO error: clien..., 41) = 
41 0.07

could this be due to:

9173  1184675906.194424 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff341af5c0)
= -1 ENOTTY (Inappropriate ioctl for device) 0.06
9173  1184675906.194463 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff341af5c0)
= -1 ENOTTY (Inappropriate ioctl for device) 0.04

? Are those -ENOTTY results normal?

or perhaps the problem started alot earlier, at:

9173  1184675906.155015 write(2, In file kernel/qpixmap_x11.cpp, ..., 56) = 
56 0.06
9173  1184675906.155052 write(2, QImage::convertDepth: Image is a..., 44) = 
44 0.04
9173  1184675906.155169 gettimeofday({1184675906, 155179}, NULL) = 0 0.06
9173  1184675906.155249 write(11, close(6f1c2f7):about:konqueror\n, 31) = 31 
0.32

i think konqueror tried to say something here about an image problem?

could you perhaps upload the strace to some webpage so that others can 
take a look too?

it might also be good to add -s 1000 to the strace command, so that we 
can see the full messages that konqueror tried to log to some other 
task, i.e.:

  strace -s 1000 -ttt -TTT -o trace.log -f app

and perhaps try to do a 'comparison' trace.normal.log as well, with 
konqueror having no problems. Also a KDE expert's advice would be useful 
here too i guess ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ingo Molnar [EMAIL PROTECTED] wrote:

 i think it fails here due to some IO error:
 
  9173 1184675906.674610 write(2, konqueror: Fatal IO error: 
  clien..., 41) = 41 0.07

oh, and i missed the obvious request: could you start konqueror from a 
terminal and see what it prints when it goes down?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ian Kent [EMAIL PROTECTED] wrote:

  ah! It passes in a low-res time source into a high-res time interface 
  (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
  time(NULL) + 2, or change it to:
  
  gettimeofday(wait, NULL);
  wait.tv_sec++;
 
 OK, I'm with you, hi-res timer.
 But even so, how is the time in the past after adding a second.
 
 Is it because I'm not setting tv_nsec when it's close to a second 
 boundary, and hence your recommendation above?

yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
1 second resolution timesource. I dont yet understand the failure mode 
though that results in that looping and in the 30% CPU time use - do you 
understand it perhaps? (and automount is still functional while this is 
happening, correct?)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

 9173  1184675906.194424 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 
0x7fff341af5c0)
 = -1 ENOTTY (Inappropriate ioctl for device) 0.06
 9173  1184675906.194463 ioctl(2, SNDCTL_TMR_TIMEBASE or TCGETS, 
0x7fff341af5c0)
 = -1 ENOTTY (Inappropriate ioctl for device) 0.04
 
 ? Are those -ENOTTY results normal?
Yes, I see it on any kernel.
 
 9173  1184675906.155015 write(2, In file 
kernel/qpixmap_x11.cpp, ..., 56) = 56 0.06
 9173  1184675906.155052 write(2, QImage::convertDepth: Image is 
a..., 44) = 44 0.04
 9173  1184675906.155169 gettimeofday({1184675906, 155179}, NULL) = 0 
0.06
 9173  1184675906.155249 write(11, close(6f1c2f7):about:konqueror\n, 
31) = 31 0.32
 
 i think konqueror tried to say something here about an image problem?
Well, yes:
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
In file kernel/qpixmap_x11.cpp, line 633: Out of memory
QImage::convertDepth: Image is a null image
konqueror: Fatal IO error: client killed

And no, my 2 GB of RAM are not full:
$ free -m
 total   used   free sharedbuffers 
cached
Mem:  2012   1077935  0 22
441
-/+ buffers/cache:612   1400
Swap: 2070  0   2070

 could you perhaps upload the strace to some webpage so that others can 
 take a look too?
hm, I dont have any webspace...

 it might also be good to add -s 1000 to the strace command, so that 
we 
 can see the full messages that konqueror tried to log to some other 
 task, i.e.:
 
   strace -s 1000 -ttt -TTT -o trace.log -f app
 
 and perhaps try to do a 'comparison' trace.normal.log as well, with 
 konqueror having no problems.
I now made some new strace logs:
- konq crash 251K
- Konq without crash on cfs 302K
- konq without crash on non-cfs 248K


   Markus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Linus Torvalds [EMAIL PROTECTED] wrote:

 But why does that happen? And why would the scheduler have *anything* 
 to do with this? No idea. Maybe timing. Maybe the time.c changes. 
 Dunno.

hm, Markus indicated that he tried the v2.6.21.6-cfsv19 patch, and that 
does not include the time.c change. Markus - does your kernel include 
the code below? (if yes, please revert it via patch -p1 -R )

Ingo

Index: linux/kernel/time.c
===
--- linux.orig/kernel/time.c
+++ linux/kernel/time.c
@@ -57,14 +57,17 @@ EXPORT_SYMBOL(sys_tz);
  */
 asmlinkage long sys_time(time_t __user * tloc)
 {
-   time_t i;
-   struct timeval tv;
+   /*
+* We read xtime.tv_sec atomically - it's updated
+* atomically by update_wall_time(), so no need to
+* even read-lock the xtime seqlock:
+*/
+   time_t i = xtime.tv_sec;
 
-   do_gettimeofday(tv);
-   i = tv.tv_sec;
+   smp_rmb(); /* sys_time() results are coherent */
 
if (tloc) {
-   if (put_user(i,tloc))
+   if (put_user(i, tloc))
i = -EFAULT;
}
return i;
@@ -373,6 +376,20 @@ void do_gettimeofday (struct timeval *tv
 
tv-tv_sec = sec;
tv-tv_usec = usec;
+
+   /*
+* Make sure xtime.tv_sec [returned by sys_time()] always
+* follows the gettimeofday() result precisely. This
+* condition is extremely unlikely, it can hit at most
+* once per second:
+*/
+   if (unlikely(xtime.tv_sec != tv-tv_sec)) {
+   unsigned long flags;
+
+   write_seqlock_irqsave(xtime_lock);
+   update_wall_time();
+   write_seqlock_irqrestore(xtime_lock);
+   }
 }
 
 EXPORT_SYMBOL(do_gettimeofday);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [patch] CFS scheduler, -v19

2007-07-17 Thread David Schwartz


 * Ian Kent [EMAIL PROTECTED] wrote:

  Yes it does and I have two reported bugs so far.
 
  In several places I have code similar to:
 
  wait.tv_sec = time(NULL) + 1;
  wait.tv_nsec = 0;
 
  signaled = 0;
  while (!signaled) {
  status = pthread_cond_timedwait(cond, mutex, wait);
 if (status) {
   if (status == ETIMEDOUT)
break;
   fatal(status);
}
  }

 ah! It passes in a low-res time source into a high-res time interface
 (pthread_cond_timedwait()). Could you change the time(NULL) + 1 to
 time(NULL) + 2, or change it to:

   gettimeofday(wait, NULL);
   wait.tv_sec++;

 does this solve the spinning?

 i'm wondering how widespread this is. If automount is the only app doing
 this then _maybe_ we could get away with it by changing automount?

This code is horribly broken. Don't change the kernel because this code is
broken.

First it adds a second, but then it subtracts up to a second. Just before
the second boundary, this code can burn CPU like crazy, with each wait being
just a few nanoseconds.

What is the intent of this code? Is it to wait up to a second, possibly for
no time at all or is to wait for at least a second? If so, why are you
zeroing the nanosecond count?

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Willy Tarreau

Hi Ingo,

sorry for the long delay, I've spent a week doing non-kernel work.

On Tue, Jul 10, 2007 at 12:39:50AM +0200, Ingo Molnar wrote:
 
 * Willy Tarreau [EMAIL PROTECTED] wrote:
 
   The biggest user-visible change in -v19 is reworked sleeper 
   fairness: it's similar in behavior to -v18 but works more 
   consistently across nice levels. Fork-happy workloads (like kernel 
   builds) should behave better as well. There are also a handful of 
   speedups: unsigned math, 32-bit speedups, O(1) task pickup, 
   debloating and other micro-optimizations.
  
  Interestingly, I also noticed the possibility of O(1) task pickup when 
  playing with v18, but did not detect any noticeable improvement with 
  it. Of course, it depends on the workload and I probably didn't 
  perform the most relevant tests.
 
 yeah - it's a small tweak. CFS is O(31) in sleep/wakeup so it's now all 
 a big O(1) family again :)

Yes, that's what I tried to explain to a guy once : what I like with log(N)
algos is that even with N very large, log(N) is always small, and it's
sometimes faster to perform log(N) fast operations than 1 slow operation.
That's also why I don't care about balanced trees : my unbalanced trees may
hold 32 levels for 32 carefully chosen values, while balanced trees will
have 5 levels (worst difference between both). If I can insert and delete
a node 6 times faster, I always win. And quite frankly, I'm not interested
at the 32 entries case in a tree :-)

  V19 works very well here on 2.6.20.14. I could start 32k busy loops at 
  nice +19 (I exhausted the 32k pids limit), and could still perform 
  normal operations. I noticed that 'vmstat' scans all the pid entries 
  under /proc, which takes ages to collect data before displaying a 
  line. Obviously, the system sometimes shows some short latencies, but 
  not much more than what you get from and SSH through a remote DSL 
  connection.
 
 great! I did not try to push it this far, yet.

Well, I borrowed two 1GB sticks because I discovered that one of my 512MB
had one defect bit. It was finally an opportunity for me to push the test
this far.

  Here's a vmstat 1 output :
  
   r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy 
  id
  32437  0  0  0 809724488   619600 1 0  135 0 24 
  72 4
  32436  0  0  0 811336488   619600 0 0  717 0 78 
  22 0
 
 crazy :-)

indeed :-)

  Amusingly, I started mpg123 during this test and it skipped quite a 
  bit. After setting all tasks to SCHED_IDLE, it did not skip anymore. 
  All this seems to behave like one could expect.
 
 yeah. It behaves better than i expected in fact - 32K tasks is pushing 
 things quite a bit. (we've got a 32K PID limit for example)

Yes, and in fact, I suspect that we still have an O(N) or O(N^2) pid
allocation algo somewhere (I did not look at the code), because forking
was very very slow when reaching those numbers. I'll possibly check this
when I have some spare time, because it reminds me a trivial source port
ring allocator I wrote a few years ago which was O(1). With 32k pids, it
will only require 64kB RAM for the whole system, and we may even optimize
it to spread CPUs entry points in order to nearly always avoid lock
contention.

  I also started 30k processes distributed in 130 groups of 234 chained 
  by pipes in which one byte is passed. I get an average of 8000 in the 
  run queue. The context switch rate is very low and sometimes even null 
  in this test, maybe some of them are starving, I really do not know :
  
   r  b  w   swpd   free   buff  cache   si   sobibo   incs us sy 
  id
  7752  0  1  0 656892244   419600 0 0  725 0 16 
  84  0
 
 hm, could you profile this? We could have some bottleneck somewhere 
 (likely not in the scheduler) with that many tasks being runnable. [ 
 With CFS you can actually run a profiler under this workload ;-) ]

I may probably try some time later (not this week-end, I have some 2.4 to
work on).

  In my tree, I have replaced the rbtree with the ebtree we talked 
  about, but it did not bring any performance boost because, eventhough 
  insert() and delete() are faster, the scheduler is already quite good 
  at avoiding them as much as possible, mostly relying on rb_next() 
  which has the same cost in both trees. All in all, the only variations 
  I noticed were caused by cacheline alignment when I tried to reorder 
  fields in the eb_node. So I will stop my experimentations here since I 
  don't see any more room for improvement.
 
 well, just a little bit of improvement would be nice to have too :)

Yes but I prefer to merge it where it really bring something (I'll have a
look at epoll, I noticed epollctl() was 30% slower under 2.6 with an rbtree
as it is under 2.4 with a hash). Then people will tell me you're completely
dumb, you could have improved it that way! and then, once it's optimized to
be always faster than the

Re: [patch] CFS scheduler, -v19

 hm, Markus indicated that he tried the v2.6.21.6-cfsv19 patch, and 
that 
 does not include the time.c change. Markus - does your kernel include 
 the code below? (if yes, please revert it via patch -p1 -R )
Well, the 2.6.22.1-cfs-v19 does include it, but the 2.6.21.6-cfs-v19 
does not have that patch applied.
But both show this problem.

   Markus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-17 Thread Bill Davidsen


Ingo Molnar wrote:

* Ian Kent [EMAIL PROTECTED] wrote:

  
ah! It passes in a low-res time source into a high-res time interface 
(pthread_cond_timedwait()). Could you change the time(NULL) + 1 to 
time(NULL) + 2, or change it to:


gettimeofday(wait, NULL);
wait.tv_sec++;
  

OK, I'm with you, hi-res timer.
But even so, how is the time in the past after adding a second.

Is it because I'm not setting tv_nsec when it's close to a second 
boundary, and hence your recommendation above?



yeah, it looks a bit suspicious: you create a +1 second timeout out of a 
1 second resolution timesource. I dont yet understand the failure mode 
though that results in that looping and in the 30% CPU time use - do you 
understand it perhaps? (and automount is still functional while this is 
happening, correct?)
  


Can't say, I have automount running because I get it by default, but I 
have nothing using at on my test machine. Why is it looping so fast when 
there are no mount points defined? If the config changes there's no 
requirement to notice right away, is there?


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Ian Kent

On Mon, 2007-07-16 at 23:55 +0200, Ingo Molnar wrote:
> * Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> 
> > On 07/13/2007 05:19 PM, Bill Davidsen wrote:
> > > 
> > > I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
> > > with FC6. Automount starts taking 30% of CPU (unused at the moment)
> > 
> > Can you confirm whether CFS is involved, i.e. does it spin like that 
> > even without the CFS patch applied?
> 
> hmmm  could you take out the kernel/time.c (sys_time()) changes from 
> the CFS patch, does that solve the automount issue? If yes, could 
> someone take a look at automount and check whether it makes use of 
> time(2) and whether it combines it with finer grained time sources?

Yes it does and I have two reported bugs so far.

In several places I have code similar to:

wait.tv_sec = time(NULL) + 1;
wait.tv_nsec = 0;

signaled = 0;
while (!signaled) {
status = pthread_cond_timedwait(, , );
   if (status) {
 if (status == ETIMEDOUT)
  break;
 fatal(status);
  }
}

lead to automount spinning with strace output a bit like:

futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 130925919}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131160876}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131377080}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131593297}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131871792}) = 0

There should be something like:

futex(0x557868c4, FUTEX_WAIT, 5321099, {0, 998091311}) = -1 ETIMEDOUT 
(Connection timed out)

in there I think.

Ian


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Bill Davidsen


Ingo Molnar wrote:

* Chuck Ebbert <[EMAIL PROTECTED]> wrote:

  

On 07/13/2007 05:19 PM, Bill Davidsen wrote:


I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
with FC6. Automount starts taking 30% of CPU (unused at the moment)
  
Can you confirm whether CFS is involved, i.e. does it spin like that 
even without the CFS patch applied?



  
I will try that, but not until Tuesday night. I've been here too long 
today and have an out-of-state meeting tomorrow. I'll take a look after 
dinner. Note that the latest 2.6.21 with cfs-v19 doesn't have any 
problems of any nature, other than suspend to RAM not working, and I may 
have the config wrong. Runs really well otherwise, but I'll test drive 
2.6.22 w/o the patch.


hmmm  could you take out the kernel/time.c (sys_time()) changes from 
the CFS patch, does that solve the automount issue? If yes, could 
someone take a look at automount and check whether it makes use of 
time(2) and whether it combines it with finer grained time sources?


  

Will do.

--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

* Chuck Ebbert <[EMAIL PROTECTED]> wrote:

> On 07/13/2007 05:19 PM, Bill Davidsen wrote:
> > 
> > I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
> > with FC6. Automount starts taking 30% of CPU (unused at the moment)
> 
> Can you confirm whether CFS is involved, i.e. does it spin like that 
> even without the CFS patch applied?

hmmm  could you take out the kernel/time.c (sys_time()) changes from 
the CFS patch, does that solve the automount issue? If yes, could 
someone take a look at automount and check whether it makes use of 
time(2) and whether it combines it with finer grained time sources?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Chuck Ebbert

On 07/13/2007 05:19 PM, Bill Davidsen wrote:
> 
> I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
> with FC6. Automount starts taking 30% of CPU (unused at the moment)

Can you confirm whether CFS is involved, i.e. does it spin like that
even without the CFS patch applied?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Markus

> > [...] The mouse is smooth, just when one app is being quit (dont 
> > know why...) the mouse will be jerking for a few seconds...
> is the mouse jerky on any app quitting?
No.

> Or is your observation the  following: _sometimes_ apps quit 
> unexpectedly (their window just vanishes?), and _at the same time_,
> the mouse becomes jerky as well, for a few seconds?
Exactly.

> the mouse typically only becomes jerky when there's some really high 
> load on the system - anything else would be a kernel bug. A jerky
> mouse on an unloaded system is definitely a sign of some sort of
> kernel bug  (in or outside of the scheduler). An app vanishing
> unexpectedly might mean an OOM-kill - but that would should up in the
> syslog as well.
> Pretty weird.
Well, the system uses about 30% of the cpu (cool'n'quite put it on the 
lowest frequency).

I made a plain 2.6.22.1 and could use it for about 2 hours without 
any problem. Then I applied the cfs-v19 for that kernel, rebuild from 
mrproper with the saved config and booted. After a few minutes the 
first app vanished... some more followed by time (I just surfed around 
a bit...)

The dmesg output is not differing in any interesting point (just some 
numbers, like raid-benchmark, some irqs or usb-numbers...)

So its obviously something within cfs... unfortunately...

> Can you make this regression trigger arbitrarily, so that we could
> debug it better? Apps exiting unexpectedly can be debugged via: 
> 
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/vdso-print-fatal-signals.patch
> 
> you can turn it on via the print-fatal-signals=1 boot option or via:
> 
>   echo 1 > /proc/sys/kernel/print-fatal-signals
> 
> this feature will produce a small dump to the syslog about every app 
> that exits unexpectedly. Note that this might not cover all types of 
> "window suddenly vanishes" regressions.
Nothing is printed for a disapeared app for me.


Is there anything more I can try?


   Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Ed Tomlinson

On Monday 16 July 2007 05:17, Ingo Molnar wrote:
> 
> * Ed Tomlinson <[EMAIL PROTECTED]> wrote:
> 
> > I run a java application at nice 15.  Its been a background 
> > application here for as long as SD and CFS have been around.  If I 
> > have a compile running at nice 0, with v19 java gets so little cpu 
> > that the the wrapper that runs to monitor it is timing out waiting for 
> > it to start.  This is new in v19 - something in v19 is not meshing 
> > well with my mix of applications...
> 
> how much longer did the startup of the java app get relative to say v18?
> 
> to debug this, could you check whether this problem goes away if you use 
> nice 10 (or nice 5) instead of nice 15?

Ingo,

It may take a day to two before I get to test this. I have had to revert to 
2.6.21 -
it seems that 22 triggers a stall here (21 also can trigger this but its 
harder)...

Thanks
Ed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

* Markus <[EMAIL PROTECTED]> wrote:

> [...] The mouse is smooth, just when one app is being quit (dont know
> why...) the mouse will be jerking for a few seconds...

is the mouse jerky on any app quitting? Or is your observation the
following: _sometimes_ apps quit unexpectedly (their window just
vanishes?), and _at the same time_, the mouse becomes jerky as well, for
a few seconds?

the mouse typically only becomes jerky when there's some really high
load on the system - anything else would be a kernel bug. A jerky mouse
on an unloaded system is definitely a sign of some sort of kernel bug
(in or outside of the scheduler). An app vanishing unexpectedly might
mean an OOM-kill - but that would should up in the syslog as well.
Pretty weird.

Can you make this regression trigger arbitrarily, so that we could debug
it better? Apps exiting unexpectedly can be debugged via:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/vdso-print-fatal-signals.patch

you can turn it on via the print-fatal-signals=1 boot option or via:

echo 1 > /proc/sys/kernel/print-fatal-signals

this feature will produce a small dump to the syslog about every app
that exits unexpectedly. Note that this might not cover all types of
"window suddenly vanishes" regressions.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ed Tomlinson <[EMAIL PROTECTED]> wrote:

> I run a java application at nice 15.  Its been a background 
> application here for as long as SD and CFS have been around.  If I 
> have a compile running at nice 0, with v19 java gets so little cpu 
> that the the wrapper that runs to monitor it is timing out waiting for 
> it to start.  This is new in v19 - something in v19 is not meshing 
> well with my mix of applications...

how much longer did the startup of the java app get relative to say v18?

to debug this, could you check whether this problem goes away if you use 
nice 10 (or nice 5) instead of nice 15?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> Sending a few seconds of logged /proc/sched_debug will also help get a 
> picture of what's happening, and lovely would be a method to reproduce 
> the problem locally.

also, by running this script:

   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and sending us the file it produces we'll have most of the environmental 
information as well.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Mike Galbraith

On Sun, 2007-07-15 at 23:11 +0200, Markus wrote:
> > > [1] http://lkml.org/lkml/2007/07/14/60
> > 
> > Hm.  Tasks disappearing isn't you're typical process scheduler problem
> > by any means, nor is an idle box exhibiting mouse "lurchiness".  Is
> > there anything unusual in your logs?
> 
> I know that its not typical, but when my current kernel is stable and 
> shows the same problem with the cfs-patch applied like the 
> git-snapshot, I would say its a cfs issue.

Yes, from your description, and with the now presented additional
information that the git-snapshot exhibits the same symptoms, it sounds
like cfs _may_ be implicated in some way.  I can't imagine how at the
moment.  In your original report, there are other patches involved,
which are an unknown variables.  The git-snapshot contains very many
changes other than cfs as well.  I'd eliminate absolutely all unknowns
as the first step.

> But I can build a plain 2.6.22 without cfs and one with it and compare 
> dmesgs output, if that helps.

Yes.  It would definitely be worth while to test a virgin stable kernel,
and then add only cfs with identical config.  Dmesg output may not turn
up anything, but eliminating all other variables should either pin the
tail on the donkey (cfs?) or vindicate it, and that's what needs to be
nailed down solidly first.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Mike Galbraith

On Sun, 2007-07-15 at 23:11 +0200, Markus wrote:
   [1] http://lkml.org/lkml/2007/07/14/60
  
  Hm.  Tasks disappearing isn't you're typical process scheduler problem
  by any means, nor is an idle box exhibiting mouse lurchiness.  Is
  there anything unusual in your logs?
 
 I know that its not typical, but when my current kernel is stable and 
 shows the same problem with the cfs-patch applied like the 
 git-snapshot, I would say its a cfs issue.

Yes, from your description, and with the now presented additional
information that the git-snapshot exhibits the same symptoms, it sounds
like cfs _may_ be implicated in some way.  I can't imagine how at the
moment.  In your original report, there are other patches involved,
which are an unknown variables.  The git-snapshot contains very many
changes other than cfs as well.  I'd eliminate absolutely all unknowns
as the first step.

 But I can build a plain 2.6.22 without cfs and one with it and compare 
 dmesgs output, if that helps.

Yes.  It would definitely be worth while to test a virgin stable kernel,
and then add only cfs with identical config.  Dmesg output may not turn
up anything, but eliminating all other variables should either pin the
tail on the donkey (cfs?) or vindicate it, and that's what needs to be
nailed down solidly first.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Mike Galbraith [EMAIL PROTECTED] wrote:

 Sending a few seconds of logged /proc/sched_debug will also help get a 
 picture of what's happening, and lovely would be a method to reproduce 
 the problem locally.

also, by running this script:

   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and sending us the file it produces we'll have most of the environmental 
information as well.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Ed Tomlinson [EMAIL PROTECTED] wrote:

 I run a java application at nice 15.  Its been a background 
 application here for as long as SD and CFS have been around.  If I 
 have a compile running at nice 0, with v19 java gets so little cpu 
 that the the wrapper that runs to monitor it is timing out waiting for 
 it to start.  This is new in v19 - something in v19 is not meshing 
 well with my mix of applications...

how much longer did the startup of the java app get relative to say v18?

to debug this, could you check whether this problem goes away if you use 
nice 10 (or nice 5) instead of nice 15?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

* Markus [EMAIL PROTECTED] wrote:

[...] The mouse is smooth, just when one app is being quit (dont know
why...) the mouse will be jerking for a few seconds...

Can you make this regression trigger arbitrarily, so that we could debug
it better? Apps exiting unexpectedly can be debugged via:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/vdso-print-fatal-signals.patch

you can turn it on via the print-fatal-signals=1 boot option or via:

echo 1 /proc/sys/kernel/print-fatal-signals

this feature will produce a small dump to the syslog about every app
that exits unexpectedly. Note that this might not cover all types of
window suddenly vanishes regressions.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Ed Tomlinson

On Monday 16 July 2007 05:17, Ingo Molnar wrote:
 
 * Ed Tomlinson [EMAIL PROTECTED] wrote:
 
  I run a java application at nice 15.  Its been a background 
  application here for as long as SD and CFS have been around.  If I 
  have a compile running at nice 0, with v19 java gets so little cpu 
  that the the wrapper that runs to monitor it is timing out waiting for 
  it to start.  This is new in v19 - something in v19 is not meshing 
  well with my mix of applications...
 
 how much longer did the startup of the java app get relative to say v18?
 
 to debug this, could you check whether this problem goes away if you use 
 nice 10 (or nice 5) instead of nice 15?

Ingo,

It may take a day to two before I get to test this. I have had to revert to 
2.6.21 -
it seems that 22 triggers a stall here (21 also can trigger this but its 
harder)...

Thanks
Ed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Markus

[...] The mouse is smooth, just when one app is being quit (dont
know why...) the mouse will be jerking for a few seconds...
is the mouse jerky on any app quitting?
No.

Or is your observation the following: _sometimes_ apps quit
unexpectedly (their window just vanishes?), and _at the same time_,
the mouse becomes jerky as well, for a few seconds?
Exactly.

the mouse typically only becomes jerky when there's some really high
load on the system - anything else would be a kernel bug. A jerky
mouse on an unloaded system is definitely a sign of some sort of
kernel bug (in or outside of the scheduler). An app vanishing
unexpectedly might mean an OOM-kill - but that would should up in the
syslog as well.
Pretty weird.
Well, the system uses about 30% of the cpu (cool'n'quite put it on the
lowest frequency).

I made a plain 2.6.22.1 and could use it for about 2 hours without
any problem. Then I applied the cfs-v19 for that kernel, rebuild from
mrproper with the saved config and booted. After a few minutes the
first app vanished... some more followed by time (I just surfed around
a bit...)

The dmesg output is not differing in any interesting point (just some
numbers, like raid-benchmark, some irqs or usb-numbers...)

So its obviously something within cfs... unfortunately...

Can you make this regression trigger arbitrarily, so that we could
debug it better? Apps exiting unexpectedly can be debugged via:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/vdso-print-fatal-signals.patch

you can turn it on via the print-fatal-signals=1 boot option or via:

echo 1 /proc/sys/kernel/print-fatal-signals

this feature will produce a small dump to the syslog about every app
that exits unexpectedly. Note that this might not cover all types of
window suddenly vanishes regressions.
Nothing is printed for a disapeared app for me.

Is there anything more I can try?

Markus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Chuck Ebbert

On 07/13/2007 05:19 PM, Bill Davidsen wrote:
 
 I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
 with FC6. Automount starts taking 30% of CPU (unused at the moment)

Can you confirm whether CFS is involved, i.e. does it spin like that
even without the CFS patch applied?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19


* Chuck Ebbert [EMAIL PROTECTED] wrote:

 On 07/13/2007 05:19 PM, Bill Davidsen wrote:
  
  I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
  with FC6. Automount starts taking 30% of CPU (unused at the moment)
 
 Can you confirm whether CFS is involved, i.e. does it spin like that 
 even without the CFS patch applied?

hmmm  could you take out the kernel/time.c (sys_time()) changes from 
the CFS patch, does that solve the automount issue? If yes, could 
someone take a look at automount and check whether it makes use of 
time(2) and whether it combines it with finer grained time sources?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Bill Davidsen


Ingo Molnar wrote:

* Chuck Ebbert [EMAIL PROTECTED] wrote:

  

On 07/13/2007 05:19 PM, Bill Davidsen wrote:


I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
with FC6. Automount starts taking 30% of CPU (unused at the moment)
  
Can you confirm whether CFS is involved, i.e. does it spin like that 
even without the CFS patch applied?



  
I will try that, but not until Tuesday night. I've been here too long 
today and have an out-of-state meeting tomorrow. I'll take a look after 
dinner. Note that the latest 2.6.21 with cfs-v19 doesn't have any 
problems of any nature, other than suspend to RAM not working, and I may 
have the config wrong. Runs really well otherwise, but I'll test drive 
2.6.22 w/o the patch.


hmmm  could you take out the kernel/time.c (sys_time()) changes from 
the CFS patch, does that solve the automount issue? If yes, could 
someone take a look at automount and check whether it makes use of 
time(2) and whether it combines it with finer grained time sources?


  

Will do.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-16 Thread Ian Kent

On Mon, 2007-07-16 at 23:55 +0200, Ingo Molnar wrote:
 * Chuck Ebbert [EMAIL PROTECTED] wrote:
 
  On 07/13/2007 05:19 PM, Bill Davidsen wrote:
   
   I should really go back to 2.6.21.6, 2.6.22 has many bizarre behaviors
   with FC6. Automount starts taking 30% of CPU (unused at the moment)
  
  Can you confirm whether CFS is involved, i.e. does it spin like that 
  even without the CFS patch applied?
 
 hmmm  could you take out the kernel/time.c (sys_time()) changes from 
 the CFS patch, does that solve the automount issue? If yes, could 
 someone take a look at automount and check whether it makes use of 
 time(2) and whether it combines it with finer grained time sources?

Yes it does and I have two reported bugs so far.

In several places I have code similar to:

wait.tv_sec = time(NULL) + 1;
wait.tv_nsec = 0;

signaled = 0;
while (!signaled) {
status = pthread_cond_timedwait(cond, mutex, wait);
   if (status) {
 if (status == ETIMEDOUT)
  break;
 fatal(status);
  }
}

lead to automount spinning with strace output a bit like:

futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 130925919}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131160876}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131377080}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131593297}) = 0
time(NULL)= 1184593935
futex(0x80034b60, FUTEX_WAKE, 1)  = 0
clock_gettime(CLOCK_REALTIME, {1184593936, 131871792}) = 0

There should be something like:

futex(0x557868c4, FUTEX_WAIT, 5321099, {0, 998091311}) = -1 ETIMEDOUT 
(Connection timed out)

in there I think.

Ian


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

> > [1] http://lkml.org/lkml/2007/07/14/60
> 
> Hm.  Tasks disappearing isn't you're typical process scheduler problem
> by any means, nor is an idle box exhibiting mouse "lurchiness".  Is
> there anything unusual in your logs?

I know that its not typical, but when my current kernel is stable and 
shows the same problem with the cfs-patch applied like the 
git-snapshot, I would say its a cfs issue.
There is nothing in the logs when a program dies, thats why asked for a 
way to make the kernel more verbose.
But I can build a plain 2.6.22 without cfs and one with it and compare 
dmesgs output, if that helps.

   Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-15 Thread Mike Galbraith

On Sun, 2007-07-15 at 14:53 +0200, Markus wrote:
> [1] http://lkml.org/lkml/2007/07/14/60

Hm.  Tasks disappearing isn't you're typical process scheduler problem
by any means, nor is an idle box exhibiting mouse "lurchiness".  Is
there anything unusual in your logs?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

> Sending a few seconds of logged /proc/sched_debug will also help get a
> picture of what's happening, and lovely would be a method to reproduce
> the problem locally.

Hi. Is there anything like the sched_debug in the 2.6.22-git5?
Because I have a cfs-problem as well [1].

   Markus


[1] http://lkml.org/lkml/2007/07/14/60
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

 Sending a few seconds of logged /proc/sched_debug will also help get a
 picture of what's happening, and lovely would be a method to reproduce
 the problem locally.

Hi. Is there anything like the sched_debug in the 2.6.22-git5?
Because I have a cfs-problem as well [1].

   Markus


[1] http://lkml.org/lkml/2007/07/14/60
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19

2007-07-15 Thread Mike Galbraith

On Sun, 2007-07-15 at 14:53 +0200, Markus wrote:
 [1] http://lkml.org/lkml/2007/07/14/60

Hm.  Tasks disappearing isn't you're typical process scheduler problem
by any means, nor is an idle box exhibiting mouse lurchiness.  Is
there anything unusual in your logs?

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v19