Re: [Xenomai-core] Scheduling while atomic

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


...
[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...




And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c482] e100_hw_init+0x3ad/0xa81 [e100]
 [c48524e1] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [c4856946] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [c4a09b29] rt_sem_p+0xa6/0x10a [xeno_native]
 [c4a03c62] __rt_sem_p+0x5d/0x66 [xeno_native]
 [c485b207] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c010b3ea] __ipipe_syscall_root+0x53/0xbe
 [c01029c0] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME


0  0  30   000500080  ROOT


   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c0266ed5] schedule+0x3ef/0x5ed
 [c485a27c] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [c485a316] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [c010dd8b] default_wake_function+0x0/0x12
 [c0124fbc] kthread+0x68/0x95
 [c0124f54] kthread+0x0/0x95
 [c0100d71] kernel_thread_helper+0x5/0xb

Any bells already ringing?


Yes; the bad news is that this looks like the same bug than you reported recently, 
which I only partially fixed, it seems. xnshadow_harden() is still not working 
properly under certain preemption situation induced by CONFIG_PREEMPT, and the 
hardening thread is likely unexpectedly moved back to the Linux runqueue while 
transitioning to Xenomai. The good news is that it's a well identified issue, at 
least...




Will try Gilles' patch now...

Jan





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.



[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hm.

When I remove the output() from both tasks, all seems fine.

Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus

Hold on. Just crashed without the file access: please disregard last post.


Jeroen.
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Gilles,


I cannot reproduce those messages after turning nucleus debugging on. Instead, I now either get relatively more failing mutexes oreven hard lockups with the test program I sent to you. If thecomputer didn't crash, dmesg contains 3 Xenomai messages relating to a task being movend to secondary domain after exception #14. Aswhen thecomputer crashes: I have written the last kernel panic message on a paper. Please tell if you want also the addresses or (part of) the call stack.


I'm still wondering if there's a programming error in the mutex test program. After I sent my previous message, and before I turned nucleus debugging on, I managed (by reducing the sleeptimes to max. 5.0e4) to fatally crash the computer,while spewing out countless 'scheduling while atomic messages'.Is the mutex error reproducible ?


Tomorrow I'll try the patch.

lostage_handler + e/33a
rthal_apc_handler + 3b/46
lostage_handler + 190/33a
rthal_apc_handler + 3b/46
__ipipe_sync_stage + 2a1/2bc
mark_offset_tsc + c1/456
__ipipe_sync_stage + 2a9/2bc
ipipe_unstall_pipeline_from + 189/194 (might be 181/194)
xnpod_delete_thread + ba1/bc3
mcount + 23/2a
taskexit_event + 4f/6c
__ipipe_dispatch_event + 90/173
do_exit + 10f/604
sys_exit + 8/14
syscall_call + 7/b
next_thread + 0/15
syscall_call + 7/b

0 Kernel panic - not syncing: Fatal Exception in interrupt


Thanks for investigating,

Jeroen.


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Gilles Chanteperdrix
Jeroen Van den Keybus wrote:
  Gilles,
  
  
  I cannot reproduce those messages after turning nucleus debugging on.
  Instead, I now either get relatively more failing mutexes or even hard
  lockups with the test program I sent to you. If the computer didn't crash,
  dmesg contains 3 Xenomai messages relating to a task being movend to
  secondary domain after exception #14. As when the computer crashes: I have
  written the last kernel panic message on a paper. Please tell if you want
  also the addresses or (part of) the call stack.

Could you try adding a call to mlockall(MCL_CURRENT|MCL_FUTURE) ? 

Also note that you do not need protecting accesses to file descriptor
with rt_mutexes. stdio file descriptor are protected with pthread
mutexes, and pthread mutexes functions cause threads migration to
secondary mode. And unix file descriptor are passed to system calls,
which also cause migration to secondary mode.

-- 


Gilles Chanteperdrix.



[Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hello,


Apparently, the code I shared with Gilles never made it to this forum. Anyway, the issue I'm having here is really a problem and it might be useful if some of you could try it out or comment on it. I might be making a silly programming error here, but the result is invariably erroneous operation or kernel crashes.


The program creates a file dump.txt and has two independent threads trying to access it and write a one or a zero there. Inside the writing routine, which is accessed by both threads, a check is made to see if the access is really locked. In my setup, I have tons of ALERTS popping up with this program, meaning that something is wrong with my use of mutex. Could anyone please check and see if a) it is correctly written and b) it fails as well on their machine. It would allow me to focus my actions on the Xenomai setup (which I keep frozen this instant, in order to keep a possible bug predictable) or on my own programming.


A second example is also included, which tries to achieve the same goal with a semaphore (initialized to 1). That seems to work, but under heavy load (tmax = 1.0e7), the kernel crashes.

Kernel: 2.6.15 Adeos: 1.1-03 gcc: 4.0.2 Ipipe tracing enabled

TIA

Jeroen.



/* TEST_MUTEX.C */
#include stdlib.h#include stdio.h#include unistd.h#include fcntl.h#include signal.h#include math.h#include values.h
#include sys/mman.h
#include native/task.h#include native/mutex.h#include native/sem.h
int fd, err;RT_MUTEX m;RT_SEM s;float tmax = 1.0e7;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){ if (r != 0) fprintf(stderr, L%d: %s.\n, n, strerror(-r)); return(r);}
void output(char c) { static int cnt = 0; int n; char buf[2]; RT_MUTEX_INFO mutexinfo;  buf[0] = c;  if (cnt == 80) { buf[1] = '\n'; n = 2;
 cnt = 0; } else { n = 1; cnt++; }  CHECK(rt_mutex_inquire(m, mutexinfo)); if (mutexinfo.lockcnt = 0) { RT_TASK_INFO taskinfo;
 CHECK(rt_task_inquire(NULL, taskinfo)); fprintf(stderr, ALERT: No lock! (lockcnt=%d) Offending task: %s\n, mutexinfo.lockcnt, taskinfo.name
); }  if (write(fd, buf, n) != n) { fprintf(stderr, File write error.\n); CHECK(rt_sem_v(s)); } }
void task0(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_mutex_lock(m, TM_INFINITE));
 output('0'); CHECK(rt_mutex_unlock(m)); }}
void task1(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_mutex_lock(m, TM_INFINITE));
 output('1'); CHECK(rt_mutex_unlock(m)); }}
void sighandler(int arg){ CHECK(rt_sem_v(s));}
int main(int argc, char *argv[]){ RT_TASK t, t0, t1;  if ((fd = open(dump.txt, O_CREAT | O_TRUNC | O_WRONLY))  0) fprintf(stderr, File open error.\n);
 else { if (argc == 2) { tmax = atof(argv[1]); if (tmax == 0.0) tmax = 1.0e7; } if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) printf(mlockall() error.\n);
  CHECK(rt_task_shadow(t, main, 1, T_FPU));
 CHECK(rt_timer_start(TM_ONESHOT));  CHECK(rt_mutex_create(m, mutex)); CHECK(rt_sem_create(s, sem, 0, S_PRIO));
 signal(SIGINT, sighandler);  CHECK(rt_task_create(t0, task0, 0, 30, T_FPU)); CHECK(rt_task_start(t0, task0, NULL)); CHECK(rt_task_create(t1, task1, 0, 29, T_FPU));
 CHECK(rt_task_start(t1, task1, NULL));
 printf(Running for %.2f seconds.\n, (float)MAXLONG/1.0e9); CHECK(rt_sem_p(s, MAXLONG)); signal(SIGINT, SIG_IGN);  CHECK(rt_task_delete(t1));
 CHECK(rt_task_delete(t1)); CHECK(rt_task_delete(t0));  CHECK(rt_sem_delete(s)); CHECK(rt_mutex_delete(m));  rt_timer_stop();
  close(fd); } return 0;}
/*/

/* TEST_SEM.C */
#include stdlib.h#include stdio.h#include unistd.h#include fcntl.h#include signal.h#include math.h#include values.h
#include sys/mman.h
#include native/task.h#include native/sem.h
int fd, err;RT_SEM s, m;float tmax = 1.0e9;
#define CHECK(arg) check(arg, __LINE__)
int check(int r, int n){ if (r != 0) fprintf(stderr, L%d: %s.\n, n, strerror(-r)); return(r);}
void output(char c) { static int cnt = 0; int n; char buf[2]; RT_SEM_INFO seminfo;  buf[0] = c;  if (cnt == 80) { buf[1] = '\n'; n = 2;
 cnt = 0; } else { n = 1; cnt++; }  CHECK(rt_sem_inquire(m, seminfo)); if (seminfo.count != 0) { RT_TASK_INFO taskinfo; CHECK(rt_task_inquire(NULL, taskinfo));
 fprintf(stderr, ALERT: No lock! (count=%ld) Offending task: %s\n, seminfo.count, taskinfo.name); }  if (write(fd, buf, n) != n) {
 fprintf(stderr, File write error.\n); CHECK(rt_sem_v(s)); } }
void task0(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_sem_p(m, TM_INFINITE));
 output('0'); CHECK(rt_sem_v(m)); }}
void task1(void *arg){ CHECK(rt_task_set_mode(T_PRIMARY, 0, NULL)); while (1) { CHECK(rt_task_sleep((float)rand()*tmax/(float)RAND_MAX)); CHECK(rt_sem_p(m, TM_INFINITE));
 output('1'); CHECK(rt_sem_v(m)); }}
void sighandler(int arg){ 

Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
 ...
 [Update] While writing this mail and letting your test run for a while,
 I *did* get a hard lock-up. Hold on, digging deeper...
 

And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c482] e100_hw_init+0x3ad/0xa81 [e100]
 [c48524e1] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [c4856946] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [c4a09b29] rt_sem_p+0xa6/0x10a [xeno_native]
 [c4a03c62] __rt_sem_p+0x5d/0x66 [xeno_native]
 [c485b207] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c010b3ea] __ipipe_syscall_root+0x53/0xbe
 [c01029c0] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME
  0  0  30   000500080  ROOT
   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [c012e564] __ipipe_dispatch_event+0x56/0xdd
 [c0266ed5] schedule+0x3ef/0x5ed
 [c485a27c] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [c485a316] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [c010dd8b] default_wake_function+0x0/0x12
 [c0124fbc] kthread+0x68/0x95
 [c0124f54] kthread+0x0/0x95
 [c0100d71] kernel_thread_helper+0x5/0xb

Any bells already ringing?

Will try Gilles' patch now...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jan Kiszka wrote:
 Jan Kiszka wrote:
 ...
 [Update] While writing this mail and letting your test run for a while,
 I *did* get a hard lock-up. Hold on, digging deeper...

 
 And here are its last words, spoken via serial console:
 
 c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
c012e564 0022  0246 c30d1a90 c4866ce0 0033
 c482
c482a360 c4866ca0  c48293a4 c48524e1  
 0002
 Call Trace:
  [c012e564] __ipipe_dispatch_event+0x56/0xdd
  [c482] e100_hw_init+0x3ad/0xa81 [e100]
  [c48524e1] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
  [c4856946] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
  [c4a09b29] rt_sem_p+0xa6/0x10a [xeno_native]
  [c4a03c62] __rt_sem_p+0x5d/0x66 [xeno_native]
  [c485b207] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
  [c012e564] __ipipe_dispatch_event+0x56/0xdd
  [c010b3ea] __ipipe_syscall_root+0x53/0xbe
  [c01029c0] system_call+0x20/0x41
 Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
 sig=0, prev=gatekeeper/0[809])
  CPU  PIDPRI  TIMEOUT  STAT  NAME
  0  0  30   000500080  ROOT
0  86430   000300180  task0
0  86529   000300288  task1
0  8631000300082  main
 Timer: oneshot [tickval=1 ns, elapsed=175144731477]
 
 c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
 0001
c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
 c030cd80
 Call Trace:
  [c012e564] __ipipe_dispatch_event+0x56/0xdd
  [c0266ed5] schedule+0x3ef/0x5ed
  [c485a27c] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
  [c485a316] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
  [c010dd8b] default_wake_function+0x0/0x12
  [c0124fbc] kthread+0x68/0x95
  [c0124f54] kthread+0x0/0x95
  [c0100d71] kernel_thread_helper+0x5/0xb
 
 Any bells already ringing?
 
 Will try Gilles' patch now...
 

Nope, this didn't help.

Ok, this is migration magic. Someone around who hacks this part blindly?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jan Kiszka
Jeroen Van den Keybus wrote:
  Interesting, when writing to 2 different files, I get the same crashes.
 Will test with only one task/fd.

File ops doesn't matter for me. I took them out of task0/1, and I still
got the crashes. (BTW, this may explain the difference in your backtrace
you reported privately.)

Jan - now really leaving...



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Hannes Mayer

Jan Kiszka wrote:
[...]

Do you (or anybody else) have a running 2.0.x installation? If so,
please test that setup as well.


Sure :-)

# uname -r
2.6.13.4-adeos-xenomai
# cat /proc/xenomai/version
2.0
# ./mutex
Running for 2.15 seconds.
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
ALERT: No lock! (lockcnt=0) Offending task: task0
L121: Connection timed out.
# cat dump.txt
101001001010101011000110001[...]

# ./sem
Running for 2.15 seconds.
L119: Connection timed out.
# cat dump.txt
101001muon:/home/xenomai/atomic#

More tests ?

Best regards,
Hannes.



Re: [Xenomai-core] Scheduling while atomic

2006-01-18 Thread Jeroen Van den Keybus
Hm.

When I remove the output() from both tasks, all seems fine.

Jeroen.