Philippe Gerum wrote:
> 
> In theory, you might be right. In practice, there is no way to be
> 100% sure that under such situation, the wipe out would succeed,
> _that_ is the problem. We could devise something with a watermark,
> starting to spam the syslog with warning messages after some pressure
> level is reached, but even the wipe out operation would have to
> follow the OOM "strategy", i.e. not necessarily killing the real
> offending application but any random thread which happens to be
> registered in the drop queue, which is terminally useless when
> debugging.        
> 
> Again, the situation you encounter is a sign of a patent
> dysfunctioning due to a design issue in your application. We could
> raise the limit so that statistically, the issue would not even
> trigger in your case, like it never triggered for anyone else for the
> last two years. But, for that particular issue occurring in sched.c,
> there is nothing 100% safe and sane we could do to recover such
> situation, so let's not pretend that a system works just because it
> does not run into any BUG_ON(). As it is, your application is living
> on the edge, and this is what you need to solve, and for that
> purpose, a BUG_ON() which points you at the problem immediately
> during the testing phase is much better that killing any thread
> around randomly while in production, just for the purpose of "being
> graceful". This would not be graceful, at all.            

Well I am still not having an idea what you mean that the application is
"on the edge". Ok, there is a bunch of threads initialized with a number
of message queues, semaphores, condition variables, probably most things
the POSIX interface provides, accumulating in a heap size of more than
130 KB. I configured a heap size of 512KB. These threads are doing most
of the time nothing. According to /proc/xenomai/stat only seldom mode
changes happen. There are a few timer threads doing clock_nanosleep(),
sending some messages around to some threads if some timer expire. A
maximum of 16 tasks could have some steady activity which lead to CSW's
but as I said almost no MSW's happen.
I think one might break up the one process / 50 threads model to
something like more processes / less threads, but this is existing
software coming from a PSOS background and at first the software has to
be ported (to POSIX) and afterwards some redesign will probably happen.
Could you maybe give a more general view on what the application should
not do in the mentioned case, so a later redesign could consider your
recommendations ?

BTW, the application runs very nicely so far. We are pleased with
Xenomai's performance. This is our second application we are porting to
Xenomai and it's the biggest we have. So far only the Kernel oops was
sort of annoying, but at least we know now how to work around it.


Best regards,

Daniel Schnell.


_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to