George Doukas wrote:
> On 1/22/06, Petr SmolĂ­k <[EMAIL PROTECTED]> wrote:
>>> George Doukas wrote:
>>> So lets forget LXRT...
>>> I'm trying to use ORTE from kernel space, thus orte_rt.o and
>>> ortemanager_rt.o modules must be loaded.
>>>
>>> Even though I use the parameter rtskb_cache_size=64 when loading
>>> rtnet.o, the system freezes when I try to load ortemanager_rt.o. Why?
>>>
>>> I'm using...
>>> RTAI 3.1 over linux 2.4.26 kernel (adeos patched)
>>> RTnet 0.8.3
>>> ORTE 0.3.1
>>> (I've also tried RTAI 3.1 with RTnet 0.8.2 and ORTE 0.3.0)
>>>
>>> My test system consists of two P4 machines with the above
>>> configuration on a dedicated ethernet network.
>>> On both nodes I issue the following commands:
>>>
>>> insmod /usr/realtime/modules/rtai_hal.o
>>> insmod /usr/realtime/modules/rtai_ksched.o
>>> insmod /usr/realtime/modules/rtai_lxrt.o
>>> insmod /usr/realtime/modules/rtai_sem.o
>>> insmod /usr/realtime/modules/rtai_mbx.o
>>>
>>> rmmod 8139too (or eepro100 for the other node)
>>> insmod /usr/local/rtnet/modules/rtai_rtdm.o
>>> /usr/local/rtnet/sbin/rtnet start
>>>
>>> I've edited rtnet script so that rtnet.o is loaded with the
>>> appropriate parameter like this:
>>> insmod $RTNET_MOD/rtnet$MODULE_EXT rtskb_cache_size=64 >/dev/null || exit 1
>>>
>>> (At this point rtping works just fine)
>>> And finally...
>>> insmod
>>> /usr/local/orte/lib/modules/2.4.26-adeos/orte/orte_rt.o
>>> insmod
>>> /usr/local/orte/lib/modules/2.4.26-adeos/orte/ortemanager_rt.o
>>> THE SYSTEM FREEZES (on any node) !!!
>>  Are you using localback interface (rt_loopback.o)?
>>
>>  this is my script for testing ORTE. It was tested long time ago. I am not
>> sure, if all is actually correct against RTAI & RTNet.
>>  insmod rtai_hal.o
>>  insmod rtai_up.o
>>  insmod rtai_sem.o
>>
>>  insmod rtai_rtdm.o
>>  insmod rtnet.o rtskb_cache_size=128
>>  insmod rtnet.o
>>  insmod rtcfg.o
>>  insmod rt_loopback.o
>>  insmod 8139too-rt.o
>>  #insmod rtmac.o
>>  #insmod tdma.o
>>  rtifconfig rtlo up 127.0.0.1
>>  #rtifconfig rteth0 up 147.32.86.71
>>
>>  insmod orte_rt.ko
>>  insmod ortemanager_rt.ko
>>
>>  For first testing is not necessary to be running any network interface. Try
>> to test the correct behavior with localhost interface (127.0.0.1).
>>
>>  When the code freezes, try to locate the point failure. When you insert
>> ortemanger_rt module, use parameter "verbosity" to enable some logs.
>>  insmod ortemanager_rt.ko verbosity="ALL.10".
>>  First step at the location should be disabled function ORTEDomainStart at
>> the ortemanager.c (No thread will be started). After that see correct
>> behavior function ORTEDomainMgrCreate(...). ...
>>
>>
>>  Regards
>>  Petr
>>
> 
> OK I've changed my rtai-rtnet-orte startup script! Now I don't load
> lxrt module, I don't make use of rtnet script and I only setup
> loopback interface.... my script looks like this:
> 
> insmod $RTAI_PATH/modules/rtai_hal.o
> insmod $RTAI_PATH/modules/rtai_up.o
> # insmod $RTAI_PATH/modules/rtai_lxrt.o

This loading of both rtai_up and rtai_lxrt is a common pitfall with
RTAI. Having both first seems to work but will sooner or later crash
your box.

> insmod $RTAI_PATH/modules/rtai_sem.o
> insmod $RTAI_PATH/modules/rtai_mbx.o
> insmod $RTAI_PATH/modules/rtai_shm.o
> 
> ifconfig eth0 down
> rmmod eepro100
> insmod $RTNET_PATH/modules/rtai_rtdm.o
> insmod $RTNET_PATH/modules/rtnet.o rtskb_cache_size=128
> insmod $RTNET_PATH/modules/rtcfg.o
> insmod $RTNET_PATH/modules/rt_loopback.o
> rtifconfig rtlo up 127.0.0.1
> 
> insmod $ORTE_PATH/lib/modules/2.4.26-adeos/orte/orte_rt.o
> insmod $ORTE_PATH/lib/modules/2.4.26-adeos/orte/ortemanager_rt.o
> verbosity="ALL.10"
> 
> Ortemanager load without problems and everything seems to work just
> fine (dmesg shows no errors).
> But when I try to load the h_subscriber.o and h_publisher.o (orte
> hello example) the modules never manage to initialize correctly an I
> get the following error:
> 
> RTnet: rtskb allocation from real-time cache failed
> RTnet: rtskb allocation from real-time cache failed

Already tried to increase the rtskb_cache_size option?

> Default Trap Handler: vector 14: Suspend RT task cb3b4040

I guess this is a result of an uncaught error.

> 
> I found out the a similar broblem is reported by Kai Moritz (also
> CCed) in rtnet-user mailing list but no solution was provided (Kai if
> you found a solution please help).
> As Kai did, I also tracked down the error to ORTEDomainAppCreate function.
> 
> Here what I get from dmesg (I've added some comments in the form +++
> Comment...):
> 
> Adeos: Domain RTAI registered.
> RTAI[hal]: 3.1 mounted over Adeos 2.4r16/x86.
> RTAI[hal]: compiled with gcc version 3.3.5.
> RTAI[malloc]: loaded (global heap size=131072 bytes).
> RTAI[sched_up]: loaded.
> RTAI[sched_up]: fpu=yes, timer=periodic.
> RTAI[sched_up]: standard tick=100 hz, CPU freq=3006901000 hz.
> RTAI[sched_up]: timer setup=2010 ns, resched latency=2689 ns.
> ***** WARNING: GLOBAL HEAP NEITHER SHARABLE NOR USABLE FROM USER SPACE
> (use the vmalloc option for RTAI malloc) *****
> 
> +++ At this point RTAI modules finished loading
> 
> eth0: network connection down
> RTDM Version 0.6.0
> 
> *** RTnet 0.8.3 - built on Jan 20 2006 19:55:02 ***
> 
> RTnet: initialising real-time networking
> RTnet: stack-mgr started
> RTDM: registered protocol device 2:2
> RTDM: registered protocol device 17:2
> RTcfg: init real-time configuration distribution protocol
> initializing loopback...
> RTnet: registered rtlo
> 
> +++ At this point RTnet modules finished loading
> 
> 4455.802 | ORTEDomainMgrCreate: start
> 4455.802 | ORTEDomainCreate: orte 0.3.1 compiled: Jan 20 2006,20:00:18
> 4455.802 | ORTEDomainCreate: start
> 4455.802 | ORTEDomainCreate: no active interface card
> 4455.802 | ORTEDomainCreate: bind on port(RecvUnicastMetatraffic): 7400
> 4455.802 | ORTEDomainCreate: bind on port(Send): 2048
> 4455.802 | ORTEDomainCreate: GUID: 0x7f000001,0x00080002,0x000001c1
> 4455.802 | objectEntry: start
> 4455.802 | objectEntry: Host  : 0x7f000001 created
> 4455.802 | objectEntry: App   : 0x00080002 created
> 4455.802 | objectEntry: Object: 0x000001c1 connected to AID
> 4455.802 | objectEntry: Object: 0x000001c1 created
> 4455.802 | objectEntry: finished
> 4455.802 | CSTWriterInit: start
> 4455.802 | CSTWriterRefreshTimer: start
> 4455.802 | eventDetach: AID 0x80002
> 4455.802 | eventDetach: finished
> 4455.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer
> 4455.802 | htimerUnicastCommon: root updated, wakeup
> 4455.802 | WakeUpSendingThread : start
> 4455.802 | WakeUpSendingThread : send wakeup signal
> 4455.802 | eventAdd: finished
> 4455.802 | CSTWriterRefreshTimer: finished
> 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x8c2
> 4455.802 | CSTWriterInit: finished
> 4455.802 | CSTReaderInit: start
> 4455.802 | CSTReaderInit: 0x7f000001-0x80002-0x7c7
> 4455.802 | CSTReaderInit: finished
> 4455.802 | CSTReaderInit: start
> 4455.802 | CSTReaderInit: 0x7f000001-0x80002-0x1c7
> 4455.802 | CSTReaderInit: finished
> 4455.802 | CSTWriterInit: start
> 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x1c2
> 4455.802 | CSTWriterInit: finished
> 4455.802 | CSTWriterInit: start
> 4455.802 | CSTWriterRefreshTimer: start
> 4455.802 | eventDetach: AID 0x80002
> 4455.802 | eventDetach: finished
> 4455.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer
> 4455.802 | eventAdd: finished
> 4455.802 | CSTWriterRefreshTimer: finished
> 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x7c2
> 4455.802 | CSTWriterInit: finished
> 4455.802 | CSTWriterAddCSChange: cstWriter:0x7f000001-0x80002-0x8c2
> 4455.802 | CSTWriterAddCSChange: sn:0x1
> 4455.802 | CSTWriterAddCSChange: finished
> 4455.802 | ORTEDomainCreate: finished
> 4455.802 | ORTEDomainMgrCreate: finished
> 4455.802 | ORTEAppRecvThread UM: start
> 4455.802 | ORTEAppRecvThread UM: receiving
> 4455.802 | ORTEAppSendThread: start
> 4455.802 | ORTEAppSendThread: sleeping for 72s 0ms
> 4455.802 | ORTEAppSendThread: fired
> 4455.802 | htimerRoot: start
> 4455.802 | htimerRoot: finished
> 4455.802 | ORTEAppSendThread: sleeping for 72s 0ms
> 
> +++ At this point orte_rt.o and ortemanager_rt.o have been loaded....
> +++ And I try try to load h_subscriber.o (using verbosity="ALL.10")
> 
> 4527.802 | ORTEAppSendThread: fired
> 4527.802 | htimerRoot: start
> 4527.802 | htimerRoot: AID-0x80002
> 4527.802 | htimerUnicastCommon: CSTWriterRefreshTimer
> 4527.802 | CSTWriterRefreshTimer: start
> 4527.802 | eventDetach: AID 0x80002
> 4527.802 | htimerUnicastCommon: root updated, wakeup
> 4527.802 | WakeUpSendingThread : start
> 4527.802 | eventDetach: finished
> 4527.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer
> 4527.802 | eventAdd: finished
> 4527.802 | CSTWriterRefreshTimer: finished
> 4527.802 | htimerRoot: finished
> 4527.802 | ORTEAppSendThread: sleeping for 0s 0ms
> 4527.802 | ORTEAppSendThread: fired
> 4527.802 | htimerRoot: start
> 4527.802 | htimerRoot: AID-0x80002
> 4527.802 | htimerUnicastCommon: CSTWriterRefreshTimer
> 4527.802 | CSTWriterRefreshTimer: start
> 4527.802 | eventDetach: AID 0x80002
> 4527.802 | htimerUnicastCommon: root updated, wakeup
> 4527.802 | WakeUpSendingThread : start
> 4527.802 | eventDetach: finished
> 4527.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer
> 4527.802 | eventAdd: finished
> 4527.802 | CSTWriterRefreshTimer: finished
> 4527.802 | htimerRoot: finished
> 4527.802 | ORTEAppSendThread: sleeping for 72s 0ms
> RTnet: rtskb allocation from real-time cache failed
> RTnet: rtskb allocation from real-time cache failed
> Default Trap Handler: vector 14: Suspend RT task cd6d4040
> 
> Now h_subscriber module never leaves initializing state.
> The thread creaded inside init_module function of h_subscriber that
> runs domainInit is never joined.
> What's going on?
> Let me remind you that I'm using rtai-3.1, rtnet-0.8.3 and orte-0.3.1.
> 

The hard crashes, the outdated components, and the resource (rtskb)
allocations from RT-contexts are good reasons why we should move ORTE
over a recent real-time userspace API. The sooner the better.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to