George Doukas wrote: > On 1/22/06, Petr SmolĂk <[EMAIL PROTECTED]> wrote: >>> George Doukas wrote: >>> So lets forget LXRT... >>> I'm trying to use ORTE from kernel space, thus orte_rt.o and >>> ortemanager_rt.o modules must be loaded. >>> >>> Even though I use the parameter rtskb_cache_size=64 when loading >>> rtnet.o, the system freezes when I try to load ortemanager_rt.o. Why? >>> >>> I'm using... >>> RTAI 3.1 over linux 2.4.26 kernel (adeos patched) >>> RTnet 0.8.3 >>> ORTE 0.3.1 >>> (I've also tried RTAI 3.1 with RTnet 0.8.2 and ORTE 0.3.0) >>> >>> My test system consists of two P4 machines with the above >>> configuration on a dedicated ethernet network. >>> On both nodes I issue the following commands: >>> >>> insmod /usr/realtime/modules/rtai_hal.o >>> insmod /usr/realtime/modules/rtai_ksched.o >>> insmod /usr/realtime/modules/rtai_lxrt.o >>> insmod /usr/realtime/modules/rtai_sem.o >>> insmod /usr/realtime/modules/rtai_mbx.o >>> >>> rmmod 8139too (or eepro100 for the other node) >>> insmod /usr/local/rtnet/modules/rtai_rtdm.o >>> /usr/local/rtnet/sbin/rtnet start >>> >>> I've edited rtnet script so that rtnet.o is loaded with the >>> appropriate parameter like this: >>> insmod $RTNET_MOD/rtnet$MODULE_EXT rtskb_cache_size=64 >/dev/null || exit 1 >>> >>> (At this point rtping works just fine) >>> And finally... >>> insmod >>> /usr/local/orte/lib/modules/2.4.26-adeos/orte/orte_rt.o >>> insmod >>> /usr/local/orte/lib/modules/2.4.26-adeos/orte/ortemanager_rt.o >>> THE SYSTEM FREEZES (on any node) !!! >> Are you using localback interface (rt_loopback.o)? >> >> this is my script for testing ORTE. It was tested long time ago. I am not >> sure, if all is actually correct against RTAI & RTNet. >> insmod rtai_hal.o >> insmod rtai_up.o >> insmod rtai_sem.o >> >> insmod rtai_rtdm.o >> insmod rtnet.o rtskb_cache_size=128 >> insmod rtnet.o >> insmod rtcfg.o >> insmod rt_loopback.o >> insmod 8139too-rt.o >> #insmod rtmac.o >> #insmod tdma.o >> rtifconfig rtlo up 127.0.0.1 >> #rtifconfig rteth0 up 147.32.86.71 >> >> insmod orte_rt.ko >> insmod ortemanager_rt.ko >> >> For first testing is not necessary to be running any network interface. Try >> to test the correct behavior with localhost interface (127.0.0.1). >> >> When the code freezes, try to locate the point failure. When you insert >> ortemanger_rt module, use parameter "verbosity" to enable some logs. >> insmod ortemanager_rt.ko verbosity="ALL.10". >> First step at the location should be disabled function ORTEDomainStart at >> the ortemanager.c (No thread will be started). After that see correct >> behavior function ORTEDomainMgrCreate(...). ... >> >> >> Regards >> Petr >> > > OK I've changed my rtai-rtnet-orte startup script! Now I don't load > lxrt module, I don't make use of rtnet script and I only setup > loopback interface.... my script looks like this: > > insmod $RTAI_PATH/modules/rtai_hal.o > insmod $RTAI_PATH/modules/rtai_up.o > # insmod $RTAI_PATH/modules/rtai_lxrt.o
This loading of both rtai_up and rtai_lxrt is a common pitfall with RTAI. Having both first seems to work but will sooner or later crash your box. > insmod $RTAI_PATH/modules/rtai_sem.o > insmod $RTAI_PATH/modules/rtai_mbx.o > insmod $RTAI_PATH/modules/rtai_shm.o > > ifconfig eth0 down > rmmod eepro100 > insmod $RTNET_PATH/modules/rtai_rtdm.o > insmod $RTNET_PATH/modules/rtnet.o rtskb_cache_size=128 > insmod $RTNET_PATH/modules/rtcfg.o > insmod $RTNET_PATH/modules/rt_loopback.o > rtifconfig rtlo up 127.0.0.1 > > insmod $ORTE_PATH/lib/modules/2.4.26-adeos/orte/orte_rt.o > insmod $ORTE_PATH/lib/modules/2.4.26-adeos/orte/ortemanager_rt.o > verbosity="ALL.10" > > Ortemanager load without problems and everything seems to work just > fine (dmesg shows no errors). > But when I try to load the h_subscriber.o and h_publisher.o (orte > hello example) the modules never manage to initialize correctly an I > get the following error: > > RTnet: rtskb allocation from real-time cache failed > RTnet: rtskb allocation from real-time cache failed Already tried to increase the rtskb_cache_size option? > Default Trap Handler: vector 14: Suspend RT task cb3b4040 I guess this is a result of an uncaught error. > > I found out the a similar broblem is reported by Kai Moritz (also > CCed) in rtnet-user mailing list but no solution was provided (Kai if > you found a solution please help). > As Kai did, I also tracked down the error to ORTEDomainAppCreate function. > > Here what I get from dmesg (I've added some comments in the form +++ > Comment...): > > Adeos: Domain RTAI registered. > RTAI[hal]: 3.1 mounted over Adeos 2.4r16/x86. > RTAI[hal]: compiled with gcc version 3.3.5. > RTAI[malloc]: loaded (global heap size=131072 bytes). > RTAI[sched_up]: loaded. > RTAI[sched_up]: fpu=yes, timer=periodic. > RTAI[sched_up]: standard tick=100 hz, CPU freq=3006901000 hz. > RTAI[sched_up]: timer setup=2010 ns, resched latency=2689 ns. > ***** WARNING: GLOBAL HEAP NEITHER SHARABLE NOR USABLE FROM USER SPACE > (use the vmalloc option for RTAI malloc) ***** > > +++ At this point RTAI modules finished loading > > eth0: network connection down > RTDM Version 0.6.0 > > *** RTnet 0.8.3 - built on Jan 20 2006 19:55:02 *** > > RTnet: initialising real-time networking > RTnet: stack-mgr started > RTDM: registered protocol device 2:2 > RTDM: registered protocol device 17:2 > RTcfg: init real-time configuration distribution protocol > initializing loopback... > RTnet: registered rtlo > > +++ At this point RTnet modules finished loading > > 4455.802 | ORTEDomainMgrCreate: start > 4455.802 | ORTEDomainCreate: orte 0.3.1 compiled: Jan 20 2006,20:00:18 > 4455.802 | ORTEDomainCreate: start > 4455.802 | ORTEDomainCreate: no active interface card > 4455.802 | ORTEDomainCreate: bind on port(RecvUnicastMetatraffic): 7400 > 4455.802 | ORTEDomainCreate: bind on port(Send): 2048 > 4455.802 | ORTEDomainCreate: GUID: 0x7f000001,0x00080002,0x000001c1 > 4455.802 | objectEntry: start > 4455.802 | objectEntry: Host : 0x7f000001 created > 4455.802 | objectEntry: App : 0x00080002 created > 4455.802 | objectEntry: Object: 0x000001c1 connected to AID > 4455.802 | objectEntry: Object: 0x000001c1 created > 4455.802 | objectEntry: finished > 4455.802 | CSTWriterInit: start > 4455.802 | CSTWriterRefreshTimer: start > 4455.802 | eventDetach: AID 0x80002 > 4455.802 | eventDetach: finished > 4455.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer > 4455.802 | htimerUnicastCommon: root updated, wakeup > 4455.802 | WakeUpSendingThread : start > 4455.802 | WakeUpSendingThread : send wakeup signal > 4455.802 | eventAdd: finished > 4455.802 | CSTWriterRefreshTimer: finished > 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x8c2 > 4455.802 | CSTWriterInit: finished > 4455.802 | CSTReaderInit: start > 4455.802 | CSTReaderInit: 0x7f000001-0x80002-0x7c7 > 4455.802 | CSTReaderInit: finished > 4455.802 | CSTReaderInit: start > 4455.802 | CSTReaderInit: 0x7f000001-0x80002-0x1c7 > 4455.802 | CSTReaderInit: finished > 4455.802 | CSTWriterInit: start > 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x1c2 > 4455.802 | CSTWriterInit: finished > 4455.802 | CSTWriterInit: start > 4455.802 | CSTWriterRefreshTimer: start > 4455.802 | eventDetach: AID 0x80002 > 4455.802 | eventDetach: finished > 4455.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer > 4455.802 | eventAdd: finished > 4455.802 | CSTWriterRefreshTimer: finished > 4455.802 | CSTWriterInit: 0x7f000001-0x80002-0x7c2 > 4455.802 | CSTWriterInit: finished > 4455.802 | CSTWriterAddCSChange: cstWriter:0x7f000001-0x80002-0x8c2 > 4455.802 | CSTWriterAddCSChange: sn:0x1 > 4455.802 | CSTWriterAddCSChange: finished > 4455.802 | ORTEDomainCreate: finished > 4455.802 | ORTEDomainMgrCreate: finished > 4455.802 | ORTEAppRecvThread UM: start > 4455.802 | ORTEAppRecvThread UM: receiving > 4455.802 | ORTEAppSendThread: start > 4455.802 | ORTEAppSendThread: sleeping for 72s 0ms > 4455.802 | ORTEAppSendThread: fired > 4455.802 | htimerRoot: start > 4455.802 | htimerRoot: finished > 4455.802 | ORTEAppSendThread: sleeping for 72s 0ms > > +++ At this point orte_rt.o and ortemanager_rt.o have been loaded.... > +++ And I try try to load h_subscriber.o (using verbosity="ALL.10") > > 4527.802 | ORTEAppSendThread: fired > 4527.802 | htimerRoot: start > 4527.802 | htimerRoot: AID-0x80002 > 4527.802 | htimerUnicastCommon: CSTWriterRefreshTimer > 4527.802 | CSTWriterRefreshTimer: start > 4527.802 | eventDetach: AID 0x80002 > 4527.802 | htimerUnicastCommon: root updated, wakeup > 4527.802 | WakeUpSendingThread : start > 4527.802 | eventDetach: finished > 4527.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer > 4527.802 | eventAdd: finished > 4527.802 | CSTWriterRefreshTimer: finished > 4527.802 | htimerRoot: finished > 4527.802 | ORTEAppSendThread: sleeping for 0s 0ms > 4527.802 | ORTEAppSendThread: fired > 4527.802 | htimerRoot: start > 4527.802 | htimerRoot: AID-0x80002 > 4527.802 | htimerUnicastCommon: CSTWriterRefreshTimer > 4527.802 | CSTWriterRefreshTimer: start > 4527.802 | eventDetach: AID 0x80002 > 4527.802 | htimerUnicastCommon: root updated, wakeup > 4527.802 | WakeUpSendingThread : start > 4527.802 | eventDetach: finished > 4527.802 | eventAdd: AID 0x80002 CSTWriterRefreshTimer > 4527.802 | eventAdd: finished > 4527.802 | CSTWriterRefreshTimer: finished > 4527.802 | htimerRoot: finished > 4527.802 | ORTEAppSendThread: sleeping for 72s 0ms > RTnet: rtskb allocation from real-time cache failed > RTnet: rtskb allocation from real-time cache failed > Default Trap Handler: vector 14: Suspend RT task cd6d4040 > > Now h_subscriber module never leaves initializing state. > The thread creaded inside init_module function of h_subscriber that > runs domainInit is never joined. > What's going on? > Let me remind you that I'm using rtai-3.1, rtnet-0.8.3 and orte-0.3.1. > The hard crashes, the outdated components, and the resource (rtskb) allocations from RT-contexts are good reasons why we should move ORTE over a recent real-time userspace API. The sooner the better. Jan
signature.asc
Description: OpenPGP digital signature