Kai Moritz wrote:
>> Kai Moritz wrote:
>>
>>> Hi folks,
>>>
> 
>> Oh, yes, there is something one have to know when trying to run ORTE
>> over RTnet (which is, AFAIK, undocumented yet): due to the old-style
>> kernel design of the ORTE port over RTAI/RTnet (mea culpa),
>> initialisation of sockets takes place in real-time context. This means
>> that the required buffers of that sockets need to be obtained from the
>> global real-time rtskb cache - which is empty by default. Increase it by
>> passing e.g. "rtskb_cache_size=64" during insmod rtnet.ko (or patch the
>> rtnet start script). See also README.pools.
> 
> 
> Yeah, thanks! That helps.
> After increasing the size of the rtskb_cache orte and ortemanager are
> loaded without any errors!
> 
> Unfortunatly, I stumbled in some new problems directly afterwards. I
> wanted to test my ORTE-compilation by inserting the h_publisher_rt.ko
> module from ORTE's hello-world example. But when I load that module, I
> get the following message from RTAI:
> ----------------------
> Default Trap Handler: vector 14: Suspend RT task c5d34040
> ----------------------
> I've used Google to search for the string "rtai vector 14" and found a
> mail-thread that says, "vector 14" is indicating a page fault.

Yep, something went wrong, either in the real-time task or in some RTAI
service called by the task. The latter case typically means the kernel
task broke the system...

> Then I've inserted some simple rt_printk debug-statements in the code.
> So I was able to track the error down to a function called
> ORTEDomainCreate(). This function is rather long so it's to
> timeconsuming to insert more rt_printk debug-statements for further
> research. (Perhaps I will try to use kerneldebugging for that...)

Merging kgdb or similar tools into the adeos patch will likely be too
much work for finding just this bug. The rt_printk approach is generally
sufficient to track such reproducible faults down.

> At the beginning of the function some memory is allocated. But that
> doese not trigger the rtai-trap.
> 
> Strangely enough, the same function is called without any errors before,
> when the ortemanager_rt.ko module (which initializes the ORTE-System) is
> loaded!
> 
> Id it right, that "vector 14" indicates a page-fault?
> Which functions can trigger that rtai trap??
> 

Basically, every function. You will unfortunately have to spend a bit
more time on this. That's also why this stack rather belongs to user
space. Then debugging and backtracing would become significantly easier.
Again, if someone is interested to work on this field, I can provide an
experimental patch and some instruction how to make ORTE (almost) run
over Fusion's POSIX skin.

> 
>>
>> Besides these issues with setting up and running ORTE 0.3 over latest
>> RTnet (which is likely easy to fix),
> 
> Since we need ORTE I would like to try to fix the errors stated above.
> But unfortunatly I'm absolutly new to rtai and RTnet. Though, it is very
>  hard for me, to figure out, what is going wrong. I think I need some -
> or better: much - more time to play around with rtai and RTnet to fully
> understand the fundamentals...

Working on this will give you more insights over the time - hope you are
not short on time. ;)

I hoped Petr could jump in, but I guess he's not listening. Anyway, if
you discover anything in ORTE or RTnet you don't understand, don't
hesitate to post questions here. We will try to answer also ORTE-related
questions, and more information will get documented via this list at the
same time.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to