Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)

2018-04-06 Thread Michael Ruder
Hi Gavin,

<> Have you issued an mlockall at the start of your process?  It can also 
<> help to explicitly prefault your stack so that you don't get a page 
<> fault later if your stack depth grows.

Yes, I am using the mostly unmodified ec_user_example. I just adapted the 
PDO setup for the Omron NX ECAT IO. This example does already SCHED_FIFO 
(i tried with prio 99 and 79, as 99 should not be used according the 
PREEMPT docs), mlockall(MCL_CURRENT | MCL_FUTURE) and stack_prefault() 
before the cyclic_task starts.

I am using igb with ec_generic. I use the etherlab 1.5.2 branch (which is 
the only one that commits go to as far as I saw) in the most current 
version from the git repos.

I now found out the problem. As ecrt_domain_process is just a wrapper for 
an IOCTL, the problem had to be in the kernel, which however was 
identical. Aside from the cmdline, though and logging: There was a serial 
console defined in the cmdline and kernel logging was on console. After 
disabling the serial console, it dropped to 2 ms already. After using 
dmesg -E to disable kernel logging on the console altogether, everything 
went back to normal...

Thanks for listening ;)
-- 
.  -Michael
___
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users


Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)

2018-04-05 Thread Gavin Lambert
On 5 April 2018 21:19, quoth Michael Ruder:
> In fact, the function that takes the long time is ecrt_domain_process (and
> not ecrt_domain_queue). However, if I do no longer the call to
> ecrt_domain_queue, then the ecrt_domain_process will not take long (that
> is what got me on the wrong track yesterday).

If you don't queue the domain datagram then there's nothing to process, so that 
makes sense. 

It's peculiar for ecrt_domain_queue or ecrt_domain_process to take all that 
long, however.  Even with a large network with a complicated PDO layout I've 
never seen these take very long.

Have you issued an mlockall at the start of your process?  It can also help to 
explicitly prefault your stack so that you don't get a page fault later if your 
stack depth grows.

I have had issues in the past where ecrt_master_receive took unexpectedly long; 
the culprit was the e1000e network driver, which during the poll sometimes 
triggered watchdog processing directly on that thread instead of on a 
background thread.  There's some patches in the unofficial patchset 
(http://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme) to 
resolve the cases that were encountered there, among other things.

> This only happens on my Yocto system but not on a Gentoo system with the
> identical kernel (4.14.28-rt23). On the Gentoo, the calls never take long, 
> even
> during transitions.

What network driver and hardware are you using?

___
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users


Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)

2018-04-05 Thread Michael Ruder
Hello again,

after sleeping over the issue, I did some more tests.

In fact, the function that takes the long time is ecrt_domain_process (and not 
ecrt_domain_queue). However, if I do no longer the call to ecrt_domain_queue, 
then the ecrt_domain_process will not take long (that is what got me on the 
wrong track yesterday).

So during startup (Link up, Master gets to OP etc.), one or two calls to 
ecrt_domain_process take about 10 ms, instead of virtually nothing (500 ns) 
later on. When I disconnect the network cable from the slave then again some 
calls take long and after reconnecting as well. So always during transitions of 
the state.

This only happens on my Yocto system but not on a Gentoo system with the 
identical kernel (4.14.28-rt23). On the Gentoo, the calls never take long, even 
during transitions.
 
Thanks a lot for any hint,
Michael

> I successfully got the IgH EtherCAT Master running on an IEI board using 
> Gentoo and the 4.14.28-rt23 PREEMPT kernel.
>
> Now, I switched to a self-made, yocto-based distribution, using the same 
> kernel (I tried both the kernel built by our yocto
> recipe and my own "hand-built" kernel with the exact same result) and now I 
> have a very strange problem:
> 
> The call to ecrt_domain_queue takes sometimes very long (about 10 ms, I 
> measured that call separately). During normal operation 
> it takes no considerable amount of time but while the system is coming up 
> (switching to OP etc.) or if I disconnect a cable it 
> starts taking those large amounts of time. Then of course a 1 ms cycle time 
> fails catastrophic.
 
> On my gentoo based system, this behaviour does not occur. It never takes too 
> much time, even if I disconnect a cable or reboot a 
> slave. 
 
> As it does not seem to be kernel related, what else could it be? GCC versions 
> differ of course, yocto using 7.2.0 while gentoo is on
> 6.4.0.
 
> I narrowed down the issue to the ec_user_example already.
 
> Cyclictest shows no 
> abnormality[http://www.dict.cc/englisch-deutsch/abnormality.html], max 
> latency is < 25 us on both systems. I run
> the system with isolcpus=3 (and smp_affinity=7 for irq) and the cyclictest/ec 
> test program with taskset 8 on the isolated core. So I
> rule out a generic latency issue.
 
> I am really a bit puzzled where to start debugging this strange issue and are 
> happy for any hint!
___
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users