Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)
Hi Gavin, <> Have you issued an mlockall at the start of your process? It can also <> help to explicitly prefault your stack so that you don't get a page <> fault later if your stack depth grows. Yes, I am using the mostly unmodified ec_user_example. I just adapted the PDO setup for the Omron NX ECAT IO. This example does already SCHED_FIFO (i tried with prio 99 and 79, as 99 should not be used according the PREEMPT docs), mlockall(MCL_CURRENT | MCL_FUTURE) and stack_prefault() before the cyclic_task starts. I am using igb with ec_generic. I use the etherlab 1.5.2 branch (which is the only one that commits go to as far as I saw) in the most current version from the git repos. I now found out the problem. As ecrt_domain_process is just a wrapper for an IOCTL, the problem had to be in the kernel, which however was identical. Aside from the cmdline, though and logging: There was a serial console defined in the cmdline and kernel logging was on console. After disabling the serial console, it dropped to 2 ms already. After using dmesg -E to disable kernel logging on the console altogether, everything went back to normal... Thanks for listening ;) -- . -Michael ___ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users
Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)
On 5 April 2018 21:19, quoth Michael Ruder: > In fact, the function that takes the long time is ecrt_domain_process (and > not ecrt_domain_queue). However, if I do no longer the call to > ecrt_domain_queue, then the ecrt_domain_process will not take long (that > is what got me on the wrong track yesterday). If you don't queue the domain datagram then there's nothing to process, so that makes sense. It's peculiar for ecrt_domain_queue or ecrt_domain_process to take all that long, however. Even with a large network with a complicated PDO layout I've never seen these take very long. Have you issued an mlockall at the start of your process? It can also help to explicitly prefault your stack so that you don't get a page fault later if your stack depth grows. I have had issues in the past where ecrt_master_receive took unexpectedly long; the culprit was the e1000e network driver, which during the poll sometimes triggered watchdog processing directly on that thread instead of on a background thread. There's some patches in the unofficial patchset (http://sourceforge.net/u/uecasm/etherlab-patches/ci/default/tree/#readme) to resolve the cases that were encountered there, among other things. > This only happens on my Yocto system but not on a Gentoo system with the > identical kernel (4.14.28-rt23). On the Gentoo, the calls never take long, > even > during transitions. What network driver and hardware are you using? ___ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users
Re: [etherlab-users] Call to ecrt_domain_process takes sometimes very long (10ms!)
Hello again, after sleeping over the issue, I did some more tests. In fact, the function that takes the long time is ecrt_domain_process (and not ecrt_domain_queue). However, if I do no longer the call to ecrt_domain_queue, then the ecrt_domain_process will not take long (that is what got me on the wrong track yesterday). So during startup (Link up, Master gets to OP etc.), one or two calls to ecrt_domain_process take about 10 ms, instead of virtually nothing (500 ns) later on. When I disconnect the network cable from the slave then again some calls take long and after reconnecting as well. So always during transitions of the state. This only happens on my Yocto system but not on a Gentoo system with the identical kernel (4.14.28-rt23). On the Gentoo, the calls never take long, even during transitions. Thanks a lot for any hint, Michael > I successfully got the IgH EtherCAT Master running on an IEI board using > Gentoo and the 4.14.28-rt23 PREEMPT kernel. > > Now, I switched to a self-made, yocto-based distribution, using the same > kernel (I tried both the kernel built by our yocto > recipe and my own "hand-built" kernel with the exact same result) and now I > have a very strange problem: > > The call to ecrt_domain_queue takes sometimes very long (about 10 ms, I > measured that call separately). During normal operation > it takes no considerable amount of time but while the system is coming up > (switching to OP etc.) or if I disconnect a cable it > starts taking those large amounts of time. Then of course a 1 ms cycle time > fails catastrophic. > On my gentoo based system, this behaviour does not occur. It never takes too > much time, even if I disconnect a cable or reboot a > slave. > As it does not seem to be kernel related, what else could it be? GCC versions > differ of course, yocto using 7.2.0 while gentoo is on > 6.4.0. > I narrowed down the issue to the ec_user_example already. > Cyclictest shows no > abnormality[http://www.dict.cc/englisch-deutsch/abnormality.html], max > latency is < 25 us on both systems. I run > the system with isolcpus=3 (and smp_affinity=7 for irq) and the cyclictest/ec > test program with taskset 8 on the isolated core. So I > rule out a generic latency issue. > I am really a bit puzzled where to start debugging this strange issue and are > happy for any hint! ___ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users