Hello, > You should not have downgraded but rather pulled the latest code from > the stable-3.0.x branch at git://git.xenomai.org/xenomai-3.git. > As a general note, please disregard the release tarballs: our release > cycle is way too slow to make them a sane option, as truckloads of bug > fixes can pass before a new tarball is issued. Tracking the stable tree > would get you the latest validated fixes.
Ok, I did not realize, I pulled the latest code from the repo, thanks. > Any specifics regarding what went wrong would be more helpful. > Otherwise, nobody may bother and a potential bug would stay. About Xenomai 3.0.4 Alchemy skin which did not work, as I said, I observed (September 2017) that when I used this skin in my code, the system froze (I did not have any trace, any control on the serial console). And when I launched /usr/xenomai/demo/altency, the problem was the same (I had to reboot using the hard reset button). Today, the demo/altency test runs well with the last stable version. Perhaps I did something wrong when installing previous one. I was a very beginner. I can try again with v3.0.4 if you ask. >> For now, my point is that I observe some unexpected behaviors when >> isolating cpu1 and perhaps you can explain some to me. >> >> I am a bit disappointed by so execution-time variations. >> How can we explain that? > > A dual kernel system exhibits a permanent conflict between two kernels > competing for the same hw resources. Considering CPU caches for > instance, the cachelines a sleeping rt thread was previously using can > be evicted by a non-rt thread resuming on the same CPU then treading on > a large amount of physical memory. When the rt thread wakes up > eventually, it may have to go through a series of cache misses to get > the I/D caches hot again. Yes, I understand very well. That is why I expected that when the RT thread is on the isolated CPU, it will perform better than when it is on the non-isolated one. Have you well understood my point that I have a better behavior when the RT thread is on the same CPU as Linux (cpu0) rather than when it is on the isolated one (cpu1)? I had the feeling it would be the contrary, and your explanation comforts me in this direction. > This issue may be aggravated by hw specifics: your imx6d is likely > fitted with a PL3xx outer L2 cache, for which the write-allocate policy > is enabled by the kernel. That policy proved to be responsible for ugly > latency figures with this cache controller. Can we disable such policy? > Maybe, it depends; we used to have some success doing just that with > early imx6 hw, then keeping it enabled became a requirement later with > more recent SoCs (e.g. imx6qp) as we noticed that such policy was > involved in cache coherence in multi-core configs. So YMMV. Interesting. > If you want to give WA disabling a try, just pass l2x0_write_allocate=0 > to the kernel cmdline. If your SoC ends up not booting with that switch, > or dies in mysterious and random ways during runtime, well, this it is > likely the sign that a cache coherence issue is biting and you can't > hack away with that one. I did. The SoC boots and there is no improvement. I do not really know how to check if this parameter is well considered... > You may also need to tell Xenomai that only CPU1 should process rt > workloads (i.e. xenomai.supported_cpus=2). I suspect that serialization > on a core Xenomai lock from all CPUs where the local TWDs tick > introduces some jitter. Restricting the set of rt CPUs to CPU1 would > prevent Xenomai from handling rt timer events on any other CPU, lifting > any contention of that lock in the same move. Very interesting idea. That is what I want. But my SoC does not boot with this cmdline... Even when there is no cpu-isolation. When cmdline is "isolcpus=1 xenomai.supported_cpus=2" or just "xenomai.supported_cpus=2", boot hangs just after "Starting kernel ..." Any idea why? Even if it is not what I want, I tried with "isolcpus=1 xenomai.supported_cpus=1" cmdline too. It boots and I can run the smokey/cpu_affinity test: $ ./smokey --run=cpu_affinity --verbose=100 .. CPU0 is available .. CPU1 is online, non-RT .. control thread binding to non-RT CPU1 .. starting user thread .... user thread starts on CPU0, ok .. RTDM test module not available, skipping cpu_affinity OK Of course, I am interested in having the contrary (CPU0 is online, non-RT / CPU1 is available). Any idea why I can not do it with cpu1? Note: "isolcpus=0 xenomai.supported_cpus=1" also hangs the starting... I wonder if there are some hardware limitations. Is it possible that only cpu0 sees the clock? See my last question below. Indeed, I continued some investigations and found other surprising stuff: Let me call the graphs from previous message the "beautiful one" and the "non-beautiful one". For now, my point is always on the execution time of the thread. Beautiful graph = Min execution time is 32us / Max execution time is 65us. Non-beautiful graph = Min execution time is 32us / Max execution time is 82us (max goes up to 100us on other tests). My previous message was with these observations: isolcpus=1, so Linux is on core0, I stress the Linux with dohell script When the 4000Hz thread is on core0 too ==> beautiful graph When the 4000Hz thread is the only one on cpu1 ==> non-beautiful graph Now, I tried other permutations, and I can not explain: isolcpus=1, so Linux is on core0, I stress the Linux with dohell script I bind the 4000Hz thread on cpu1 (I am confident that it is its good place). - When I create a new 4000Hz thread on cpu0, which does the same amount of stuff as the other one ==> beautiful graph (for the cpu1 thread execution time) - When I create a new 4000Hz thread on cpu0, which does nothing ==> non-beautiful graph (for the cpu1 thread execution time) - When I create a 2000Hz thread (<4000Hz) on cpu0, which does the same amount of stuff as the other one ==> non-beautiful graph (for the cpu1 thread execution time) - When I create a 5000Hz thread (>4000Hz) on cpu0, which does the same amount of stuff as the other one ==> beautiful graph (for the cpu1 thread execution time) Given these observations I wonder if the scheduler or the clock tick are bound to cpu0. And if they play a role in the responsiveness of the system. By the way, it could explain why xenomai.supported_cpus=2 cmdline does not work, no? Is it possible to migrate the clock tick interrupt to cpu1? Is it what you did in your 2015 patch? https://xenomai.org/pipermail/xenomai-git/2015-December/006009.html Regards, Yann _______________________________________________ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai