Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
Hi Philippe, On Thu, 14 Jun 2018, Philippe Gerum wrote: > On 06/12/2018 06:18 PM, Jeff Melvile wrote: > > Dmitriy (and Philippe), > > > > Thanks for looking into this. I'm working with Raman. > > > > On Tue, 22 May 2018, Dmitriy Cherkasov wrote: > > > >> On 05/20/2018 08:07 AM, Philippe Gerum wrote: > >>> On 05/18/2018 06:24 PM, Singh, Raman wrote: > Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a > Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May > 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), > Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 > > I've been having issues with semaphore latency when threads access > semaphores while executing on different cores. When both threads > accessing > a semaphore execute on the same processor core, the latency between > one thread posting a semaphore and another waking up after waiting on it > is fairly small. However, as soon as one of the threads is moved to a > different core, the latency between a semaphore post from one thread to > a > waiting thread waking up in response starts to become large enough to > affect real time performance. The latencies I've been seeing are on the > order > of 100's of milliseconds. > > >>> > >>> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for > >>> waking up threads on remote CPUs don't flow to the other end properly > >>> (ipipe_send_ipi()), which explains the behavior you have been seeing. > >>> > >>> @Dmitriy: this may be an issue with the range of SGIs available to the > >>> kernel when a secure firmware is enabled, which may be restricted to > >>> SGI[0-7]. > >>> > >>> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to > >>> trigger SGI8 which may be reserved by the ATF in secure mode, therefore > >>> may never be received on the remote end. > >>> > >>> Fixing this will require some work in the interrupt pipeline, typically > >>> for multiplexing our IPIs on a single SGI below SGI8. As a matter of > >>> fact, the same issue exists on the ARM side, but since running a secure > >>> firmware there is uncommon for Xenomai users, this went unnoticed (at > >>> least not reported yet AFAIR). We need to sync up on this not to > >>> duplicate work. > >>> > >> > >> I see this on Hikey with the latest ipipe-arm64 tree as well. I can > >> confirm the > >> reschedule IPI isn't being received although it is sent. Rearranging the > >> IPIs > >> to move reschedule up a few spots resolves the issue, so I think this > >> confirms > >> the root cause. > > > > Short term - what is the consequence of naively rearranging the IPIs? What > > else breaks? FWIW secure firmware is not in use. Is your test patch > > something we can apply to be able to test the multi-core aspects of our > > software? > > > > Let me know if there is anything either of us can do to help. We have > > kernel development experience but admittedly not quite at this level. > > > > This issue may affect the ARM port in some cases as well, so I took a stab at > it for ARM64 since the related code is very similar. Could you test that > patch? TIA, Thanks for the patch. We ended up applying it on top of a kernel patched with ipipe-core-4.9.24-arm64-2.patch, manually resolving the conflicts (contained to smp.c IIRC). Clearly this is a little diferent than applying it on top of the ipipe HEAD and generating a fresh patch. The fix did resolve the high latencies we were seeing in our application across cores. Thanks again for the fix and let me know if you'd like us to do any additional testing. Thanks, Jeff > > commit 765aa7853642b46e1c13fd1f21dfcb9d049f5bfa (HEAD -> wip/arm64-ipi-4.9) > Author: Philippe Gerum > Date: Wed Jun 13 19:16:27 2018 +0200 > > arm64/ipipe: multiplex IPIs > > SGI8-15 can be reserved for the exclusive use of the firmware. The > ARM64 kernel currently uses six of them (NR_IPI), and the pipeline > needs to define three more for conveying out-of-band events > (i.e. reschedule, hrtimer and critical IPIs). Therefore we have to > multiplex nine inter-processor events over eight SGIs (SGI0-7). > > This patch changes the IPI management in order to multiplex all > regular (in-band) IPIs over SGI0, reserving SGI1-3 for out-of-band > events. > > diff --git a/arch/arm64/include/asm/ipipe.h b/arch/arm64/include/asm/ipipe.h > index b16f03b508d6..8e756be01906 100644 > --- a/arch/arm64/include/asm/ipipe.h > +++ b/arch/arm64/include/asm/ipipe.h > @@ -32,6 +32,7 @@ > #include > #include > #include > +#include > > #define IPIPE_CORE_RELEASE 4 > > @@ -165,7 +166,7 @@ static inline void ipipe_unmute_pic(void) > void __ipipe_early_core_setup(void); > void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd); > void __ipipe_root_localtimer(unsigned int irq, void *cookie); >
Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
On 06/12/2018 06:18 PM, Jeff Melvile wrote: > Dmitriy (and Philippe), > > Thanks for looking into this. I'm working with Raman. > > On Tue, 22 May 2018, Dmitriy Cherkasov wrote: > >> On 05/20/2018 08:07 AM, Philippe Gerum wrote: >>> On 05/18/2018 06:24 PM, Singh, Raman wrote: Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 I've been having issues with semaphore latency when threads access semaphores while executing on different cores. When both threads accessing a semaphore execute on the same processor core, the latency between one thread posting a semaphore and another waking up after waiting on it is fairly small. However, as soon as one of the threads is moved to a different core, the latency between a semaphore post from one thread to a waiting thread waking up in response starts to become large enough to affect real time performance. The latencies I've been seeing are on the order of 100's of milliseconds. >>> >>> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for >>> waking up threads on remote CPUs don't flow to the other end properly >>> (ipipe_send_ipi()), which explains the behavior you have been seeing. >>> >>> @Dmitriy: this may be an issue with the range of SGIs available to the >>> kernel when a secure firmware is enabled, which may be restricted to >>> SGI[0-7]. >>> >>> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to >>> trigger SGI8 which may be reserved by the ATF in secure mode, therefore >>> may never be received on the remote end. >>> >>> Fixing this will require some work in the interrupt pipeline, typically >>> for multiplexing our IPIs on a single SGI below SGI8. As a matter of >>> fact, the same issue exists on the ARM side, but since running a secure >>> firmware there is uncommon for Xenomai users, this went unnoticed (at >>> least not reported yet AFAIR). We need to sync up on this not to >>> duplicate work. >>> >> >> I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm >> the >> reschedule IPI isn't being received although it is sent. Rearranging the IPIs >> to move reschedule up a few spots resolves the issue, so I think this >> confirms >> the root cause. > > Short term - what is the consequence of naively rearranging the IPIs? What > else breaks? FWIW secure firmware is not in use. Is your test patch > something we can apply to be able to test the multi-core aspects of our > software? > > Let me know if there is anything either of us can do to help. We have > kernel development experience but admittedly not quite at this level. > This issue may affect the ARM port in some cases as well, so I took a stab at it for ARM64 since the related code is very similar. Could you test that patch? TIA, commit 765aa7853642b46e1c13fd1f21dfcb9d049f5bfa (HEAD -> wip/arm64-ipi-4.9) Author: Philippe Gerum Date: Wed Jun 13 19:16:27 2018 +0200 arm64/ipipe: multiplex IPIs SGI8-15 can be reserved for the exclusive use of the firmware. The ARM64 kernel currently uses six of them (NR_IPI), and the pipeline needs to define three more for conveying out-of-band events (i.e. reschedule, hrtimer and critical IPIs). Therefore we have to multiplex nine inter-processor events over eight SGIs (SGI0-7). This patch changes the IPI management in order to multiplex all regular (in-band) IPIs over SGI0, reserving SGI1-3 for out-of-band events. diff --git a/arch/arm64/include/asm/ipipe.h b/arch/arm64/include/asm/ipipe.h index b16f03b508d6..8e756be01906 100644 --- a/arch/arm64/include/asm/ipipe.h +++ b/arch/arm64/include/asm/ipipe.h @@ -32,6 +32,7 @@ #include #include #include +#include #define IPIPE_CORE_RELEASE 4 @@ -165,7 +166,7 @@ static inline void ipipe_unmute_pic(void) void __ipipe_early_core_setup(void); void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd); void __ipipe_root_localtimer(unsigned int irq, void *cookie); -void __ipipe_grab_ipi(unsigned svc, struct pt_regs *regs); +void __ipipe_grab_ipi(unsigned int sgi, struct pt_regs *regs); void __ipipe_ipis_alloc(void); void __ipipe_ipis_request(void); diff --git a/arch/arm64/include/asm/ipipe_base.h b/arch/arm64/include/asm/ipipe_base.h index 867474e1b075..4d8beb560a2f 100644 --- a/arch/arm64/include/asm/ipipe_base.h +++ b/arch/arm64/include/asm/ipipe_base.h @@ -31,13 +31,15 @@ #ifdef CONFIG_SMP -extern unsigned __ipipe_first_ipi; - -#define IPIPE_CRITICAL_IPI __ipipe_first_ipi -#define IPIPE_HRTIMER_IPI (IPIPE_CRITICAL_IPI + 1) -#define IPIPE_RESCHEDULE_IPI (IPIPE_CRITICAL_IPI + 2) - -#define IPIPE_LAST_IPI IPIPE_RESCHEDULE_IPI +/* + * Out-of-band IPIs are directly mapped to
Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
Dmitriy (and Philippe), Thanks for looking into this. I'm working with Raman. On Tue, 22 May 2018, Dmitriy Cherkasov wrote: > On 05/20/2018 08:07 AM, Philippe Gerum wrote: > > On 05/18/2018 06:24 PM, Singh, Raman wrote: > >> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a > >> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May > >> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), > >> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 > >> > >> I've been having issues with semaphore latency when threads access > >> semaphores while executing on different cores. When both threads accessing > >> a semaphore execute on the same processor core, the latency between > >> one thread posting a semaphore and another waking up after waiting on it > >> is fairly small. However, as soon as one of the threads is moved to a > >> different core, the latency between a semaphore post from one thread to a > >> waiting thread waking up in response starts to become large enough to > >> affect real time performance. The latencies I've been seeing are on the > >> order > >> of 100's of milliseconds. > >> > > > > Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for > > waking up threads on remote CPUs don't flow to the other end properly > > (ipipe_send_ipi()), which explains the behavior you have been seeing. > > > > @Dmitriy: this may be an issue with the range of SGIs available to the > > kernel when a secure firmware is enabled, which may be restricted to > > SGI[0-7]. > > > > For the rescheduling IPI on ARM64, the interrupt pipeline attempts to > > trigger SGI8 which may be reserved by the ATF in secure mode, therefore > > may never be received on the remote end. > > > > Fixing this will require some work in the interrupt pipeline, typically > > for multiplexing our IPIs on a single SGI below SGI8. As a matter of > > fact, the same issue exists on the ARM side, but since running a secure > > firmware there is uncommon for Xenomai users, this went unnoticed (at > > least not reported yet AFAIR). We need to sync up on this not to > > duplicate work. > > > > I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm > the > reschedule IPI isn't being received although it is sent. Rearranging the IPIs > to move reschedule up a few spots resolves the issue, so I think this confirms > the root cause. Short term - what is the consequence of naively rearranging the IPIs? What else breaks? FWIW secure firmware is not in use. Is your test patch something we can apply to be able to test the multi-core aspects of our software? Let me know if there is anything either of us can do to help. We have kernel development experience but admittedly not quite at this level. > > Philippe, are there architectures that already do this type of multiplexing, > or > does this mechanism need to be designed from scratch? > > ___ > Xenomai mailing list > Xenomai@xenomai.org > https://xenomai.org/mailman/listinfo/xenomai > Thanks, Jeff ___ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai
Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
On 05/22/2018 07:06 AM, Dmitriy Cherkasov wrote: > On 05/20/2018 08:07 AM, Philippe Gerum wrote: >> On 05/18/2018 06:24 PM, Singh, Raman wrote: >>> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a >>> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May >>> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), >>> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 >>> >>> I've been having issues with semaphore latency when threads access >>> semaphores while executing on different cores. When both threads accessing >>> a semaphore execute on the same processor core, the latency between >>> one thread posting a semaphore and another waking up after waiting on it >>> is fairly small. However, as soon as one of the threads is moved to a >>> different core, the latency between a semaphore post from one thread to a >>> waiting thread waking up in response starts to become large enough to >>> affect real time performance. The latencies I've been seeing are on the >>> order >>> of 100's of milliseconds. >>> >> >> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for >> waking up threads on remote CPUs don't flow to the other end properly >> (ipipe_send_ipi()), which explains the behavior you have been seeing. >> >> @Dmitriy: this may be an issue with the range of SGIs available to the >> kernel when a secure firmware is enabled, which may be restricted to >> SGI[0-7]. >> >> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to >> trigger SGI8 which may be reserved by the ATF in secure mode, therefore >> may never be received on the remote end. >> >> Fixing this will require some work in the interrupt pipeline, typically >> for multiplexing our IPIs on a single SGI below SGI8. As a matter of >> fact, the same issue exists on the ARM side, but since running a secure >> firmware there is uncommon for Xenomai users, this went unnoticed (at >> least not reported yet AFAIR). We need to sync up on this not to >> duplicate work. >> > > I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm > the > reschedule IPI isn't being received although it is sent. Rearranging the IPIs > to move reschedule up a few spots resolves the issue, so I think this confirms > the root cause. > > Philippe, are there architectures that already do this type of multiplexing, > or > does this mechanism need to be designed from scratch? > ppc implements a muxed IPI scheme for platforms with interrupt controllers not providing enough IPI channels (i.e. less than 4). This is done in the SMP support code, which enables the feature for all ICs that would require it (CONFIG_PPC_SMP_MUXED_IPI). We could use a similar approach, except that we may want to multiplex all of the regular kernel inter-processor messages (i.e. IPI_WAKEUP..IPI_CPU_BACKTRACE) on a single IPI vector, mapping I-pipe messages 1:1 onto the remaining IPI vectors for efficiency. That would leave us with 1 (mux) + 3 (HRTIMER, RESCHED and CRITICAL) SGIs used. -- Philippe. ___ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai
Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
On 05/20/2018 08:07 AM, Philippe Gerum wrote: > On 05/18/2018 06:24 PM, Singh, Raman wrote: >> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a >> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May >> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), >> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 >> >> I've been having issues with semaphore latency when threads access >> semaphores while executing on different cores. When both threads accessing >> a semaphore execute on the same processor core, the latency between >> one thread posting a semaphore and another waking up after waiting on it >> is fairly small. However, as soon as one of the threads is moved to a >> different core, the latency between a semaphore post from one thread to a >> waiting thread waking up in response starts to become large enough to >> affect real time performance. The latencies I've been seeing are on the >> order >> of 100's of milliseconds. >> > > Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for > waking up threads on remote CPUs don't flow to the other end properly > (ipipe_send_ipi()), which explains the behavior you have been seeing. > > @Dmitriy: this may be an issue with the range of SGIs available to the > kernel when a secure firmware is enabled, which may be restricted to > SGI[0-7]. > > For the rescheduling IPI on ARM64, the interrupt pipeline attempts to > trigger SGI8 which may be reserved by the ATF in secure mode, therefore > may never be received on the remote end. > > Fixing this will require some work in the interrupt pipeline, typically > for multiplexing our IPIs on a single SGI below SGI8. As a matter of > fact, the same issue exists on the ARM side, but since running a secure > firmware there is uncommon for Xenomai users, this went unnoticed (at > least not reported yet AFAIR). We need to sync up on this not to > duplicate work. > I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm the reschedule IPI isn't being received although it is sent. Rearranging the IPIs to move reschedule up a few spots resolves the issue, so I think this confirms the root cause. Philippe, are there architectures that already do this type of multiplexing, or does this mechanism need to be designed from scratch? ___ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai
Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency
On 05/18/2018 06:24 PM, Singh, Raman wrote: > Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a > Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May > 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), > Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24 > > I've been having issues with semaphore latency when threads access > semaphores while executing on different cores. When both threads accessing > a semaphore execute on the same processor core, the latency between > one thread posting a semaphore and another waking up after waiting on it > is fairly small. However, as soon as one of the threads is moved to a > different core, the latency between a semaphore post from one thread to a > waiting thread waking up in response starts to become large enough to > affect real time performance. The latencies I've been seeing are on the order > of 100's of milliseconds. > Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for waking up threads on remote CPUs don't flow to the other end properly (ipipe_send_ipi()), which explains the behavior you have been seeing. @Dmitriy: this may be an issue with the range of SGIs available to the kernel when a secure firmware is enabled, which may be restricted to SGI[0-7]. For the rescheduling IPI on ARM64, the interrupt pipeline attempts to trigger SGI8 which may be reserved by the ATF in secure mode, therefore may never be received on the remote end. Fixing this will require some work in the interrupt pipeline, typically for multiplexing our IPIs on a single SGI below SGI8. As a matter of fact, the same issue exists on the ARM side, but since running a secure firmware there is uncommon for Xenomai users, this went unnoticed (at least not reported yet AFAIR). We need to sync up on this not to duplicate work. -- Philippe. ___ Xenomai mailing list Xenomai@xenomai.org https://xenomai.org/mailman/listinfo/xenomai