Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-06-18 Thread Jeff Melvile


Hi Philippe,

On Thu, 14 Jun 2018, Philippe Gerum wrote:

> On 06/12/2018 06:18 PM, Jeff Melvile wrote:
> > Dmitriy (and Philippe),
> > 
> > Thanks for looking into this. I'm working with Raman.
> > 
> > On Tue, 22 May 2018, Dmitriy Cherkasov wrote:
> > 
> >> On 05/20/2018 08:07 AM, Philippe Gerum wrote:
> >>> On 05/18/2018 06:24 PM, Singh, Raman wrote:
>  Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
>  Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
>  14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
>  Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24
> 
>  I've been having issues with semaphore latency when threads access 
>  semaphores while executing on different cores. When both threads 
>  accessing 
>  a semaphore execute on the same processor core, the latency between
>  one thread posting a semaphore and another waking up after waiting on it 
>  is fairly small. However, as soon as one of the threads is moved to a 
>  different core, the latency between a semaphore post from one thread to 
>  a 
>  waiting thread waking up in response starts to become large enough to 
>  affect real time performance.  The latencies I've been seeing are on the 
>  order
>  of 100's of milliseconds.
> 
> >>>
> >>> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
> >>> waking up threads on remote CPUs don't flow to the other end properly
> >>> (ipipe_send_ipi()), which explains the behavior you have been seeing.
> >>>
> >>> @Dmitriy: this may be an issue with the range of SGIs available to the
> >>> kernel when a secure firmware is enabled, which may be restricted to
> >>> SGI[0-7].
> >>>
> >>> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
> >>> trigger SGI8 which may be reserved by the ATF in secure mode, therefore
> >>> may never be received on the remote end.
> >>>
> >>> Fixing this will require some work in the interrupt pipeline, typically
> >>> for multiplexing our IPIs on a single SGI below SGI8. As a matter of
> >>> fact, the same issue exists on the ARM side, but since running a secure
> >>> firmware there is uncommon for Xenomai users, this went unnoticed (at
> >>> least not reported yet AFAIR). We need to sync up on this not to
> >>> duplicate work.
> >>>
> >>
> >> I see this on Hikey with the latest ipipe-arm64 tree as well. I can 
> >> confirm the
> >> reschedule IPI isn't being received although it is sent. Rearranging the 
> >> IPIs
> >> to move reschedule up a few spots resolves the issue, so I think this 
> >> confirms
> >> the root cause.
> > 
> > Short term - what is the consequence of naively rearranging the IPIs? What 
> > else breaks? FWIW secure firmware is not in use. Is your test patch 
> > something we can apply to be able to test the multi-core aspects of our 
> > software?
> > 
> > Let me know if there is anything either of us can do to help. We have 
> > kernel development experience but admittedly not quite at this level.
> > 
> 
> This issue may affect the ARM port in some cases as well, so I took a stab at 
> it for ARM64 since the related code is very similar. Could you test that 
> patch? TIA,

Thanks for the patch. We ended up applying it on top of 
a kernel patched with ipipe-core-4.9.24-arm64-2.patch, manually resolving 
the conflicts (contained to smp.c IIRC). Clearly this is a little diferent 
than applying it on top of the ipipe HEAD and generating a fresh patch. 

The fix did resolve the high latencies we were seeing in our application 
across cores. Thanks again for the fix and let me know if you'd 
like us to do any additional testing.

Thanks,
Jeff 

> 
> commit 765aa7853642b46e1c13fd1f21dfcb9d049f5bfa (HEAD -> wip/arm64-ipi-4.9)
> Author: Philippe Gerum 
> Date:   Wed Jun 13 19:16:27 2018 +0200
> 
> arm64/ipipe: multiplex IPIs
> 
> SGI8-15 can be reserved for the exclusive use of the firmware. The
> ARM64 kernel currently uses six of them (NR_IPI), and the pipeline
> needs to define three more for conveying out-of-band events
> (i.e. reschedule, hrtimer and critical IPIs). Therefore we have to
> multiplex nine inter-processor events over eight SGIs (SGI0-7).
> 
> This patch changes the IPI management in order to multiplex all
> regular (in-band) IPIs over SGI0, reserving SGI1-3 for out-of-band
> events.
> 
> diff --git a/arch/arm64/include/asm/ipipe.h b/arch/arm64/include/asm/ipipe.h
> index b16f03b508d6..8e756be01906 100644
> --- a/arch/arm64/include/asm/ipipe.h
> +++ b/arch/arm64/include/asm/ipipe.h
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IPIPE_CORE_RELEASE   4
>  
> @@ -165,7 +166,7 @@ static inline void ipipe_unmute_pic(void)
>  void __ipipe_early_core_setup(void);
>  void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd);
>  void __ipipe_root_localtimer(unsigned int irq, void *cookie);
> 

Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-06-14 Thread Philippe Gerum
On 06/12/2018 06:18 PM, Jeff Melvile wrote:
> Dmitriy (and Philippe),
> 
> Thanks for looking into this. I'm working with Raman.
> 
> On Tue, 22 May 2018, Dmitriy Cherkasov wrote:
> 
>> On 05/20/2018 08:07 AM, Philippe Gerum wrote:
>>> On 05/18/2018 06:24 PM, Singh, Raman wrote:
 Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
 Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
 Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24

 I've been having issues with semaphore latency when threads access 
 semaphores while executing on different cores. When both threads accessing 
 a semaphore execute on the same processor core, the latency between
 one thread posting a semaphore and another waking up after waiting on it 
 is fairly small. However, as soon as one of the threads is moved to a 
 different core, the latency between a semaphore post from one thread to a 
 waiting thread waking up in response starts to become large enough to 
 affect real time performance.  The latencies I've been seeing are on the 
 order
 of 100's of milliseconds.

>>>
>>> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
>>> waking up threads on remote CPUs don't flow to the other end properly
>>> (ipipe_send_ipi()), which explains the behavior you have been seeing.
>>>
>>> @Dmitriy: this may be an issue with the range of SGIs available to the
>>> kernel when a secure firmware is enabled, which may be restricted to
>>> SGI[0-7].
>>>
>>> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
>>> trigger SGI8 which may be reserved by the ATF in secure mode, therefore
>>> may never be received on the remote end.
>>>
>>> Fixing this will require some work in the interrupt pipeline, typically
>>> for multiplexing our IPIs on a single SGI below SGI8. As a matter of
>>> fact, the same issue exists on the ARM side, but since running a secure
>>> firmware there is uncommon for Xenomai users, this went unnoticed (at
>>> least not reported yet AFAIR). We need to sync up on this not to
>>> duplicate work.
>>>
>>
>> I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm 
>> the
>> reschedule IPI isn't being received although it is sent. Rearranging the IPIs
>> to move reschedule up a few spots resolves the issue, so I think this 
>> confirms
>> the root cause.
> 
> Short term - what is the consequence of naively rearranging the IPIs? What 
> else breaks? FWIW secure firmware is not in use. Is your test patch 
> something we can apply to be able to test the multi-core aspects of our 
> software?
> 
> Let me know if there is anything either of us can do to help. We have 
> kernel development experience but admittedly not quite at this level.
> 

This issue may affect the ARM port in some cases as well, so I took a stab at 
it for ARM64 since the related code is very similar. Could you test that patch? 
TIA,

commit 765aa7853642b46e1c13fd1f21dfcb9d049f5bfa (HEAD -> wip/arm64-ipi-4.9)
Author: Philippe Gerum 
Date:   Wed Jun 13 19:16:27 2018 +0200

arm64/ipipe: multiplex IPIs

SGI8-15 can be reserved for the exclusive use of the firmware. The
ARM64 kernel currently uses six of them (NR_IPI), and the pipeline
needs to define three more for conveying out-of-band events
(i.e. reschedule, hrtimer and critical IPIs). Therefore we have to
multiplex nine inter-processor events over eight SGIs (SGI0-7).

This patch changes the IPI management in order to multiplex all
regular (in-band) IPIs over SGI0, reserving SGI1-3 for out-of-band
events.

diff --git a/arch/arm64/include/asm/ipipe.h b/arch/arm64/include/asm/ipipe.h
index b16f03b508d6..8e756be01906 100644
--- a/arch/arm64/include/asm/ipipe.h
+++ b/arch/arm64/include/asm/ipipe.h
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IPIPE_CORE_RELEASE 4
 
@@ -165,7 +166,7 @@ static inline void ipipe_unmute_pic(void)
 void __ipipe_early_core_setup(void);
 void __ipipe_hook_critical_ipi(struct ipipe_domain *ipd);
 void __ipipe_root_localtimer(unsigned int irq, void *cookie);
-void __ipipe_grab_ipi(unsigned svc, struct pt_regs *regs);
+void __ipipe_grab_ipi(unsigned int sgi, struct pt_regs *regs);
 void __ipipe_ipis_alloc(void);
 void __ipipe_ipis_request(void);
 
diff --git a/arch/arm64/include/asm/ipipe_base.h 
b/arch/arm64/include/asm/ipipe_base.h
index 867474e1b075..4d8beb560a2f 100644
--- a/arch/arm64/include/asm/ipipe_base.h
+++ b/arch/arm64/include/asm/ipipe_base.h
@@ -31,13 +31,15 @@
 
 #ifdef CONFIG_SMP
 
-extern unsigned __ipipe_first_ipi;
-
-#define IPIPE_CRITICAL_IPI __ipipe_first_ipi
-#define IPIPE_HRTIMER_IPI  (IPIPE_CRITICAL_IPI + 1)
-#define IPIPE_RESCHEDULE_IPI   (IPIPE_CRITICAL_IPI + 2)
-
-#define IPIPE_LAST_IPI IPIPE_RESCHEDULE_IPI
+/*
+ * Out-of-band IPIs are directly mapped to 

Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-06-12 Thread Jeff Melvile
Dmitriy (and Philippe),

Thanks for looking into this. I'm working with Raman.

On Tue, 22 May 2018, Dmitriy Cherkasov wrote:

> On 05/20/2018 08:07 AM, Philippe Gerum wrote:
> > On 05/18/2018 06:24 PM, Singh, Raman wrote:
> >> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
> >> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
> >> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
> >> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24
> >>
> >> I've been having issues with semaphore latency when threads access 
> >> semaphores while executing on different cores. When both threads accessing 
> >> a semaphore execute on the same processor core, the latency between
> >> one thread posting a semaphore and another waking up after waiting on it 
> >> is fairly small. However, as soon as one of the threads is moved to a 
> >> different core, the latency between a semaphore post from one thread to a 
> >> waiting thread waking up in response starts to become large enough to 
> >> affect real time performance.  The latencies I've been seeing are on the 
> >> order
> >> of 100's of milliseconds.
> >>
> > 
> > Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
> > waking up threads on remote CPUs don't flow to the other end properly
> > (ipipe_send_ipi()), which explains the behavior you have been seeing.
> > 
> > @Dmitriy: this may be an issue with the range of SGIs available to the
> > kernel when a secure firmware is enabled, which may be restricted to
> > SGI[0-7].
> > 
> > For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
> > trigger SGI8 which may be reserved by the ATF in secure mode, therefore
> > may never be received on the remote end.
> > 
> > Fixing this will require some work in the interrupt pipeline, typically
> > for multiplexing our IPIs on a single SGI below SGI8. As a matter of
> > fact, the same issue exists on the ARM side, but since running a secure
> > firmware there is uncommon for Xenomai users, this went unnoticed (at
> > least not reported yet AFAIR). We need to sync up on this not to
> > duplicate work.
> > 
> 
> I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm 
> the
> reschedule IPI isn't being received although it is sent. Rearranging the IPIs
> to move reschedule up a few spots resolves the issue, so I think this confirms
> the root cause.

Short term - what is the consequence of naively rearranging the IPIs? What 
else breaks? FWIW secure firmware is not in use. Is your test patch 
something we can apply to be able to test the multi-core aspects of our 
software?

Let me know if there is anything either of us can do to help. We have 
kernel development experience but admittedly not quite at this level.

> 
> Philippe, are there architectures that already do this type of multiplexing, 
> or
> does this mechanism need to be designed from scratch?
> 
> ___
> Xenomai mailing list
> Xenomai@xenomai.org
> https://xenomai.org/mailman/listinfo/xenomai
> 

Thanks,
Jeff

___
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai


Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-05-22 Thread Philippe Gerum
On 05/22/2018 07:06 AM, Dmitriy Cherkasov wrote:
> On 05/20/2018 08:07 AM, Philippe Gerum wrote:
>> On 05/18/2018 06:24 PM, Singh, Raman wrote:
>>> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
>>> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
>>> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
>>> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24
>>>
>>> I've been having issues with semaphore latency when threads access 
>>> semaphores while executing on different cores. When both threads accessing 
>>> a semaphore execute on the same processor core, the latency between
>>> one thread posting a semaphore and another waking up after waiting on it 
>>> is fairly small. However, as soon as one of the threads is moved to a 
>>> different core, the latency between a semaphore post from one thread to a 
>>> waiting thread waking up in response starts to become large enough to 
>>> affect real time performance.  The latencies I've been seeing are on the 
>>> order
>>> of 100's of milliseconds.
>>>
>>
>> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
>> waking up threads on remote CPUs don't flow to the other end properly
>> (ipipe_send_ipi()), which explains the behavior you have been seeing.
>>
>> @Dmitriy: this may be an issue with the range of SGIs available to the
>> kernel when a secure firmware is enabled, which may be restricted to
>> SGI[0-7].
>>
>> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
>> trigger SGI8 which may be reserved by the ATF in secure mode, therefore
>> may never be received on the remote end.
>>
>> Fixing this will require some work in the interrupt pipeline, typically
>> for multiplexing our IPIs on a single SGI below SGI8. As a matter of
>> fact, the same issue exists on the ARM side, but since running a secure
>> firmware there is uncommon for Xenomai users, this went unnoticed (at
>> least not reported yet AFAIR). We need to sync up on this not to
>> duplicate work.
>>
> 
> I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm 
> the
> reschedule IPI isn't being received although it is sent. Rearranging the IPIs
> to move reschedule up a few spots resolves the issue, so I think this confirms
> the root cause.
> 
> Philippe, are there architectures that already do this type of multiplexing, 
> or
> does this mechanism need to be designed from scratch?
> 

ppc implements a muxed IPI scheme for platforms with interrupt
controllers not providing enough IPI channels (i.e. less than 4). This
is done in the SMP support code, which enables the feature for all ICs
that would require it (CONFIG_PPC_SMP_MUXED_IPI).

We could use a similar approach, except that we may want to multiplex
all of the regular kernel inter-processor messages (i.e.
IPI_WAKEUP..IPI_CPU_BACKTRACE) on a single IPI vector, mapping I-pipe
messages 1:1 onto the remaining IPI vectors for efficiency. That would
leave us with 1 (mux) + 3 (HRTIMER, RESCHED and CRITICAL) SGIs used.

-- 
Philippe.

___
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai


Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-05-21 Thread Dmitriy Cherkasov
On 05/20/2018 08:07 AM, Philippe Gerum wrote:
> On 05/18/2018 06:24 PM, Singh, Raman wrote:
>> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
>> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
>> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
>> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24
>>
>> I've been having issues with semaphore latency when threads access 
>> semaphores while executing on different cores. When both threads accessing 
>> a semaphore execute on the same processor core, the latency between
>> one thread posting a semaphore and another waking up after waiting on it 
>> is fairly small. However, as soon as one of the threads is moved to a 
>> different core, the latency between a semaphore post from one thread to a 
>> waiting thread waking up in response starts to become large enough to 
>> affect real time performance.  The latencies I've been seeing are on the 
>> order
>> of 100's of milliseconds.
>>
> 
> Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
> waking up threads on remote CPUs don't flow to the other end properly
> (ipipe_send_ipi()), which explains the behavior you have been seeing.
> 
> @Dmitriy: this may be an issue with the range of SGIs available to the
> kernel when a secure firmware is enabled, which may be restricted to
> SGI[0-7].
> 
> For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
> trigger SGI8 which may be reserved by the ATF in secure mode, therefore
> may never be received on the remote end.
> 
> Fixing this will require some work in the interrupt pipeline, typically
> for multiplexing our IPIs on a single SGI below SGI8. As a matter of
> fact, the same issue exists on the ARM side, but since running a secure
> firmware there is uncommon for Xenomai users, this went unnoticed (at
> least not reported yet AFAIR). We need to sync up on this not to
> duplicate work.
> 

I see this on Hikey with the latest ipipe-arm64 tree as well. I can confirm the
reschedule IPI isn't being received although it is sent. Rearranging the IPIs
to move reschedule up a few spots resolves the issue, so I think this confirms
the root cause.

Philippe, are there architectures that already do this type of multiplexing, or
does this mechanism need to be designed from scratch?

___
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai


Re: [Xenomai] Xenomai 3 Multi-core Semaphore latency

2018-05-20 Thread Philippe Gerum
On 05/18/2018 06:24 PM, Singh, Raman wrote:
> Environment: ARM Cortex-A53 quad-core processor (ARM 64-bit) on a
> Zynq Ultrascale+ ZCU102 dev board, Xenomai 3 next branch from May 
> 14, 2018 (SHA1: 410a4cc1109ba4e0d05b7ece7b4a5210287e1183 ), 
> Cobalt configuration with POSIX skin, Linux Kernel version 4.9.24
> 
> I've been having issues with semaphore latency when threads access 
> semaphores while executing on different cores. When both threads accessing 
> a semaphore execute on the same processor core, the latency between
> one thread posting a semaphore and another waking up after waiting on it 
> is fairly small. However, as soon as one of the threads is moved to a 
> different core, the latency between a semaphore post from one thread to a 
> waiting thread waking up in response starts to become large enough to 
> affect real time performance.  The latencies I've been seeing are on the order
> of 100's of milliseconds.
> 

Reproduced on hikey here: the rescheduling IPIs Xenomai is sending for
waking up threads on remote CPUs don't flow to the other end properly
(ipipe_send_ipi()), which explains the behavior you have been seeing.

@Dmitriy: this may be an issue with the range of SGIs available to the
kernel when a secure firmware is enabled, which may be restricted to
SGI[0-7].

For the rescheduling IPI on ARM64, the interrupt pipeline attempts to
trigger SGI8 which may be reserved by the ATF in secure mode, therefore
may never be received on the remote end.

Fixing this will require some work in the interrupt pipeline, typically
for multiplexing our IPIs on a single SGI below SGI8. As a matter of
fact, the same issue exists on the ARM side, but since running a secure
firmware there is uncommon for Xenomai users, this went unnoticed (at
least not reported yet AFAIR). We need to sync up on this not to
duplicate work.

-- 
Philippe.

___
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai