Hello,

there is a potential problem in the SMP initialization procedure.

One processor in the system has a special role, the so called boot processor. Currently this is the processor with index zero. The way to select the boot processor may change in the future, but what will not change is that we have a boot processor.

The boot processors initializes the data and BSS sections. It performs also the sequential part of the RTEMS initialization.

During the sequential initialization the function

/**
 * @brief Performs CPU specific SMP initialization in the context of the boot
 * processor.
 *
 * This function is invoked on the boot processor by RTEMS during
 * initialization.  All interrupt stacks are allocated at this point in case
 * the CPU port allocates the interrupt stacks.
 *
 * The CPU port should start secondary processors now.
 *
 * @param[in] configured_cpu_count The count of processors requested by the
 * application configuration.
 *
 * @return The count of processors available for the application in the system.
 * This value is less than or equal to the configured count of processors.
 */
uint32_t _CPU_SMP_Initialize( uint32_t configured_cpu_count );

called. This function is currently implemented by the BSPs. An example which starts the processor on its own:

http://git.rtems.org/rtems/tree/c/src/lib/libbsp/sparc/leon3/smp/smp_leon3.c#n38

An example which uses U-Boot to start the second processor:

http://git.rtems.org/rtems/tree/c/src/lib/libbsp/powerpc/qoriq/startup/smp.c#n144

The return value of _CPU_SMP_Initialize() will tell the RTEMS system how many processors are present.

void _SMP_Handler_initialize( void )
{
  uint32_t max_cpus = rtems_configuration_get_maximum_processors();
  uint32_t cpu;

[...]

  /*
   * Discover and initialize the secondary cores in an SMP system.
   */
  max_cpus = _CPU_SMP_Initialize( max_cpus );

  _SMP_Processor_count = max_cpus;
}

If the BSP says "you have three processors", and one of them is actually not available, then we have a problem later.

Before the system starts multitasking there is a synchronization barrier. This synchronization barrier is necessary to have a defined starting point for the scheduler.

void _SMP_Request_start_multitasking( void )
{
  Per_CPU_Control *self_cpu = _Per_CPU_Get();
  uint32_t ncpus = _SMP_Get_processor_count();
  uint32_t cpu;

  _Per_CPU_State_change( self_cpu, PER_CPU_STATE_READY_TO_START_MULTITASKING );

  for ( cpu = 0 ; cpu < ncpus ; ++cpu ) {
    Per_CPU_Control *per_cpu = _Per_CPU_Get_by_index( cpu );

    _Per_CPU_State_change( per_cpu, PER_CPU_STATE_REQUEST_START_MULTITASKING );
  }
}

So before this function returns ALL (!) processors must have changed into the PER_CPU_STATE_REQUEST_START_MULTITASKING (or into PER_CPU_STATE_SHUTDOWN which will terminate the system right now).

In case one of the processors doesn't start, then we will wait here FOREVER (unless a watchdog kill us).

There are now several ways to deal with this.

1. You can consider this a BSP bug. The BSP told the system via _CPU_SMP_Initialize() that so many processors are available. If this is not the case then the BSP lied and you should fix the BSP.

2. You can consider this a feature of the BSP that it tells you wrong numbers. So now what to do?

2.1. You can install a watchdog driver that kills you no matter what corrupt systems state you have. If you analyze the per-CPU states in this case you will notice that some of the processors didn't start.

2.2. You can limit the time spent waiting. If a timeout occurs then we can issue a fatal error that indicates exactly the problem area.

2.2.1 Now we need a facility to measure time (e.g. the CPU counter introduced recently).

2.2.2 Now we need a timeout.

2.2.2.1 The RTEMS kernel cannot know a proper timeout value.

2.2.2.2 The CPU/BSP may know the timeout value. How can the CPU/BSP tell the RTEMS kernel timeout value?

2.2.2.3 We can add an application configuration item that specifies the timeout value and move the responsibility to the application developer.

I am in favor of 1. in combination with 2.1 and 2.2.2.2. For BSPs with unreliably start of secondary processors we should add a support function, e.g.

/**
 * @brief Waits for all other processors to enter the ready to start
 * multitasking state with a timeout in microseconds.
 *
 * In case one processor enters the shutdown state, this function does not
 * return.
 *
 * This function should be called only in _CPU_SMP_Initialize() if required by
 * the CPU port or BSP.
 *
 * @param[in] processor_count The processor count which will later returned by
 * _CPU_SMP_Initialize().
 * @param[in] timeout_in_us The timeout in microseconds.
 *
 * @retval true All other processors entered the ready to start multitasking
 * state.
 * @retval false Not all the other processors entered the ready to start
 * multitasking state and the timeout expired.
 */
bool _Per_CPU_State_wait_for_ready_to_start_multitasking(
  uint32_t processor_count,
  uint32_t timeout_in_us
);

This avoids the burden for the application developer to know about the timeout configuration option and to select a proper value. It moves the responsibility to deal with issue to the BSP which knows best what to do. In case false is returned it can either issue a fatal error or reduce the processor count.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
_______________________________________________
rtems-devel mailing list
rtems-devel@rtems.org
http://www.rtems.org/mailman/listinfo/rtems-devel

Reply via email to