dangers of current NEED_CPU_H, CONFIG_SOFTMMU, CONFIG_USER_ONLY (was: [PATCH v11 7/7] cpu: introduce cpu_accel_instance_init)

Claudio Fontana Thu, 17 Dec 2020 11:46:37 -0800

Hi,

I would like to highlight the current dangerous state of NEED_CPU_H / 
CONFIG_SOFTMMU / CONFIG_USER_ONLY.

This patch of mine (the last in the i386 cleanup PART 2) breaks check-tcg.

The why is not obvious at all. I'll comment below it.

On 12/11/20 11:09 AM, Claudio Fontana wrote:
> centralize the calls to cpu->accel_cpu_interface
> 
> Signed-off-by: Claudio Fontana <cfont...@suse.de>
> ---
>  include/hw/core/cpu.h | 6 ++++++
>  hw/core/cpu.c         | 9 +++++++++
>  target/i386/cpu.c     | 9 ++-------
>  3 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 97e1dd8279..cc05c8fc96 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -664,6 +664,12 @@ void cpu_list_remove(CPUState *cpu);
>   */
>  void cpu_reset(CPUState *cpu);
>  
> +/**
> + * cpu_accel_instance_init:
> + * @cpu: The CPU that needs to do accel-specific object initializations.
> + */
> +void cpu_accel_instance_init(CPUState *cpu);
> +
>  /**
>   * cpu_class_by_name:
>   * @typename: The CPU base type.
> diff --git a/hw/core/cpu.c b/hw/core/cpu.c
> index f41c009e6c..873cf5e4ef 100644
> --- a/hw/core/cpu.c
> +++ b/hw/core/cpu.c
> @@ -242,6 +242,15 @@ void cpu_reset(CPUState *cpu)
>      trace_guest_cpu_reset(cpu);
>  }
>  
> +void cpu_accel_instance_init(CPUState *cpu)
> +{
> +    CPUClass *cc = CPU_GET_CLASS(cpu);
> +
> +    if (cc->accel_cpu_interface) {
> +        cc->accel_cpu_interface->cpu_instance_init(cpu);
> +    }
> +}
> +
>  static void cpu_common_reset(DeviceState *dev)
>  {
>      CPUState *cpu = CPU(dev);
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 5615d9e8bc..8ee39bea24 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -28,7 +28,6 @@
>  #include "sysemu/kvm.h"
>  #include "sysemu/reset.h"
>  #include "sysemu/hvf.h"
> -#include "hw/core/accel-cpu.h"
>  #include "sysemu/xen.h"
>  #include "kvm/kvm_i386.h"
>  #include "sev_i386.h"
> @@ -6621,8 +6620,6 @@ static void x86_cpu_initfn(Object *obj)
>  {
>      X86CPU *cpu = X86_CPU(obj);
>      X86CPUClass *xcc = X86_CPU_GET_CLASS(obj);
> -    CPUClass *cc = CPU_CLASS(xcc);
> -
>      CPUX86State *env = &cpu->env;
>      FeatureWord w;
>  
> @@ -6680,10 +6677,8 @@ static void x86_cpu_initfn(Object *obj)
>          x86_cpu_load_model(cpu, xcc->model);
>      }
>  
> -    /* if required, do the accelerator-specific cpu initialization */
> -    if (cc->accel_cpu_interface) {
> -        cc->accel_cpu_interface->cpu_instance_init(CPU(obj));
> -    }
> +    /* if required, do accelerator-specific cpu initializations */
> +    cpu_accel_instance_init(CPU(obj));
>  }
>  
>  static int64_t x86_cpu_get_arch_id(CPUState *cs)
> 

Seems a harmless change right?

Just extract the use of cc->accel_cpu_interface->cpu_instance_init from x86 so 
it can be a useful function for all architecture targets to start using,
as we continue the refactoring past x86 into arm etc.

Instead, it breaks at least check-tcg (linux-user), if not more.

vvv spoiler below vvvv

The reason comes down in the end to the fact that we have moved code that is 
using CPUClass from target/i386 to hw/core/cpu.c.

If we look at hw/core/meson.build , we notice that cpu.c is in common_ss.

common_ss code does NOT see CONFIG_USER_ONLY, ever.

So our struct TcgCpuOperations in include/hw/core/cpu.h,
which contains after this series:

#ifndef CONFIG_USER_ONLY
    /**                                                                         

     * @do_transaction_failed: Callback for handling failed memory transactions 

     * (ie bus faults or external aborts; not MMU faults)                       

     */
    void (*do_transaction_failed)(CPUState *cpu, hwaddr physaddr, vaddr addr,
                                  unsigned size, MMUAccessType access_type,
                                  int mmu_idx, MemTxAttrs attrs,
                                  MemTxResult response, uintptr_t retaddr);
    /**                                                                         

     * @do_unaligned_access: Callback for unaligned access handling             

     */
    void (*do_unaligned_access)(CPUState *cpu, vaddr addr,
                                MMUAccessType access_type,
                                int mmu_idx, uintptr_t retaddr);
#endif /* !CONFIG_USER_ONLY */

Now suddenly will have some of the objects (in target/...) seeing the struct as 
not having do_transaction_failed and do_unaligned_access,
and some of the objects (common_ss stuff) not seeing CONFIG_USER_ONLY and 
therefore instead _seeing_ do_transaction_failed and do_unaligned_access.

Result is a set of segfaults.

The reason we went on and tried to protect with CONFIG_USER_ONLY was to make 
sure that it is a compile time error to try to use these for linux-user,
but we end up making things worse.

Note that include/hw/core/cpu.h already uses CONFIG_USER_ONLY in other parts of 
the header file, and we might have hidden problems as a result we (or at least 
I) don't know about,
because code is being compiled in for linux-user which explicitly should not be 
compiled there.

There are multiple workarounds / fixes possible for my short term problem,
but would it not be a good idea to fix this problem at its root once and for 
all?

Otherwise, like I fell into this trap, others also probably will, and based on 
the existing cpu.h code already in mainline, indeed it seems already have.

Thoughts?

Thanks,

Claudio

dangers of current NEED_CPU_H, CONFIG_SOFTMMU, CONFIG_USER_ONLY (was: [PATCH v11 7/7] cpu: introduce cpu_accel_instance_init)

Reply via email to