On 15.08.2025 10:40, Nicola Vetrini wrote:
> On 2025-08-15 10:30, Jan Beulich wrote:
>> On 14.08.2025 20:20, Andrew Cooper wrote:
>>> On 14/08/2025 8:26 am, Jan Beulich wrote:
>>>> On 13.08.2025 13:36, Andrew Cooper wrote:
>>>>> On 12/08/2025 10:43 am, Nicola Vetrini wrote:
>>>>>> On 2025-08-08 22:23, Andrew Cooper wrote:
>>>>>>> diff --git a/xen/arch/x86/traps-setup.c 
>>>>>>> b/xen/arch/x86/traps-setup.c
>>>>>>> index 8ca379c9e4cb..13b8fcf0ba51 100644
>>>>>>> --- a/xen/arch/x86/traps-setup.c
>>>>>>> +++ b/xen/arch/x86/traps-setup.c
>>>>>>> @@ -19,6 +20,124 @@ boolean_param("ler", opt_ler);
>>>>>>>
>>>>>>>  void nocall entry_PF(void);
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Sets up system tables and descriptors for IDT devliery.
>>>>>>> + *
>>>>>>> + * - Sets up TSS with stack pointers, including ISTs
>>>>>>> + * - Inserts TSS selector into regular and compat GDTs
>>>>>>> + * - Loads GDT, IDT, TR then null LDT
>>>>>>> + * - Sets up IST references in the IDT
>>>>>>> + */
>>>>>>> +static void load_system_tables(void)
>>>>>>> +{
>>>>>>> +    unsigned int i, cpu = smp_processor_id();
>>>>>>> +    unsigned long stack_bottom = get_stack_bottom(),
>>>>>>> +        stack_top = stack_bottom & ~(STACK_SIZE - 1);
>>>>>>> +    /*
>>>>>>> +     * NB: define tss_page as a local variable because clang 3.5
>>>>>>> doesn't
>>>>>>> +     * support using ARRAY_SIZE against per-cpu variables.
>>>>>>> +     */
>>>>>>> +    struct tss_page *tss_page = &this_cpu(tss_page);
>>>>>>> +    idt_entry_t *idt = this_cpu(idt);
>>>>>>> +
>>>>>> Given the clang baseline this might not be needed anymore?
>>>>> Hmm.  While true, looking at 51461114e26, the code is definitely 
>>>>> better
>>>>> written with the tss_page variable and we wouldn't want to go back 
>>>>> to
>>>>> the old form.
>>>>>
>>>>> I think that I'll simply drop the comment.
>>>>>
>>>>> ~Andrew
>>>>>
>>>>> P.S.
>>>>>
>>>>> Generally speaking, because of the RELOC_HIDE() in this_cpu(), any 
>>>>> time
>>>>> you ever want two accesses to a variable, it's better (code gen 
>>>>> wise) to
>>>>> construct a pointer to it and use the point multiple times.
>>>>>
>>>>> I don't understand why there's a RELOC_HIDE() in this_cpu().  The
>>>>> justification doesn't make sense, but I've not had time to explore 
>>>>> what
>>>>> happens if we take it out.
>>>> There's no justification in xen/percpu.h?
>>>
>>> Well, it's given in compiler.h by RELOC_HIDE().
>>>
>>> /* This macro obfuscates arithmetic on a variable address so that gcc
>>>    shouldn't recognize the original var, and make assumptions about it 
>>> */
>>>
>>>
>>> But this is far from convincing.
>>>
>>>>
>>>> My understanding is that we simply may not expose any accesses to 
>>>> per_cpu_*
>>>> variables directly to the compiler, or there's a risk that it might 
>>>> access
>>>> the "master" variable (i.e. CPU0's on at least x86).
>>>
>>> RELOC_HIDE() doesn't do anything about the correctness of the pointer
>>> arithmetic expression to make the access work.
>>>
>>> I don't see how a correct expression can ever access CPU0's data by
>>> accident.
>>
>> Hmm, upon another look I agree. I wonder whether we inherited this from
>> Linux, where in turn it may have been merely a workaround to deal with
>> preemptible code not correctly accessing per-CPU data (i.e. not
>> accounting for get_per_cpu_offset() not being stable across 
>> preemption).
>> Yet then per_cpu() would have been of similar concern when "cpu" isn't
>> properly re-fetched after any possible preemption point ...
> 
> Probably inherited with a stripped-down comment on top of RELOC_HIDE, 
> see [1]. In a way it does make sense that the compiler may decide to 
> optimize based on this assumption, though I don't know whether wrapping 
> is meant to happen with per-CPU variables.

I wouldn't call it "meant to", but wrapping certainly is possible. This
is arch-independent code, and hence whether any wrapping would occur
depends on the VA layout of the individual architectures.

Jan

Reply via email to