Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-07 Thread Jan Beulich
On 06.11.2024 12:58, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich  wrote:
>>
>> On 06.11.2024 12:34, Frediano Ziglio wrote:
>>> On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich  wrote:

 On 06.11.2024 07:56, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
>>
>> On 05.11.2024 17:35, Frediano Ziglio wrote:
>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:

 On 05.11.2024 15:55, Frediano Ziglio wrote:
> This toolchain generates different object and map files.
> Account for these changes.

 At least briefly mentioning what exactly the differences are would be
 quite nice, imo.

>>>
>>> What about.
>>>
>>> Object have 3 additional sections which must be handled by the linker 
>>> script.
>>
>> I expect these sections are there in both cases. The difference, I 
>> assume,
>> is that for the GNU linker they don't need mentioning in the linker 
>> script.
>> Maybe that's what you mean to say, but to me at least the sentence can 
>> also
>> be interpreted differently.
>
> Why do you expect such sections? They are used for dynamic symbols in
> shared objects, we don't use shared objects here. Normal object
> symbols are not handled by these sections. GNU compiler+linker (we
> link multiple objects together) do not generate these sections. So the
> comment looks correct to me.

 About every ELF object will have .symtab and .strtab, and many also a
 separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
 confused by your reply.
>>>
>>> I checked the object files and there are no such sections using GNU 
>>> toolchain.
>>
>> I think I checked every *.o that's under boot/, and they all have these three
>> sections. Can you clarify which one(s) specifically you checked?
> 
> $ gcc --version
> gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
> Copyright (C) 2021 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ ld --version
> GNU ld (GNU Binutils for Ubuntu) 2.38
> Copyright (C) 2022 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public Licence version 3 or (at your option) a later version.
> This program has absolutely no warranty.
> 
> $ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
> 'objdump -x filename' | grep -e \\.
> shstrtab -e \\.strtab -e \\.symtab
> 
> (xen/normal and xen/pvh are the build directory, with different 
> configurations)
> 
> I'm saying that's possibly why the linker scripts didn't need to
> specify these sections.

Just to mention it here as well - objdump's -x option doesn't include "control"
sections. Considering the help text for -x this feels like a bug. However, as
documentation has it:

"Display all available header information, including the symbol table and
 relocation entries."

the symbol table and possible relocations _are_ being displayed, just not as
part of "Sections:".

Jan



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Frediano Ziglio
On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich  wrote:
>
> On 06.11.2024 07:56, Frediano Ziglio wrote:
> > On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
> >>
> >> On 05.11.2024 17:35, Frediano Ziglio wrote:
> >>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
> 
>  On 05.11.2024 15:55, Frediano Ziglio wrote:
> > This toolchain generates different object and map files.
> > Account for these changes.
> 
>  At least briefly mentioning what exactly the differences are would be
>  quite nice, imo.
> 
> >>>
> >>> What about.
> >>>
> >>> Object have 3 additional sections which must be handled by the linker 
> >>> script.
> >>
> >> I expect these sections are there in both cases. The difference, I assume,
> >> is that for the GNU linker they don't need mentioning in the linker script.
> >> Maybe that's what you mean to say, but to me at least the sentence can also
> >> be interpreted differently.
> >
> > Why do you expect such sections? They are used for dynamic symbols in
> > shared objects, we don't use shared objects here. Normal object
> > symbols are not handled by these sections. GNU compiler+linker (we
> > link multiple objects together) do not generate these sections. So the
> > comment looks correct to me.
>
> About every ELF object will have .symtab and .strtab, and many also a
> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> confused by your reply.
>
> Jan

I checked the object files and there are no such sections using GNU toolchain.

Frediano



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Roger Pau Monné
On Wed, Nov 06, 2024 at 11:58:19AM +, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich  wrote:
> >
> > On 06.11.2024 12:34, Frediano Ziglio wrote:
> > > On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich  wrote:
> > >>
> > >> On 06.11.2024 07:56, Frediano Ziglio wrote:
> > >>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
> > 
> >  On 05.11.2024 17:35, Frediano Ziglio wrote:
> > > On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
> > >>
> > >> On 05.11.2024 15:55, Frediano Ziglio wrote:
> > >>> This toolchain generates different object and map files.
> > >>> Account for these changes.
> > >>
> > >> At least briefly mentioning what exactly the differences are would be
> > >> quite nice, imo.
> > >>
> > >
> > > What about.
> > >
> > > Object have 3 additional sections which must be handled by the linker 
> > > script.
> > 
> >  I expect these sections are there in both cases. The difference, I 
> >  assume,
> >  is that for the GNU linker they don't need mentioning in the linker 
> >  script.
> >  Maybe that's what you mean to say, but to me at least the sentence can 
> >  also
> >  be interpreted differently.
> > >>>
> > >>> Why do you expect such sections? They are used for dynamic symbols in
> > >>> shared objects, we don't use shared objects here. Normal object
> > >>> symbols are not handled by these sections. GNU compiler+linker (we
> > >>> link multiple objects together) do not generate these sections. So the
> > >>> comment looks correct to me.
> > >>
> > >> About every ELF object will have .symtab and .strtab, and many also a
> > >> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> > >> confused by your reply.
> > >
> > > I checked the object files and there are no such sections using GNU 
> > > toolchain.
> >
> > I think I checked every *.o that's under boot/, and they all have these 
> > three
> > sections. Can you clarify which one(s) specifically you checked?
> >
> > Jan
> 
> $ gcc --version
> gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
> Copyright (C) 2021 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ ld --version
> GNU ld (GNU Binutils for Ubuntu) 2.38
> Copyright (C) 2022 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public Licence version 3 or (at your option) a later version.
> This program has absolutely no warranty.
> 
> $ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
> 'objdump -x filename' | grep -e \\.
> shstrtab -e \\.strtab -e \\.symtab
> 
> (xen/normal and xen/pvh are the build directory, with different 
> configurations)
> 
> I'm saying that's possibly why the linker scripts didn't need to
> specify these sections.

objdump -x doesn't seem to list .symtab, .strtab or .shstrtab, but
readelf -S does:

# readelf -SW xen/arch/x86/boot/reloc.o
[...]
  [11] .symtab   SYMTAB   0004b0 000190 10 12  21  4
  [12] .strtab   STRTAB   000640 92 00  0   0  1
  [13] .shstrtab STRTAB   000734 78 00  0   0  1

# objdump -x xen/arch/x86/boot/reloc.o
[...]
Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .group0008      0034  2**2
  CONTENTS, READONLY, GROUP, LINK_ONCE_DISCARD
  1 .text 041d      0040  2**4
  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 .data       045d  2**0
  CONTENTS, ALLOC, LOAD, DATA
  3 .bss        045d  2**0
  ALLOC
  4 .rodata   0024      0460  2**2
  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .text.__x86.get_pc_thunk.bx 0004      0484  2**0
  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 .comment  0028      0488  2**0
  CONTENTS, READONLY
  7 .note.GNU-stack       04b0  2**0
  CONTENTS, READONLY

It also seems to skip .rel sections.  It doesn't seem reliable to use
objdump to get ELF sections.

Regards, Roger.



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Frediano Ziglio
On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich  wrote:
>
> On 06.11.2024 12:34, Frediano Ziglio wrote:
> > On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich  wrote:
> >>
> >> On 06.11.2024 07:56, Frediano Ziglio wrote:
> >>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
> 
>  On 05.11.2024 17:35, Frediano Ziglio wrote:
> > On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
> >>
> >> On 05.11.2024 15:55, Frediano Ziglio wrote:
> >>> This toolchain generates different object and map files.
> >>> Account for these changes.
> >>
> >> At least briefly mentioning what exactly the differences are would be
> >> quite nice, imo.
> >>
> >
> > What about.
> >
> > Object have 3 additional sections which must be handled by the linker 
> > script.
> 
>  I expect these sections are there in both cases. The difference, I 
>  assume,
>  is that for the GNU linker they don't need mentioning in the linker 
>  script.
>  Maybe that's what you mean to say, but to me at least the sentence can 
>  also
>  be interpreted differently.
> >>>
> >>> Why do you expect such sections? They are used for dynamic symbols in
> >>> shared objects, we don't use shared objects here. Normal object
> >>> symbols are not handled by these sections. GNU compiler+linker (we
> >>> link multiple objects together) do not generate these sections. So the
> >>> comment looks correct to me.
> >>
> >> About every ELF object will have .symtab and .strtab, and many also a
> >> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> >> confused by your reply.
> >
> > I checked the object files and there are no such sections using GNU 
> > toolchain.
>
> I think I checked every *.o that's under boot/, and they all have these three
> sections. Can you clarify which one(s) specifically you checked?
>
> Jan

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public Licence version 3 or (at your option) a later version.
This program has absolutely no warranty.

$ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
'objdump -x filename' | grep -e \\.
shstrtab -e \\.strtab -e \\.symtab

(xen/normal and xen/pvh are the build directory, with different configurations)

I'm saying that's possibly why the linker scripts didn't need to
specify these sections.

Frediano



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Jan Beulich
On 06.11.2024 12:34, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich  wrote:
>>
>> On 06.11.2024 07:56, Frediano Ziglio wrote:
>>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:

 On 05.11.2024 17:35, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
>>
>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>> This toolchain generates different object and map files.
>>> Account for these changes.
>>
>> At least briefly mentioning what exactly the differences are would be
>> quite nice, imo.
>>
>
> What about.
>
> Object have 3 additional sections which must be handled by the linker 
> script.

 I expect these sections are there in both cases. The difference, I assume,
 is that for the GNU linker they don't need mentioning in the linker script.
 Maybe that's what you mean to say, but to me at least the sentence can also
 be interpreted differently.
>>>
>>> Why do you expect such sections? They are used for dynamic symbols in
>>> shared objects, we don't use shared objects here. Normal object
>>> symbols are not handled by these sections. GNU compiler+linker (we
>>> link multiple objects together) do not generate these sections. So the
>>> comment looks correct to me.
>>
>> About every ELF object will have .symtab and .strtab, and many also a
>> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
>> confused by your reply.
> 
> I checked the object files and there are no such sections using GNU toolchain.

I think I checked every *.o that's under boot/, and they all have these three
sections. Can you clarify which one(s) specifically you checked?

Jan



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Andrew Cooper
On 06/11/2024 10:32 am, Alejandro Vallejo wrote:
> On Tue Nov 5, 2024 at 7:23 PM GMT, Andrew Cooper wrote:
>> On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
>>> diff --git a/xen/tools/combine_two_binaries.py 
>>> b/xen/tools/combine_two_binaries.py
>>> index 447c0d3bdb..79ae8900b1 100755
>>> --- a/xen/tools/combine_two_binaries.py
>>> +++ b/xen/tools/combine_two_binaries.py
>>> @@ -67,13 +67,22 @@ if args.exports is not None:
>>>  
>>>  # Parse mapfile, look for ther symbols we want to export.
>>>  if args.mapfile is not None:
>>> -symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>>> +symbol_re_clang = \
>>> +
>>> re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
>>> +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>> These are specific to the linker, not the compiler, so really should be
>> _ld and _lld rather than _gnu and _clang.
> GNU binutils ships not one, but two linkers.

Yes that's technically true.

But while only one of them is remotely capable of linking a kernel, it's
also not very relevant.

Neither Gold, or indeed Mold which you mention too, are liable to be LD
within Xen for the foreseeable future.

~Andrew



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Jan Beulich
On 06.11.2024 07:56, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
>>
>> On 05.11.2024 17:35, Frediano Ziglio wrote:
>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:

 On 05.11.2024 15:55, Frediano Ziglio wrote:
> This toolchain generates different object and map files.
> Account for these changes.

 At least briefly mentioning what exactly the differences are would be
 quite nice, imo.

>>>
>>> What about.
>>>
>>> Object have 3 additional sections which must be handled by the linker 
>>> script.
>>
>> I expect these sections are there in both cases. The difference, I assume,
>> is that for the GNU linker they don't need mentioning in the linker script.
>> Maybe that's what you mean to say, but to me at least the sentence can also
>> be interpreted differently.
> 
> Why do you expect such sections? They are used for dynamic symbols in
> shared objects, we don't use shared objects here. Normal object
> symbols are not handled by these sections. GNU compiler+linker (we
> link multiple objects together) do not generate these sections. So the
> comment looks correct to me.

About every ELF object will have .symtab and .strtab, and many also a
separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
confused by your reply.

Jan



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-06 Thread Alejandro Vallejo
On Tue Nov 5, 2024 at 7:23 PM GMT, Andrew Cooper wrote:
> On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
> > diff --git a/xen/tools/combine_two_binaries.py 
> > b/xen/tools/combine_two_binaries.py
> > index 447c0d3bdb..79ae8900b1 100755
> > --- a/xen/tools/combine_two_binaries.py
> > +++ b/xen/tools/combine_two_binaries.py
> > @@ -67,13 +67,22 @@ if args.exports is not None:
> >  
> >  # Parse mapfile, look for ther symbols we want to export.
> >  if args.mapfile is not None:
> > -symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> > +symbol_re_clang = \
> > +
> > re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
> > +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>
> These are specific to the linker, not the compiler, so really should be
> _ld and _lld rather than _gnu and _clang.

GNU binutils ships not one, but two linkers. _ld does not reflect the fact that
it's not the map format of a linker, but a family of them. Probably shared by
mold too. That's a good reason for _gnu to stay _gnu (or if changing, to
_binutils; but that's hardly relevant in this patch).

Otherwise _clang could become either _llvm or _lld. Given there's a single
linker shipped by LLVM the difference is less than cosmetic.

>
> The build environment does have CONFIG_LD_IS_GNU which is used for one
> LD vs LLD distinction.  It seems reasonable to re-use it here (so you're
> not trying both regexes in the hope that one works), although you'll
> need to plumb it into the script somehow because it's not exported into
> the environment currently.
>
>
> But, regexes are utterly opaque, and given that neither Binutils nor
> LLVM document their map format, you really need to provide at least one
> example of what a relevant mapfile looks like in a comment.
>
> Looking at built-in-32.offset.map in it's gnu form, it's presumably
> looking for the lines that are just a hex address and a name with no
> other punctuation.
>
> But the lld form is clearly different.
>
> ~Andrew

Cheers,
Alejandro



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-05 Thread Frediano Ziglio
On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich  wrote:
>
> On 05.11.2024 17:35, Frediano Ziglio wrote:
> > On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
> >>
> >> On 05.11.2024 15:55, Frediano Ziglio wrote:
> >>> This toolchain generates different object and map files.
> >>> Account for these changes.
> >>
> >> At least briefly mentioning what exactly the differences are would be
> >> quite nice, imo.
> >>
> >
> > What about.
> >
> > Object have 3 additional sections which must be handled by the linker 
> > script.
>
> I expect these sections are there in both cases. The difference, I assume,
> is that for the GNU linker they don't need mentioning in the linker script.
> Maybe that's what you mean to say, but to me at least the sentence can also
> be interpreted differently.
>

Why do you expect such sections? They are used for dynamic symbols in
shared objects, we don't use shared objects here. Normal object
symbols are not handled by these sections. GNU compiler+linker (we
link multiple objects together) do not generate these sections. So the
comment looks correct to me.

> >>> +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> >>>  for line in open(args.mapfile):
> >>> -m = symbol_re.match(line)
> >>> -if not m or m.group(2) not in exports:
> >>> +name = None
> >>> +m = symbol_re_clang.match(line)
> >>> +if m:
> >>> +name = m.group(5)
> >>> +else:
> >>> +m = symbol_re_gnu.match(line)
> >>> +if m:
> >>> +name = m.group(2)
> >>
> >> ... uniformly use m.group(2) here (outside of the conditional)?
> >>
> >
> > It could be done. The initial idea was testing that VMA and LMA fields
> > should be identical, which gives a bit more certainty about the format
> > used.
> > About map file format, both formats have some headers. Would it be
> > sensible to detect such headers? BTW, probably a mis-detection would
> > cause symbols not to be detected.
>
> Not sure either way, to be honest.
>
> Jan

Mumble... I'm looking for an alternative, maybe I can avoid using map files.

Frediano



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-05 Thread Andrew Cooper
On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
> diff --git a/xen/tools/combine_two_binaries.py 
> b/xen/tools/combine_two_binaries.py
> index 447c0d3bdb..79ae8900b1 100755
> --- a/xen/tools/combine_two_binaries.py
> +++ b/xen/tools/combine_two_binaries.py
> @@ -67,13 +67,22 @@ if args.exports is not None:
>  
>  # Parse mapfile, look for ther symbols we want to export.
>  if args.mapfile is not None:
> -symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> +symbol_re_clang = \
> +
> re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
> +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')

These are specific to the linker, not the compiler, so really should be
_ld and _lld rather than _gnu and _clang.

The build environment does have CONFIG_LD_IS_GNU which is used for one
LD vs LLD distinction.  It seems reasonable to re-use it here (so you're
not trying both regexes in the hope that one works), although you'll
need to plumb it into the script somehow because it's not exported into
the environment currently.


But, regexes are utterly opaque, and given that neither Binutils nor
LLVM document their map format, you really need to provide at least one
example of what a relevant mapfile looks like in a comment.

Looking at built-in-32.offset.map in it's gnu form, it's presumably
looking for the lines that are just a hex address and a name with no
other punctuation.

But the lld form is clearly different.

~Andrew



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-05 Thread Jan Beulich
On 05.11.2024 17:35, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
>>
>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>> This toolchain generates different object and map files.
>>> Account for these changes.
>>
>> At least briefly mentioning what exactly the differences are would be
>> quite nice, imo.
>>
> 
> What about.
> 
> Object have 3 additional sections which must be handled by the linker script.

I expect these sections are there in both cases. The difference, I assume,
is that for the GNU linker they don't need mentioning in the linker script.
Maybe that's what you mean to say, but to me at least the sentence can also
be interpreted differently.

>>> +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>>>  for line in open(args.mapfile):
>>> -m = symbol_re.match(line)
>>> -if not m or m.group(2) not in exports:
>>> +name = None
>>> +m = symbol_re_clang.match(line)
>>> +if m:
>>> +name = m.group(5)
>>> +else:
>>> +m = symbol_re_gnu.match(line)
>>> +if m:
>>> +name = m.group(2)
>>
>> ... uniformly use m.group(2) here (outside of the conditional)?
>>
> 
> It could be done. The initial idea was testing that VMA and LMA fields
> should be identical, which gives a bit more certainty about the format
> used.
> About map file format, both formats have some headers. Would it be
> sensible to detect such headers? BTW, probably a mis-detection would
> cause symbols not to be detected.

Not sure either way, to be honest.

Jan



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-05 Thread Frediano Ziglio
On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich  wrote:
>
> On 05.11.2024 15:55, Frediano Ziglio wrote:
> > This toolchain generates different object and map files.
> > Account for these changes.
>
> At least briefly mentioning what exactly the differences are would be
> quite nice, imo.
>

What about.

Object have 3 additional sections which must be handled by the linker script.
Map file has 7 columns: VMA, LMA, size, alignment, output section,
definition, symbols, for our symbols output section and definition are
empty.

> > --- a/xen/tools/combine_two_binaries.py
> > +++ b/xen/tools/combine_two_binaries.py
> > @@ -67,13 +67,22 @@ if args.exports is not None:
> >
> >  # Parse mapfile, look for ther symbols we want to export.
> >  if args.mapfile is not None:
> > -symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> > +symbol_re_clang = \
>
> Is "clang" really appropriate to use here? Even without the alternative
> being named "gnu", I'd expect "llvm" to be more to the point, as at
> the linking stage the original language (encoded in "clang") shouldn't
> matter much anymore.
>

I'll change to "llvm".

> > +
> > re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
>
> Just wondering:
> - How stable is their map file format?

No idea, unfortunately, map files are not really standard.

> - Do they not have an option to write GNU-compatible map files?

I try to search, but I could not find such an option.

> - Why do you declare 5 groups, when you're only after the 1st and last,
> which would then allow to ...
>
> > +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> >  for line in open(args.mapfile):
> > -m = symbol_re.match(line)
> > -if not m or m.group(2) not in exports:
> > +name = None
> > +m = symbol_re_clang.match(line)
> > +if m:
> > +name = m.group(5)
> > +else:
> > +m = symbol_re_gnu.match(line)
> > +if m:
> > +name = m.group(2)
>
> ... uniformly use m.group(2) here (outside of the conditional)?
>

It could be done. The initial idea was testing that VMA and LMA fields
should be identical, which gives a bit more certainty about the format
used.
About map file format, both formats have some headers. Would it be
sensible to detect such headers? BTW, probably a mis-detection would
cause symbols not to be detected.

> Jan
>
> > +if name is None or name not in exports:
> >  continue
> >  addr = int(m.group(1), 16)
> > -exports[m.group(2)] = addr
> > +exports[name] = addr
> >  for (name, addr) in exports.items():
> >  if addr is None:
> >  raise Exception("Required export symbols %s not found" % name)
>

Frediano



Re: [PATCH] x86/boot: Fix build with LLVM toolchain

2024-11-05 Thread Jan Beulich
On 05.11.2024 15:55, Frediano Ziglio wrote:
> This toolchain generates different object and map files.
> Account for these changes.

At least briefly mentioning what exactly the differences are would be
quite nice, imo.

> --- a/xen/tools/combine_two_binaries.py
> +++ b/xen/tools/combine_two_binaries.py
> @@ -67,13 +67,22 @@ if args.exports is not None:
>  
>  # Parse mapfile, look for ther symbols we want to export.
>  if args.mapfile is not None:
> -symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> +symbol_re_clang = \

Is "clang" really appropriate to use here? Even without the alternative
being named "gnu", I'd expect "llvm" to be more to the point, as at
the linking stage the original language (encoded in "clang") shouldn't
matter much anymore.

> +
> re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')

Just wondering:
- How stable is their map file format?
- Do they not have an option to write GNU-compatible map files?
- Why do you declare 5 groups, when you're only after the 1st and last,
which would then allow to ...

> +symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>  for line in open(args.mapfile):
> -m = symbol_re.match(line)
> -if not m or m.group(2) not in exports:
> +name = None
> +m = symbol_re_clang.match(line)
> +if m:
> +name = m.group(5)
> +else:
> +m = symbol_re_gnu.match(line)
> +if m:
> +name = m.group(2)

... uniformly use m.group(2) here (outside of the conditional)?

Jan

> +if name is None or name not in exports:
>  continue
>  addr = int(m.group(1), 16)
> -exports[m.group(2)] = addr
> +exports[name] = addr
>  for (name, addr) in exports.items():
>  if addr is None:
>  raise Exception("Required export symbols %s not found" % name)