Re: [lldb-dev] lldb10 can not hit break point on windows platform

2020-10-20 Thread le wang via lldb-dev
Thanks, I will take a look.

Greg Clayton  于2020年10月21日周三 上午2:50写道:

> So the good news is the DWARF seems to be valid.
>
> I think LLDB is having problems with this ELF file because it is an object
> file (e_type == ET_REL) or because it has no program headers.
>
> There were some changes made to LLDB where we would put any section
> headers that were contained in program headers inside of a section named
> for the program headers. So you will normally end up with sections in LLDB
> like:
>
>
> PT_LOAD[0]
>   .note.android.ident
>   .note.gnu.build-id
>   .dynsym
>   .gnu.version
>   .gnu.version_r
>   .gnu.hash
>   .hash
>   .dynstr
>   .rel.dyn
>   .ARM.exidx
>   .rel.plt
>   .ARM.extab
>   .rodata
>   .text
>   .plt
> PT_LOAD[1]
>   .fini_array
>   .data.rel.ro
>   .dynamic
>   .got
>   .got.plt
> PT_LOAD[2]
>   .data
>   .bss
> .comment
> .ARM.attributes
> .debug_str
> .debug_loc
> .debug_abbrev
> .debug_info
> .debug_ranges
> .debug_macinfo
> .debug_frame
> .debug_line
> .debug_aranges
> .symtab
> .shstrtab
> .strtab
>
> Note how any section that is contained within a program header is
> contained within a PT_LOAD[N] section.
>
> If I load your ELF binary, I can set a breakpoint:
>
> (lldb) b TestFunction.cpp:1
> Breakpoint 1: where = ELFData.txt`::2() + 4 at TestFunction.cpp:1:7,
> address = 0x0004
>
> I can also view the line table that LLDB was able to parse:
>
>
> (lldb) target modules dump line-table TestFunction.cpp
> Line table for /test/E:/test/TestFunction.cpp in `ELFData.txt
> 0x: E:/test/TestFunction.cpp
> 0x0004: E:/test/TestFunction.cpp:1:7
> 0x000c: E:/test/TestFunction.cpp:2:7
> 0x0014: E:/test/TestFunction.cpp:3:7
> 0x001c: E:/test/TestFunction.cpp:4:7
> 0x002a: E:/test/TestFunction.cpp:4:7
>
> But if we look at the sections, we see the sections had their addresses
> changed. If we look at what is in the ELF file:
>
> $ elf.py /tmp/ELFData.txt
> ELF: /tmp/ELFData.txt (x86_64)
> ELF Header:
> e_ident[EI_MAG0  ] = 0x7f
> e_ident[EI_MAG1  ] = 0x45 'E'
> e_ident[EI_MAG2  ] = 0x4c 'L'
> e_ident[EI_MAG3  ] = 0x46 'F'
> e_ident[EI_CLASS ] = 0x02 ELFCLASS64
> e_ident[EI_DATA  ] = 0x01 ELFDATA2LSB
> e_ident[EI_VERSION   ] = 0x01
> e_ident[EI_OSABI ] = 0x00 ELFOSABI_NONE
> e_ident[EI_ABIVERSION] = 0x00
> e_type  = 0x0001 ET_REL
> e_machine   = 0x003e EM_X86_64
> e_version   = 0x0001
> e_entry = 0x
> e_phoff = 0x
> e_shoff = 0x0568
> e_flags = 0x
> e_ehsize= 0x0040
> e_phentsize = 0x
> e_phnum = 0x
> e_shentsize = 0x0040
> e_shnum = 0x0011
> e_shstrndx  = 0x0001
>
>
> Section Headers:
> Index   sh_namesh_type   sh_flags   sh_addr
>   sh_offset  sh_sizesh_linksh_infosh_addr_a
>   sh_entsize
> === -- - -- --
> -- -- -- --
> -- --
> [0] 0x SHT_NULL  0x 0x
> 0x 0x 0x 0x
> 0x 0x
> [1] 0x008b SHT_STRTAB0x 0x
> 0x04c0 0x00a1 0x 0x
> 0x0001 0x .strtab
> [2] 0x000f SHT_PROGBITS  0x0006 0x0137f103
> 0x0040 0x002a 0x 0x
> 0x0010 0x .text ( SHF_ALLOC SHF_EXECINSTR  )
> [3] 0x003f SHT_PROGBITS  0x0030 0x
> 0x006a 0x003e 0x 0x
> 0x0001 0x0001 .debug_str ( SHF_MERGE SHF_STRINGS  )
> [4] 0x0001 SHT_PROGBITS  0x 0x
> 0x00a8 0x0043 0x 0x
> 0x0001 0x .debug_abbrev
> [5] 0x004f SHT_PROGBITS  0x 0x
> 0x00eb 0x008d 0x 0x
> 0x0001 0x .debug_info
> [6] 0x004a SHT_RELA  0x 0x
> 0x0310 0x0150 0x0010 0x0005
> 0x0008 0x0018 .rela.debug_info
> [7] 0x002f SHT_PROGBITS  0x 0x
> 0x0178 0x001c 0x 0x
> 0x0001 0x .debug_pubnames
> [8] 0x002a SHT_RELA  0x 0x
> 0x0460 0x0018 0x0010 0x0007
> 0x0008 0x0018 .rela.debug_pubnames
> [9] 0x001a SHT_PROGBITS  0x 0x
> 0x0194 0x0026 0x 

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

2020-10-20 Thread Ted Woodward via lldb-dev
I agree with Pavel about the larger picture - we need to know the driver behind 
address spaces before we can discuss a workable solution.

I've dealt with 2 use cases - Harvard architecture cores, and low level 
hardware debugging.

A Harvard architecture core has separate instruction and data memories. These 
often use the same addresses, so to distinguish between them you need address 
spaces. The Motorola DSP56300 had 1 program and 2 data memories, called p, x 
and y. p:100, x:100 and y:100 were all separate memories, so "address 100" 
isn't enough to get what the user needed to see.

For low level hardware debugging (often using JTAG), many devices let you 
access memories in ways like "virtual using the TLB", or "virtual == physical, 
through the core", or "physical, through the SoC, not cached". Memory spaces, 
done right, can give the user the flexibility to pick how to view memory.


Are these the use cases you were envisioning, Jonas?

> -Original Message-
> From: lldb-dev  On Behalf Of Pavel Labath
> via lldb-dev
> Sent: Tuesday, October 20, 2020 12:51 PM
> To: Jonas Devlieghere ; LLDB  d...@lists.llvm.org>
> Subject: [EXT] Re: [lldb-dev] [RFC] Segmented Address Space Support in
> LLDB
>
> There's a lot of things that are unclear to me about this proposal. The
> mechanics of representing an segmented address are one thing, but I I think
> that the really interesting part will be the interaction with the rest of 
> lldb. Like
> - What's going to be the source of this address space information? Is it going
> to be statically baked into lldb (a function of the target architecture?), or
> dynamically retrieved from the target or platform we're debugging? How
> would that work?
> - How is this going to interact with Object/SymbolFile classes? Are you
> expecting to use existing object and symbol formats for address space
> information, or some custom ones? AFAIK, none of the existing formats
> actually support encoding address space information (though that hasn't
> stopped people from trying).
>
> Without understanding the bigger picture it's hard for me to say whether the
> proposed large scale refactoring is a good idea. Nonetheless, I am doubtful of
> the viability of that approach. Some of my reasons for that are:
> - not all addr_ts represent an actual address -- sometimes that is a 
> difference
> between two addresses, which still uses addr_t, as that's guaranteed to fit.
> - relatedly to that, there is a difference (I'd expect) between the operations
> supported by the two types. addr_t supports all integral operations (though I
> hope we don't use all of them), but I wouldn't expect to be able to do the
> same with a SegmentedAddress. For one, I'd expect it wouldn't be possible
> to add two SegmentedAddresses together (which is possible for addr_t).
> OTOH, adding a SegmentedAddress and an addr_t would probably be fine?
> Would subtracting two SegmentedAddresses should result in an addr_t? But
> only if they have matching address spaces (and assert otherwise)?
> - I'd also be worried about over-generalizing specialized code which can
> afford to work with plain addresses, and where the added address space
> would be a nuisance (or a source of bugs). E.g. ELF has no notion of address
> space, so I don't think I'd find it helpful to replace all plain integer 
> calculations
> in elf parsing code with something more complex.
> (I'm aware that some people are using elf to encode address space
> information, but this is a pretty nonstandard extension, and it'd take more
> than type substitution to support anything like that.)
> - large scale refactorings are very much not the norm in llvm
>
>
>
> On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote:
> > We want to support segmented address spaces in LLDB. Currently, all of
> > LLDB’s external API, command line interface, and internals assume that
> > an address in memory can be addressed unambiguously as an addr_t (aka
> > uint64_t). To support a segmented address space we’d need to extend
> > addr_t with a discriminator (an aspace_t) to uniquely identify a
> > location in memory. This RFC outlines what would need to change and
> > how we propose to do that.
> >
> > ### Addresses in LLDB
> >
> > Currently, LLDB has two ways of representing an address:
> >
> >   - Address object. Mostly represents addresses as Section+offset for
> > a binary image loaded in the Target. An Address in this form can
> > persist across executions, e.g. an address breakpoint in a binary
> > image that loads at a different address every execution. An Address
> > object can represent memory not mapped to a binary image. Heap, stack,
> > jitted items, will all be represented as the uint64_t load address of
> > the object, and cannot persist across multiple executions. You must
> > have the Target object available to get the current load address of an
> > Address object in the current process run. Some parts of lldb do not
> > have a Target available to 

Re: [lldb-dev] lldb10 can not hit break point on windows platform

2020-10-20 Thread Greg Clayton via lldb-dev
So the good news is the DWARF seems to be valid.

I think LLDB is having problems with this ELF file because it is an object file 
(e_type == ET_REL) or because it has no program headers. 

There were some changes made to LLDB where we would put any section headers 
that were contained in program headers inside of a section named for the 
program headers. So you will normally end up with sections in LLDB like:


PT_LOAD[0]
  .note.android.ident
  .note.gnu.build-id
  .dynsym
  .gnu.version
  .gnu.version_r
  .gnu.hash
  .hash
  .dynstr
  .rel.dyn
  .ARM.exidx
  .rel.plt
  .ARM.extab
  .rodata
  .text
  .plt
PT_LOAD[1]
  .fini_array
  .data.rel.ro
  .dynamic
  .got
  .got.plt
PT_LOAD[2]
  .data
  .bss
.comment
.ARM.attributes
.debug_str
.debug_loc
.debug_abbrev
.debug_info
.debug_ranges
.debug_macinfo
.debug_frame
.debug_line
.debug_aranges
.symtab
.shstrtab
.strtab

Note how any section that is contained within a program header is contained 
within a PT_LOAD[N] section.

If I load your ELF binary, I can set a breakpoint:

(lldb) b TestFunction.cpp:1
Breakpoint 1: where = ELFData.txt`::2() + 4 at TestFunction.cpp:1:7, 
address = 0x0004

I can also view the line table that LLDB was able to parse:


(lldb) target modules dump line-table TestFunction.cpp
Line table for /test/E:/test/TestFunction.cpp in `ELFData.txt
0x: E:/test/TestFunction.cpp
0x0004: E:/test/TestFunction.cpp:1:7
0x000c: E:/test/TestFunction.cpp:2:7
0x0014: E:/test/TestFunction.cpp:3:7
0x001c: E:/test/TestFunction.cpp:4:7
0x002a: E:/test/TestFunction.cpp:4:7

But if we look at the sections, we see the sections had their addresses 
changed. If we look at what is in the ELF file:

$ elf.py /tmp/ELFData.txt 
ELF: /tmp/ELFData.txt (x86_64)
ELF Header:
e_ident[EI_MAG0  ] = 0x7f
e_ident[EI_MAG1  ] = 0x45 'E'
e_ident[EI_MAG2  ] = 0x4c 'L'
e_ident[EI_MAG3  ] = 0x46 'F'
e_ident[EI_CLASS ] = 0x02 ELFCLASS64
e_ident[EI_DATA  ] = 0x01 ELFDATA2LSB
e_ident[EI_VERSION   ] = 0x01
e_ident[EI_OSABI ] = 0x00 ELFOSABI_NONE
e_ident[EI_ABIVERSION] = 0x00
e_type  = 0x0001 ET_REL
e_machine   = 0x003e EM_X86_64
e_version   = 0x0001
e_entry = 0x
e_phoff = 0x
e_shoff = 0x0568
e_flags = 0x
e_ehsize= 0x0040
e_phentsize = 0x
e_phnum = 0x
e_shentsize = 0x0040
e_shnum = 0x0011
e_shstrndx  = 0x0001


Section Headers:
Index   sh_namesh_type   sh_flags   sh_addr
sh_offset  sh_sizesh_linksh_infosh_addr_a  
sh_entsize
=== -- - -- -- 
-- -- -- -- -- 
--
[0] 0x SHT_NULL  0x 0x 
0x 0x 0x 0x 0x 
0x 
[1] 0x008b SHT_STRTAB0x 0x 
0x04c0 0x00a1 0x 0x 0x0001 
0x .strtab
[2] 0x000f SHT_PROGBITS  0x0006 0x0137f103 
0x0040 0x002a 0x 0x 0x0010 
0x .text ( SHF_ALLOC SHF_EXECINSTR  )
[3] 0x003f SHT_PROGBITS  0x0030 0x 
0x006a 0x003e 0x 0x 0x0001 
0x0001 .debug_str ( SHF_MERGE SHF_STRINGS  )
[4] 0x0001 SHT_PROGBITS  0x 0x 
0x00a8 0x0043 0x 0x 0x0001 
0x .debug_abbrev
[5] 0x004f SHT_PROGBITS  0x 0x 
0x00eb 0x008d 0x 0x 0x0001 
0x .debug_info
[6] 0x004a SHT_RELA  0x 0x 
0x0310 0x0150 0x0010 0x0005 0x0008 
0x0018 .rela.debug_info
[7] 0x002f SHT_PROGBITS  0x 0x 
0x0178 0x001c 0x 0x 0x0001 
0x .debug_pubnames
[8] 0x002a SHT_RELA  0x 0x 
0x0460 0x0018 0x0010 0x0007 0x0008 
0x0018 .rela.debug_pubnames
[9] 0x001a SHT_PROGBITS  0x 0x 
0x0194 0x0026 0x 0x 0x0001 
0x .debug_pubtypes
[   10] 0x0015 SHT_RELA  0x 0x 
0x0478 0x0018 0x0010 0x0009 0x0008 
0x0018 .rela.debug_pubtypes
[   11] 0x005b 

Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB

2020-10-20 Thread Pavel Labath via lldb-dev
There's a lot of things that are unclear to me about this proposal. The 
mechanics of representing an segmented address are one thing, but I I 
think that the really interesting part will be the interaction with the 
rest of lldb. Like
- What's going to be the source of this address space information? Is it 
going to be statically baked into lldb (a function of the target 
architecture?), or dynamically retrieved from the target or platform 
we're debugging? How would that work?
- How is this going to interact with Object/SymbolFile classes? Are you 
expecting to use existing object and symbol formats for address space 
information, or some custom ones? AFAIK, none of the existing formats 
actually support encoding address space information (though that hasn't 
stopped people from trying).


Without understanding the bigger picture it's hard for me to say whether 
the proposed large scale refactoring is a good idea. Nonetheless, I am 
doubtful of the viability of that approach. Some of my reasons for that are:
- not all addr_ts represent an actual address -- sometimes that is a 
difference between two addresses, which still uses addr_t, as that's 
guaranteed to fit.
- relatedly to that, there is a difference (I'd expect) between the 
operations supported by the two types. addr_t supports all integral 
operations (though I hope we don't use all of them), but I wouldn't 
expect to be able to do the same with a SegmentedAddress. For one, I'd 
expect it wouldn't be possible to add two SegmentedAddresses together 
(which is possible for addr_t). OTOH, adding a SegmentedAddress and an 
addr_t would probably be fine? Would subtracting two SegmentedAddresses 
should result in an addr_t? But only if they have matching address 
spaces (and assert otherwise)?
- I'd also be worried about over-generalizing specialized code which can 
afford to work with plain addresses, and where the added address space 
would be a nuisance (or a source of bugs). E.g. ELF has no notion of 
address space, so I don't think I'd find it helpful to replace all plain 
integer calculations in elf parsing code with something more complex. 
(I'm aware that some people are using elf to encode address space 
information, but this is a pretty nonstandard extension, and it'd take 
more than type substitution to support anything like that.)

- large scale refactorings are very much not the norm in llvm



On 19/10/2020 23:56, Jonas Devlieghere via lldb-dev wrote:
We want to support segmented address spaces in LLDB. Currently, all of 
LLDB’s external API, command line interface, and internals assume that 
an address in memory can be addressed unambiguously as an addr_t (aka 
uint64_t). To support a segmented address space we’d need to extend 
addr_t with a discriminator (an aspace_t) to uniquely identify a 
location in memory. This RFC outlines what would need to change and how 
we propose to do that.


### Addresses in LLDB

Currently, LLDB has two ways of representing an address:

  - Address object. Mostly represents addresses as Section+offset for a 
binary image loaded in the Target. An Address in this form can persist 
across executions, e.g. an address breakpoint in a binary image that 
loads at a different address every execution. An Address object can 
represent memory not mapped to a binary image. Heap, stack, jitted 
items, will all be represented as the uint64_t load address of the 
object, and cannot persist across multiple executions. You must have the 
Target object available to get the current load address of an Address 
object in the current process run. Some parts of lldb do not have a 
Target available to them, so they require that the Address can be 
devolved to an addr_t (aka uint64_t) and passed in.
  - The addr_t (aka uint64_t) type. Primarily used when receiving input 
(e.g. from a user on the command line) or when interacting with the 
inferior (reading/writing memory) for addresses that need not persist 
across runs. Also used when reading DWARF and in our symbol tables to 
represent file offset addresses, where the size of an Address object 
would be objectionable.


## Proposal




### Address + ProcessAddress

  - The Address object gains a segment discriminator member variable. 
Everything that creates an Address will need to provide this segment 
discriminator.
  - A ProcessAddress object which is a uint64_t and a segment 
discriminator as a replacement for addr_t. ProcessAddress objects would 
not persist across multiple executions. Similar to how you can create an 
addr_t from an Address+Target today, you can create a ProcessAddress 
given an Address+Target. When we pass around addr_ts today, they would 
be replaced with ProcessAddress, with the exception of symbol tables 
where the added space would be significant, and we do not believe we 
need segment discriminators today.


I'm strongly in favor of the first approach. The reason for that is that 
we have a lot of code that can only reasonable deal with one kind 

Re: [lldb-dev] lldb10 can not hit break point on windows platform

2020-10-20 Thread le wang via lldb-dev
 Hi, all:
 Thanks for your answer. Maybe my description is not clear enough.
MLExecuteTest.exe is just a shell to compile and execute TestFunctin.cpp,
the step is below:
  1. call llvm function to compile TestFunction.cpp, this will create
module ,function, block, instruction, and generate binary code with debug
info;
  2. it uses JIT ExecuteEnginie to execute the binary code.
  and what I use lldb to debug is the binary code generated from
TestFunction.cpp. So I am not concerned about the pdb of MLExecuteTest.exe.

  According to your suggestion, I have debug lldb source code, set
breakpoint in SymbolFilePDB::ResolveSymbolContext, but not come in this
function. Since I think my debug info is all generated by llvm tegother
with IR in the first phrase, which is all contained in binary code. I don't
know in the second phrase, while JIT load binary code to generate execute
process, weather JIT will meanwhile generate some other debug info in PDB
forms on windows platform, if so, which library in llvm do this work, and
how does lldb load these PDB symbols. Because if PDB needs to be generated,
maybe this is what I missed.

  On the other hand, I've debug the lldb to see what happens in
CompileUnit::ResolveSymbolContext, and found that every debug line can be
parsed from DWARFDebugLine.cpp, then the function
FindLineEntryIndexByFileIndex in LinTable.cpp will find whether the
file_addr of line entry is contained in file addresses of section list(I
don't know the reason why lldb do this). But the result is the file_addr of
each line can not be found from the section list addresses. So at last
lines are not found, breakpoint not hit.

I have added some code in notifyObjectLoaded function from
GDBRegistrationListener.cpp to store the middle ELF data. Sections and
debug_line details can be seen by the readelf linux command. Can you help
me to analyse the debug info in ELF, check whether there is anything wrong
in debug info, which causes breakpoints not to be hit.
the TestFunction.cpp and its ELF data is brought in attachment.

Thanks,
le wang

Greg Clayton  于2020年10月20日周二 上午7:23写道:

> As long as the location TestFunction.cpp:1 has a valid line in the PDB
> line tables, this breakpoint should be hit if there is debug info and the
> internal PDB parser is parsing things correctly. If you have debug info in
> your binary, _and_ if LLDB is able to locate your PDB file, then you should
> end up seeing a location if it was found. We can see this isn't happening:
>
> (lldb)br s -fE:/test/TestFunction.cpp -l1
> Breakpoint 1: no locations(pending).
> WARNING :  Unable to resolve breakpoint to any actual locations.
>
> I would suggest debugging the LLDB and debugging:
>
> uint32_t SymbolFilePDB::ResolveSymbolContext(const lldb_private::FileSpec
> _spec, uint32_t line, bool check_inlines, SymbolContextItem
> resolve_scope, lldb_private::SymbolContextList _list);
>
> This function is what turns the file (in "file_spec") and line (in "line")
> into a symbol context list (in "sc_list"). A SymbolContext is a class that
> defines a complete symbol file context for something. It contains a Module
> (executable), CompileUnit (if there is debug info for the source file),
> Function (if there is debug info for the function), Block (lexical block of
> the deepest lexical block that contains the start address for the file +
> line), LineEntry (source file and line number, which might have a line
> number greater than what you requested if there was no perfect match), and
> Symbol (symbol from the symbol table). We have a symbol context list,
> because you might have multiple matches for a given source file and line if
> your functions was inlined.
>
> You might try just doing:
>
> (lldb) b TestFunction.cpp:1
>
> And seeing if that helps. If the debug information doesn't contain the
> exact same path of "E:/test/TestFunction.cpp", it might not set the
> breapoint if the path in the debug info contains symlinks or is a relative
> path.
>
> Does anyone know if the "pdb" file shows up when doing "image list" in
> LLDB? On mac if we have a stand alone debug info file, we show it below the
> executable:
>
> (lldb) target create "/tmp/a.out"
> Current executable set to '/tmp/a.out' (x86_64).
> (lldb) image list a.out
> [  0] E76A2647-AFB4-3950-943A-CB1D701B7C07 0x0001 /tmp/a.out
>   /tmp/a.out.dSYM/Contents/Resources/DWARF/a.out
>
>
> If you do a "image list" on your executable, it would be interesting to
> see if the pdb file shows up in this output.
>
> Greg Clayton
>
> > On Oct 17, 2020, at 1:51 AM, le wang via lldb-dev <
> lldb-dev@lists.llvm.org> wrote:
> >
> > Hello,everyone:
> >   I have a problem when download llvm10.0.1 and use lldb to debug my
> process on windows10 x64 platform. but with no debug point hit.
> > the command is
> > (lldb)target create "D:/code/MLExecuteTest.exe"
> > Current executable set to 'D:/code/MLExecuteTest.exe'  (x86_64)
> > (lldb)br s -fE:/test/TestFunction.cpp -l1
> > Breakpoint 1: