Re: [lldb-dev] LLDB Demangling

2018-07-26 Thread Pavel Labath via lldb-dev
On Wed, 25 Jul 2018 at 19:15, Stefan Gränitz  wrote:
> > right now the Symtab class builds a table containing all
> > demangled names. This is mostly unnecessary, and
> What exact table are you referring to here? The m_name_to_index map does
> store both, demangled and mangled strings. Not sure to what extend the
> demangled ones are necessary. It's a little hard to judge, but I can try
> and check. Or do you mean a different one?

Yes, that's the table I meant. You might also be interested in this

lldb-dev thread from May, which discussed a similar table in
SymbolFileDWARF and a couple of my commits from around that time
(which removed it).


>
> There is one more follow-up task here: Now that FastDemangle isn't used
> anymore in Mangled::GetDemangledName(), its last remaining usage is
> SubsPrimitiveParmItanium() in CPlusPlusLanguage.cpp. I will try to
> figure out if we still need it or if we can switch it to the IPD too.
> Then we could consider to get rid of FastMangle altogether.
>
> Any objections here?

The SubsPrimitiveParmItanium function has bugged me too. It does some
questionable manipulations of the mangled names, and it would be nice
to get rid of it. However, I have no idea how critical that
functionality is, nor I was able to find a reason for its existence
via a quick history search.


> With both demangle implementations, the old one and IPD, annotations
> cause demangling to fail/return empty name. I assume this branch has its
> right to exist and I would just keep it, but it would also be great to
> find an explanation and a repro to cover it.

I am going to assume you've already figured what the "linker
annotations" are in general (if not let me know). It looks to me like
this could be triggered if the symtab contained a name with linker
annotations, but the name itself was *not* mangled (e.g. something
like puts@GLIBC_2.5). In that case, it seems like the
Mangled::GetMangledName should return nothing, and the actual name
should be in Mangled::GetDemangledName. Have you tried something like
that?

pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] LLDB Demangling

2018-07-26 Thread Stefan Gränitz via lldb-dev
One more short question, as you seem to be much more familiar with the
code than me. Maybe you have an idea?

In the unit test I prepared, I didn't manage to cover this line in
Symtab::InitNameIndexes():
https://github.com/llvm-mirror/lldb/blob/master/source/Symbol/Symtab.cpp#L348

With both demangle implementations, the old one and IPD, annotations
cause demangling to fail/return empty name. I assume this branch has its
right to exist and I would just keep it, but it would also be great to
find an explanation and a repro to cover it.

Am 25.07.18 um 20:15 schrieb Stefan Gränitz:
> Hi Pavel, thanks for your feedback!
>
>> I would propose to create a new class (RichMangingInfo?), which would
>> wrap both ItaniumPartialDemangler and the existing string-chopping
>> implementation, and provide a unified interface on top of them.
> Right, good point. Adding an abstraction like this certainly makes sense.
>
>> we might be able
>> to get further speedup by having DemangleWithRichNameIndexInfo *not*
>> fill out the demangled string (as, I understand, building the actual
>> string is the most expensive operation), and just fetching the bits of
>> info we need from the IPD.
> Yes, if it turns out we really don't need the demangled names in a table
> and instead just render them on-demand, I guess we'd gain additional
> speedup plus reduction in memory usage. I will have a closer look on this.
>
>> right now the Symtab class builds a table containing all
>> demangled names. This is mostly unnecessary, and
> What exact table are you referring to here? The m_name_to_index map does
> store both, demangled and mangled strings. Not sure to what extend the
> demangled ones are necessary. It's a little hard to judge, but I can try
> and check. Or do you mean a different one?
>
> There is one more follow-up task here: Now that FastDemangle isn't used
> anymore in Mangled::GetDemangledName(), its last remaining usage is
> SubsPrimitiveParmItanium() in CPlusPlusLanguage.cpp. I will try to
> figure out if we still need it or if we can switch it to the IPD too.
> Then we could consider to get rid of FastMangle altogether.
>
> Any objections here?
>
>
> Am 25.07.18 um 13:11 schrieb Pavel Labath:
>> Hello Stefan,
>>
>> thank you for working on this. I believe this work will bring a huge
>> improvement to LLDB.
>>
>> Adding something like DemangleWithRichNameIndexInfo to the Mangled
>> class makes sense to me. However, I don't think that function should
>> be taking an ItaniumPartialDemangler as an argument. The reason for
>> that is that ItaniumPartialDemangler is, unsurprisingly, specific to
>> the itanium mangling scheme, and the Mangled class tries to hide the
>> specifics of different manglings.
>>
>> I would propose to create a new class (RichMangingInfo?), which would
>> wrap both ItaniumPartialDemangler and the existing string-chopping
>> implementation, and provide a unified interface on top of them.
>> Internally it would use IPD for the itanium scheme and the legacy
>> implementation for the MSVC mangling, but the user would not have to
>> care about that, as it can just ask questions like "is this a
>> constructor?" and get the correct answer either way. If ever we get
>> something similar to IPD for the MSVC mangling scheme, we can just
>> replace the legacy implementation with that one, and nobody should
>> know the difference.
>>
>> What do you think?
>>
>>
>> BTW, right now the Symtab class builds a table containing all
>> demangled names. This is mostly unnecessary, and in fact we have
>> recently removed a similar table for the demangled debug_info names.
>> The only present use for it that I am aware of is papering over a
>> clang "bug"*. If we are able to get rid of this, then we might be able
>> to get further speedup by having DemangleWithRichNameIndexInfo *not*
>> fill out the demangled string (as, I understand, building the actual
>> string is the most expensive operation), and just fetching the bits of
>> info we need from the IPD.
>>
>> (*) The "bug" is that sometimes clang will not generate the C1 (full
>> object) flavour of an constructor if it is identical to the C2 version
>> (even though the CU in question definitely constructs full objects of
>> the given class). This means that we are not able able to construct
>> some objects during expression evaluation as we cannot find the C1
>> version anywhere. The demangling makes this work accidentally, as were
>> are able to match up the constructors by demangled names, as they both
>> demangle to the same string. I have recently fixed clang to emit C1
>> always (at -O0), but the question remains on what to do with older
>> compilers.
>>
>>
>> On Tue, 24 Jul 2018 at 21:55, Stefan Gränitz  
>> wrote:
>>> Hello everyone
>>>
>>> I am relatively new to the LLDB sources and just submitted my first own
>>> patch for review in Phabricator. Based on this patch I would like to
>>> discuss a few details for further improvements on LLDB's demangling.
>>>
>>> First a 

Re: [lldb-dev] LLDB Demangling

2018-07-25 Thread Pavel Labath via lldb-dev
Hello Stefan,

thank you for working on this. I believe this work will bring a huge
improvement to LLDB.

Adding something like DemangleWithRichNameIndexInfo to the Mangled
class makes sense to me. However, I don't think that function should
be taking an ItaniumPartialDemangler as an argument. The reason for
that is that ItaniumPartialDemangler is, unsurprisingly, specific to
the itanium mangling scheme, and the Mangled class tries to hide the
specifics of different manglings.

I would propose to create a new class (RichMangingInfo?), which would
wrap both ItaniumPartialDemangler and the existing string-chopping
implementation, and provide a unified interface on top of them.
Internally it would use IPD for the itanium scheme and the legacy
implementation for the MSVC mangling, but the user would not have to
care about that, as it can just ask questions like "is this a
constructor?" and get the correct answer either way. If ever we get
something similar to IPD for the MSVC mangling scheme, we can just
replace the legacy implementation with that one, and nobody should
know the difference.

What do you think?


BTW, right now the Symtab class builds a table containing all
demangled names. This is mostly unnecessary, and in fact we have
recently removed a similar table for the demangled debug_info names.
The only present use for it that I am aware of is papering over a
clang "bug"*. If we are able to get rid of this, then we might be able
to get further speedup by having DemangleWithRichNameIndexInfo *not*
fill out the demangled string (as, I understand, building the actual
string is the most expensive operation), and just fetching the bits of
info we need from the IPD.

(*) The "bug" is that sometimes clang will not generate the C1 (full
object) flavour of an constructor if it is identical to the C2 version
(even though the CU in question definitely constructs full objects of
the given class). This means that we are not able able to construct
some objects during expression evaluation as we cannot find the C1
version anywhere. The demangling makes this work accidentally, as were
are able to match up the constructors by demangled names, as they both
demangle to the same string. I have recently fixed clang to emit C1
always (at -O0), but the question remains on what to do with older
compilers.


On Tue, 24 Jul 2018 at 21:55, Stefan Gränitz  wrote:
>
> Hello everyone
>
> I am relatively new to the LLDB sources and just submitted my first own
> patch for review in Phabricator. Based on this patch I would like to
> discuss a few details for further improvements on LLDB's demangling.
>
> First a short recap on the current state:
> * Name demangling used to be a lazy process, exclusively accessible via
> Mangled::GetDemangledName() - this is a valuable mechanism:
> https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Core/Mangled.cpp#L252
> * My patch wants to replace the existing combination of FastDemangle &
> itaniumDemangle() with LLVM's new ItaniumPartialDemangler (IPD)
> implementation and no fallbacks anymore. It slightly reduces complexity
> and slightly improves performance, but doesn't introduce conceptual
> changes: https://reviews.llvm.org/D49612
> * IPD provides rich information on names, e.g. isFuntion() or
> isCtorOrDtor(), but stores that in its own state rather than returning a
> queriable object:
> https://github.com/llvm-mirror/llvm/blob/a3de0cbb8f4d886a968d20a8c6a6e8aa01d28c2a/include/llvm/Demangle/Demangle.h#L36
> * IPD's rich info could help LLDB, where it currently parses mangled
> names on its own, on-top of demangling. Symtab::InitNameIndexes() seems
> to be the most prominent such place. LLDB builds an index with various
> categories from all its symbols here. This is performance-critical and
> it does not benefit from the laziness in GetDemangledName():
> https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Symbol/Symtab.cpp#L218
>
> My simple switch doesn't exploit IPD's rich demangling info yet and it
> uses a new IPD instance for each demangling request, which is considered
> quite costly as it uses a bump allocator internally. Over-all
> performance still didn't drop, but even seems to benefit.
>
> In order to fully exploit the remaining potential, I am thinking about
> the following changes:
>
> (1) In the Mangled class, add a separate entry-point for batch
> demangling, that allows to pass in an existing IPD:
> bool Mangled::DemangleWithRichNameIndexInfo(ItaniumPartialDemangler );
>
> (2) DemangleWithRichNameIndexInfo() will demangle explicitly, which is
> required to make sure we gather IPD's rich info. It's not lazy as
> GetDemangledName(), but it will store the demangled name and set the
> "MangledCounterpart" so that subsequent lazy requests will be fast.
>
> (3) DemangleWithRichNameIndexInfo() will be used by
> Symtab::InitNameIndexes(), which will have a single IPD instance that is
> reused for all symbols. 

[lldb-dev] LLDB Demangling

2018-07-24 Thread Stefan Gränitz via lldb-dev
Hello everyone

I am relatively new to the LLDB sources and just submitted my first own
patch for review in Phabricator. Based on this patch I would like to
discuss a few details for further improvements on LLDB's demangling.

First a short recap on the current state:
* Name demangling used to be a lazy process, exclusively accessible via
Mangled::GetDemangledName() - this is a valuable mechanism:
https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Core/Mangled.cpp#L252
* My patch wants to replace the existing combination of FastDemangle &
itaniumDemangle() with LLVM's new ItaniumPartialDemangler (IPD)
implementation and no fallbacks anymore. It slightly reduces complexity
and slightly improves performance, but doesn't introduce conceptual
changes: https://reviews.llvm.org/D49612
* IPD provides rich information on names, e.g. isFuntion() or
isCtorOrDtor(), but stores that in its own state rather than returning a
queriable object:
https://github.com/llvm-mirror/llvm/blob/a3de0cbb8f4d886a968d20a8c6a6e8aa01d28c2a/include/llvm/Demangle/Demangle.h#L36
* IPD's rich info could help LLDB, where it currently parses mangled
names on its own, on-top of demangling. Symtab::InitNameIndexes() seems
to be the most prominent such place. LLDB builds an index with various
categories from all its symbols here. This is performance-critical and
it does not benefit from the laziness in GetDemangledName():
https://github.com/llvm-mirror/lldb/blob/8ba903256fd92a2d8644b108a7c8a1a15efd90ad/source/Symbol/Symtab.cpp#L218

My simple switch doesn't exploit IPD's rich demangling info yet and it
uses a new IPD instance for each demangling request, which is considered
quite costly as it uses a bump allocator internally. Over-all
performance still didn't drop, but even seems to benefit.

In order to fully exploit the remaining potential, I am thinking about
the following changes:

(1) In the Mangled class, add a separate entry-point for batch
demangling, that allows to pass in an existing IPD:
bool Mangled::DemangleWithRichNameIndexInfo(ItaniumPartialDemangler );

(2) DemangleWithRichNameIndexInfo() will demangle explicitly, which is
required to make sure we gather IPD's rich info. It's not lazy as
GetDemangledName(), but it will store the demangled name and set the
"MangledCounterpart" so that subsequent lazy requests will be fast.

(3) DemangleWithRichNameIndexInfo() will be used by
Symtab::InitNameIndexes(), which will have a single IPD instance that is
reused for all symbols. Symtab::InitNameIndexes() is usually called
before anything else, so it is basically "warming the cache" here.

(4) Finally, with IPD's rich info, we can get rid of the additional
string parsing in Symtab::InitNameIndexes(). I expect a considerable
speedup here too.

What do you think about the plan?
Do you think it's a good idea to add DemangleWithRichNameIndexInfo()
like this?
Are you aware of more batch-processing places like
Symtab::InitNameIndexes(), that I should consider as clients for
DemangleWithRichNameIndexInfo()?
Do you know potential side-effects I must be aware of?
Would you consider the evidence on the performance benefits convincing,
or do you think it needs bulletproof benchmarking numbers?

When it comes to MSVC-mangled names:
* It is certainly necessary to keep a legacy version of the current
categorization mechanism for these. But in general, what do you think
about their importance for LLDB? (Personally I would like to see LLDB on
Windows, but I tried it only once and gave up quickly.)
* I saw there is a new MicrosoftDemangler now in LLVM. Does anyone know
more about it? Especially: Are there plans to provide rich demangling
information similar to the IPD?

So far I started writing a unit test for Symtab::InitNameIndexes(), so I
won't accidentally break its indexing. I also experimented with a
potential DemangleWithRichNameIndexInfo() and had a look on the numbers
of the internal LLDB timers. This was, however, not exhaustive and real
benchmarking is always hard.

Thanks for all kinds of feedback.

Best,
Stefan


___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev