Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On 14 May 2015 at 19:47, Joseph Myers jos...@codesourcery.com wrote: On Thu, 14 May 2015, Iain Buclaw wrote: On another note, I've found out why the remaining 20 symbols in my 75k sample failed. They don't fail at all! It's just that they were all greater than 33,000 characters in length, and my test used c++filt, which trims anything bigger than 32767 (whoops!). That would be a bug in c++filt (failure to follow the GNU Coding Standards regarding having no arbitrary limits). Good to know. I'll query the binutils maintainers on a fix for that then. Iain
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On 14 May 2015 at 17:30, Iain Buclaw ibuc...@gdcproject.org wrote: On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote: On 05/13/2015 02:51 AM, Iain Buclaw wrote: In my tests, this gives the demangler near-complete support. Of a sample of about 75k symbols pulled from the standard library unittester, all but 20 were successfully parsed. --- libiberty/ChangeLog: 2015-05-13 Iain Buclawibuc...@gdcproject.org * d-demangle.c (dlang_symbol_kinds): New enum. (dlang_parse_symbol): Update signature. Handle an ambiguity between mangle symbol for pascal and template value arguments. Only check for a type if parsing a function, or at the top level. Return failure if the entire symbol was not successfully demangled. (dlang_identifier): Update signature. Handle an ambiguity between two adjacent digits in a mangled symbol string. (dlang_type): Update call to dlang_parse_symbol. (dlang_template_args): Likewise. (dlang_parse_template): Likewise. (dlang_demangle): Likewise. * testsuite/d-demangle-expected: Fix bad tests found, and add problematic examples to the unittests. OK. I'm going to trust the code to dis-ambiguate the adjacent digits in dlang_identifier is correct. The rest of the changes were pretty easy to follow :-0 Jeff Actually, the one snippest that we should be OK without in that dis-ambiguate section is the while loop with the comment: handle any overflow. Also discovered that an infinite loop is possible in that dis-ambiguate section when was testing gdb with these changes on various programs. So FYI, I've removed the while loop and fixed the problem as mentioned above. Unfortunately it also seems that the second ambiguity with extern(Pascal) vs Template value parameters is a little bit more difficult to get right. I've got an idea for a way to handle it, but I don't like it, so I've raised a bug in upstream D. :-) Iain --- libiberty/d-demangle.c | 185 libiberty/testsuite/d-demangle-expected | 44 ++-- 2 files changed, 175 insertions(+), 54 deletions(-) diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c index 0af926c..c697b00 100644 --- a/libiberty/d-demangle.c +++ b/libiberty/d-demangle.c @@ -165,6 +165,21 @@ string_prepend (string *p, const char *s) } } +/* What kinds of symbol we could be parsing. */ +enum dlang_symbol_kinds +{ + /* Top-level symbol, needs it's type checked. */ + dlang_top_level, + /* Function symbol, needs it's type checked. */ + dlang_function, + /* Strongly typed name, such as for classes, structs and enums. */ + dlang_type_name, + /* Template identifier. */ + dlang_template_ident, + /* Template symbol parameter. */ + dlang_template_param +}; + /* Prototypes for forward referenced functions */ static const char *dlang_function_args (string *, const char *); @@ -172,7 +187,8 @@ static const char *dlang_type (string *, const char *); static const char *dlang_value (string *, const char *, const char *, char); -static const char *dlang_parse_symbol (string *, const char *); +static const char *dlang_parse_symbol (string *, const char *, + enum dlang_symbol_kinds); static const char *dlang_parse_tuple (string *, const char *); @@ -527,7 +543,7 @@ dlang_type (string *decl, const char *mangled) case 'E': /* enum T */ case 'T': /* typedef T */ mangled++; - return dlang_parse_symbol (decl, mangled); + return dlang_parse_symbol (decl, mangled, dlang_type_name); case 'D': /* delegate T */ { string mods; @@ -662,114 +678,162 @@ dlang_type (string *decl, const char *mangled) /* Extract the identifier from MANGLED and append it to DECL. Return the remaining string on success or NULL on failure. */ static const char * -dlang_identifier (string *decl, const char *mangled) +dlang_identifier (string *decl, const char *mangled, + enum dlang_symbol_kinds kind) { + char *endptr; + long len; + if (mangled == NULL || *mangled == '\0') return NULL; - if (ISDIGIT (*mangled)) + len = strtol (mangled, endptr, 10); + + if (endptr == NULL || len = 0) +return NULL; + + /* In template parameter symbols, the first character of the mangled + name can be a digit. This causes ambiguity issues because the + digits of the two numbers are adjacent. */ + if (kind == dlang_template_param) { - char *endptr; - long i = strtol (mangled, endptr, 10); + long psize = len; + char *pend; + int saved = string_length (decl); + + /* Work backwards until a match is found. */ + for (pend = endptr; endptr != NULL; pend--) + { + mangled = pend; - if (endptr == NULL || i = 0 || strlen (endptr) (size_t) i) + /* Reached the beginning of the pointer to the name length, + try parsing the entire symbol. */ + if (psize == 0) + { + psize = len; +
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On 05/13/2015 02:51 AM, Iain Buclaw wrote: In my tests, this gives the demangler near-complete support. Of a sample of about 75k symbols pulled from the standard library unittester, all but 20 were successfully parsed. --- libiberty/ChangeLog: 2015-05-13 Iain Buclawibuc...@gdcproject.org * d-demangle.c (dlang_symbol_kinds): New enum. (dlang_parse_symbol): Update signature. Handle an ambiguity between mangle symbol for pascal and template value arguments. Only check for a type if parsing a function, or at the top level. Return failure if the entire symbol was not successfully demangled. (dlang_identifier): Update signature. Handle an ambiguity between two adjacent digits in a mangled symbol string. (dlang_type): Update call to dlang_parse_symbol. (dlang_template_args): Likewise. (dlang_parse_template): Likewise. (dlang_demangle): Likewise. * testsuite/d-demangle-expected: Fix bad tests found, and add problematic examples to the unittests. OK. I'm going to trust the code to dis-ambiguate the adjacent digits in dlang_identifier is correct. The rest of the changes were pretty easy to follow :-0 Jeff
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote: On 05/13/2015 02:51 AM, Iain Buclaw wrote: In my tests, this gives the demangler near-complete support. Of a sample of about 75k symbols pulled from the standard library unittester, all but 20 were successfully parsed. --- libiberty/ChangeLog: 2015-05-13 Iain Buclawibuc...@gdcproject.org * d-demangle.c (dlang_symbol_kinds): New enum. (dlang_parse_symbol): Update signature. Handle an ambiguity between mangle symbol for pascal and template value arguments. Only check for a type if parsing a function, or at the top level. Return failure if the entire symbol was not successfully demangled. (dlang_identifier): Update signature. Handle an ambiguity between two adjacent digits in a mangled symbol string. (dlang_type): Update call to dlang_parse_symbol. (dlang_template_args): Likewise. (dlang_parse_template): Likewise. (dlang_demangle): Likewise. * testsuite/d-demangle-expected: Fix bad tests found, and add problematic examples to the unittests. OK. I'm going to trust the code to dis-ambiguate the adjacent digits in dlang_identifier is correct. The rest of the changes were pretty easy to follow :-0 This was the hardest one to get right! ;-) In the briefest of possible explanations, for the following symbol: S213std11parallelism3run We go through the following loop: (ptr = std11parallelism3run, len = 213) = Pointer doesn't start with a digit or _D, move back by 1. (ptr = 3std11parallelism3run, len = 21) = Parse returns std.parallelism.run, length matches, return success! In a fail example, for instance if we instead had 214: (ptr = std11parallelism3run, len = 214) = Pointer doesn't start with a digit or _D, move back by 1. (ptr = 4std11parallelism3run, len = 21) = Parse returns std1.p, length doesn't match, move pointer back by 1. (ptr = 14std11parallelism3run, len = 2) = Parse returns std11paralleli, length doesn't match, move pointer back by 1. (ptr = 214std11parallelism3run, len = 214) = Reached the beginning, reset length and parse the entire symbol. Length is too big, return NULL. Granted, there could be some very small room for false positives, but if we find an match that succeeds incorrectly, there is very little change that the rest of the symbol will be demangled correctly. Regards Iain.
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On Thu, 14 May 2015, Iain Buclaw wrote: On another note, I've found out why the remaining 20 symbols in my 75k sample failed. They don't fail at all! It's just that they were all greater than 33,000 characters in length, and my test used c++filt, which trims anything bigger than 32767 (whoops!). That would be a bug in c++filt (failure to follow the GNU Coding Standards regarding having no arbitrary limits). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On May 14, 2015, at 8:30 AM, Iain Buclaw ibuc...@gdcproject.org wrote: On another note, I've found out why the remaining 20 symbols in my 75k sample failed. They don't fail at all! It's just that they were all greater than 33,000 characters in length, and my test used c++filt, which trims anything bigger than 32767 (whoops!). We blew past 32k more than a decade or two ago.
Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates
On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote: On 05/13/2015 02:51 AM, Iain Buclaw wrote: In my tests, this gives the demangler near-complete support. Of a sample of about 75k symbols pulled from the standard library unittester, all but 20 were successfully parsed. --- libiberty/ChangeLog: 2015-05-13 Iain Buclawibuc...@gdcproject.org * d-demangle.c (dlang_symbol_kinds): New enum. (dlang_parse_symbol): Update signature. Handle an ambiguity between mangle symbol for pascal and template value arguments. Only check for a type if parsing a function, or at the top level. Return failure if the entire symbol was not successfully demangled. (dlang_identifier): Update signature. Handle an ambiguity between two adjacent digits in a mangled symbol string. (dlang_type): Update call to dlang_parse_symbol. (dlang_template_args): Likewise. (dlang_parse_template): Likewise. (dlang_demangle): Likewise. * testsuite/d-demangle-expected: Fix bad tests found, and add problematic examples to the unittests. OK. I'm going to trust the code to dis-ambiguate the adjacent digits in dlang_identifier is correct. The rest of the changes were pretty easy to follow :-0 Jeff Actually, the one snippest that we should be OK without in that dis-ambiguate section is the while loop with the comment: handle any overflow. On another note, I've found out why the remaining 20 symbols in my 75k sample failed. They don't fail at all! It's just that they were all greater than 33,000 characters in length, and my test used c++filt, which trims anything bigger than 32767 (whoops!). Iain.