Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-16 Thread Iain Buclaw
On 14 May 2015 at 19:47, Joseph Myers jos...@codesourcery.com wrote:
 On Thu, 14 May 2015, Iain Buclaw wrote:

 On another note, I've found out why the remaining 20 symbols in my 75k
 sample failed.  They don't fail at all!  It's just that they were all
 greater than 33,000 characters in length, and my test used c++filt,
 which trims anything bigger than 32767 (whoops!).

 That would be a bug in c++filt (failure to follow the GNU Coding Standards
 regarding having no arbitrary limits).


Good to know.  I'll query the binutils maintainers on a fix for that then.

Iain


Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-16 Thread Iain Buclaw
On 14 May 2015 at 17:30, Iain Buclaw ibuc...@gdcproject.org wrote:
 On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote:
 On 05/13/2015 02:51 AM, Iain Buclaw wrote:

 In my tests, this gives the demangler near-complete support.  Of a
 sample of about 75k symbols pulled from the standard library
 unittester, all but 20 were successfully parsed.

 ---
 libiberty/ChangeLog:

 2015-05-13 Iain Buclawibuc...@gdcproject.org

  * d-demangle.c (dlang_symbol_kinds): New enum.
  (dlang_parse_symbol): Update signature.  Handle an ambiguity between
 mangle
  symbol for pascal and template value arguments.  Only check for a
 type
  if parsing a function, or at the top level.  Return failure if the
  entire symbol was not successfully demangled.
  (dlang_identifier): Update signature.  Handle an ambiguity between
 two
  adjacent digits in a mangled symbol string.
  (dlang_type): Update call to dlang_parse_symbol.
  (dlang_template_args): Likewise.
  (dlang_parse_template): Likewise.
  (dlang_demangle): Likewise.
  * testsuite/d-demangle-expected: Fix bad tests found, and add
 problematic
  examples to the unittests.

 OK.

 I'm going to trust the code to dis-ambiguate the adjacent digits in
 dlang_identifier is correct.  The rest of the changes were pretty easy to
 follow :-0

 Jeff


 Actually, the one snippest that we should be OK without in that
 dis-ambiguate section is the while loop with the comment: handle any
 overflow.


Also discovered that an infinite loop is possible in that
dis-ambiguate section when was testing gdb with these changes on
various programs.

So FYI, I've removed the while loop and fixed the problem as mentioned above.

Unfortunately it also seems that the second ambiguity with
extern(Pascal) vs Template value parameters is a little bit more
difficult to get right.  I've got an idea for a way to handle it, but
I don't like it, so I've raised a bug in upstream D. :-)

Iain
---
 libiberty/d-demangle.c  | 185 
 libiberty/testsuite/d-demangle-expected |  44 ++--
 2 files changed, 175 insertions(+), 54 deletions(-)

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 0af926c..c697b00 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -165,6 +165,21 @@ string_prepend (string *p, const char *s)
 }
 }
 
+/* What kinds of symbol we could be parsing.  */
+enum dlang_symbol_kinds
+{
+  /* Top-level symbol, needs it's type checked.  */
+  dlang_top_level,
+  /* Function symbol, needs it's type checked.   */
+  dlang_function,
+  /* Strongly typed name, such as for classes, structs and enums.  */
+  dlang_type_name,
+  /* Template identifier.  */
+  dlang_template_ident,
+  /* Template symbol parameter.  */
+  dlang_template_param
+};
+
 /* Prototypes for forward referenced functions */
 static const char *dlang_function_args (string *, const char *);
 
@@ -172,7 +187,8 @@ static const char *dlang_type (string *, const char *);
 
 static const char *dlang_value (string *, const char *, const char *, char);
 
-static const char *dlang_parse_symbol (string *, const char *);
+static const char *dlang_parse_symbol (string *, const char *,
+   enum dlang_symbol_kinds);
 
 static const char *dlang_parse_tuple (string *, const char *);
 
@@ -527,7 +543,7 @@ dlang_type (string *decl, const char *mangled)
 case 'E': /* enum T */
 case 'T': /* typedef T */
   mangled++;
-  return dlang_parse_symbol (decl, mangled);
+  return dlang_parse_symbol (decl, mangled, dlang_type_name);
 case 'D': /* delegate T */
 {
   string mods;
@@ -662,114 +678,162 @@ dlang_type (string *decl, const char *mangled)
 /* Extract the identifier from MANGLED and append it to DECL.
Return the remaining string on success or NULL on failure.  */
 static const char *
-dlang_identifier (string *decl, const char *mangled)
+dlang_identifier (string *decl, const char *mangled,
+		  enum dlang_symbol_kinds kind)
 {
+  char *endptr;
+  long len;
+
   if (mangled == NULL || *mangled == '\0')
 return NULL;
 
-  if (ISDIGIT (*mangled))
+  len = strtol (mangled, endptr, 10);
+
+  if (endptr == NULL || len = 0)
+return NULL;
+
+  /* In template parameter symbols, the first character of the mangled
+ name can be a digit.  This causes ambiguity issues because the
+ digits of the two numbers are adjacent.  */
+  if (kind == dlang_template_param)
 {
-  char *endptr;
-  long i = strtol (mangled, endptr, 10);
+  long psize = len;
+  char *pend;
+  int saved = string_length (decl);
+
+  /* Work backwards until a match is found.  */
+  for (pend = endptr; endptr != NULL; pend--)
+	{
+	  mangled = pend;
 
-  if (endptr == NULL || i = 0 || strlen (endptr)  (size_t) i)
+	  /* Reached the beginning of the pointer to the name length,
+	 try parsing the entire symbol.  */
+	  if (psize == 0)
+	{
+	  psize = len;
+	  

Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-14 Thread Jeff Law

On 05/13/2015 02:51 AM, Iain Buclaw wrote:

In my tests, this gives the demangler near-complete support.  Of a
sample of about 75k symbols pulled from the standard library
unittester, all but 20 were successfully parsed.

---
libiberty/ChangeLog:

2015-05-13 Iain Buclawibuc...@gdcproject.org

 * d-demangle.c (dlang_symbol_kinds): New enum.
 (dlang_parse_symbol): Update signature.  Handle an ambiguity between mangle
 symbol for pascal and template value arguments.  Only check for a type
 if parsing a function, or at the top level.  Return failure if the
 entire symbol was not successfully demangled.
 (dlang_identifier): Update signature.  Handle an ambiguity between two
 adjacent digits in a mangled symbol string.
 (dlang_type): Update call to dlang_parse_symbol.
 (dlang_template_args): Likewise.
 (dlang_parse_template): Likewise.
 (dlang_demangle): Likewise.
 * testsuite/d-demangle-expected: Fix bad tests found, and add problematic
 examples to the unittests.

OK.

I'm going to trust the code to dis-ambiguate the adjacent digits in 
dlang_identifier is correct.  The rest of the changes were pretty easy 
to follow :-0


Jeff



Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-14 Thread Iain Buclaw
On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote:
 On 05/13/2015 02:51 AM, Iain Buclaw wrote:

 In my tests, this gives the demangler near-complete support.  Of a
 sample of about 75k symbols pulled from the standard library
 unittester, all but 20 were successfully parsed.

 ---
 libiberty/ChangeLog:

 2015-05-13 Iain Buclawibuc...@gdcproject.org

  * d-demangle.c (dlang_symbol_kinds): New enum.
  (dlang_parse_symbol): Update signature.  Handle an ambiguity between
 mangle
  symbol for pascal and template value arguments.  Only check for a
 type
  if parsing a function, or at the top level.  Return failure if the
  entire symbol was not successfully demangled.
  (dlang_identifier): Update signature.  Handle an ambiguity between
 two
  adjacent digits in a mangled symbol string.
  (dlang_type): Update call to dlang_parse_symbol.
  (dlang_template_args): Likewise.
  (dlang_parse_template): Likewise.
  (dlang_demangle): Likewise.
  * testsuite/d-demangle-expected: Fix bad tests found, and add
 problematic
  examples to the unittests.

 OK.

 I'm going to trust the code to dis-ambiguate the adjacent digits in
 dlang_identifier is correct.  The rest of the changes were pretty easy to
 follow :-0


This was the hardest one to get right! ;-)

In the briefest of possible explanations, for the following symbol:

S213std11parallelism3run

We go through the following loop:

(ptr = std11parallelism3run, len = 213)
  = Pointer doesn't start with a digit or _D, move back by 1.
(ptr = 3std11parallelism3run, len = 21)
  = Parse returns std.parallelism.run, length matches, return success!

In a fail example, for instance if we instead had 214:

(ptr = std11parallelism3run, len = 214)
  = Pointer doesn't start with a digit or _D, move back by 1.
(ptr = 4std11parallelism3run, len = 21)
  = Parse returns std1.p, length doesn't match, move pointer back by 1.
(ptr = 14std11parallelism3run, len = 2)
  = Parse returns std11paralleli, length doesn't match, move pointer back by 1.
(ptr = 214std11parallelism3run, len = 214)
  = Reached the beginning, reset length and parse the entire symbol.
Length is too big, return NULL.


Granted, there could be some very small room for false positives, but
if we find an match that succeeds incorrectly, there is very little
change that the rest of the symbol will be demangled correctly.

Regards
Iain.


Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-14 Thread Joseph Myers
On Thu, 14 May 2015, Iain Buclaw wrote:

 On another note, I've found out why the remaining 20 symbols in my 75k
 sample failed.  They don't fail at all!  It's just that they were all
 greater than 33,000 characters in length, and my test used c++filt,
 which trims anything bigger than 32767 (whoops!).

That would be a bug in c++filt (failure to follow the GNU Coding Standards 
regarding having no arbitrary limits).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-14 Thread Mike Stump
On May 14, 2015, at 8:30 AM, Iain Buclaw ibuc...@gdcproject.org wrote:
 
 On another note, I've found out why the remaining 20 symbols in my 75k
 sample failed.  They don't fail at all!  It's just that they were all
 greater than 33,000 characters in length, and my test used c++filt,
 which trims anything bigger than 32767 (whoops!).

We blew past 32k more than a decade or two ago.


Re: [PATCH 6/7] [D] libiberty: Improve support for demangling D2 templates

2015-05-14 Thread Iain Buclaw
On 14 May 2015 at 15:24, Jeff Law l...@redhat.com wrote:
 On 05/13/2015 02:51 AM, Iain Buclaw wrote:

 In my tests, this gives the demangler near-complete support.  Of a
 sample of about 75k symbols pulled from the standard library
 unittester, all but 20 were successfully parsed.

 ---
 libiberty/ChangeLog:

 2015-05-13 Iain Buclawibuc...@gdcproject.org

  * d-demangle.c (dlang_symbol_kinds): New enum.
  (dlang_parse_symbol): Update signature.  Handle an ambiguity between
 mangle
  symbol for pascal and template value arguments.  Only check for a
 type
  if parsing a function, or at the top level.  Return failure if the
  entire symbol was not successfully demangled.
  (dlang_identifier): Update signature.  Handle an ambiguity between
 two
  adjacent digits in a mangled symbol string.
  (dlang_type): Update call to dlang_parse_symbol.
  (dlang_template_args): Likewise.
  (dlang_parse_template): Likewise.
  (dlang_demangle): Likewise.
  * testsuite/d-demangle-expected: Fix bad tests found, and add
 problematic
  examples to the unittests.

 OK.

 I'm going to trust the code to dis-ambiguate the adjacent digits in
 dlang_identifier is correct.  The rest of the changes were pretty easy to
 follow :-0

 Jeff


Actually, the one snippest that we should be OK without in that
dis-ambiguate section is the while loop with the comment: handle any
overflow.

On another note, I've found out why the remaining 20 symbols in my 75k
sample failed.  They don't fail at all!  It's just that they were all
greater than 33,000 characters in length, and my test used c++filt,
which trims anything bigger than 32767 (whoops!).

Iain.